
Google's AutoML: Cutting Through the Hype - redman25
http://www.fast.ai/2018/07/23/auto-ml-3/
======
minimaxir
At Google Next (which happened after this post was published), Google
explicitly positions their AutoML product as a _beginner_ approach to model
building to get companies up-and-running with machine learning at low
time/cost, and not an end-all-be-all approach.

Here's a recording from a session titled "How to Get Started Injecting AI Into
Your Applications" which illustrates case studies for AutoML:
[https://www.youtube.com/watch?v=O7iT1INWrqo](https://www.youtube.com/watch?v=O7iT1INWrqo)

~~~
raverbashing
Yeah, the last thing a beginner should be trying to do is come up with a NN
from scratch to solve their problem.

If you have a problem that's not amenable to existing architectures then
please try to tune that, but it's an advanced problem.

------
tensor
His comments about the "dangers" of academic research and PhDs is incredibly
strange. The very examples he uses as possibly superior alternatives to the
neural search approach are also by PhDs and academia. Possibly the fact that
he runs an machine learning trade school is leading to this out of place
criticism.

~~~
cromwellian
Also, it cherry picks examples of failed startups or ideas, but ignores all of
the ML that Google did turn into viable end user experiences: Smart Reply,
WaveNet, Translate, Speech Recognition, RankBrain, Photo/Image Search, Retinal
diagnosis, Waymo (soon), and numerous other components that are used in
Android and the Assistant.

The whole point of venture capital or R&D is that you don't know what will
work, and expect 40 failures for 1 success. She seems to think that the way
science progresses is that you iterate on a Phd thesis until it's "Done" and
verified and then it becomes applied. But most Phd theses, even though that
survive peer review, end up inapplicable, unused, or forgotten. Not everyone
publishes a General Relativity paper. A ton of Phd papers are junk, a good
number don't even have reproducible results.

The only way to know if someone will be successful is to try it and let it
succeed or fail in the marketplace.

~~~
jph00
> _Smart Reply, WaveNet, Translate, Speech Recognition, RankBrain, Photo
> /Image Search, Retinal diagnosis, Waymo (soon), and numerous other
> components that are used in Android and the Assistant._

These aren't examples of just productionizing a PhD thesis - they're
thoughtfully designed products that solve a real problem.

Unfortunately we've watched as many ML PhD graduates launch startups which are
little more than an API wrapped around the key algorithm from their thesis.
These startups nearly always fail, because they don't actually address a
market need.

> _The only way to know if someone will be successful is to try it and let it
> succeed or fail in the marketplace._

There are many ways to estimate potential market size, product-market fit, etc
ahead of time. They're not perfect, but they're a lot better than nothing.

~~~
anjc
> These aren't examples of just productionizing a PhD thesis

His point is that sometimes research bears fruit, and sometimes it doesn't.
All universities (in my country) now have metrics for how much research _must_
successfully bear commercial-fruit or else they lose funding from the
government and the EU. As such, they have commercialisation pipelines that
funnel viable commercial research into products. However, even these funding
bodies don't expect that anything more than a fraction of research will be
commercialisable. That isn't the point of research.

> we've watched as many ML PhD graduates launch startups which are little more
> than an API wrapped around the key algorithm from their thesis. These
> startups nearly always fail, because they don't actually address a market
> need.

Why do you care what someone does with their PhD research anyway?

------
ebikelaw
Many of Google's advances in machine learning really do owe their existence to
just throwing computers at the problem. Jeff Dean et al realized several years
back that there were techniques in the literature that didn't seem to work all
that well in the small, but they might work better with a crapton of
parameters, if you trained the model with unprecedented amounts of CPU time.
They were right: throwing megawatts at the problem is effective.

The line on this forum is that Google's product is its customers' data, but
that's never been right. Google's product is, and has always been, dirt-cheap
computing. They have a really large amount of computers, they are building
more right now, and they want you to use them. The surplus of computer power
within Google is what makes Googlers sit around thinking "sure, that was more
CPU time than anyone has ever used for anything before, but what if we used
100x that much?"

~~~
denzil_correa
The paper titled "The Unreasonable Effectiveness of Data" (2009) by Halevy,
Norvig and Pereira from Google mentions this.

[https://static.googleusercontent.com/media/research.google.c...](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf)

------
MrQuincle
So, what should we do instead? The author gives some recommendations:

"Research to make deep learning easier to use has a huge impact, making it
faster and simpler to train better networks. Examples of exciting discoveries
that have now become standard practice are:

* Dropout allows training on smaller datasets without over-fitting.

* Batch normalization allows for faster training.

* Rectified linear units help avoid gradient explosions.

Newer research to improve ease of use includes:

* The learning rate finder makes the training process more robust.

* Super convergence speeds up training, requiring fewer computational resources.

* “Custom heads” for existing architectures (e.g. modifying ResNet, which was initially designed for classification, so that it can be used to find bounding boxes or perform style transfer) allow for easier architecture reuse across a range of problems.

None of the above discoveries involve bare-metal power; instead, all of them
were creative ideas of ways to do things differently."

These seem quite fundamental (except for the last one which sounds like a
favorite domain adaptation strategy from the author).

I'm in industry myself, but it's quite hard to come up with fundamental
strategies. In particular I'm trying to merge nonparametric Bayesian models
with deep networks. By enriching the latent variables in autoencoders to
richer priors we might see improvements. Subsequently, we simultaneously need
better control variates to do inference in more complex models. See my blog
post on the overlooked topic of control variates:
[https://www.annevanrossum.com/blog/2018/05/26/random-
gradien...](https://www.annevanrossum.com/blog/2018/05/26/random-gradients/).
If we really want to be creative we need people from academia on board.

------
robius
We're not going to get very far letting machines guess what we want by making
models we can't interpret.

Explainability is important, and critical in applications where lives are on
the line.

Explainability where we're guessing what's happening in a black box model
won't do either. Nothing but complete transparency of the model and why it's
doing what it's doing. Its source code, that makes sense to humans, is needed.
Full on model audit. No guessing.

I can think of only one company that's attempting to do this, and it's not
anyone you hear working on explainability, including DARPA.

~~~
tensor
If the visual cortex were a machine model, people would be complaining about
how we can't explain it and how it's a dangerous black box. They'd probably
tout the many optical illusions as demonstrations of this danger.

Yet we don't demand that other humans explain how their visual cortex work.
There is a double standard here.

~~~
AstralStorm
I do not approve of the red herring. Either it is visual cortex, or the
intelligence and decision making/planning. One works very differently from the
other.

------
syllogism
This has to be one of the worst comment threads I've seen here.

First of all: Fast.ai are a non-profit. There's no ulterior motive here. I
think a lot of the commenters here are feeling clever "looking for an angle",
when there honestly is none.

Secondly, I really can't think who should have accrued more benefit-of-the-
doubt than Rachel Thomas. It's just silly to take the point of view expressed
here as insincere or motivated reasoning. None of this means you have to agree
with the points raised, the predictions made, or the conclusions drawn. Of
course. But the snide tone of many of the comments here is really discordant.

Finally, it's a little...revealing, that there's so much discussion of Jeremy
here, including comments that seem to assume he wrote the article. I don't
even know what to say about that.

------
anjc
1) "Google’s AutoML highlights some of the dangers of having an academic
research lab embedded in a for-profit corporation"

2) "We can remove the biggest obstacles to using deep learning ... by making
deep learning easier to use"

3) "Research to make deep learning easier to use has a huge impact"

Because this article rants against academics, but is written by an academic,
it reads as inconsistent and disingenuous. Wild guess...AutoML and Google are
worrying competitors to whatever their business is?

------
montenegrohugo
I know I sound like a fanboy, but fast.ai and Jeremy Howard are just so level-
headed and "practical". I highly recommend anybody that wants to get into
ML/DL to try out the 2 coursers they offer. Really fantastic way of teaching,
and to apply these problems to real. life. In particular, the emphasis on
tabular data (often ignored in academic, but very much used in industry) was
very helpful.

~~~
stochastic_monk
They’re also doing some great research. If all you’re familiar with is their
(truly stellar) didactic work, give them a little more attention.

------
mark_l_watson
I have been doing general AI and machine learning since the 1980s. I now
manage a team that specializes in deep learning.

My bet is that most of data engineering/data science/designing neural network
architectures will be commoditized in 5 to 10 years. Maybe sooner.

------
QML
Isn’t the argument between transfer learning and neural architecture search
the same problem posed by the “no free lunch” theorem, which essentially
states specialized algorithms will always beat generalized algorithms in
specific tasks and vice versa?

~~~
riku_iki
Very few people write assembler code today, because compilers are good enough
most of the time. The same can happen if we get generic and robust enough ML
tool.

~~~
QML
Could you expand on that analogy? I'm not sure if a one-model fits-all is
comparable to a one-optimizer-fits-all. Wouldn't that be an argument against
transfer learning, since we don't have a universal programming language and
thus there does exist multiple compilers?

------
halflings
"In evaluating Google’s claims, it’s valuable to keep in mind Google has a
vested financial interest in convincing us that the key to effective use of
deep learning is more computational power, because this is an area where they
clearly beat the rest of us. If true, we may all need to purchase Google
products. On its own, this doesn’t mean that Google’s claims are false, but
it’s good be aware of what financial motivations could underlie their
statements."

Excuse the snark, but:

"In evaluating FastAI's claims, it’s valuable to keep in mind FastAI has a
vested financial interest in convincing us that the key to effective use of
deep learning is more machine learning experts, because ML education is their
business model. If true, we may all need to purchase FastAI courses. On its
own, this doesn’t mean that FastAI’s claims are false, but it’s good be aware
of what financial motivations could underlie their statements."

I can think of thousands of ways that Google could increase the computational
power required that would be much easier than the AutoML effort (for ex.
simply recommending ridiculously deep models that take 100x to train without
giving higher performance). They are putting _so_ much effort into AutoML
because it just works. A lot of the things included in later parts of this
series are very useful (learning rate search, etc.) and decrease the
computational power required, but most people just want to drop a dataset and
pick the type of model (say, multiclass image classification) and leave the
rest for machines to optimize.

~~~
voidray
> most people just want to drop a dataset and pick the type of model (say,
> multiclass image classification) and leave the rest for machines to
> optimize.

I think the disconnect here is that you can reuse existing architectures and
get state-of-the-art performance without running something like AutoML. It's
not clear that creating a bespoke architecture for your specific problem is
always better, let alone always a good use of your resources.

~~~
halflings
Frankly, I don't think this is obvious. Case in point: sequential models
(granted, probably not yet included in the first batch of AutoML). There are
so many ways to model the problem that tweaking each of these ways, in a way
that makes sense for your dataset, takes a lot of work.

I've built models that worked fairly OK, only to have a colleague build a
separate model that had +10% prAUC by virtue of adding some additional
mechanism (say, attention, a different RNN cell, more units, etc.).

I'll also say that this is aimed towards people that are unfamiliar with ML
and would have trouble finding and re-implementing the state-of-the-art in
their specific field.

------
h4b4n3r0
Meh. As far as I'm concerned, it does _exactly_ what it says on the tin --
democratizes access to custom ML. Will you get a state of the art model out of
it? Probably not. Will you be able to train a decent model and deploy it at
scale fairly easily? Probably yes. That's the problem they're solving: getting
people to use ML/AI without hiring a $400K/yr research scientist.

~~~
YeGoblynQueenne
I'm not a businessperson and I try to avoid talking as if I were one, but I
don't see what is the, well, business incentive for Google to "democratize
access to custom ML". Accordingly, I don't see them even trying to do that.

Rather, it seems to me that their policy so far amounts to an attempt at
platform lock-in. If you want to do Deep Learning like Google does, you
basically have to use their tools (yes, that's tensorflow I'm talking about)
and their data (in the form of pre-trained models) and eventually pay them for
CPU or GPU time (you can also pay their competitors, of course).

In fact, I'd take this a step further and say that the whole Deep Learning
hype is starting to sound like a total marketing trick, just to get more
people to use Google's stuff, in the vain hopes of achieving their
performance. However, to beat Google at their own game, with their own tools
and their own data, running their models on their own computers... that does
not sound like a winning proposition.

~~~
h4b4n3r0
Well, the tools are free and open source, and other pretrained models are
available for download. All that’s missing is an expensive PhD or two to make
sense of it all, so that option is also available. You can do ML/DL on Google
Cloud too in a completely portable way, using, for instance, Facebook’s
PyTorch and pretrained models for it. You will discover, however, that it’s
much harder and much more expensive to get anything usable that way, without
Google doing most of the yak shaving for you, even though PyTorch is much
easier to use than TF.

------
benfortuna
ML = Machine Learning = Markup Language = confusion

~~~
IGI-111
Have some empathy for the loyal afficionados of the ML programming language.

Poor folks must live a constant emotional rollercoaster reading this site.

