
Your AI skills are worth less than you think - secabeen
https://www.kdnuggets.com/2019/01/your-ai-skills-worth-less-than-you-think.html
======
m0zg
Stopped reading at "manual tuning is going the way of dodo, and good
riddance". The guy clearly has no idea whatsoever what he's talking about
(figures: he was an SRE at Google, nowhere close to any ML/DL efforts).

Hyperparameter tuning is an area of active research and a dark art of sorts,
and it will remain that for the foreseeable future for one simple reason:
hyperparameters are interdependent and _data dependent_ as well, and nobody,
not even Google with its TPUs has the compute to tune them "automatically" to
any meaningful extent every time. What you see with the various "automl"
efforts is merely transfer learning, refining of existing models on customer's
data. You can learn how to do that in one evening, with no prior DL experience
if you're a good coder, and in 2-3 evenings if you aren't a good coder. Point
is, it's not a difficult problem, just follow a tutorial.

That's great, that's how smart researchers build practical models, but that's
not really "auto-ml", in the sense that there's no hyperparameter tuning going
on (as far as I know) and no new problems are being solved. It's just a
classifier: anybody can do that with common off the shelf tools and pre-
trained checkpoints.

Now achieving state of the art results, solving novel tasks for which there
aren't any ready-made solutions, doing DL efficiently, figuring out creative
ways to get labeled data, semi-supervised or few-shot methods, etc, that's
where it's at right now. And the comp has never been better if you know what
you're doing.

~~~
Zephyr314
To be fair, the hyperparameter tuning behind these AutoML systems are getting
fairly robust. Google bases theirs on Vizier [0]. The Amazon Sagemaker group
has people from the gpyopt project [1]. There is also tons of open source
projects out there to help for non-enterprise projects [2] [3]. There are also
stand-alone companies that help with this explicitly for enterprises [4]
(Caveat, I am a founder).

Increasingly I think more time will be spent on the creative/bespoke aspects
you mention later in your post, like making sure that you are building a
system that actually achieves some business value (vs just getting a better
academic-oriented metric result). Hyperparameter tuning is basically trying to
do high-dimensional, non-convex optimization on time consuming and expensive
to sample functions. Hand tuning is a terrible way to approach this, and is
different for each problem as you point out. Experts can leverage their domain
expertise and the unique aspects of their data, models, and applications in
much better ways.

[0]: [https://www.kdd.org/kdd2017/papers/view/google-vizier-a-
serv...](https://www.kdd.org/kdd2017/papers/view/google-vizier-a-service-for-
black-box-optimization)

[1]:
[https://github.com/SheffieldML/GPyOpt](https://github.com/SheffieldML/GPyOpt)

[2]: [https://github.com/Yelp/MOE](https://github.com/Yelp/MOE)

[3]:
[https://github.com/hyperopt/hyperopt](https://github.com/hyperopt/hyperopt)

[4]: [https://sigopt.com](https://sigopt.com)

~~~
m0zg
Are you sure this is actually being used for "AutoML" type services? All of
the mentioned methods require a parameter search, which is computationally
infeasible in a "quick" AutoML use case, and expensive in case you actually
need it. That is, you more or less run several training sessions in parallel,
and learn from which performs the best in choosing the next parameters. You
don't do a full grid search (that's completely unfeasible most of the time),
but you at best tweak only a few parameters, and you don't do it every time
you train. Hyperparameters aren't just the learning rate and weight decay,
it's also the size and number of layers, where and when to quantize and by how
much, structure of the network, parameters of pooling, etc etc. I'd say we're
still pretty early in the game with all of that, especially when it comes to
efficient architectures that demonstrate high accuracy.

~~~
Zephyr314
I agree that this isn't as common for most end-to-end "AutoML" systems that
take a CSV, do light feature engineering/combinations, pipe it into a random
forest / GBDT, and then output a model. For many of those approaches there are
fewer parameters to tune and you don't get as much lift from tuning them
right. Often it is more about quantity of models and ease of use vs quality. I
do think that quality will increasingly help though so some tuning will start
to be used as the volume, variety, or complexity of the models in these
systems increases or the value of the models themselves start to increase.

However, or more complex model pipelines where an expert is probably involved
there are lots of tools to help with it and it is quickly becoming automated
and less of a "dark art." Some of these tools are built into frameworks like
Google/Amazon, some are built into open source platforms (like katib in
kubeflow), and others are entire companies building model experimentation
platforms (like SigOpt). Many of these can handle everything from traditional
hyperparameters like learning rate to architecture parameters to tuning
feature embeddings, all at once [1]. I agree with the original author that
playing with parameters and doing trial and error optimization of hyper-,
architecture, or feature transformation parameters will largely stop happening
in the manual way it is done today. All of these methods are orders of
magnitude quicker than standard brute-force approaches.

Otherwise, I think you are completely right that there are a ton of aspects of
modeling that require domain expertise and nuance beyond pulling a model off
the shelf. I think a lot of that comes down to picking the model, picking the
data that matters, picking the objective that actually solves the problem for
the task at hand, etc. I believe less of that will be high-D non-convex
optimization done manually.

[1]: [https://aws.amazon.com/blogs/machine-learning/fast-cnn-
tunin...](https://aws.amazon.com/blogs/machine-learning/fast-cnn-tuning-with-
aws-gpu-instances-and-sigopt/)

------
pizza
In a gold rush, sell pickaxes.

bitcoin boom => GPUs

AI boom => TPUs (and GPUs, too), cloud processing platforms, and, generally,
AI specific hardware platforms

~~~
partingshots
What are your opinions on the TPUs Google uses versus more traditional GPUs
from say Nvidia for machine learning?

~~~
pizza
Honestly, I’ve never used them before. Cloud notebooks running on virtual GPU
servers seems awesome, in theory, to me, though.

