
Why is machine learning ‘hard’? - Dawny33
http://ai.stanford.edu/~zayd/why-is-machine-learning-hard.html
======
mattlondon
I think the problem is that we don't really understand ML properly yet.

Like picking hyperparamters - time and time again I've asked
experts/trainers/colleagues: "How do I know what type of model to use? How
many layers? How many nodes per layer? Dropout or not?" etc etc And the answer
is always along the lines of "just try a load of stuff and pick the one that
works best".

To me, that feels weird and worrying. Its like we don't yet understand ML
properly yet to definitively say, for a given data set, what sort of model
we'll need.

This can lead us down the debugging black-hole TFA talks about since we appear
to have zero-clue about why we chose something, so debugging something
ultimately might just be "opps - we chose 3 layers of 10, 15, and 11 nodes,
instead of 3 layers of 10, 15 and _12_ nodes! D'oh! Lets start training
again!"

It really grates me to think about this considering how much maths and proofs
and algorithms get thrown at you when being taught ML, then to be told when it
comes to actually doing something its all down to "intuition" (guessing).

And yeah as others have said - data :-)

~~~
scoot
> the answer is always along the lines of "just try a load of stuff and pick
> the one that works best"

Forgive the naive question, but why couldn't ML figure out its own best
"stuff"?

~~~
chongli
Because then you've got to figure out how to tell the machine what _best_
means and that's what you were setting out to do in the first place.

------
blt
I think to some extent, this kind of difficulty occurs with any numerical
programming. With typical software engineering the bugs are usually in logic.
All but the hardest problems can be diagnosed with a visual debugger. With
numerical code, though, you usually can't look at the state of a few discrete-
valued variables and a stack trace to figure out what went wrong. "Why is this
matrix singular?" can take days to answer. You spend a lot of time staring at
the code, comparing it to the math on paper, trying to visualize high-
dimensional intermediate data, etc. Continuous math can be a lot harder to
reason about than discrete.

~~~
nabla9
It's even worse than that.

Talking from personal experience in the private sector:

You go trough the whole process and present your results for a customer. They
make the required changes, see few percent improvement in the bottom line and
are happy. Year later you pick up the same code for another project and
discover small error in data collection script that completely invalidates
everything.

Garbage in, garbage out errors can have internal consistency that survive
cross validation and provide similar results with many different algorithms
and models. Random changes in the real word can produce actual gains.

Standard machine learning datasets rarely suffer from this problem because
results and semantics of the problem are known beforehand. If you have a
original black-box problem, it's possible to do random search and improve by
accident.

------
eva1984
So there is a greedy strategy to approach problem with ML

1.Starts with a VANILLA model, a proven one. To establish a baseline you can
fall back on. For example, in deep learning, starts with fully-connected nets,
then vanilla CNN, adding BNs and ReLus, then residual connections, etc.

2.Do not spend too much time tuning hyperparmeters, especially in the field of
deep learning, once you change the your algorithm, a.k.a network structure,
everything changes.

3.Adding complexity as you go. It is important once you established some solid
baseline, then you can start add more fancy ideas into your stack, and you
will find, fancy ideas are improvement over the already working ideas, and it
is not that hard to add it.

4.One important tip to remind, once you change your algorithm, as time goes,
those changes might not happy with each other. So reduction is also very
important. Rethink your approach from time to time, take away stuff that
didn't fit anymore.

5.Look At Your Data. Garbage in, Garbage out. Cannot be more true. Really.
Look at your data, maybe a sample of it, see whether youself, as the most
intelligent being as of yet, can make sense of it or not. If you cannot, then
you need to probably improve the quality.

Anyway, ML is a very complex field and developing like crazy, but I didn't
feel the methodology to tackle it is any different from any other complex
problems. It is a iterative processes starting from simple prove solutions to
something greater, piece by piece. Watch and think then improve.

------
jupp0r
There are also lots of architectural problems with machine learning components
to consider, beautifully summarized in "Machine Learning: The High Interest
Credit Card of Technical Debt" by Sculley et al.
[http://research.google.com/pubs/pub43146.html](http://research.google.com/pubs/pub43146.html)

------
BickNowstrom
I actually think Machine Learning is relatively easy. There are a lot of
resources, the community is very open, state-of-the-art tools are available,
and all it needs to get incrementally better is trying out more stuff on
different data sets.

I worked in SEO before, which had far more elements of "black magic". Perhaps
SEO helps with the transition to ML, because you are basically reverse
engineering a model (Google's search engine) / crafting input to get a higher
ranked output. It's feature engineering, experimentation, and debugging all-
in-one.

And front-end development of the old days... debugging old javascript or IE6
render bugs makes ML debugging pale in comparison. You had to make a broken
model work, without being able to repair it.

As for the long debugging cycles in ML. John Langford coined "sub-linear
debugging": Output enough intermediate information to quickly know if you
introduced a major bug or hit upon a significant improvement [1]. Machine
learning competitions are not so much won by skill, but by the teams iterating
faster and more efficiently: Those who try more (failed) experiments hit upon
more successful experiments. No Neural Net researcher should let all nets
finish training, before drawing conclusions/estimates on learning process.

Sure, the ML field is relatively new, and computer programming has a longer
history of proper debugging and testing. It _is_ difficult to do monitoring on
feedback-looped models running in production, yet no more difficult than
control theory ;). And proper practices are being developed as we speak [2].
The author will probably write a randomization script to avoid malordered
samples automatically in the future.

[1] [http://www.machinedlearnings.com/2013/06/productivity-is-
abo...](http://www.machinedlearnings.com/2013/06/productivity-is-about-not-
waiting.html)

[2]
[http://research.google.com/pubs/pub43146.html](http://research.google.com/pubs/pub43146.html)

------
aflam
The long and convoluted debugging cycle for machine learning really hurts my
faith in ML models. This issue - with some practical advice - was the center
of this interview (disclaimer: author).
[https://shapescience.xyz/blog/interview-data-science-
methodo...](https://shapescience.xyz/blog/interview-data-science-methodology/)

I'm convinced we lack decent tools for ML debugging: what could they be?

------
Matthias247
I think once you have the need to go deep enough into a topic they all get
hard.

Debugging and testing is also hard in all things that are somehow related to
realtime or concurrency. E.g. OS development, embedded firmware, network
stacks, etc. For these things you often also need to know about Math, Physics,
Statistics, Electronics, Hardware and Software Architecture, etc.

Game engine development is also hard because you should also know about most
of this stuff to really find the most efficient solutions.

------
du_bing
That's right, machine learning requires knowledges of so many fields, so if
any problem occurs, developer has to do so so many check to find the problem,
and optimize it.

~~~
curiousgal
Exactly why I despise "ML Crash course" and "Learn ML in x hours" kind of
courses.

~~~
dzhiurgis
It gets worse. There's YouTube channel with uploads like:

Build a Neural Net in 4 Minutes

Build an Antivirus in 5 Min

Build a Self Driving Car in 5 Min

~~~
walrus
Have you watched any of them? I just watched "Build a Self Driving Car in 5
Min" and the content was good. My only complaint is that his demo used end-to-
end learning, which isn't how most self-driving cars actually work (but he
acknowledges this).

~~~
Drdrdrq
Link:
[https://www.youtube.com/watch?v=hBedCdzCoWM](https://www.youtube.com/watch?v=hBedCdzCoWM)

------
partykid92
one big dimension here, the "implementation error" can be easily be debugged.
Gradients can be checked numerically. The model can be checked to work by
looking at the optimality conditions (not just the loss function go down).
This shouldn't be an issue for anyone from a traditional coding background.

------
peatfreak
Why would you expect it to be easy?

~~~
Kirth
I've met some people whom believe that the more modern tools (such as
Tensorflow) magically require little human input and make it so that you do
not need to know/understand the mathematics and statistics. Not sure where
they get this idea.

Everyone wants to do machine learning, but nobody seems to want to learn
statistics.

~~~
an_account
How would you recommend learning ML? I've been interested but don't know where
to start.

~~~
blahi
Not who you asked, but:

Statistics in Plan English.

Regression Analysis by Gilman

Elements of Statistical Learning.

You need to throw some matrix algebra and calc 1 & 2 somewhere in between.
Certainly before ESL. It would also require You can't simply read the books
and go through the examples. You will be stuck at a concept at many occasions
and you will battle it out until after much googling and reading additional
papers, you finally get.

After those 3 books, you've got the basics.

~~~
redtexture
Fuller title on "Regression Analysis by Gilman"? I cannot find such a title
author combination online.

Perhaps?: "Data Analysis Using Regression and Multilevel/Hierarchical Models"
by Andrew Gelman & Jennifer Hill

~~~
blahi
Huh, I typo-ed his name and fudged the name in my head. Sorry about that.

That's the book.

------
Dzugaru
Even if you develop an "intuition" for known tasks (like classification) -
there are so many problems that are not tackled yet and no one has any
"intuition" yet. Common sense very often doesn't work there (in high-
dimensional spaces ;)).

For example I've only recently stumbled upon an "Explaining and harnessing
adversarial examples" article - and that completely changed my perception
about my current work in computer vision.

------
js8
I think it is hard because of
[https://en.wikipedia.org/wiki/No_free_lunch_theorem](https://en.wikipedia.org/wiki/No_free_lunch_theorem)

That follows there is no single "good" algorithm, and you need to have and
exploit domain knowledge in order to succeed.

------
norswap
> Machine learning often boils down to the art of developing an intuition for
> where something went wrong (or could work better) when there are many
> dimensions of things that could go wrong (or work better).

I'm not a practitioner, but I always thought this was the main challenge. Uses
of ML are rarely "right" or "wrong" per se, but they rely on intuition to get
a model that "works" in a practical sense.

There is no royal way to machine learning: you can't decide you are going to
make an algorithm that detects bad comments (as determined by human consensus)
and then just go make an implementation that you can reason out to be correct,
the way you could prove a graph algorithm correct. Trial-and-error and hard-
to-transcribe intuition are baked into the process.

(I'd love to get some insider insight on this comment!)

------
bitL
I will be a little sarcastic - using sub-optimal/locally-optimal algorithms
everywhere in ML due to time complexity, why would you expect it would bring
nice/predictable results? It's more like a miracle if you find something that
works, otherwise you will be hitting the usual hard problems from optimization
and end up in a "catch-as-you-can" situation where even Monte Carlo randomness
is a good guess. And way too many people assume ML is just applied statistics,
always keeping their minds in this frame and missing out on the large data ML
capabilities where statistics is irrelevant and you can directly ask and find
answers to many fundamental questions in your dataset.

------
ramblenode
The diagrams are pretty misleading. One can craft a space with any number of
arbitrary dimensions, but that's kind of meaningless until the space is
populated with data. Certainly the likelihood of a bug is not uniformly
distributed across the space, and certainly the density of bugs within a space
varies greatly depending on the problem. I imagine the average kernel
developer's 2-D space is both very dense and has greater spread than the 4-D
space of many ML engineers.

------
jorgemf
In software you can track the program and detect what instruction is not doing
what it is suppose to do. In machine learning there is no program to track, it
is not a set of instructions with a purpose, the whole thing either works or
not. In order to discover what could be failing, you need to have a deep
knowledge about a lot of stuff (maths, statistics, CS) to figure out what is
wrong. And sometimes the answer is that the problem doesn't have a solution.

~~~
fnl
It is very much possible to track your calculations and quite literally debug
your models. But like in computer science, it is hard to find the data
scientist who can actually do that and not just copy-paste tutorial code from
some blog post and then wonder what isn't working...

------
strictfp
People don't use it simply because it's not what they signed up for. It's not
obvious that a software engineer will enjoy being a data scientist. I for one
think it's tedious, and I don' t enjoy spending time and effort collecting the
necessary data to solve my problems the ML way.

------
DrNuke
Generally speaking, being hot in the media does not help: walking the walk is
way harder than talking the talk.

------
godmodus
because of preprocessing and needing to choose the functions that'll do the
approximation - the process itself is semi-automatic, not fully automatic.
ANN's inner nodes are specific functions that need parameter tweaking (after
choosing the right ones that is) - vector machines have different kinds for
different data, etc.

and those two things are very domain specific so you need to do a lot of
homework first, and debugging later.

------
edblarney
I've worked in computational linguistics:

1) It's 'hard' because you need a lot of 'training data' in order to train
models etc.. It's hard to get.

2) 'AI' type interfaces represent a whole new kind of UI challenge. For
'predictive typing' for example, you can optimize an algorithm so that it does
better for 90% of the US population, but then it gets 'worse' for the
remaining 10%. So it's a paradox. This can have weird effects.

For example, if you have an app in the app-store, you may leave the settings
so that it's 'broadly optimal'. You get ok stars.

If you then make it 'better' for those 90%, you might get a little boost in
ratings, but you get 1 and 0 star ratings from the 10% for whom it's a sub-par
experience. This can destroy your product.

Anyhow - 'there is no right answer' often in AI, and setting expectations can
be extremely difficult.

And all of that has nothing even to do with CS.

~~~
yomly
I appreciate that you've given a pretty topline overview but in the 90/10
example, if that 10% can be characterised and/or clustered can the algorithm
be optimised for both groups involved? Appreciate that that's not always
possible and can lead to lots of engineering overhead - curious what your
thoughts are though...

~~~
edblarney
It's a very good point.

Yes, often it is possible to determine where the user belongs in that 90/10
setting, but it can take a lot of time in order to be 'pretty sure'. You need
a lot of 'user interaction' in order to make that assessment.

The 90/10 rule can broadly apply to things like culture: certain Latino
Americans speak/write very differently. A lot of 'le' and 'la' (gendered) in
there as well as a whole different set of proper names and colloquialisms.

But it can take some time to really establish if someone is 'latino' from
their writing.

Even harder: some people type more precisely, some people type more loosely.
You can actually adjust the probability spectrum of a predictive keyboard to
match someone's style. But get this: people's style changes all the time! I
noticed that when I'm tired, I type like I'm drunk. Or if I'm busy etc.. So
there's even variation in style that makes it difficult.

It's a really hard thing to do.

~~~
edblarney
I should add:

You can 'massively decreasing returns to complexity' in these domains.

Meaning that you can do 'pretty good' with some basic algorithms.

For the next 'bump in performance' you need some complex code.

After that - you really start to have 10x larger models, or crazy complex
engineering just to move the needle.

It creates a completely different set of 'Product Management' rules. It's kind
of fun, unless you're a struggling startup trying to figure this out on the
fly :)

Usually, someone comes along with a new approach which changes the games.

As I understand it 'Neural Networks' i.e. 'Deep Learning' style AI has changed
everything voice related quite a lot.

And also - different business approaches can change the game. Google has
access to zillions of phrases for properly transcribed audio phrases. This is
the 'golden asset' that can underpin a really great voice recognition engine.
Google voice is even better than the old industry standard - Nuance - in many
scenarios and my hunch is that it's the size of their training data that has
given them an edge - at least that.

~~~
yomly
>You can 'massively decreasing returns to complexity' in these domains.

This is a really concise expression I've been looking for the sentiments
you've just laid out so thanks for that!

Really like your insight in Google, think it's spot on.

Re. 'Product Management' rules - would love to know more about this? Do you
keep a blog?

------
tmptmp
Beautifully written and insightful.

>>After much trial and error I eventually learned that this is often the case
of a training set that has not been correctly randomized and is a problem when
you are using stochastic gradient algorithms that process the data in small
batches.

Take this single term from the above sentence: "stochastic gradient
algorithms", they represent three key areas: statistics, calculus and CS.

These three things, even when they are to be studied in isolation are much
complex. For ML, you must be able to juggle these 3 fireballs effectively. No
surprise, it's much, much more difficult than many other software engineering
problems.

~~~
ttub
No offense intended, but what is your background?

I'm a researcher with a physics/stats PhD, and if a colleague approached me
and said "stochastic gradient algorithms" entails three highly complex areas
of scientific knowledge, I would have been stunned and assumed an undergrad
with an English major had stumbled into our lab.

Just because you find something extremely challenging, doesn't mean it is
inherently challenging. Considering what a lot of people in my field is
struggling with, your example is absolutely trivial. You might want to adjust
your ego downwards a bit.

~~~
tmptmp
No offense taken. I feel humbled. Indeed, your argument supports my view.

I have deep and very high regard for the people who are able to apply ML to
fields like DNA analysis or NLP which can take the dreaded "Turing test".

I stand nowhere in the ML arena, but I had tried once and got a good shock of
my life: how hellishly difficult the ML can get and how quickly. I really feel
humbled. If anything, I learnt to appreciate the width and depth of human
brain capabilities. It seems entirely magical now to me how on earth does my
brain process/understand such complex things like this very paragraph. Prior
to some exposure to ML, I couldn't have appreciated this thing.

>>Just because you find something extremely challenging, doesn't mean it is
inherently challenging.

Agreed. I never claimed it anyway. But the kind of problems, for which ML is
being applied, the state of the art existing "analyzable algorithms" (like,
finding approximate near-optimal solutions for TSP) are far from trivial. In
addition to this, we must realize that the ML solution must "beat" these
algorithms hands-down in "non-trivial" cases. All this makes ML extremely
difficult.

I agree that for real world (and not necessarily state-of-the-art) ML
applications, you have to handle many more fields in addition to these 3
fields. All I say is even these three things, when taken together, are very
complex things to handle.

edit: typo

~~~
adriand
I know a simple upvote may (and perhaps ought to) suffice instead of writing
this, but I feel compelled to comment on how impressed I am that you responded
to "You might want to adjust your ego downwards a bit" with such class and
humility. Truly refreshing, truly commendable. I'm using it as a learning
experience, because my initial reaction on reading that comment was highly
negative and I sincerely doubt whether I would have been able to summon the
kind of response you did.

On a morning where I woke up feeling anxious and worried after the events of
Monday, this was a small but appreciated little reminder that there's still
hope.

------
atomical
I'm getting a 404.

~~~
a_bonobo
Me too, the whole blog 404s

Here's the cached version:

[https://webcache.googleusercontent.com/search?q=cache:https:...](https://webcache.googleusercontent.com/search?q=cache:https://ai.stanford.edu/~zayd/why-
is-machine-learning-hard.html)

------
dschiptsov
Because most of the models are flawed or wrong.

------
fucking_idiot
mostly because the implementation is really tough - mostly lots of matricies
and calculus. i recommend using sklearn.

