
Things I wish I knew earlier about Machine Learning - ColinWright
https://peadarcoyle.wordpress.com/2015/12/23/three-things-i-wish-i-knew-earlier-about-machine-learning/
======
abrichr
Also see (Sculley 2014):

Machine Learning: The High-Interest Credit Card of Technical Debt:

[https://static.googleusercontent.com/media/research.google.c...](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43146.pdf)

~~~
michaelsbradley
I could have sworn I saw this linked from the HN front-page earlier today, but
couldn't find it when I looked a few minutes ago (dug back through more than
several pages under both "new" and "news").

------
varelse
I would almost state "Clean datasets (like MNIST) considered harmful" because
without exception the data scientists I meet who work with them just shovel
real data blindly into ML code and kvetch when they don't beat simpler
existing approaches right out of the gate.

And without fail so far, just cleaning up the data (feature-scaling, removing
input errors etc) changes this. I vacillate on building an automagic tool to
do this for them, but IMO such a tool would send tsunamis of technical debt up
their data pipelines as they got even sloppier about data until the tool was
overwhelmed by stuff it couldn't detect. I'd much rather handle the lesser of
two evils of them getting nowhere until they realize this isn't magic.

~~~
Teodolfo
Would you write something for biologists entitled "working on Drosophila
considered harmful"?

~~~
varelse
Drosophila is a real noisy living system, and as a biologist myself, and
unlike the data scientists with whom I work, none of us would consider the
fruit fly to be an approximation of a human being.

Further, the biologists and chemists I know are aware of their limitations and
do their best to surpass them. The data scientists I work with OTOH constantly
tell me how code should be written despite knowing little more than single-
threaded Python and Hadoop. See also their contempt for code maintenance and
engineering or "ops." Of course, this is an unfair strawman to the smart ones
who aren't this way, but I'm seeing more and more of this as time goes on, not
less.

Perfect example: the myriad of supposed improvements to neural network
training for which only MNIST results are ever shown. Wouldn't it be great if
there were a 6+ TFLOP processor available along with an open source
specification for and multiple vendors providing servers with 8 of these
wonderful gadgets that could be used to train, say AlexNet, with this new and
improved algorithm? But that would require ops(tm) and ops is for engineers,
not data scientists. Or TLDR: they did try it and it lost to Nesterov Momentum
or something similar on larger systems (results not shown).

------
joe_the_user
Yeah,

 _One thing that I didn’t realise until sometime afterwards was just how
challenging it is to handle issues like model decay, evaluation of models in
production, dev-ops etc all by yourself._

Honestly, it seems like any standard machine learning application (based on
train/test/deploy/etc) is going to be more subject to decay/bit-rot/etc than
some equivalently complex application with a codified specification.

It seems like something over time is going to be a problem for any institution
which relies on such applications.

~~~
TeeWEE
Wait? What has bit-rot todo with machine learning? I dont understand. Where
can i learn about it?

~~~
slv77
Bit rot is the degradation of the performance of a ML model over time. When a
model is trained it is learning a set of assumptions about the "universe" and
over time the universe changes which can invalidate some of those assumptions
which degrades the models performance.

For example you create a model to classify tweets by sentiment and then people
start referring to your company as sick and phat which your model doesn't
understand.

The bigger challenge is that most models attempt to predict human behavior but
humans have the ability to learn and so simply deploying the model changes the
universe.

For example your FICO score is an attempt to predict your credit default risk
but over time people learn how to trivially manipulate their scores which
degrades the ability to use FICO scores to predict default risk.

------
rrmm
This is where we break out the old Zawinski quote:

    
    
      Some people, when confronted with a problem, think
     “I know, I'll use regular expressions.”   
      Now they have two problems.
    

I actually had to tell someone this who wanted to build a machine classifier.
They wanted the classifier as a step in getting data for their research. The
problem was as far as I knew in the literature building the classifier would
be a research project of its own.

~~~
joe_the_user
Indeed,

Most of (relative) successes of machine learning have involved a circuit
something like

(input data) --> ML --> (human acceptable output)

We see this in translation, image classification, face recognition.

But if we a circuit of

(input data) --> ML --> (data for further _significant_ processing by
computer)

We have a problem. The ML can't be a real "black box" at all here in the sense
it can't learn the correct classifications without regard to how those
classifications are going to be used. Since it's learning "what humans do",
the code can't be that much more reliable than humans but it also has the
problem of not being adaptable in the fashion that humans (sometimes) are
adaptable.

