
Are ML and Statistics Complementary? [pdf] - snippyhollow
http://www.ics.uci.edu/~welling/publications/papers/WhyMLneedsStatistics.pdf
======
ktamura
They definitely are as far as their roles at (most) startups are concerned.

Unless your startup's core strategy involves machine learning, statistics
tends to come handier than machine learning in the early days. Most likely,
what moves your company is not a data product built atop machine learning
models but the ability to draw less wrong conclusions from your data, which is
the very definition of statistics. Also, in the early days of a startup, you
experience small/missing data problems: You have very few customers, very
incomplete datasets with a lot of gotchas. Interpreting such bad data is no
small feat, but it's definitely different from training your Random Forest
model against millions of observations.

------
tristanz
LeCun has a comment on this paper here:
[https://www.facebook.com/yann.lecun/posts/10153293764562143](https://www.facebook.com/yann.lecun/posts/10153293764562143)

~~~
jupiter90000
Thanks for sharing that, he's got some interesting stuff to say about this
topic.

------
washedup
Here is a link to the paper referenced in the beginning:
[http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataSci...](http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf)

Great read for anyone interested in the debate.

------
nextos
I think they will eventually converge.

Probabilistic programming is already a hint of this. The most general class of
probability distributions is that of non-deterministic programs. ML is just a
quick and dirty way to write these programs.

~~~
murbard2
It's not just a way to write them, it's a way to do inference. Probabilistic
programming is extremely powerful in terms of representation but inference is,
in general, intractable. Yes, you can express all those ML models as
probabilistic programs, but the sampler isn't going to perform nearly as well
as the original algorithm.

------
p4wnc6
What is commonly understood as 'statistics' is just a specialized subset of
machine learning. Machine learning generalizes statistics.

The correct complement to machine learning is cryptography -- trying to
intentionally build things that are provably intractable to reverse engineer.

~~~
51109
Working with both statisticians and pure machine learners on the same task, I
did notice some tendencies, presuppositions and modus operandi that were
different (beyond being a specialized subset). Like said in this position
paper, machine learners like to throw computation and parameters at the
problem, where statisticians are more careful and sober. As an analogy, a
statistician will approach a cliff very carefully, stomping the ground to make
sure it is sturdy enough to carry a human. They'll approach the edge of the
cliff 'till they have their p-measures and that is their model. Machine
learners will jump head-first off the cliff and when you listen you can hear
them yell: Cross-validatioooohhhh... as they plummet down.

I like the complement with cryptography. I would add another coding method:
compression - Approximating the simplest model with explanatory power.

~~~
p4wnc6
I have had the exact opposite experience with machine learning and statistics.
In my experience, those who come from the 'statistics' side tend to use
constructs, like null hypothesis significance testing, which are not
consistent even from a theoretical point of view. And further, when they use
them, they do awful things like p hacking, or using a direct comparison of
t-stats as a model selection criterion, which are further rife with
theoretical problems, not to mention lots of statistical biases and so forth.

I find the machine learning approach is far more humble. It starts out by
saying that I, as a domain expert or a statistician, probably don't know any
better than a lay person what is going to work for prediction or how to best
attribute efficacy for explanation. Instead of coming at the problem from a
position of hubris, that me and my stats background know what to do, I will
instead try to arrive at an algorithmic solution that has provable inference
properties, and then allow it to work and commit to it.

Either side can lead to failings if you just try to throw an off-the-shelf
method at a problem without thinking, but there's a difference between
criticizing the naivety with which a given practitioner uses the method versus
criticizing _the method itself._

When we look at _the methods themselves_ I see much more care, humility, and
carefulness to avoid statistical fallacies in the machine learning world. I
see a lot of sloppy hacks and from-first-principles-invalid (like NHST)
approaches in the 'statistics' side. And even when we consider how practioners
use them, both sides are pretty much equally as guilty of trying to just throw
methods at a problem like a black box. Machine learning is no more of a black
box than a garbage-can regression from which t-stats will be used for model
selection. However, all of the notorious misuses of p-values and conflation
over policy questions (questions for which a conditional posterior is
necessarily required, but for which likelihood functions are substituted as a
proxy for the posterior) seem very uniquely problematic for only the
'statistics' side.

Three papers that I recommend for this sort of discussion are:

[1] "Bayesian estimation supersedes the t-test" by Kruschke,
[http://www.indiana.edu/~kruschke/BEST/BEST.pdf](http://www.indiana.edu/~kruschke/BEST/BEST.pdf)

[2] "Statistical Modeling: The Two Cultures" by Breiman,
[https://projecteuclid.org/euclid.ss/1009213726](https://projecteuclid.org/euclid.ss/1009213726)

[3] "Let's put the garbage-can regressions and garbage-can probits where they
belong" by Achen,
[http://www.columbia.edu/~gjw10/achen04.pdf](http://www.columbia.edu/~gjw10/achen04.pdf)

~~~
51109
Thanks for the links to interesting papers. I really liked the Breiman paper.
I did not try to qualify either machine learners and statisticians as bad or
good, just pointing out a difference in their approaches to problems.

I do not know enough about statistics to make a (negative) quality statement
about it. I know a bit more about machine learning though, and there I also
see things like: Picking the most favorable cross-validation evaluation
metric, comparing to "state-of-the-art" while ignoring the real SotA,
generating your own data sets instead of using real-life data, improving
performance by "reverse engineering" the data sets, reporting only on problems
where your algo works, and other such tricks. I believe you when you say much
the same is happening for statisticians.

Maybe it was my choice of words (careful, sober). I think its fair to say that
(especially applied) machine learners care more about the result, and less
about how they got to that result. Cowboys, in the most positive sense of the
word. I retraced where I got the cliff analogy. It's from Caruana in his video
"Intelligible Machine Learning Models for Health Care"
[https://vimeo.com/125940125](https://vimeo.com/125940125) @37:30.

"We are going too far. I think that our models are a little more complicated
and higher variance than they should be. And what we really want to do is to
be somewhere in the middle. We want this guy to stop and we want that
statistician to get there, together we will find an optimal point, but we are
not there yet."

~~~
ivan_ah
Thx for linking to the Caruana video, very interesting.

------
sjg007
This is a great summary of the field.

------
fpoling
I think feasibility to get an explanation for the results of modern machine
learning is wishful thinking. I personally cannot explain my gut feelings. So
why should we expect an explanation when machine deals with the same class of
problems?

Besides, it is easy to get wrong explanation and, as Vladimir Vapnik in his 3
metaphors for complex world observed,
[http://www.lancaster.ac.uk/users/esqn/windsor04/handouts/vap...](http://www.lancaster.ac.uk/users/esqn/windsor04/handouts/vapnik.pdf)
, "actions based on your understanding of God’s thoughts can bring you to
catastrophe".

~~~
51109
As we start to use AI/ML for more tasks, the need for model interpretability
rises. We expect doctors to explain their gut feelings, much like we expect
computer vision models that detect disease to explain their findings and have
a (theoretically sound) estimate of confidence.

SVM's were so popular, pretty much because they had a firm theoretical basis
on which they were designed (or "cute math" as deep learners may call it). As
Patrick Winston would ask his students (paraphrasing): "Did God really meant
it this way, or did humans create it, because it was useful to them?". Except
maybe for the LSTM, deep learning models are not God-given. We use them
because, in practice, they beat other modeling techniques. Now we need to find
the theoretical grounding to explain why they work so well, and allow for
better model interpretability, so these models can more readily be deployed in
health care and under regulation.

~~~
fpoling
The article calls for an explanation not why some ML method works, but for an
explanation of a particular ML result, like why a car drives this way or why a
patient got a cancer. While I hopeful for the former, I just do not see the
basis for the latter.

If some regulations shall require such explanation, the end result will be
fake stories like parents tell to the children that Moon do not fall because
it is nailed to the sky.

~~~
51109
I don't think the paper asked for that. Relevant quote:

> machine learning is more concerned with making predictions, even if the
> prediction can not be explained very well (a.k.a. “a black­box prediction”)

So in your example: an algo may explain that a car slows down, before taking a
turn, because else it would likely crash. It may even get to a threshold
("under these weather conditions, anything over 55Mph is unsafe when taking a
turn of such and such degree"). Statistics can help with that.

Welling is not asking for deep learning models to explain why a person got a
cancer, but to explain its reasoning when it diagnoses a person with cancer
("I am confident, because in a random population of a 1000 other patients,
these variables are within ..."). Statistics can help with that. It aligns
with their mind set and tool set.

Regulations are cheated even with these kind of explanations, but that is for
another story (black box models may provide some plausible deniability).

~~~
fpoling
I am referring to this fragment:

> Thus, for many applications, in order to successfully interact with humans,
> machines will need to explain their reasoning, including some quantification
> of confidence, to humans.

No doubtful there are cases when an explanation is easy. Often this is because
we have a very solid model like physics of a car. In fact since we know the
model, we do not need an explanation, we must demand that the algorithm
follows the model or declare it unfit.

But how can we expect an explanation for a behavior in a critical situation on
a road that was not explicitly programmed and when the algorithm decided to
turn to a particular degree bases on a non-trivial inference? Similarly, when
an algorithm decides if a patient needs an emergency operation or if they can
wait, why can we expect an simple explanation especially for the patient with
rare conditions when algorithm again must perform an inference, not a
deduction from 1000 very similar cases?

~~~
jupiter90000
Maybe you couldn't always expect an explanation, but having one would
certainly be useful. Methods that could get the job done while also providing
enough information to explain or at least shed some kind of light on why the
solution was chosen would likely be preferable.

