
What’s the difference between statistics and machine learning? - jwb133
https://thestatsgeek.com/2019/08/08/whats-the-difference-between-statistics-and-machine-learning/
======
mushufasa
The classic explanation is Lei Breiman's 'Two Cultures' paper. He was a
statistics professor who left for industry, came back, and tried to get
academics to adapt industry approaches. The paper is very readable.

[http://www2.math.uu.se/~thulin/mm/breiman.pdf](http://www2.math.uu.se/~thulin/mm/breiman.pdf)

An oversimplified version may be:

Statistics focuses on fitting data to formally understandable models, whereas
data science focuses on solving problems -- even if that means using
techniques that aren't formally understood.

Leo Breiman is also known for pioneering random forest and bootstrap
aggregation.

I think of machine learning as a subset of data science where you trick linear
algebra into thinking.

~~~
analog31
This seems perfectly fair. And I think that historically there was plenty of
use for statistics where people didn't care about the formal understanding, so
they were doing crude machine learning before the term became widespread.

I've sat through lengthy discussions of machine learning exercises, and could
not silence the voice in my head, saying: "This is just curve fitting."
Fitting data to an arbitrary curve, and then extrapolating the fitting
function, is as old as the hills.

~~~
AstralStorm
Formal understanding is critical to actually know the limitations of any given
system. Treating it as a black box has short lifetime, as your method of
analysis will miss key features of the system or oversimplify it.

Understanding modern ML algebra is really "general relativity hard" if not
actually harder. Spiking NNs are "quantum physics hard". The math is very much
translatable between these domains.

~~~
TuringTest
_> Treating it as a black box has short lifetime, as your method of analysis
will miss key features of the system or oversimplify it._

On the other hand, it can work indefinitely if it solves a problem well enough
and it is used always on the same kind of problem.

Sure, you will need to understand it better to apply it to new markets, but
often that kind of research is outside the scope of a single business, and
they don't need it.

~~~
AstralStorm
How do you know it solves the problem, instead of just tricking your measure?

Do you know when it doesn't work?

Business does not care about rigor, but then terrible things happen when your
face recognition system happens to not work for people with dark skin tone.

Even more terrible things happen when it detects cats as people, and worse
once it's used to link data into a police database.

Even pretty bad things happen when it's used for detecting potential (treated
as absolute) copyright violations.

Security holes when it's used for detecting security problems.

Loss of business when it's used for ticket prioritization.

Business world does not care that it sells a broken solution as long as it's
not obvious and someone has been paid. Everyone else pays for the failures.
And you cannot sue an ML system really.

~~~
dorgo
Why don't we do online learning? Each failure is a new training sample.

Why dont't we do active learning? Or adversarial networks, or....

Sure it is easy to fail in machine learning. But it doesn't mean that we need
to undstand how and what our models learn. That's the only big advantage of
machine learning: the model learns, so I don't have to.

------
edbaskerville
I'm not sure about machine learning specifically, but I heard somewhere that a
data scientist is someone who does statistics, on a Mac, in San Francisco.

~~~
PopeDotNinja
After spending 80% of their time cleaning up garbage input data.

~~~
rumanator
...and aftet spending 15% plotting graphs and diagrams.

------
lottin
Inferential statistics is about explaining an observed outcome in terms of its
causing factors. Once we have explained it, then we can make predictions.
Machine learning skips the explaining part and goes straight to making
predictions, without attempting to understand the underlying process that led
to the particular outcome. This would be the main difference, in my opinion.

~~~
peatmoss
Yes, this is my observation as well. Put another way, statistics is primarily
concerned with understanding the mechanisms behind something. Predictive power
can and will often be sacrificed if it aids explanatory power or conceptual
elegance.

By contrast, machine learning is primarily concerned with making the best
prediction possible, even if that means sacrificing an understanding of the
underlying mechanisms.

The Venn diagram of methods can have a fair degree of overlap.

Neither disposition is right or wrong, but they tend to have natural places
where each makes more sense. If you’re trying to predict whether a picture is
of a cat or a dog, you probably don’t care much about the constituent
contribution of factors to one pictures dogness or catness. On the other hand,
if you’re trying to predict traffic collisions based on characteristics of a
roadway, you’re probably less concerned with the predicted number of crashes
and more concerned with the relative contribution of a handful of independent
variables.

~~~
nitrogen
If you are trying to predict the failure modes of a cat/dog discriminator on
arbitrary images, an understanding of catness/dogness is more useful.

------
theothermkn
This may sound a little like trivializing, but don't we have to know what
"statistics" are and what "machine learning" is to say anything about the
difference(s) between them?

Looking at this through even the lens of multinomial logistic regression, or
of econometrics generally, I don't think that "statistics draws population
inferences from a sample, while machine learning finds generalizable
predictive patterns" even makes sense _as a difference_. Any _prediction_ is
an _inference_ about the population of future events, or of contemporary
events not present in the _sample_. You plug 1000s of events each described by
100 columns into a logistic regression, and you're hoping to get something
predictive out of it. Further, as nice as the idea is that you can tease out
"factors" from your 100 columns, you don't have to look at "3.3375905e-5 x
(spent five years before age 18 in a smoker's home)" for very long to wonder
how much 'explanation' you're getting out of the terms in the exponents of
your probability functions.

I still can't resist tweaking ML enthusiasts and data scientists: Statistics
is what people who know what they're doing are doing. Machine learning is the
rest!

~~~
cwyers
Everybody owes it to themselves to read Breiman's Two Cultures[1]:

> There are two cultures in the use of statistical modeling to reach
> conclusions from data. One assumes that the data are generated by a given
> stochastic data model. The other uses algorithmic models and treats the data
> mechanism as unknown. The statistical community has been committed to the
> almost exclusive use of data models. This commitment has led to irrelevant
> theory, questionable conclusions, and has kept statisticians from working on
> a large range of interesting current problems. Algorithmic modeling, both in
> theory and practice, has developed rapidly in fields outside statistics. It
> can be used both on large complex data sets and as a more accurate and
> informative alternative to data modeling on smaller data sets. If our goal
> as a field is to use data to solve problems, then we need to move away from
> exclusive dependence on data models and adopt a more diverse set of tools.

The difference between inference and prediction can be illustrated with
something like a decision tree or a random forest. Statistical inference is
"the theory, methods, and practice of forming judgments about the parameters
of a population and the reliability of statistical relationships, typically on
the basis of random sampling." If you look at something like linear
regressions, it makes a lot of assumptions about the data[2]: the distribution
of residual errors is normal, there's no multicollinearity, etc.

Random forest don't care. Random forest is a set of steps. You follow the
steps, you get an answer. You made no assumptions about your data, about the
distribution of it, any of it. You just followed an algorithm.

Algorithms are powerful! Not everything needs to be an inference problem.
We're better off when we have lots of tools. You don't have to choose
_between_ the two camps. But the "Everything is Statistics!" and the
"Everything is Machine Learning!" points of view rob us of ways of thinking
about our tools that helps us understand what those tools are.

1)
[https://projecteuclid.org/download/pdf_1/euclid.ss/100921372...](https://projecteuclid.org/download/pdf_1/euclid.ss/1009213726)

2) [https://thestatsgeek.com/2013/08/07/assumptions-for-
linear-r...](https://thestatsgeek.com/2013/08/07/assumptions-for-linear-
regression/)

~~~
ssivark
Nice comment, but there's more nuance than you claim.

> You made no assumptions about your data, about the distribution of it, any
> of it. You just followed an algorithm.

You made _implicit assumptions_ that you are now unaware of, which might come
and bite you later (eg: using zip codes or names as a proxy for race in credit
scoring models, neural networks overfititng to texture and classifying a
leopard print couch as a leopard). This means that you are liable to overfit
to the data, and generalize poorly, or in ways you are not supposed to.

This is what leads to the perspective that "machine learning" might end up
being a powerful tool for laundering bias:
[https://idlewords.com/talks/sase_panel.htm](https://idlewords.com/talks/sase_panel.htm)

~~~
cwyers
I think what you're saying is orthogonal to what I'm saying.

Yes, ZIP codes can be a proxy for race in models dealing with credit scores
(and recidivism, which is also a really bad place to put racial bias), as an
example. But if I put it in a mixed-effects model, it shows the same bias, and
a mixed-effects model is just an extended version of linear regression. Both
statistical and ML models suffer from the problem you're stating.

What you have not made any assumptions about in a random forest is about the
distribution of the data you're looking at. One example of a case where the
assumptions that bog-standard OLS makes about your data can cause you problems
is zero-dominated data -- data with a lot of zeros in it. Basically any time
you're trying to make predictions about things that are rare in your measured
population.

OLS does a bad job on zero-dominated data. If you throw a zero-dominated
dataset into a random forest, you will get back better answers than if you use
OLS on zero-dominated data.

To be clear: there are strategies for dealing with zero-dominated data using
statistical inference. You don't have to resort to non-inferential learning
just because you have data that doesn't look like a bell curve. But machine
learning is a powerful way to get pretty good results on a lot of problem
spaces without having to understand the probability function involved (or in
cases where the probability function is too complicated to be tractable
computationally, like the probability function that determines the color of
pixels in a dataset where you're classifying dogs versus cats).

~~~
ssivark
> _machine learning is a powerful way to get pretty good results on a lot of
> problem spaces without having to understand the probability function
> involved_

The NFL theorems essentially state that for every dataset on which an
algorithm generalizes well, there is another on which it generalizes poorly.
So, there are _always_ implicit biases, even for a random forest model (eg: if
you try to model Boolean functions with a random forest, those functions which
can be approximated effectively with few trees will form a set of very small
measure. Specifically, it is my intuition that those which have many terms in
the sum of products form might need many trees to approximate). It then
becomes a question of whether the bias of your models are compatible with the
dataset/domain under consideration.

See this Minsky-Sussman koan:
[http://www.catb.org/jargon/html/koans.html#id3141241](http://www.catb.org/jargon/html/koans.html#id3141241)

> _Both statistical and ML models suffer from the problem you 're stating._

And that is precisely why I don’t see how ML algorithms are “more powerful”
than statistics in any way, as per your claim.

~~~
cwyers
> And that is precisely why I don’t see how ML algorithms are “more powerful”
> than statistics in any way, as per your claim.

Well I'm glad you don't see that, but I'm a bit confused why you think I said
it because I didn't say it and don't believe it. I said that ML is "powerful,"
not "more powerful."

------
js8
That's easy - statisticians take pride in models that are understandable,
while machine learning practitioners take pride in models that are not.

~~~
ACow_Adonis
Understandable? Two words: statistical significance :P I've seen university
employees who still don't understand what it is and can't explain it...and
just about everyone who uses it gets it wrong...

~~~
chalst
Most scientists who use statistics are not statisticians.

~~~
bregma
95% of scientists who use statistics are not statisticians, plus or minus 5%,
19 times out of 20.

------
RcouF1uZ4gsC
The main difference is not in techniques, but in goals. The primary goal of
statistics is to help a human make an informed decision by quantifying
uncertainty. That quantification of uncertainty can be used to explain to
other humans why the decision was made.

The primary goal of machine learning is to help a machine make better
decisions. As long as it gets the "right" answer, the explanations to humans
are not as important.

~~~
6gvONxR4sf7o
I think that's close, but disagree with the uncertainty quantification bit.
Often in both fields, you'll have a proof that "errors" converge at some
asymptotic rate. Often in both fields, when you apply that tool to a finite
data set, you don't know the error magnitude in _this_ instance. I don't mean
exact error, in the sense of 'if I knew the error, I could correct for it.' I
mean you know big-O notation error, but not the constant factors.

I do like the rest of the point you make, and it seems to match OP. Inference
vs otherwise. Helping people learn/act/decide based on data data vs helping
machines learn/act/decide based on data.

------
kmundnic
Reminded me of this [1] by Prof. Rob Tibshirani.

[1]:
[https://statweb.stanford.edu/~tibs/stat315a/glossary.pdf](https://statweb.stanford.edu/~tibs/stat315a/glossary.pdf)

------
phonebucket
I like Tom Mitchell's definition in his machine learning book:

'A computer program is said to learn from experience E with respect to some
class of tasks T and performance measure P, if its performance at tasks in T,
as measured by P, improves with experience E.'

Merriam Webster's definition of statistics:

'a branch of mathematics dealing with the collection, analysis,
interpretation, and presentation of masses of numerical data'

So establish a couple of definitions, and take it from there.

I believe such discussions can easily be taken too far: surely the point of a
subject is to group together related topics. Who is to say that the set of
topics comprising multiple subjects need to be disjoint?

------
s_Hogg
The amount of hair-splitting that goes into discussing this subject is
unbelievable. Clearly, they both are fairly closely intertwined - particularly
given that one potential explanation boils down to the motivation of the user.

So do we need an explicit taxonomy to say that one application is machine
learning and another is statistics? Or does the form not matter as much as the
function?

Pace the OP, who I'm sure didn't do this with this end in mind, but a lot of
the cases I've seen of people trying to bring this up come down to wanting to
self-identify in a certain way as opposed to actually talking about the
subject at hand.

~~~
jointpdf
I think it’s actually both important and possible to differentiate between
stats and ML (and AI for that matter). Statisticians, ML engineers, and data
scientists all have related but distinct skill sets, knowledge, methodological
experience, and worldviews. The identification can be important because it
suggests what sorts of tasks and projects an individual is well-suited for.
(Notwithstanding the fact that I believe anyone can acquire skills/knowledge
to be competent at any of those jobs and transition between them with a bit of
effort).

For example, consider the work that the FDA does in evaluating clinical drug
trial data. In this case, you need statisticians on staff that can critically
analyze the statistical/methodological rigor, not to mention understand the
medical domain and relevant regulations/processes. It doesn’t matter if they
can only use the GUI versions of SAS and have never heard of TensorFlow, their
time is better spent deeply understanding nuances of experimental design,
sampling methods, causal analysis, estimation, etc. I would argue that a
“good” statistician in this context might very well be “not so good” at data
analysis, as long as they can clearly and accurately critique experiments and
analyses and identify when things go wrong. Also consider—what would happen if
you slotted a ML engineer into a role like this?

ML engineers are best at optimizing performance (not just accuracy) at some
defined task, where ideally the cost of the model being wrong in some exotic
way is not astronomical (e.g. approving a dangerous drug, convicting an
innocent person, initiating a stock crash, corrupting the attention span of
the human race). A good use case for machine learning comes from the book
_Pattern Recognition_ —sorting fish on a conveyor belt. No one gives an ounce
of chum about statistical rigor in this case, and the consequences of being
inaccurate are quantifiable and manageable. But building and tuning a ML
system to do this at an acceptable performance level requires a ton of work.
You’d need to define the sensor array / inputs, collect and label data,
engineer features (more relevant before deep learning, in the case of computer
vision), train and evaluate an object detection model, avoid overfitting the
training data, build an anomaly detector to weed out the stray crab, ensure it
works fast enough to be used in production, turbo-browse arXiv to make sure
your stuff isn’t obsolete (damn, it is). This is a vastly different focus than
statisticians, who wouldn’t get very far with this sort of performance-
optimization problem.

Data scientists are somewhere in between a statistician and ML engineer—like a
hybrid class in an RPG. They can cast a few key ML spells and have dabbled in
arcane math, but can also slice up data goblins by hand (maybe even with
style). With a foundation in a bit of everything, they can glue a team
together and flexibly grow in multiple directions. But they can also become a
waste of shiny gold if pitted against a specialized task without support from,
you know, specialists.

------
graycat
I can't speak for all of machine learning, but classic statistics has some
probabilistic assumptions and uses those to prove some theorems about the
results of how statistical methods manipulate data.

E.g., in regression analysis, the assumptions are (i) there really is a
_linear mode_ with the variables to be used; (ii) typically the data will not
fit the model exactly and instead there are errors; (iii) the errors are in
the sense of probability independent, all have the same distribution which is
Gaussian with mean zero. Then estimate the regression coefficients and get,
say, a F ratio statistic on the fit and t-tests on the coefficients. For
details can see, say, Mood, Graybill, and Boas, or Morrison, or Draper and
Smith, or Rao, etc. If you need detailed references, request them here, but
these are all classic references going back decades. In that case, maybe some
of current machine learning does qualify as statistics.

Here regression was just an example, however, apparently especially close to
much of current machine learning. But there is no end of (i) having data, (ii)
having some probabilistic assumptions, (iii) manipulating the data, (iv)
getting results, and (v) proving some theorems about the probabilistic
properties of the results.

Might argue that in machine learning the results of the training data on the
model yields some estimates of some probabilistic properties that can be used
to make probabilistic statements about the results of the trained model on
future real data.

~~~
alexgmcm
There are non-parametric statistical methods too though (such as bootstrap
methods) which don't make such assumptions.

~~~
graycat
Have two calculus teachers, A and B, each with 20 students. Look at the final
exam numbers. Put all the numbers in a bucket, stir briskly, draw out 20
numbers (test scores) and average, average the other 20, and get the
difference in the averages. Do this many times. Get the empirical distribution
of the differences in the averages.

Now look at the difference in the actual average for A and B. If this is out
in a tail of the empirical distribution, then we reject the null hypothesis
that the two teachers are equally good.

For this non-parametric, _distribution-free_ , resampling, two-sample test, to
make theorems about it, which should, will likely need at least an
independence assumption and likely an i.i.d. (independent, identically
distributed) assumption. Else maybe each student of teacher B is an older
sibling of a student of teacher A!!!!

In a nutshell, what's going on in statistical hypothesis testing is that we
make the null hypothesis, and that gives us enough assumptions, e.g., i.i.d.,
to calculate the probability of our calculated _test statistic_ , e.g., the
difference in the two averages, being way out in a tail. Without some such
null hypothesis assumptions, we have no basis on which to _reject_ anything,
are not testing anything.

There's chance of getting all twisted out of shape _philosophically( here:
E.g., I outlined one statistical hypothesis test for the two teachers A and B.
Okay, now consider ALL_ reasonably relevant* hypothesis: Maybe we on the test
I outlined, the two teachers look very different, not equal, maybe teacher B
better. But in ALL those hypothesis tests, maybe in one of the tests the two
teachers look equally good or even teacher A looks better. Now what do we do?
That is, there is a suspicion that teacher B looked better ONLY because of the
particular test we chose. Maybe there has been some research to clean up this
issue.

~~~
graycat
Sorry, was way too busy and typed way too fast and had LOTS of typing errors.

------
candiodari
Statistics is a set of theories and methods that can be successfully executed
on any list of numbers and provably provides mathematically valid predictions
under a set of conditions that never occurs in the real world.

This means, more precisely, statistics only works on data series that obey the
law of large numbers, and combining/using 2 or more statistical predictions
nearly always requires total independence of the predictions and their inputs,
which is never the case (for one thing, they always occur on the same planet).
Furthermore, the reason people make statistical predictions is to change the
outcome, but doing anything to change the outcome always invalidates the
statistical method used to collect the data. There are a couple of things
statistics _never_ does. Used correctly, it can never predict extreme values.
It can never correctly predict values in systems that are too complex, where
too many independent variables determine the outome. And "too many" is
something like 50 to 500. It can never correctly be used to verify if a
deliberate change worked.

Machine learning is much the same, except it never provides mathematically
valid predictions.

Despite this, it should probably be mentioned that both do provide _useful_
results, occasionally getting things very, very wrong.

[https://xkcd.com/605/](https://xkcd.com/605/)

------
cm2187
I think a better question is what is the difference between a non linear
regression and machine learning!

------
grizzles
Statistics is good for modelling things that are simplistic & low dimensional.
Machine learning is good for modelling things that are nuanced & high
dimensional. statisticians want to understand things but imho overestimate
human ability to make sense of a complex world. machine learning people want
the machine to understand the things for us and then teach us about it when
it's making us breakfast.

------
ACow_Adonis
Whatever it is, I imagine it's similar to the difference between math, stats,
ml, ai, logic, econometrics, actuaries, epidemiology, etc.

For the record, I don't consider those intellectually separate fields, but do
accept them to be culturally separate, for better or worse...

But then I could never understand why physics, biology or chemistry were
considered separate fields either...

Or psychology, economics, philosophy, etc etc etc...

~~~
Retra
They're considered separate fields because they focus on different problems
which are amenable to different techniques, leaving their expert practitioners
with very different knowledge bases. You're right that it is a cultural
distinction, but that doesn't mean it isn't an important or practical one.

------
salusbury
One actually knows what's going on.

------
crazygringo
My $0.02: statistics is generally not very "sensitive" to underlying data --
it deals with small numbers of dimensions/features and lots of similar
examples and classification is about falling within easily understood bounds.

Machine learning, on the other hand, can be extremely "sensitive" to
underlying data -- it deals with high numbers of dimensions/features on
potentially extremely sparse data sets, and a small change in one of them
could result in a radically different classification. The potentially accuracy
is far, far higher but so is the risk of overfitting.

Or: statistics looks for ranges, machine learning looks for patterns.

------
samch93
I agree that in general statistics is more concerned about inference, while
machine learning focuses on prediction. On the other hand, there is such a
huge overlap between the fields that it is hard to make a distinction. Also
there are statistical fields which focus more on prediction the same way as ML
does. For example, in geostatistics prediction is often the only goal, e.g. to
predict heavy metal concentration across a domain. People accept that it is
impossible to explain every bit of spatial variation and just model it by a
gaussian process (the same gaussian process used in ML).

------
didibus
Statistics + Computer Science = Machine Learning?

I think to me that's the case. But also, Machine Learning describes a computer
science goal, to have a computer that learns certain or ideally all parts of
its algorithm on its own, either from example or by experimentation.

It just so happens that some of the ideas from statistics lend themselves to
help computer science implement such machines that can learn. One can imagine
techniques for Machine Learning being discovered in the future which leverage
ideas from other fields apart from statistics.

------
samwalrus
One difference is that in Machine Learning you must think of data structures
and algorithms. i.e. the practical ways to compute a model. How to represent
and transform data while building a model. I think this is given less emphasis
in statistics. Standard models are often used and theory is built around these
different models. For example aspects such as power calculations for a
regression model.

------
dchichkov
Can someone correct me, if I'm wrong, but statistics doesn't allow Turing-
complete models, right? Machine learning certainly allows that (for example an
RNN).

~~~
mturmon
This is kind of a fun question because it places the notion of “computational”
computability (classical CS) alongside that of classical computability
(finding a model by, say, minimizing an error which in turn boils down to
calculus or linear system solution).

Long story short, because of the power of real numbers — with each one
encoding an infinite sequence of bits — it’s not clear that even “simple”
computations important to elementary statistical models, like exponentiating,
have low computational complexity. Computing, say, “e” can be done by a Turing
machine, but it’s nontrivial.

Another observation along the same lines is that some simple statistical
models turn out to really complex — like “find a such that the model

    
    
      y = sin (a x)
    

fits sample data (x1, y1),..., where abs(y) < 1”. This simple model class is
well known to have infinite VC dimension, because a can be arbitrarily large.

Another way to say it may be that Turing completeness is not a very sharp tool
to separate model classes.

I found this helpful:
[http://www.cs.cmu.edu/~lblum/PAPERS/TuringMeetsNewton.pdf](http://www.cs.cmu.edu/~lblum/PAPERS/TuringMeetsNewton.pdf)

------
TrackerFF
One will give you a grant (if you're lucky), the other will give you a multi-
million dollar funding.

------
darod
the difference is branding

------
dboreham
Everything's a function, right?

------
bigred100
From Michael Jordan’s reddit AMA

I personally don't make the distinction between statistics and machine
learning that your question seems predicated on.

Also I rarely find it useful to distinguish between theory and practice; their
interplay is already profound and will only increase as the systems and
problems we consider grow more complex.

Think of the engineering problem of building a bridge. There's a whole food
chain of ideas from physics through civil engineering that allow one to design
bridges, build them, give guarantees that they won't fall down under certain
conditions, tune them to specific settings, etc, etc. I suspect that there are
few people involved in this chain who don't make use of "theoretical concepts"
and "engineering know-how". It took decades (centuries really) for all of this
to develop.

Similarly, Maxwell's equations provide the theory behind electrical
engineering, but ideas like impedance matching came into focus as engineers
started to learn how to build pipelines and circuits. Those ideas are both
theoretical and practical.

We have a similar challenge---how do we take core inferential ideas and turn
them into engineering systems that can work under whatever requirements that
one has in mind (time, accuracy, cost, etc), that reflect assumptions that are
appropriate for the domain, that are clear on what inferences and what
decisions are to be made (does one want causes, predictions, variable
selection, model selection, ranking, A/B tests, etc, etc), can allow
interactions with humans (input of expert knowledge, visualization,
personalization, privacy, ethical issues, etc, etc), that scale, that are easy
to use and are robust. Indeed, with all due respect to bridge builders (and
rocket builders, etc), but I think that we have a domain here that is more
complex than any ever confronted in human society.

I don't know what to call the overall field that I have in mind here (it's
fine to use "data science" as a placeholder), but the main point is that most
people who I know who were trained in statistics or in machine learning
implicitly understood themselves as working in this overall field; they don't
say "I'm not interested in principles having to do with randomization in data
collection, or with how to merge data, or with uncertainty in my predictions,
or with evaluating models, or with visualization". Yes, they work on subsets
of the overall problem, but they're certainly aware of the overall problem.
Different collections of people (your "communities") often tend to have
different application domains in mind and that makes some of the details of
their current work look superficially different, but there's no actual
underlying intellectual distinction, and many of the seeming distinctions are
historical accidents.

I also must take issue with your phrase "methods more squarely in the realm of
machine learning". I have no idea what this means, or could possibly mean.
Throughout the eighties and nineties, it was striking how many times people
working within the "ML community" realized that their ideas had had a lengthy
pre-history in statistics. Decision trees, nearest neighbor, logistic
regression, kernels, PCA, canonical correlation, graphical models, K means and
discriminant analysis come to mind, and also many general methodological
principles (e.g., method of moments, which is having a mini-renaissance,
Bayesian inference methods of all kinds, M estimation, bootstrap, cross-
validation, EM, ROC, and of course stochastic gradient descent, whose pre-
history goes back to the 50s and beyond), and many many theoretical tools
(large deviations, concentrations, empirical processes, Bernstein-von Mises, U
statistics, etc). Of course, the "statistics community" was also not ever that
well defined, and while ideas such as Kalman filters, HMMs and factor analysis
originated outside of the "statistics community" narrowly defined, there were
absorbed within statistics because they're clearly about inference. Similarly,
layered neural networks can and should be viewed as nonparametric function
estimators, objects to be analyzed statistically.

In general, "statistics" refers in part to an analysis style---a statistician
is happy to analyze the performance of any system, e.g., a logic-based system,
if it takes in data that can be considered random and outputs decisions that
can be considered uncertain. A "statistical method" doesn't have to have any
probabilities in it per se. (Consider computing the median).

When Leo Breiman developed random forests, was he being a statistician or a
machine learner? When my colleagues and I developed latent Dirichlet
allocation, were we being statisticians or machine learners? Are the SVM and
boosting machine learning while logistic regression is statistics, even though
they're solving essentially the same optimization problems up to slightly
different shapes in a loss function? Why does anyone think that these are
meaningful distinctions?

I don't think that the "ML community" has developed many new inferential
principles---or many new optimization principles---but I do think that the
community has been exceedingly creative at taking existing ideas across many
fields, and mixing and matching them to solve problems in emerging problem
domains, and I think that the community has excelled at making creative use of
new computing architectures. I would view all of this as the proto emergence
of an engineering counterpart to the more purely theoretical investigations
that have classically taken place within statistics and optimization.

But one shouldn't definitely not equate statistics or optimization with theory
and machine learning with applications. The "statistics community" has also
been very applied, it's just that for historical reasons their collaborations
have tended to focus on science, medicine and policy rather than engineering.
The emergence of the "ML community" has (inter alia) helped to enlargen the
scope of "applied statistical inference". It has begun to break down some
barriers between engineering thinking (e.g., computer systems thinking) and
inferential thinking. And of course it has engendered new theoretical
questions.

I could go on (and on), but I'll stop there for now...

------
ianamartin
The difference is in how the output is evaluated.

An ML model is evaluated empirically. You compare to real world results and
get an accuracy measure. An ML models tells you what something _should_ be.
And if it’s a good model, you will get a pretty high frequency of that model
telling you what the thing is.

Statistics does something entirely different. It tells you what something
_could_ be.

If you flip a coin, an ML model will tell you if it’s heads or tails.
Statistics will tell you how often it will be heads or tails.

Another way to think about it is the difference between probability and
likelihood.

Probability is measured by your theoretical priors and hypotheses. Likelihood
is measured by the results of actual trials.

The probability of a fair coin landing on heads is .5

But the likelihood of that happening isn’t actually .5 because pure
frequentist probabilities depend on some fundamentally problematic things.
Like a performative infinite number of trials.

The actual line between ML and statistics is really blurry because all useful
statistical models are at least a little Bayesian. Priors get updated with
each trial. This is essentially machine learning.

Outside of mostly bad/soft sciences (sociology, psychology, neuroscience,
nutrition, and climatology are all pretty godawful about abusing classical
statistics) pure frequentist statistics don’t get used much because they are
really only useful for getting papers published and generating squawking
headlines.

Most useful statistical methods _are_ machine learning methods. Specifically,
they are applied Bayesian methods with weak, randomized priors. Which is
exactly what ML is.

I sound like I hate statistics. I don’t really. ML models can do a bunch of
wacky things. There isn’t a coherent theory behind an ML model. You could
point a very good classifier at your wife, and it might tell you [(bird,.1),
(apple,.3), (woman,.9)]

That’s not a realistic interpretation of what _could_ be. It just happened to
get that correct.

A really excellent ML model in 2016 could’ve given the following result for
president [(trump,.8), (obama,.6), (rock,.5)]. And after the fact when we can
compare it to what happened, it would seem accurate.

But that doesn’t tell us the range of possibilities in our future. The reality
in 2016 was that we weren’t going to elect a fucking rock as president. There
was no chance of that. There is zero chance that if I point my camera at my
girlfriend, she might actually be a potato. Yeah, the ML model might be right
because it has guessed right, and that’s often all we care about.

But if I need to know what my chances are of my girlfriend becoming my wife or
the mother of my children. That’s where we need statistics. An ML model can’t
have those kinds of priors baked in unless you force it. And if you do that
you’re just paying someone to do some really expensive Bayesian regression.

~~~
AstralStorm
No, ML model is not evaluated empirically at all. Measures like accuracy and
precision are not empirical. Neither is generalization as evaluated by using
some model data set. These measures are _statistical predictions_ that may or
may not be correct.

Failure rate in the wild is empirical. Accuracy in the wild, the same.

Since the designer does not have access to real data, they are actually not
working empirically at all and as such you get common overfitting and methods
that work well on model datasets but not on real data.

~~~
ianamartin
You know that data scientists have access to real data, right? It's not all
just kids doing tricks on kaggle? That we have the ability to actually measure
shit?

Do you think that all of this is a total joke? I mean, you're allowed to think
that. But, umm, yeah. Real people do real things with real data. Believe it or
not.

------
visarga
statistics - descriptive, ML - predictive

------
known
Machine Learning = Heuristics + Statistics

------
bitL
In short:

Machine Learning - making predictions

Statistics - distilling huge amount of data into a few indicators

Machine Learning uses algorithms, optimization/operations research,
differential calculus, probability and statistics as it sees fit.

~~~
mr_toad
Statistics makes predictions too. Indicators are usually predictions - a mean
is called an _expected_ value for a reason.

I find the spectacle of machine-learning practitioners and statisticians
furiously using different language to agree with each other quite amusing.

