
Statistical Modeling: The Two Cultures (2001) [pdf] - michael_fine
https://projecteuclid.org/download/pdf_1/euclid.ss/1009213726
======
mturmon
This is a great little paper -- with comments and rejoinder! I presented it to
a reading group back when it appeared, I enjoyed it so much. Always worth re-
reading because Breiman is such a hero of useful probabilistic modeling and
insight.

One should remember that it is a reflection of its time, and the dichotomy it
proposed has been softened over the years.

Another paper, more recent, and re-examining some of these same trends in
broader context, is by David Donoho:

[https://courses.csail.mit.edu/18.337/2015/docs/50YearsDataSc...](https://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf)

Highly recommended. Pretty good HN comments at:

[https://news.ycombinator.com/item?id=10431617](https://news.ycombinator.com/item?id=10431617)

~~~
FabHK
I was not too impressed by the paper.

In particular, I think the dichotomy stems not from statisticians neglecting
what works, or having a narrow mindset, or whatever.

It seems to me that it stems from different goals:

* business seeks to predict and classify

* science seeks to test hypotheses

And statisticians used to focus on the latter, for which you need classical
statistics ("data modeling", or "generative modeling", as Donoho calls it),
don't you?

And for prediction and classification, sure, there are the classical
techniques (regression, time series (ARCH, GARCH, ...), Fisher's linear
discriminant), there are Bayesian methods, newer statistical stuff such as
SVM, and ML techniques such as random forests.

However, it's just driven by different objectives. As the commenters state,
Efron: 'Prediction is certainly an interesting subject but Leo [Breiman]’s
paper overstates both its role and our profession’s lack of interest in it.',
or Cox: 'Professor Breiman takes data as his starting point. I would prefer to
start with an issue, a question or a scientific hypothesis [...]', or Parzan:
'The two goals in analyzing data which Leo calls prediction and information I
prefer to describe as “management” and “science.” Management seeks profit,
practical answers (predictions) useful for decision making in the short run.
Science seeks truth, fundamental knowledge about nature which provides
understanding and control in the long run.'.

So, different objectives call for different methods. And, certainly 20 years
ago, statisticians were mostly focusing on one rather than the other. Ok. So?

~~~
laichzeit0
“Ok. So?” Well so Computer Science has been focusing mainly on the predictive
side (ML/AI) and you had a lot of intellectual whining that they’re “re-
inventing” statistics just with different terminology. I’m not sure if this is
just an attempt to down play their results or if it’s more academic jealousy
because the funding goes to the “cool stuff” like AI/ML in the CS dept. and
the Stats dept. is seen as old and boring. That’s what it feels like. You’ll
even see this type of commentary in the preface of books like All of
Statistics.

No matter what comes from empirical research in Computer Science re:
prediction/classification methods you’ll hear the Stats camp crying that it’s
“just Statistics” at the end of the day. Fair enough, but computational
Statistics was then neglected for long enough that computer scientists had to
create more powerful techniques independently and can claim priority on that
front. Theory lags practice in this area.

~~~
thousandautumns
> I’m not sure if this is just an attempt to down play their results or if
> it’s more academic jealousy because the funding goes to the “cool stuff”
> like AI/ML in the CS dept. and the Stats dept. is seen as old and boring.

No one is trying to downplay the legitimately impressive results of AI/ML.
Deep learning, convolutional neural networks and GANs have had incredible
success in fields like computer vision, and image/speech recognition. But
outside of those areas the "results" for the current fads in AI/ML learning
have been _grossly_ overstated. You have academic computer scientists like
Judea Pearl decry the "backward thinking" of statistics and who are
championing a "causal revolution", despite not actually doing anything
revolutionary. You have modern machine learning touted ad nauseam as a panacea
to any predictive problem, only for systematic reviews to show they don't
actually out perform traditional statistical methods [1]. And you have
industry giants like IBM and countless consulting companies promise AI
solutions to every business problem that turn out to be more style than
substance, and "machine learning" algorithms that are just regression.

There's a reason why AI research has gone through multiple winters, and why
another is looming. Those in AI/ML seem to be more prone and/or willing to
overpromise and underdeliver.

[1]
[https://www.sciencedirect.com/science/article/pii/S089543561...](https://www.sciencedirect.com/science/article/pii/S0895435618310813)

~~~
laichzeit0
I would also add in that list RNNs for time-series forecasting, especially
LSTMs. I would say this is a bit more than “just regression”.

------
ivan_ah
This is a great paper. Very long, but worth every bit of it. BTW, here is a
recent blog post about the paper:
[http://duboue.net/blog27.html](http://duboue.net/blog27.html)

One of the key insights I took away was the importance of using out-of-sample
predictive accuracy as a metric for regression tasks in statistics—just like
in ML. The standard best practices in STATS 101 is to compute R^2 coefficient
(based on data of the sample), which is akin to reporting error estimates on
your training data (in-sample predictive accuracy).

IMHO, statistics is one of the most fascinating and useful fields of study
with countless applications. If only we could easily tell apart what is
"legacy code" vs. what is fundamental... See this recent article
[https://www.gwern.net/Everything](https://www.gwern.net/Everything) the
points out the limitations of Null Statistical Hypothesis Testing (NHST),
another one of the pillars of STATS 101.

~~~
anthony_doan
> The standard best practices in STATS 101 is to compute R^2 coefficient
> (based on data of the sample), which is akin to reporting error estimates on
> your training data (in-sample predictive accuracy).

That's not the best practice at all. It ain't even standard because adj R^2
exist to penalized coefficient cheating and if you cheat on your degree of
freedom. And on top of that we got other penalization functions other than R^2
such as AIC and AAIC.

All of them are just for comparison between similar type of models and it's
not suppose to be use for generalization test for unseen data. We're taught CV
too and it is of statistic invention. And also taught about training set and
test set with CV.

If you want to debug for test data you have to do CV anyway for imbalance
data. So this is comparing apple to banana with R^2 vs out of sample.

You can see all of this in the book that applied statistic uses for linear
regression, Applied Linear Statistical Models by Kutner & et al.

------
astazangasta
The title is a reference to this famous essay by C.P. Snow about a split
between the humanities and science:
[https://en.wikipedia.org/wiki/The_Two_Cultures](https://en.wikipedia.org/wiki/The_Two_Cultures)

------
incompatible
See also "50 years of Data Science" by David Donoho (2015), which discusses
the question of whether there's any difference between "statistics" and "data
science".

[http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataSci...](http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf)

------
stared
Two? Two?!

There is a classic post here
[https://news.ycombinator.com/item?id=10954508](https://news.ycombinator.com/item?id=10954508):

" The Geneticists: Use evolutionary principles to have a model organize itself
The Bayesians: Pick good priors and use Bayesian statistics

The Symbolists: Use top-down approaches to modeling cognition, using symbols
and hand-crafted features

The Conspirators: Hinton, Lecun, Bengio et al. End-to-end deep learning
without manual feature engineering

The Swiss School: Schmidhuber et al. LSTM's as a path to general AI.

The Russians: Use Support Vector Machines and its strong theoretical
foundation

The Competitors: Only care about performance and generalization robustness.
Not shy to build extremely slow and complex models.

The Speed Freaks: Care about fast convergence, simplicity, online learning,
ease of use, scalability.

The Tree Huggers: Use mostly tree-based models, like Random Forests and
Gradient Boosted Decision Trees

The Compressors: View cognition as compression. Compressed sensing,
approximate matrix factorization

The Kitchen-sinkers: View learning as brute-force computation. Throw lots of
feature transforms and random models and kernels at a problem

The Reinforcement learners: Look for feedback loops to add to the problem
definition. The environment of the model is important.

The Complexities: Use methods and approaches from physics, dynamical systems
and complexity/information theory.

The Theorists: Will not use a method, if there is no clear theory to explain
it

The Pragmatists: Will use an effective method, to show that there needs to be
a theory to explain it

The Cognitive Scientists: Build machine learning models to better understand
(human) cognition

The Doom-sayers: ML Practitioners who worry about the singularity and care
about beating human performance

The Socialists: View machine learning as a possible danger to society. Study
algorithmic bias.

The Engineers: Worry about implementation, pipe-line jungles, drift, data
quality.

The Combiners: Try to use the strengths of different approaches, while
eliminating their weaknesses.

The Pac Learners: Search for the best hypothesis that is both accurate and
computationally tractable. "

~~~
antipaul
This is too funny. Engineers focus on “pipeline jungles” lol!

------
thousandautumns
God I hate this paper. Perhaps it was relevant at its time. But that was 18
years ago. The described dichotomy between the "two cultures" isn't nearly as
pronounced, if it even exists, today. There are few statisticians today who
adhere entirely to the "data modeling culture" as described by Breiman.

I'm surprised how often this paper continues to get trotted out. In my
experience it seems to be a favorite of non-statisticians who use it as
evidence that statistics is a dying dinosaur of a field to be superseded by X
(usually machine learning). Perhaps they think if its repeated enough it will
be spoken into existence?

------
brian_spiering
Here is a previous discussion
[https://news.ycombinator.com/item?id=10635631](https://news.ycombinator.com/item?id=10635631)

------
michalrichards9
I am not an expert and am still reading thru the article, but why is it such a
strong dichotomy? Don't all predictive algorithm also assume a data model? for
example aren't hidden Markov models, by assuming constant transition
probability make a data assumption?

To my ears (eyes?), this discussion resembles the transition from linear,
euclidean geometry into the fractal realm.

~~~
anthony_doan
> I am not an expert and am still reading thru the article, but why is it such
> a strong dichotomy? Don't all predictive algorithm also assume a data model?
> for example aren't hidden Markov models, by assuming constant transition
> probability make a data assumption?

I'm going to try my best to answer your question from my experiences and
background. I am from the statistic school of thought so please keep that in
mind for any bias.

I can give you an example of the different mentality in applied math vs
statistic with regard to modeling. Then I'll try to expand to machine
learning.

So an example will be time series univariate data. Applied Math people will
use probability to try to model the process which create the time series data.
A statistician will not care about modeling the process that create the data,
he/she only care about using all the information from the data to create a
model. A very clear example to this times series model is when you do residual
analysis to see if the statistic model uses all the information from the data.

I know it sounds superficial but it also drive how each field invent and
research different models.

Let's go from statistic vs machine learning thinking. If you lurk in
/r/statistic you will see many statisticians will separate ML models vs
statistic model (data model in Dr. Breiman's term) by confidence interval. A
statistical model gives the confidence interval of that prediction on top of
inferences to the parameters and such. ML models does not give a CI. A con to
this is giving a prediction without CI is worthless to a statistician because
it doesn't tell us how good that prediction is.

Let take linear regression as an example. Many non statistic books will give
you equations how to solve for it via a cost function least square. Statistic
books give you that and the MLE way of doing it. We also see every prediction
is the expected value from a distribution (see here
[https://stats.stackexchange.com/questions/148803/how-does-
li...](https://stats.stackexchange.com/questions/148803/how-does-linear-
regression-use-the-normal-distribution)). And often time most non statistic
books aren't going to give you that point of view.

Another example is Deep Learning vs PGM or Bayes Network (heirarchical
modeling). From my experiences ML is more empirical driven then statistic.

~~~
michalrichards9
Thanks. I think you are furthering the point of prediction vs. modeling,
right? You can, after all, get a confidence rating for a model

~~~
anthony_doan
> I think you are furthering the point of prediction vs. modeling, right?

Kinda. I just wanted to point of in general why the models are categorize as
statistical model vs non statistical models. Certain area of statistic do
prediction aka forecasting too (time series). It's just statistical models
gives CI for their prediction and ML usually do not give any, most of the
time, it's created empirically and have no theory to get a CI.

As for Confidence Rating, I have no idea what it is I've tried Google but I
couldn't find much. What field is this?

------
basetop
If people liked this paper, I suggest reading "The Two Cultures" by CP Snow
which is not as technical but more expansive, cultural and philosophical.

------
nestorD
> Interpretability is a way of getting information. But a model does not have
> to be simple to provide reliable information about the relation between
> predictor and response variables; neither does it have to be a data model.
> The goal is not interpretability, but accurate information.

------
vowelless
As others have said, great paper by a great author. Must read.

------
FabHK
(2001)

