
Translating Between Statistics and Machine Learning - BillPollak
https://insights.sei.cmu.edu/sei_blog/2018/11/translating-between-statistics-and-machine-learning.html
======
dhairya
It's fascinating to see the differences in language around statistics across
disciplines. I used to work in a group where my colleagues came from different
backgrounds (one had a phd in particle physics, another a phd in stats, and
the third a phd in economics). We all were hired around the same time. We
spent the first month learning how to communicate with each other about basic
statistical concepts so that we were on the same page.

~~~
otoburb
I wonder if the economics PhD had the most difficulty due to graph axis
inversion[1].

[1]
[https://en.wikipedia.org/wiki/Wikipedia:Reference_desk/Archi...](https://en.wikipedia.org/wiki/Wikipedia:Reference_desk/Archives/Humanities/2009_January_24#why_does_my_economics_book_put_the_price_as_a_Y_axis_when_the_quantity_is_a_function_of_it.3F)

~~~
dhairya
That didn't really come up. We were doing institutional research for a
university to support senior leaders. Most of our work entailed working with
transactional HR data, educational outcomes data, other information collected
at the university to provide policy analysis and strategic recommendations.

The stats and econ phd came from the same school, so they had more shared
vocabulary, but definitely thought about problems differently. The physics
colleague came from Europe and also thought about problems at a much different
scale. So a lot of the initial time was spent deriving proofs so that everyone
felt comfortable with different statistical methods used for analysis. The
data we worked with was often small scale, sparse, and not really IID. The
proofs all essentially converged, but it was interesting because different
language and assumptions were made depending on their distinct disciplinary
backgrounds.

Sorry for the vague speak. A lot of the work we did was confidential, so can't
really talk specifics.

------
kgwgk
[http://statweb.stanford.edu/~tibs/stat315a/glossary.pdf](http://statweb.stanford.edu/~tibs/stat315a/glossary.pdf)

------
wodenokoto
A few others I've noticed:

    
    
        Statistics | Machine Learning
        Dummy variable | one-hot encoding
        Fitting a model | training a model

~~~
nkozyra
I still hear fitting a model, but usually it's just shorthanded to "training."

------
zwaps
I love how Machine Learning has just taken over Decision Theory/Game
Theory/Economics as well with Reinforcement Learning or rather "Inverse
Reinforcement Learning" \- like just renaming utility to reward.

There seems to be tons of scholars who have become big names by taking results
and concepts from other fields, such as heuristic/approximate dynamic
optimization and game theory... Now I would wonder - were these independent
discoveries, or did they just read it & not provide citations...

------
killjoywashere
I spent an impressive amount of time with a biostats PhD (who had veto power
on my IRB protocol) working this sort of stuff out. In the end it became clear
he really didn't care about the machine's training process at all, and he was
only interested in validation as a way to get to talk about comparitive
statistics to evaluate the results of many machines with each other.

I drew many, many tables on the whiteboard that day.

------
misterdoubt
_X causes Y if surgical (or randomized controlled) manipulations in X are
correlated with changes in Y_

 _X causes Y if it doesn 't obviously not cause Y._

This seems to conveniently overlook the _decades_ of quantitative social
science built on "I controlled for a couple things, and p is less than .05, so
X causes Y."

But I'm not clear on the community context here. Is this just good-natured
ribbing?

~~~
daxat_staglatz
I do not think it does overlook that research: those papers generally assume
that once those things are controlled for, the remaining variation is as good
as random and thus we indeed recover the causal effect of X on Y. Many papers
are probably _wrong_ on this, but they still use "causation" in the first
sense and not in the second sense.

------
currymj
I would like to know what statisticians mean by "nonparametric", and what
machine learning people mean by "nonparametric", because they seem to be
something very different.

~~~
bonoboTP
In ML: when your model is not defined by a fixed set of parameters, but the
number of "parameters" varies depending on the training data. For example k
nearest neighbor classification requires storing the entire training set in
order to be able to make predictions. Gaussian process regression and
Dirichlet process based clustering (mixture fitting) are other examples.
Linear regression on the other hand is parametric as the model is defined by a
fixed set of coefficients whose count does not depend on the number of
training examples/observations.

~~~
currymj
this is how I understand it. but then I’ve heard statisticians describe neural
networks as “nonparametric”, even though they typically have a fixed number of
parameters. (millions of parameters! arguably they are the MOST parametric.)

~~~
srean
Neural Network in general is indeed nonparametric because the number of
weights are not something that is fixed in advance but learned from data. If
they are considered fixed, for example for logistic regression then its
considered parametric.

~~~
bonoboTP
The number of parameters in aneural net as used today, specifically in
computer vision, is basically never learned from training data. I actually
cannot recall practically used methods that would do that.

------
esalman
Another example but not related to statistics: "convolution" in machine
learning is not exactly the same thing as in signal processing.

------
visarga
> Statistics: regressions

vs.

> ML: supervised learners, machines

I think ML uses the term regression for the situations where the output is
numeric value (as opposed to a label), and supervised learning is more than
just regression. Usually regression models have mean squared error loss
function, that's one way to spot them.

~~~
wodenokoto
Yes, in ML (and I believe for regular statistics as well) regression is when
you are trying to predict a number rather than a label. Predicting house
prices is a classical example.

An exception for this is "logistic regression" which is accepted by both the
stats and ML communities.

Supervised learning is whenever you have a target variable for the
observations you use to train/fit your model.

If you only have targets for some of your observations, it is called "semi-
supervised learning". Although in deep learning you often talk about "pre-
training" your model, which often is adjusting weights in an unsupervised way.

So you can have supervised regression as well as supervised label predictors.
These can also be semi-supervised.

------
iagovar
Oh jesus, this is super useful. I had a hard time with ML people speak.

------
wUabkSG6L5Bfa5
I once interviewed at a biostats shop where the interviewer kept using the
word "responses" to refer to feature values. I could not pin him down on the
problem statement. Pretty sure that dumbass thinks I'm a dumbass.

~~~
ende
That doesn't make sense. By "response" he probably meant response variables,
as in the dependent variables. Features would be the independent variables (or
"predictors" in some stats/biostats circles).

~~~
wUabkSG6L5Bfa5
Precisely! Post hoc, I figured out that he must have meant responses to an
assay, which makes sense in context, but like, I would have expected someone
with any stats background whatsoever to be able to clarify.

