An oversimplified version may be:
Statistics focuses on fitting data to formally understandable models, whereas data science focuses on solving problems -- even if that means using techniques that aren't formally understood.
Leo Breiman is also known for pioneering random forest and bootstrap aggregation.
I think of machine learning as a subset of data science where you trick linear algebra into thinking.
I've sat through lengthy discussions of machine learning exercises, and could not silence the voice in my head, saying: "This is just curve fitting." Fitting data to an arbitrary curve, and then extrapolating the fitting function, is as old as the hills.
Understanding modern ML algebra is really "general relativity hard" if not actually harder. Spiking NNs are "quantum physics hard".
The math is very much translatable between these domains.
On the other hand, it can work indefinitely if it solves a problem well enough and it is used always on the same kind of problem.
Sure, you will need to understand it better to apply it to new markets, but often that kind of research is outside the scope of a single business, and they don't need it.
Do you know when it doesn't work?
Business does not care about rigor, but then terrible things happen when your face recognition system happens to not work for people with dark skin tone.
Even more terrible things happen when it detects cats as people, and worse once it's used to link data into a police database.
Even pretty bad things happen when it's used for detecting potential (treated as absolute) copyright violations.
Security holes when it's used for detecting security problems.
Loss of business when it's used for ticket prioritization.
Business world does not care that it sells a broken solution as long as it's not obvious and someone has been paid. Everyone else pays for the failures. And you cannot sue an ML system really.
Why dont't we do active learning? Or adversarial networks, or....
Sure it is easy to fail in machine learning. But it doesn't mean that we need to undstand how and what our models learn. That's the only big advantage of machine learning: the model learns, so I don't have to.
Its like saying software has been tested completely when we really should be saying its passed a subset of test cases out of the complete set of possible scenarios.
The only way we can make that claim is if we can interpret the model in a mathematical proof.
That's not my experience at all. What kind of ML requires something beyond basic linear algebra? Being comfortable manipulating matrices is certainly harder than plugging data into an sklearn function, but it's also significantly easier than understanding general relativity.
I think it is not a good comparison. A small board calls for simpler algorithms: just try out all possible games and choose the most favorable one. For larger boards this simple approach doesn't work any more and you start looking for something different. But the architecture of NN networks doesn't change at all if you move from small to large scales. Only the number and size of layers increases.
Edit: actually, that's the reason they call it "deep learning". it has more layers..
(Gated and threshold units.)
> What’s the difference between statistics and machine learning?
> About $50k a year.
Now let’s go back to the system admin dungeon stereotype.
By contrast, machine learning is primarily concerned with making the best prediction possible, even if that means sacrificing an understanding of the underlying mechanisms.
The Venn diagram of methods can have a fair degree of overlap.
Neither disposition is right or wrong, but they tend to have natural places where each makes more sense. If you’re trying to predict whether a picture is of a cat or a dog, you probably don’t care much about the constituent contribution of factors to one pictures dogness or catness. On the other hand, if you’re trying to predict traffic collisions based on characteristics of a roadway, you’re probably less concerned with the predicted number of crashes and more concerned with the relative contribution of a handful of independent variables.
what? Then we should call it machine oracle. It uses magic to make the right predictions without any understanding. It's like seeing a strong AI beating the turing test and saying: it was just lucky. I can think only of two possible ways to make right predictions without understanding: luck and cheating. Sure, sometimes ML cheats.
statistics is about explaining an observed
outcome in terms of its causing factors.
Looking at this through even the lens of multinomial logistic regression, or of econometrics generally, I don't think that "statistics draws population inferences from a sample, while machine learning finds generalizable predictive patterns" even makes sense as a difference. Any prediction is an inference about the population of future events, or of contemporary events not present in the sample. You plug 1000s of events each described by 100 columns into a logistic regression, and you're hoping to get something predictive out of it. Further, as nice as the idea is that you can tease out "factors" from your 100 columns, you don't have to look at "3.3375905e-5 x (spent five years before age 18 in a smoker's home)" for very long to wonder how much 'explanation' you're getting out of the terms in the exponents of your probability functions.
I still can't resist tweaking ML enthusiasts and data scientists: Statistics is what people who know what they're doing are doing. Machine learning is the rest!
> There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.
The difference between inference and prediction can be illustrated with something like a decision tree or a random forest. Statistical inference is "the theory, methods, and practice of forming judgments about the parameters of a population and the reliability of statistical relationships, typically on the basis of random sampling." If you look at something like linear regressions, it makes a lot of assumptions about the data: the distribution of residual errors is normal, there's no multicollinearity, etc.
Random forest don't care. Random forest is a set of steps. You follow the steps, you get an answer. You made no assumptions about your data, about the distribution of it, any of it. You just followed an algorithm.
Algorithms are powerful! Not everything needs to be an inference problem. We're better off when we have lots of tools. You don't have to choose _between_ the two camps. But the "Everything is Statistics!" and the "Everything is Machine Learning!" points of view rob us of ways of thinking about our tools that helps us understand what those tools are.
> You made no assumptions about your data, about the distribution of it, any of it. You just followed an algorithm.
You made implicit assumptions that you are now unaware of, which might come and bite you later (eg: using zip codes or names as a proxy for race in credit scoring models, neural networks overfititng to texture and classifying a leopard print couch as a leopard). This means that you are liable to overfit to the data, and generalize poorly, or in ways you are not supposed to.
This is what leads to the perspective that "machine learning" might end up being a powerful tool for laundering bias: https://idlewords.com/talks/sase_panel.htm
Yes, ZIP codes can be a proxy for race in models dealing with credit scores (and recidivism, which is also a really bad place to put racial bias), as an example. But if I put it in a mixed-effects model, it shows the same bias, and a mixed-effects model is just an extended version of linear regression. Both statistical and ML models suffer from the problem you're stating.
What you have not made any assumptions about in a random forest is about the distribution of the data you're looking at. One example of a case where the assumptions that bog-standard OLS makes about your data can cause you problems is zero-dominated data -- data with a lot of zeros in it. Basically any time you're trying to make predictions about things that are rare in your measured population.
OLS does a bad job on zero-dominated data. If you throw a zero-dominated dataset into a random forest, you will get back better answers than if you use OLS on zero-dominated data.
To be clear: there are strategies for dealing with zero-dominated data using statistical inference. You don't have to resort to non-inferential learning just because you have data that doesn't look like a bell curve. But machine learning is a powerful way to get pretty good results on a lot of problem spaces without having to understand the probability function involved (or in cases where the probability function is too complicated to be tractable computationally, like the probability function that determines the color of pixels in a dataset where you're classifying dogs versus cats).
The NFL theorems essentially state that for every dataset on which an algorithm generalizes well, there is another on which it generalizes poorly. So, there are always implicit biases, even for a random forest model (eg: if you try to model Boolean functions with a random forest, those functions which can be approximated effectively with few trees will form a set of very small measure. Specifically, it is my intuition that those which have many terms in the sum of products form might need many trees to approximate). It then becomes a question of whether the bias of your models are compatible with the dataset/domain under consideration.
See this Minsky-Sussman koan: http://www.catb.org/jargon/html/koans.html#id3141241
> Both statistical and ML models suffer from the problem you're stating.
And that is precisely why I don’t see how ML algorithms are “more powerful” than statistics in any way, as per your claim.
Well I'm glad you don't see that, but I'm a bit confused why you think I said it because I didn't say it and don't believe it. I said that ML is "powerful," not "more powerful."
Such that optimizing some other function based on the application of a decision tree is effectively a statistical question. A natural one.
Indeed, I'm not sure I see the difference. One is just explainable based on understandings of distributions and trusting that they hold. One is explainable based on understanding of decisions, and trusting those remain the best decisions.
I used my bike ride as an example earlier. It is made up of many decisions to get home. Knowing what all of those decisions are can help build up a solid range of when I'm likely to get home. For the level of accuracy I typically care about, so does just knowing the rough distribution of how long I typically take.
How are both of those not statistical in nature? One is just more fully exploring a space with ridiculous computational power, whereas the other is generalizing to a much quicker answer. Right?
Powerful tools like nonlinear polynomial models, OLS, HMM and RBMs were implemented and devised by statisticians. And not from tribe 2. The difference is the data model is explicit but general.
Then we go back and perturb the model to do some hand wavey guessing at real actionable insights
but hey clients are impressed by overfitting so who the fuck cares as long as the money is coming in and the press releases are going out!!!!
>In science one tries to tell people, in such a way as to be understood by everyone, something that no one ever knew before. But in poetry, it's the exact opposite.
statisticians take pride in models that are understandable, while machine learning practitioners take pride in models that have high accuracy.
And since we cannot define exactly what this value is (accuracy, RMSE, bias or a specific confusion matrix), we have to make a more abstract definition.
We are now left with a question. Is this abstract definition quantifiable? If so, we can still exclude the statistician w.r.t. understandably. However, if we allow qualitative value as well, the traditional statistician is back at the table.
Now, a further refinement needs to be made. Since fairness, accountability and transparency in Data Science are in the limelight, we can make a point that our ability to understand is a key metric for Machine Learning models as well. It is interesting to see the tendency to associate words or phrases to key neurons in embeddings. For example, in embeddings of faces, we can associate gender, ethnicitity, fatness and gaze with some of the embedded neurons.
The primary goal of machine learning is to help a machine make better decisions. As long as it gets the "right" answer, the explanations to humans are not as important.
I do like the rest of the point you make, and it seems to match OP. Inference vs otherwise. Helping people learn/act/decide based on data data vs helping machines learn/act/decide based on data.
'A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.'
Merriam Webster's definition of statistics:
'a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data'
So establish a couple of definitions, and take it from there.
I believe such discussions can easily be taken too far: surely the point of a subject is to group together related topics. Who is to say that the set of topics comprising multiple subjects need to be disjoint?
So do we need an explicit taxonomy to say that one application is machine learning and another is statistics? Or does the form not matter as much as the function?
Pace the OP, who I'm sure didn't do this with this end in mind, but a lot of the cases I've seen of people trying to bring this up come down to wanting to self-identify in a certain way as opposed to actually talking about the subject at hand.
For example, consider the work that the FDA does in evaluating clinical drug trial data. In this case, you need statisticians on staff that can critically analyze the statistical/methodological rigor, not to mention understand the medical domain and relevant regulations/processes. It doesn’t matter if they can only use the GUI versions of SAS and have never heard of TensorFlow, their time is better spent deeply understanding nuances of experimental design, sampling methods, causal analysis, estimation, etc. I would argue that a “good” statistician in this context might very well be “not so good” at data analysis, as long as they can clearly and accurately critique experiments and analyses and identify when things go wrong. Also consider—what would happen if you slotted a ML engineer into a role like this?
ML engineers are best at optimizing performance (not just accuracy) at some defined task, where ideally the cost of the model being wrong in some exotic way is not astronomical (e.g. approving a dangerous drug, convicting an innocent person, initiating a stock crash, corrupting the attention span of the human race). A good use case for machine learning comes from the book Pattern Recognition—sorting fish on a conveyor belt. No one gives an ounce of chum about statistical rigor in this case, and the consequences of being inaccurate are quantifiable and manageable. But building and tuning a ML system to do this at an acceptable performance level requires a ton of work. You’d need to define the sensor array / inputs, collect and label data, engineer features (more relevant before deep learning, in the case of computer vision), train and evaluate an object detection model, avoid overfitting the training data, build an anomaly detector to weed out the stray crab, ensure it works fast enough to be used in production, turbo-browse arXiv to make sure your stuff isn’t obsolete (damn, it is). This is a vastly different focus than statisticians, who wouldn’t get very far with this sort of performance-optimization problem.
Data scientists are somewhere in between a statistician and ML engineer—like a hybrid class in an RPG. They can cast a few key ML spells and have dabbled in arcane math, but can also slice up data goblins by hand (maybe even with style). With a foundation in a bit of everything, they can glue a team together and flexibly grow in multiple directions. But they can also become a waste of shiny gold if pitted against a specialized task without support from, you know, specialists.
E.g., in regression analysis, the assumptions are (i) there really is a linear mode with the variables to be used; (ii) typically the data will not fit the model exactly and instead there are errors; (iii) the errors are in the sense of probability independent, all have the same distribution which is Gaussian with mean zero. Then estimate the regression coefficients and get, say, a F ratio statistic on the fit and t-tests on the coefficients. For details can see, say, Mood, Graybill, and Boas, or Morrison, or Draper and Smith, or Rao, etc. If you need detailed references, request them here, but these are all classic references going back decades. In that case, maybe some of current machine learning does qualify as statistics.
Here regression was just an example, however, apparently especially close to much of current machine learning. But there is no end of (i) having data, (ii) having some probabilistic assumptions, (iii) manipulating the data, (iv) getting results, and (v) proving some theorems about the probabilistic properties of the results.
Might argue that in machine learning the results of the training data on the model yields some estimates of some probabilistic properties that can be used to make probabilistic statements about the results of the trained model on future real data.
In ML one wouldnt care much about recovering the parameters. The results/theorems of interest would be that with large enough samples the predictions and the new data will converge (according to some interesting mode of convergence). If this comes at a cost of doing poorly in terms of parameter recovery, ML wouldnt be bothered.
According to ML the cycloids and epicycloid based geocentric model of planetary motion would be perfectly acceptable.
Now look at the difference in the actual average for A and B. If this is out in a tail of the empirical distribution, then we reject the null hypothesis that the two teachers are equally good.
For this non-parametric, distribution-free, resampling, two-sample test, to make theorems about it, which should, will likely need at least an independence assumption and likely an i.i.d. (independent, identically distributed) assumption. Else maybe each student of teacher B is an older sibling of a student of teacher A!!!!
In a nutshell, what's going on in statistical hypothesis testing is that we make the null hypothesis, and that gives us enough assumptions, e.g., i.i.d., to calculate the probability of our calculated test statistic, e.g., the difference in the two averages, being way out in a tail. Without some such null hypothesis assumptions, we have no basis on which to reject anything, are not testing anything.
There's chance of getting all twisted out of shape philosophically( here: E.g., I outlined one statistical hypothesis test for the two teachers A and B. Okay, now consider ALL reasonably relevant* hypothesis: Maybe we on the test I outlined, the two teachers look very different, not equal, maybe teacher B better. But in ALL those hypothesis tests, maybe in one of the tests the two teachers look equally good or even teacher A looks better. Now what do we do? That is, there is a suspicion that teacher B looked better ONLY because of the particular test we chose. Maybe there has been some research to clean up this issue.
This means, more precisely, statistics only works on data series that obey the law of large numbers, and combining/using 2 or more statistical predictions nearly always requires total independence of the predictions and their inputs, which is never the case (for one thing, they always occur on the same planet). Furthermore, the reason people make statistical predictions is to change the outcome, but doing anything to change the outcome always invalidates the statistical method used to collect the data. There are a couple of things statistics never does. Used correctly, it can never predict extreme values. It can never correctly predict values in systems that are too complex, where too many independent variables determine the outome. And "too many" is something like 50 to 500. It can never correctly be used to verify if a deliberate change worked.
Machine learning is much the same, except it never provides mathematically valid predictions.
Despite this, it should probably be mentioned that both do provide useful results, occasionally getting things very, very wrong.
For the record, I don't consider those intellectually separate fields, but do accept them to be culturally separate, for better or worse...
But then I could never understand why physics, biology or chemistry were considered separate fields either...
Or psychology, economics, philosophy, etc etc etc...
Machine learning, on the other hand, can be extremely "sensitive" to underlying data -- it deals with high numbers of dimensions/features on potentially extremely sparse data sets, and a small change in one of them could result in a radically different classification. The potentially accuracy is far, far higher but so is the risk of overfitting.
Or: statistics looks for ranges, machine learning looks for patterns.
I think to me that's the case. But also, Machine Learning describes a computer science goal, to have a computer that learns certain or ideally all parts of its algorithm on its own, either from example or by experimentation.
It just so happens that some of the ideas from statistics lend themselves to help computer science implement such machines that can learn. One can imagine techniques for Machine Learning being discovered in the future which leverage ideas from other fields apart from statistics.
Long story short, because of the power of real numbers — with each one encoding an infinite sequence of bits — it’s not clear that even “simple” computations important to elementary statistical models, like exponentiating, have low computational complexity. Computing, say, “e” can be done by a Turing machine, but it’s nontrivial.
Another observation along the same lines is that some simple statistical models turn out to really complex — like “find a such that the model
y = sin (a x)
Another way to say it may be that Turing completeness is not a very sharp tool to separate model classes.
I found this helpful: http://www.cs.cmu.edu/~lblum/PAPERS/TuringMeetsNewton.pdf
It does seem true that statistics are more suited to describe analytical models. But, that seems as much a quirk of history than a foregone conclusion.
As an easy example, I can give you the statistics of my bike ride. Such that you can get a pretty solid understanding of what my next week's worth of rides will be like, per the parameters used in the description. This does basically nothing to help you build a bike. Or make the ride yourself. So too, would a statistical model of an rnn be.
And indeed, this is no different than a statistical model for how often humans will make mistakes see any process. Or a model for how many students will successfully learn a topic.
I personally don't make the distinction between statistics and machine learning that your question seems predicated on.
Also I rarely find it useful to distinguish between theory and practice; their interplay is already profound and will only increase as the systems and problems we consider grow more complex.
Think of the engineering problem of building a bridge. There's a whole food chain of ideas from physics through civil engineering that allow one to design bridges, build them, give guarantees that they won't fall down under certain conditions, tune them to specific settings, etc, etc. I suspect that there are few people involved in this chain who don't make use of "theoretical concepts" and "engineering know-how". It took decades (centuries really) for all of this to develop.
Similarly, Maxwell's equations provide the theory behind electrical engineering, but ideas like impedance matching came into focus as engineers started to learn how to build pipelines and circuits. Those ideas are both theoretical and practical.
We have a similar challenge---how do we take core inferential ideas and turn them into engineering systems that can work under whatever requirements that one has in mind (time, accuracy, cost, etc), that reflect assumptions that are appropriate for the domain, that are clear on what inferences and what decisions are to be made (does one want causes, predictions, variable selection, model selection, ranking, A/B tests, etc, etc), can allow interactions with humans (input of expert knowledge, visualization, personalization, privacy, ethical issues, etc, etc), that scale, that are easy to use and are robust. Indeed, with all due respect to bridge builders (and rocket builders, etc), but I think that we have a domain here that is more complex than any ever confronted in human society.
I don't know what to call the overall field that I have in mind here (it's fine to use "data science" as a placeholder), but the main point is that most people who I know who were trained in statistics or in machine learning implicitly understood themselves as working in this overall field; they don't say "I'm not interested in principles having to do with randomization in data collection, or with how to merge data, or with uncertainty in my predictions, or with evaluating models, or with visualization". Yes, they work on subsets of the overall problem, but they're certainly aware of the overall problem. Different collections of people (your "communities") often tend to have different application domains in mind and that makes some of the details of their current work look superficially different, but there's no actual underlying intellectual distinction, and many of the seeming distinctions are historical accidents.
I also must take issue with your phrase "methods more squarely in the realm of machine learning". I have no idea what this means, or could possibly mean. Throughout the eighties and nineties, it was striking how many times people working within the "ML community" realized that their ideas had had a lengthy pre-history in statistics. Decision trees, nearest neighbor, logistic regression, kernels, PCA, canonical correlation, graphical models, K means and discriminant analysis come to mind, and also many general methodological principles (e.g., method of moments, which is having a mini-renaissance, Bayesian inference methods of all kinds, M estimation, bootstrap, cross-validation, EM, ROC, and of course stochastic gradient descent, whose pre-history goes back to the 50s and beyond), and many many theoretical tools (large deviations, concentrations, empirical processes, Bernstein-von Mises, U statistics, etc). Of course, the "statistics community" was also not ever that well defined, and while ideas such as Kalman filters, HMMs and factor analysis originated outside of the "statistics community" narrowly defined, there were absorbed within statistics because they're clearly about inference. Similarly, layered neural networks can and should be viewed as nonparametric function estimators, objects to be analyzed statistically.
In general, "statistics" refers in part to an analysis style---a statistician is happy to analyze the performance of any system, e.g., a logic-based system, if it takes in data that can be considered random and outputs decisions that can be considered uncertain. A "statistical method" doesn't have to have any probabilities in it per se. (Consider computing the median).
When Leo Breiman developed random forests, was he being a statistician or a machine learner? When my colleagues and I developed latent Dirichlet allocation, were we being statisticians or machine learners? Are the SVM and boosting machine learning while logistic regression is statistics, even though they're solving essentially the same optimization problems up to slightly different shapes in a loss function? Why does anyone think that these are meaningful distinctions?
I don't think that the "ML community" has developed many new inferential principles---or many new optimization principles---but I do think that the community has been exceedingly creative at taking existing ideas across many fields, and mixing and matching them to solve problems in emerging problem domains, and I think that the community has excelled at making creative use of new computing architectures. I would view all of this as the proto emergence of an engineering counterpart to the more purely theoretical investigations that have classically taken place within statistics and optimization.
But one shouldn't definitely not equate statistics or optimization with theory and machine learning with applications. The "statistics community" has also been very applied, it's just that for historical reasons their collaborations have tended to focus on science, medicine and policy rather than engineering. The emergence of the "ML community" has (inter alia) helped to enlargen the scope of "applied statistical inference". It has begun to break down some barriers between engineering thinking (e.g., computer systems thinking) and inferential thinking. And of course it has engendered new theoretical questions.
I could go on (and on), but I'll stop there for now...
An ML model is evaluated empirically. You compare to real world results and get an accuracy measure. An ML models tells you what something should be. And if it’s a good model, you will get a pretty high frequency of that model telling you what the thing is.
Statistics does something entirely different. It tells you what something could be.
If you flip a coin, an ML model will tell you if it’s heads or tails. Statistics will tell you how often it will be heads or tails.
Another way to think about it is the difference between probability and likelihood.
Probability is measured by your theoretical priors and hypotheses. Likelihood is measured by the results of actual trials.
The probability of a fair coin landing on heads is .5
But the likelihood of that happening isn’t actually .5 because pure frequentist probabilities depend on some fundamentally problematic things. Like a performative infinite number of trials.
The actual line between ML and statistics is really blurry because all useful statistical models are at least a little Bayesian. Priors get updated with each trial. This is essentially machine learning.
Outside of mostly bad/soft sciences (sociology, psychology, neuroscience, nutrition, and climatology are all pretty godawful about abusing classical statistics) pure frequentist statistics don’t get used much because they are really only useful for getting papers published and generating squawking headlines.
Most useful statistical methods are machine learning methods. Specifically, they are applied Bayesian methods with weak, randomized priors. Which is exactly what ML is.
I sound like I hate statistics. I don’t really. ML models can do a bunch of wacky things. There isn’t a coherent theory behind an ML model. You could point a very good classifier at your wife, and it might tell you [(bird,.1), (apple,.3), (woman,.9)]
That’s not a realistic interpretation of what could be. It just happened to get that correct.
A really excellent ML model in 2016 could’ve given the following result for president [(trump,.8), (obama,.6), (rock,.5)]. And after the fact when we can compare it to what happened, it would seem accurate.
But that doesn’t tell us the range of possibilities in our future. The reality in 2016 was that we weren’t going to elect a fucking rock as president. There was no chance of that. There is zero chance that if I point my camera at my girlfriend, she might actually be a potato. Yeah, the ML model might be right because it has guessed right, and that’s often all we care about.
But if I need to know what my chances are of my girlfriend becoming my wife or the mother of my children. That’s where we need statistics. An ML model can’t have those kinds of priors baked in unless you force it. And if you do that you’re just paying someone to do some really expensive Bayesian regression.
Failure rate in the wild is empirical. Accuracy in the wild, the same.
Since the designer does not have access to real data, they are actually not working empirically at all and as such you get common overfitting and methods that work well on model datasets but not on real data.
Do you think that all of this is a total joke? I mean, you're allowed to think that. But, umm, yeah. Real people do real things with real data. Believe it or not.
Machine Learning - making predictions
Statistics - distilling huge amount of data into a few indicators
Machine Learning uses algorithms, optimization/operations research, differential calculus, probability and statistics as it sees fit.
I find the spectacle of machine-learning practitioners and statisticians furiously using different language to agree with each other quite amusing.