"In 2001, the renegade statistician Leo Breiman described 'two cultures' of using statistical models 'to reach conclusions from data.' The first 'assumes that the data are generated by a given stochastic data model.' The second 'uses algorithmic models and treats the data mechanisms as unknown.' This essay sketches the diverse sources of this second algorithmic culture, one stemming more from an engineering culture of predictive utility than from a scientific culture of truth."
As an historian, I was also interested by the part in the conclusion that drew a parallel between recent conceptual shifts among historians of science and data scientists.* Plus I just think it's cool that historians are now starting to really dig into the intellectual history of machine learning in the same way that they study, say, the history of biology or physics.
*This passage: "For at least three decades, professional historians of science have pushed against a vision of science modeled on theoretical physics; we now celebrate the diverse forms of knowledge focused on the careful empirical study of particular things. Exponents of data-focused computational science have a sur- prisingly similar evolution. Just as history of science embraced the study of the particular as it disconnected from a Cold War prioritizing of theory, the data sciences moved beyond the aggregates of mathematical statistics to draw upon granular data sets to characterize particular things—individual people, diseases, films. In a key manifesto celebrating the 'unreasonable effectivenessof data,' three Google researchers argued in terms that echo humanist denun- ciations of reductionist knowledge: “sciences that involve human beings rather than elementary particles have proven more resistant to elegant mathematics.” Something else is needed. 'Perhaps when it comes to natural language proces- sing and related fields, we’re doomed to complex theories that will never have the elegance of physics equations. But if that’s so, we should...embrace complexity and make use of the best ally we have: the unreasonable effective- ness of data.'"