Hacker News new | past | comments | ask | show | jobs | submit login
Computational and Inferential Thinking – The Foundations of Data Science (inferentialthinking.com)
161 points by gkst on Aug 13, 2016 | hide | past | favorite | 12 comments



From here you can download this as .pdf .epub or .mobi https://www.gitbook.com/book/ds8/textbook/details


Can someone recommend some books that would pair up nicely with this? Or like maybe the next two books to read after this.


"Data Smart" by Foreman

"Intro to Statistical Learning" by James, Witten, Hastie, Tibshirani

"Elements of Statistical Learning" by Hastie, Tibshirani

The first emphasizes the application of basic concepts to the practice of data science, mostly using english and excel. The second is more mathematical yet quite applied, again focusing on DS tasks, but using R. The third is more mathematical yet, yet very much in the same model as the second. The second and third are free ebooks.


David MacKay: Information Theory, Inference, and Learning Algorithms

Kevin Murphy: Machine Learning: a probabilistic perspective.

Christopher Bishop: Pattern Recognition and Macine Learning


These three are all excellent, but I'm not sure they pair well with the OP's book, which is written purely from the frequentist perspective. The books practically speak different language.


Allen Downey offers many books for free [0] including a few on statistics and data science.

[0] http://greenteapress.com/wp/


Thanks!


It depends on what you want to do.


Suggest some.


Thanks a bunch, I've been looking around for a good data science text to read to get an intro to the field, and this seems like a good place to start.


Should I import an extra library? The first code example uses a function called read_url(), that's not a native python function.


via: http://www.inferentialthinking.com/chapter1/observation-and-...

> Snow used his map to convince local authorities to remove the handle of the Broad Street pump. Though the cholera epidemic was already on the wane when he did so, it is possible that the disabling of the pump prevented many deaths from future waves of the disease...Snow’s map is one of the earliest and most powerful uses of data visualization. Disease maps of various kinds are now a standard tool for tracking epidemics...Though the map gave Snow a strong indication that the cleanliness of the water supply was the key to controlling cholera, he was still a long way from a convincing scientific argument that contaminated water was causing the spread of the disease.

I like the treatment the book gives to Dr. Snow's statistical reasoning, so I am surprised to see it give this kind of weight to his cholera-dot-map (canonization by Tufte, notwithstanding).

Other scholars seem to think the map was largely incidental to Snow's investigation -- he was not the first to use such a technique. Furthermore, others had used it to argue exactly against what he sought to prove:

> What sort of medical cartographer was Snow? Today a considerable part of his reputation hinges on his role in mapping the Broad Street outbreak, so it is essential tial that the historical assessment be accurate. When Snow modified his original map for the CIC, he introduced a substantial cartographic innovation by explicitly indicating cating the line of equidistance among the pumps. (Ironically, his CIC map is much less well known today than the more common and less accurate version of the map in MCC2.)

> Overall, however, Snow viewed his mapping activities as a minor aspect of his investigation of cholera. He never used his map as a true investigative tool, unlike Cooper and von Pettenkofer, whose theories of cholera transmission are today day discredited. The structure of MCC2 makes clear that Snow intended his south London study to be the centerpiece in supporting his theory. In essence, the Broad Street investigation was merely preparation for the main event. It is due largely to the connection between Broad Street and a visually appealing icon, the map, that today's reader often gets it backward and attributes to the Broad Street investigation an importance that Snow never assigned to it in comparison with the much more extensive and conclusive study of the south London data.

Peter Vinten-Johansen;Howard Brody;Nigel Paneth;Stephen Rachman;Michael Rip;David Zuck. Cholera, Chloroform, and the Science of Medicine: A Life of John Snow (pp. 336-337). Kindle Edition.

> As for influence, it’s pretty to think of John Snow unveiling the map before the Epidemiological Society to amazed and thunderous applause, and to glowing reviews in The Lancet the next week. But that’s not how it happened. Its persuasiveness seems obvious to us now, living as we do outside the constraints of the miasma paradigm. But when it first began circulating in late 1854 and early 1855, its impact was far from dramatic. Snow himself seems to have thought that his South London Water Works study would ultimately be the centerpiece of his argument, the Broad Street map merely a piece of supporting evidence, a sideshow.

Johnson, Steven (2006-10-19). The Ghost Map: The Story of London's Most Terrifying Epidemic--and How It Changed Science, Cities, and the Modern World (p. 198). Penguin Group US. Kindle Edition.

Here's a blog post that sums up the main points: https://petewarden.com/2010/10/21/visualization-myths-around...

It's not that the map is trash -- it correlated with what Snow postulated. And besides being accurate, it happens to be the easiest way to be introduced to his work and theory. But it's not the map that caused him or his contemporaries to make that link. He was already persuaded by his past research. And giving so much weight to the map somewhat diminishes the amount of investigative work he did to find the data for that map. Visually, the map was not a conclusive argument for Snow's theory and could have been used just as well to argue the miasma theory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: