
Foundations of Data Science [pdf] - kercker
http://www.cs.cornell.edu/jeh/book.pdf
======
nevi-me
I'm an accountant but work consulting as a data/software "engineer" (or in
those roles). For a while I got excited by all the "data science" books, until
I admitted to myself 2 years ago that without the Math & Stats background, I'm
wasting my time.

I went back to varsity (part time) this year, studying Applied Maths and
Stats. Even with the basics that I now know, going through this book; I can
feel that a glass ceiling is broken.

I'm going to print it bit by bit, and study its contents (after my exams).
Thanks!

------
danso
Not to be confused with, "The Foundations of Data Science", a free textbook
used/created for UC Berkeley's data science course:

[https://www.inferentialthinking.com/](https://www.inferentialthinking.com/)

------
cschmidt
Previous discussion
[https://news.ycombinator.com/item?id=9437925](https://news.ycombinator.com/item?id=9437925)

~~~
johnhenry
The previous discussion is about the version dated November 4th, 2014. The
book referenced here is dated January 4th, 2018. From looking at the TOCs,
most chapters contain a few additional sections and there is an entirely new
chapter. This might warrant a second discussion.

------
chalmette
Looks like this book heavily intersects with "Probability and Computing:
Randomization and Probabilistic Techniques in Algorithms and Data Analysis" by
Mitzenmacher/Upfal [0].

[0]
[https://books.google.com/books?id=E9UlDwAAQBAJ&pg=PA1&source...](https://books.google.com/books?id=E9UlDwAAQBAJ&pg=PA1&source=kp_read_button#v=onepage&q&f=false)

------
bhuthesh_r
Ravindran Kannan, one of the authors taught a course of the same name at CSA,
IISc. The video lectures of the course are available here:
[http://drona.csa.iisc.ernet.in/~chiru/datascience/iisclectur...](http://drona.csa.iisc.ernet.in/~chiru/datascience/iisclectures.html)

~~~
mrkstu
I'd avoid that site- from our Cisco proxy:

"Based on your organization's access policies, this web site (
[http://drona.csa.iisc.ernet.in/~chiru/datascience/iisclectur...](http://drona.csa.iisc.ernet.in/~chiru/datascience/iisclectures.html)
) has been blocked because it has been determined by Web Reputation Filters to
be a security threat to your computer or the organization's network. This web
site has been associated with malware/spyware."

~~~
abhishekjha
Microsoft's Youtube Playlist :
[https://www.youtube.com/watch?v=WEBUWYxaqLQ&list=PLD7HFcN7LX...](https://www.youtube.com/watch?v=WEBUWYxaqLQ&list=PLD7HFcN7LXRcvobbHq_8zMyWq_tKwtebc)

I don't get what is MS doing here.

------
hellofunk
I always thought linear regression was a “foundation” of this field, but there
is no discussion of a technique by this name in this book. Is there another
name it goes by?

~~~
boxy310
Logistic regression is also referred to as a "supervised classification"
problem, which this book only addresses in the specialized space of document
clustering or image classification. They do also address Support Vector
Machines, which is a generalized algorithm for classification. However, there
are a wide variety of specific implementations of logistic regressions that
require quite a bit more conversation (dummy variables, log-odds ratios,
ordinal variables) that are more directly applicable to a general stats
background to machine learning itself.

Considering that the authors are all CS professors or researchers and not
statisticians, that makes sense to me why they don't view logistic regression
as foundational.

~~~
hellofunk
I said linear regression, not logistic.

------
mlevental
of all of the modern grad books this one has always struck me as the most
mathematically rigorous. an heir to trevor's esl.

------
hellofunk
Quite dense! Just reading the first paragraph assumes good background
knowledge. Wish I could grasp this stuff!

------
vazamb
Not to be confused with an 'Introduction to data science' book

------
anotheryou
is there an epub?

~~~
johnhenry
[https://ebook.online-convert.com/convert-to-epub](https://ebook.online-
convert.com/convert-to-epub)

------
ybrah
I also recommend ISLR [http://www-bcf.usc.edu/~gareth/ISL/](http://www-
bcf.usc.edu/~gareth/ISL/)

~~~
thidr0
Any idea of what material overlaps or which to read first?

~~~
ybrah
My two cents is to follow courses, or start your own projects. These books
should be used as reference as you're learning things.

