
Understanding Machine Learning: From Theory to Algorithms - joddystreet
http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/
======
fwilliams
I took the author of this book's course during my undergrad and quite enjoyed
it. It's a good theoretical introduction to machine learning principles. The
video lectures are available here:
[https://youtu.be/b5NlRg8SjZg](https://youtu.be/b5NlRg8SjZg)

As others have mentioned this is a fairly theoretical take on machine learning
which may not be useful if you just want to use a deep learning library. That
said, I think there is a lot of value in having a deeper theoretical grasp of
a topic even when practicing.

------
joker3
This is one of the standard PhD-level textbooks. It's probably a little bit
much if you just want to use somebody else's library, but if you're developing
your own algorithms or you want to engage seriously with the research
literature, you have to know what's in here.

The main other books I'd consider are Foundations of Machine Learning
(Rostamizadeh, Talwalkar & Mohri) and Machine Learning: A Probabilistic
Perspective (Murphy). There are a few others, but I think you'd need a good
reason not to pick one of those three.

~~~
repsak
How do they compare to

Bishop: [https://www.amazon.com/Pattern-Recognition-Learning-
Informat...](https://www.amazon.com/Pattern-Recognition-Learning-Information-
Statistics/dp/0387310738/)

and

ESL:
[https://web.stanford.edu/~hastie/ElemStatLearn/](https://web.stanford.edu/~hastie/ElemStatLearn/)

------
tekkk
Looks coherent, informative and not too indulged with proving things with
mathematical formulas. Problem is that the only way I could study it with the
meticulousness it requires would be during a course. Studying it at my own
leisure is a bit too much of a burden.

However what irks me a little, as it did with my university's introductory
machine learning course, is the direct jump to the algorithms while omitting
explaining the basis for the underlying cause why it is the right thing to do.
Perhaps it's presumed that the reader knows the basic statistics behind the
idea that we can make assumptions of the data and predict values due to
central limit theorem. Machine learning itself then can be seen as a derivate
of that as more free-form, heuristic approach to cases where the central limit
theorem can't be applied or is too restrictive.

Am I speaking complete nonsense here? Please those wiser than me, tell me am I
right?

~~~
SatvikBeri
Inferential Statistics lets us make estimates about some quantity given a set
of assumptions about how a sample was drawn. E.g., you might assume data is
distributed normally.

From a theoretical standpoint, Machine Learning basically just explores what
happens if you remove that assumption, and try to draw conclusions that apply
no matter what the underlying distribution was.

The book does build up that theory. You need to understand the Probably
Approximately Correct framework (e.g., how many data points do you need to be
90% sure your algorithm will generalize with less than 5% generalization
error) and VC dimension (basically a measure of how much different algorithms
overfit.) It doesn't even introduce an algorithm until chapter 8!

VC dimension is sort of the equivalent of O(n) style complexity for machine
learning. You can use this framework to estimate the effectiveness of
different algorithms from a theoretical standpoint given a certain amount of
data. For example, Linear Regression's VC dimension grows as O(n) with the
number of features, while Decision Trees grow as O(2^n) with depth. This means
that Trees are much more likely to overfit on small data, but more likely to
be find complex structure in larger datasets.

------
SatvikBeri
I found this book very useful, personally. It does a great job presenting some
of the theoretical foundations in an understandable way (e.g. the PAC-learning
framework and VC dimension.) I also particularly liked the explanation on PCA.

------
kmax12
This seems like a great resource for understanding how machine learning
algorithms actually work. Anyone reading the book to help them build more
accurate models, might also be interested in supplementing it with more
research into the importance of feature engineering before training machine
learning models.

While feature engineering hasn't been a rigorously studied within the academic
literature, this book does have a section on feature generation, which gives
some practical tips once your data is in feature matrix form.

For those interested in even more, I work on a python library for automated
feature engineering called, Featuretools
([https://github.com/featuretools/featuretools/](https://github.com/featuretools/featuretools/)).
It can help when your raw data is still too granular for modeling or comprised
of multiple tables. We have several demos you can run yourself to apply it to
real datasets here:
[https://www.featuretools.com/demos](https://www.featuretools.com/demos).

------
brudgers
direct to pdf,
[http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning...](http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/understanding-
machine-learning-theory-algorithms.pdf)

------
potbelly83
There a lot of books out there now for ML, does this offer something new
(material/presentation) that others do not?

------
interdrift
I'm following it in my master's course on big data & it has some nice uses in
sampling:)

------
baalimago
Well, that's summer studies sorted, thanks!

------
master_yoda_1
I think this is very advanced theoretical book as per hacker news standard.

~~~
ealhad
What do you mean?

~~~
fwdpropaganda
I'm not the person you asked, but from what I've seen HN tends to have the
attitude that you don't need to understand "maths", "algorithms",
"statistics", "data structures", or any of that complicated stuff, all you
need is to get your hands dirty with code. If you buy that view, then this
book is the opposite of what HN usually likes.

~~~
mindcrime
_I 'm not the person you asked, but from what I've seen HN tends to have the
attitude that you don't need to understand "maths", "algorithms",
"statistics", "data structures", or any of that complicated stuff, all you
need is to get your hands dirty with code._

I don't get that at all. From what I've seen over the years, there is a clear
divide on HN between the people who lean towards the "you don't need any
theory at all" people and the "you need to know all the theory to do anything"
people, with there being a slight bias towards the latter in general.

Interesting how perspectives differ on this site. I guess it's similar to how
some people argue that HN is overrun with government hating libertarians,
while other people think the site is overrun by granola eating, tree hugging,
hippie communists. :-)

~~~
fwdpropaganda
> I don't get that at all. From what I've seen over the years, there is a
> clear divide on HN between the people who lean towards the "you don't need
> any theory at all" people and the "you need to know all the theory to do
> anything" people, with there being a slight bias towards the latter in
> general.

Haha I think I know what camp you're in, since no one claims you need to know
_all_ the theory to do _anything_. ;-)

While we disagree on which group is larger on HN, I'd be open to consider that
maybe our difference in perception is entirely explained by selective memory
(in the sense that you're more likely to remember when people rubbed you the
wrong way than when they agreed with you).

~~~
mindcrime
_Haha I think I know what camp you 're in, since no one claims you need to
know _all_ the theory to do _anything_. ;-)_

Interestingly enough, I don't really think of myself as falling specifically
into either. But I'm a pragmatist, so I lean a bit towards the idea that you
can get things done without necessarily always needing a lot of theoretical
knowledge. BUT at the same time, I think that having a solid theoretical
understanding is almost always desirable, and all other things being equal,
I'd rather know the theory than not. :-)

 _I 'd be open to consider that maybe our difference in perception is entirely
explained by selective memory (in the sense that you're more likely to
remember when people rubbed you the wrong way than when they agreed with
you)._

Agreed. I would think that is probably a big part of it.

