Hacker News new | past | comments | ask | show | jobs | submit login
Machine Learning Books That Helped Me Level Up (datastuff.tech)
307 points by strikingloo on Apr 28, 2019 | hide | past | favorite | 50 comments



I think the data scientist hiring frenzy may soon collapse as large companies run out of patience with their recently formed data science departments that struggle to deliver ROI (for many reasons, often not the fault of the data scientists). But I think we are not yet at the final iteration of the job market for these sorts of skills. Companies usually don't really want someone who specializes in model tuning and algorithm creation, they want something like a "full-stack data analyst" - someone who acknowledges that the modeling may be 2% of the effort and the rest is business analysis, data wrangling, engineering, stakeholder management, building tools for users/operators, etc., and rolls up their sleeves to deliver an end-to-end solution. There does not yet exist a catchy name for this role, but I bet that in a few years it will be what everyone wants to hire. So skate to where the puck will be...


Great comment. What you say matches my experience, but not entirely. My take:

The 2% is a bit low as an avg. for ML work; I would say modeling/ML work ranges from 0 to 20% across the "data team". Most of the work is still data wrangling/engineering.

I agree that most companies need "full stack data guys". At my current company, there were 2-3 iterations of previous DS teams who had zero impact and left behind nothing in production. Then I came in, and being a full stack guy, I built up a DWH, ETL, dashboards [1] and eventually found opportunities to put ML into production [2].

I don't agree that the hype will end. The reason = companies/CEOs aren't rational and don't know what you said above. They'll just keep hiring DS people/teams and hope that eventually some magic gift pops out. I think this will keep on going for at least 5 years. I get pinged about 1-2 times per week by recruiters for Head of DS type positions in London/Dubai, everybody wants to build data science teams.

[1] http://bytepawn.com/fetchr-data-science-infra.html

[2] http://bytepawn.com/automating-a-call-center-with-machine-le...


Great comment indeed. The number of universities that have created MS in Analytics or Data Science programs and the number of students that have enrolled in them over the last few years is staggering and IMO unsustainable. The quality of these programs is also very uneven - there are a few great ones but many seem to have just cobbled together a few database courses from the CS department, a few classical statistics and probability courses from the Math department and a few marketing courses from the Business department and called it a MS in Analytics program. Likewise, the expectations students have about the type of work they'll be doing and the expectations companies have about the value that will be realized are both completely out of alignment. /rant off


Ian Goodfellow’s Deep Learning book pretty much useless. I own it and have read through most parts of it. I couldn’t explain it better than top Amazon reviews:

https://www.amazon.com/Deep-Learning-Adaptive-Computation-Ma...

And I’m surprised to not find Aurelion Geron’s absolute masterpiece listed below. I believe it is the best machine learning book ever, although Statistical Learning mentioned in the article is really good as well :

https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-T...


For what it's worth, I disagree quite strongly with that review. The book is aimed at those with a pretty mature appetite for abstract mathematical reasoning, but not much specific knowledge in the areas of statistics, machine learning, and neural networks. It's an actual graduate-level book, and one must approach it with the appropriate background and education.

The Goodfellow book is not complete as an academic intro, but no one book can be. It's not very useful as a practical tutorial, but no book seeking this could cover the mathematical arguments that Goodfellow's book does. I found Goodfellow's book extremely useful for consolidating a lot of handwaving that I'd seen elsewhere and putting it in a slightly more rigorous framework that I could make sense of and immediately work with as a (former) mathematician.

Goodfellow's treatment is especially useful for mathematicians and mathematically-trained practitioners who nevertheless lack a background in advanced statistics. The Elements of Statistical Learning, for instance, is extremely heavy on statistics-specific jargon, and I personally found it far more difficult to extract useful insights from that book than I did from Goodfellow's.


The problem with Goodfellow's book is that it is half-baked. I don't know why he has to introduce Linear Algebra section for a 3rd of the book (which ends abruptly) but then moves on to the ANNs. If Goodfellow intended this book for mathematicians, that whole section about LA can be omitted with literally no loss in the book's value. The whole book arguably feels rushed.

So, no amount of praise and mathematician's justification makes sense. I agree that it is inclined for mathematicians, but this book is overrated and it is terribly due for a rewrite, update and frankly in my personal view - the writing style.

I am curious of specific parts of the book you found valuable.


The section on linear algebra moves quickly. I don't see how a book including extra prerequisite material is an example of it being half-baked or rushed. Surely that would actually be an example of the opposite? That section is a nice reference to have immediately available for things like SVD and PCA.


Perhaps restructuring the LA part into an appendix would be preferable. For mature readers, it nevertheless serves as a way to focus and agree on notation that is used for the rest of the book.


The best textbooks, the ones considered classic gems in their field, are careful about what they say and what they leave out. You get the sense that every word is in its right place.

By comparison, Goodfellow and his co-authors seemed to just dump everything they know onto the page. It's fragmented, bloated, and it meanders all over the place. Goodfellow was on a recent podcast where he seems to acknowledge that the book straddles an awkward place between tutorial and reference.

I don't mean to sound too harsh. I appreciate its scope, and I've certainly read much worse textbooks.


Considering how much is not in the book -- and that the book is not even that large as far as textbooks go -- part of me feels this criticism is somewhat disingenuous. I'd agree that the direction of the final part on research topics feels fragmented, but the rest of the book certainly doesn't. It's very clearly focused on developing deep neural networks and doesn't actually meander at all.

If the worst you can say is it's not a classic text, that's really not saying much at all. I feel weird defending the book so much when to me it's just a book I found useful and I don't even feel that strongly towards it. But the strength of some of the criticism here doesn't seem motivated by the book itself.


This is pretty harsh. I use Goodfellow as a reference text and then supplement the mathematics behind it with more comprehensive texts like Hastie or Wasserman. Maybe if you sit down and read it cover-to-cover, it will seem disjointed. But I usually read chapters independently - I recently read the convolutional neural network chapter in preparation for an interview and I thought it was fine.


So it's a reference, not a pedagogical tool then?

A reference implies you already know the topic and just want an index to jog your memory for things you can't hold all in your head at once.

That is different than a pedagogical tool. If so, you shouldn't recommend it to those want to learn the topic.


Phrased like this, I agree with you. I didn't think about it like that.

I agree with the poster below. Outside of classes, lecture notes, the books I listed, and Sutton/Barto (Intro. to Reinforcement Learning) have taught me the material. I use Goodfellow to brush up before interviews or jog my memory about topic I don't work with very often (like computer vision).


I much prefer reading Karpathy's notes and watching Stanford CS230 than to delve into Goodfellow. It sits on my shelf collecting dust.


What are Wasserman and Hastie?


- Wasserman has a book called "All of statistics" that gives a lot of the background required to understand modern machine learning

- Hastie is a co-author of two machine learning books, one is "Elements of Statistical Learning" which is very comprehensive, and "Introduction to Statistical Learning", which is more approachable by people without too much background in stats.


FYI, there's a new edition of Geron's book coming out in August which will include Keras: https://www.goodreads.com/book/show/40363665-hands-on-machin...


I think the Amazon review is rather dramatic, and probably not in a position to comment on style. I thought both Goodfellow and Geron were good. Goodfellow is deep learning for academics coming from a different field; Geron is deep learning for software engineers.


That's the thing, even as an academic book it falls short. It feels disjointed, unorganized and poorly written - and most frustratingly, incomplete.

The reason for Goodfellow's popularity is that it was publish in 2014 right at the turn of exponential interest in Deep Learning after AlexNet. It took off and became popular, but readers now feel it is stale for the aforementioned reasons.


I certainly agree it could be better, but I also think "disjointed, unorganized and poorly written - and most frustratingly, incomplete" more or less sums up the field of deep learning :).


I can second the recommendation for Geron's book, it's absolutely stellar.


Do you think there is a need for a better written DL textbook? I definitely agreed with the review you linked.

I've always thought that Hands on ML by Geron was great implementation wise, but lacking in the mathematical rigor and depth. While I would have a general sense of what is going on after reading it, and I'd certainly be able to structure and implement a model, I don't know if I would have any deep intuitions.


Geron's book is more of a tutorial/cookbook coalesced with important insights into the practice of machine learning. So, I recommend reading Introduction to Statistical Learning (and Elements of Statistical Learning for theoretical background) before jumping into Geron's book. As engineers, I agree we need to have some theoretical background but at the same time, we are applying this knowledge to real world problems. Geron's book is invaluable and I hope publishes more, it is a gem.


2nd edition coming in August. Preorders opened a few days ago. He posted on twitter. Some preview chapters available on O'Reilly's site.


I've heard poor things about Goodwell's Deep Learning book as well, what is a good alternative?


For what it's worth, I read Goodwell's book cover to cover and loved it. It answers "why" questions rather than "how" questions, but those are the questions I had, and you can find "how" questions answered for your framework of choice on the internet.


Take CS230, I believe it used to be taught by Fei Fei followed by Karpathy and I don't know who teaches it now.


Gérons book is both entertaining and educational, I really enjoy it so far.


I really recommend Murphy’s “Machine Learning: a probabilistic perspective”. Murphy’s lays the groundwork for understanding how the algorithms work, why and how they could be adapted to the problem you’re dealing with. It takes you from complete beginner (with a reasonable math level) to one step above `import sklearn as sk`.

The other books I read make the field look like a bunch of heuristics that just happen to work.


Kevin Murphy's book is already the equivalent of the ML bible in ML grad classes. I am very surprised to not see it at the top.

I absolutely love this book. The breath of topics is unparalleled and my STEM undergrad level math knowledge was perfectly sufficient understanding the math.


Murphy is a nice book, but I’ve always felt like it was more of a survey text and not one made for diving deep into a given subject. For instance, if you want to go from theory to writing code, Murphy isn’t necessarily the best book for that.


I think it was intended to show you the landscape, give you enough tools and background knowledge so you can go and explore the literature by yourself. Years after it’s publication it still does a really good job at it.


Since the site mentions "An Introduction to Statistical Learning":

The first book on statistical learning by Hastie, Tibshirani and Friedman, which is absolutely terrific, is freely available for download:

The Elements of Statistical Learning

http://web.stanford.edu/~hastie/ElemStatLearn/



If you can get through ESL, then you are very theoretically set for machine learning work! Fabulous book that is very dense with information.


Lists like this are awesome, but I can’t help but think we need some sort of lists tool that lets people create them, others vote on them reddit style, leave comments, rate each item, etc. almost like a subreddit type thing per list, maybe without the temporal decay component of the algorithm.

Then we could have “10 best intro to machine learning resources” as a living breathing list.


Pls someone do this. I despise Amazons stranglehold of product reviews, esp. books. Sometimes half of the ratings are on the condition of arrival.


Not to mention amazon isn’t always going to be the only source for resources. Could be a mix of blog posts, video courses, research papers...anything really


Few days ago i found this https://getpolarized.io Mind Blown. May be this will help you. The author has plans for voting, but is not available currently. But still its so cool software for whoever wants to study for life.


There's a site called Zeef. They can probably add this.


The Introduction to Statistical Learning book is great.

But, and I think this is not stated enough, there is a big difference between statistical learning and machine learning in terms of how you approach a problem. The subject matter might be same, but the approach to solve problem is different, one is a 'statistics' approach, one is a 'CS' approach. Depending on your background, you might like one but not the other.

You can know more of what I am talking about by reading this famous piece from Leo Breiman [0].

Personally, I feel I was fortunate enough to learn ML from a so called 'CS' perspective through Andrew Ng's course on Coursera.

0. https://projecteuclid.org/download/pdf_1/euclid.ss/100921372...


I agreed, also I agree that the pro and con of each one isn't stated enough either.

I recently attended Dr. Frank Harrell's workshop and he really put the differences between the two from his experiences in perspective.

He advocate more for continuous variable responses than what most ML does. He also put into context what ML does well where the "signal to noise" ratio is so that the noise are little and signal are more. But when the noises are getting bigger statistical model will do better.

His examples is the titanic dataset. Paraphrasing but he stated, "I don't care who live or die." The better question is what is their tendency to live or die? Instead of classifying it into two categories why not give a percentage? What if a person is 49% likely to live? Aren't you playing God? And it is up to the person that's using the model to decide not the modeler. Another case is that it's a loss of information by forcing it into a classification problem.

He went on how ML tend to do this force classification. And how it is very good at things that have low signal to noise, such as game of Go, speech rec, visual rec, etc...

It's an interesting contrast than Dr. Leo Breiman's The Two Cultures paper. At least that's my take away, Dr. Breiman may be less opinionated? Both are very critical of statistic/data models but Dr. Frank Harrell have good remedies and counter points.


Personally, I don't agree with a lot of Dr. Harell's definitions for SL and ML to begin with. He mostly talks about clinical trials, a field I don't have much knowledge about. But a lot of the arguments seem to attack a strawman defined as ML.


Oh I can see that.

I think personally they're just pro and con for both. They're enough problems out there for both of them to coexist.

With that said, I think the future of AI is going to be a hybrid of SL/ML as seen on m4 time series competition. ML may be just a stepping stone, like the two AI winter with expert system dying out. Or perhaps ML is going to just advance like NN did and went deep learning. Dr. Leo Breiman is correct in harshly criticizing SL but I believe both sides are going to have to take harsh objective criticisms for them to move forward.


I wouldn't say these are level-up but rather some introductory material that covers the basics.

Swapping Introduction to Statistical Learning for Elements of Statistical Learning is a good step-up if you don't need as much hand-holding (it's essentially the same book, by the same author just more thorough). Then, adding Bishop's ML book is a good idea. Although also introductory, it covers a lot more topics (some kernel methods and probabilistic stuff) and in a more disciplined way.

Also, while not that popular in the deep learning hype era, Vapnik's Nature of Statistical Learning is a great read.


Bishop's looks like an awesome book, and it's on my reading list after many colleagues recommended it to me. I wouldn't have added it to the list though, because I haven't read it yet.


I find watching all of the machine learning courses that are posted to YouTube to be a good way to keep up and to get insight into the thinking of the authors of recent papers. It has more or less become my morning ritual to watch one lecture a day.

That said, the past few weeks have been an absolute tsunami of potentially groundbreaking papers. And it is hard to keep up with The cutting edge.


Would you care to list some of them for us?


One god-fatherly advice: you won't get a data science job by just reading these three books. You need to work hard and do other things too. Like working on many projects.


I agree! Personally, I'd advice anyone who starts reading these books to also practice everything they read on them, and maybe keep all of those projects on GitHub for potential employers to see them.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: