The 2% is a bit low as an avg. for ML work; I would say modeling/ML work ranges from 0 to 20% across the "data team". Most of the work is still data wrangling/engineering.
I agree that most companies need "full stack data guys". At my current company, there were 2-3 iterations of previous DS teams who had zero impact and left behind nothing in production. Then I came in, and being a full stack guy, I built up a DWH, ETL, dashboards  and eventually found opportunities to put ML into production .
I don't agree that the hype will end. The reason = companies/CEOs aren't rational and don't know what you said above. They'll just keep hiring DS people/teams and hope that eventually some magic gift pops out. I think this will keep on going for at least 5 years. I get pinged about 1-2 times per week by recruiters for Head of DS type positions in London/Dubai, everybody wants to build data science teams.
And I’m surprised to not find Aurelion Geron’s absolute masterpiece listed below. I believe it is the best machine learning book ever, although Statistical Learning mentioned in the article is really good as well :
The Goodfellow book is not complete as an academic intro, but no one book can be. It's not very useful as a practical tutorial, but no book seeking this could cover the mathematical arguments that Goodfellow's book does. I found Goodfellow's book extremely useful for consolidating a lot of handwaving that I'd seen elsewhere and putting it in a slightly more rigorous framework that I could make sense of and immediately work with as a (former) mathematician.
Goodfellow's treatment is especially useful for mathematicians and mathematically-trained practitioners who nevertheless lack a background in advanced statistics. The Elements of Statistical Learning, for instance, is extremely heavy on statistics-specific jargon, and I personally found it far more difficult to extract useful insights from that book than I did from Goodfellow's.
So, no amount of praise and mathematician's justification makes sense. I agree that it is inclined for mathematicians, but this book is overrated and it is terribly due for a rewrite, update and frankly in my personal view - the writing style.
I am curious of specific parts of the book you found valuable.
By comparison, Goodfellow and his co-authors seemed to just dump everything they know onto the page. It's fragmented, bloated, and it meanders all over the place. Goodfellow was on a recent podcast where he seems to acknowledge that the book straddles an awkward place between tutorial and reference.
I don't mean to sound too harsh. I appreciate its scope, and I've certainly read much worse textbooks.
If the worst you can say is it's not a classic text, that's really not saying much at all. I feel weird defending the book so much when to me it's just a book I found useful and I don't even feel that strongly towards it. But the strength of some of the criticism here doesn't seem motivated by the book itself.
A reference implies you already know the topic and just want an index to jog your memory for things you can't hold all in your head at once.
That is different than a pedagogical tool. If so, you shouldn't recommend it to those want to learn the topic.
I agree with the poster below. Outside of classes, lecture notes, the books I listed, and Sutton/Barto (Intro. to Reinforcement Learning) have taught me the material. I use Goodfellow to brush up before interviews or jog my memory about topic I don't work with very often (like computer vision).
- Hastie is a co-author of two machine learning books, one is "Elements of Statistical Learning" which is very comprehensive, and "Introduction to Statistical Learning", which is more approachable by people without too much background in stats.
The reason for Goodfellow's popularity is that it was publish in 2014 right at the turn of exponential interest in Deep Learning after AlexNet. It took off and became popular, but readers now feel it is stale for the aforementioned reasons.
I've always thought that Hands on ML by Geron was great implementation wise, but lacking in the mathematical rigor and depth. While I would have a general sense of what is going on after reading it, and I'd certainly be able to structure and implement a model, I don't know if I would have any deep intuitions.
The other books I read make the field look like a bunch of heuristics that just happen to work.
I absolutely love this book. The breath of topics is unparalleled and my STEM undergrad level math knowledge was perfectly sufficient understanding the math.
The first book on statistical learning by Hastie, Tibshirani and Friedman, which is absolutely terrific, is freely available for download:
The Elements of Statistical Learning
Then we could have “10 best intro to machine learning resources” as a living breathing list.
But, and I think this is not stated enough, there is a big difference between statistical learning and machine learning in terms of how you approach a problem. The subject matter might be same, but the approach to solve problem is different, one is a 'statistics' approach, one is a 'CS' approach. Depending on your background, you might like one but not the other.
You can know more of what I am talking about by reading this famous piece from Leo Breiman .
Personally, I feel I was fortunate enough to learn ML from a so called 'CS' perspective through Andrew Ng's course on Coursera.
I recently attended Dr. Frank Harrell's workshop and he really put the differences between the two from his experiences in perspective.
He advocate more for continuous variable responses than what most ML does. He also put into context what ML does well where the "signal to noise" ratio is so that the noise are little and signal are more. But when the noises are getting bigger statistical model will do better.
His examples is the titanic dataset. Paraphrasing but he stated, "I don't care who live or die." The better question is what is their tendency to live or die? Instead of classifying it into two categories why not give a percentage? What if a person is 49% likely to live? Aren't you playing God? And it is up to the person that's using the model to decide not the modeler. Another case is that it's a loss of information by forcing it into a classification problem.
He went on how ML tend to do this force classification. And how it is very good at things that have low signal to noise, such as game of Go, speech rec, visual rec, etc...
It's an interesting contrast than Dr. Leo Breiman's The Two Cultures paper. At least that's my take away, Dr. Breiman may be less opinionated? Both are very critical of statistic/data models but Dr. Frank Harrell have good remedies and counter points.
I think personally they're just pro and con for both. They're enough problems out there for both of them to coexist.
With that said, I think the future of AI is going to be a hybrid of SL/ML as seen on m4 time series competition. ML may be just a stepping stone, like the two AI winter with expert system dying out. Or perhaps ML is going to just advance like NN did and went deep learning. Dr. Leo Breiman is correct in harshly criticizing SL but I believe both sides are going to have to take harsh objective criticisms for them to move forward.
Swapping Introduction to Statistical Learning for Elements of Statistical Learning is a good step-up if you don't need as much hand-holding (it's essentially the same book, by the same author just more thorough). Then, adding Bishop's ML book is a good idea. Although also introductory, it covers a lot more topics (some kernel methods and probabilistic stuff) and in a more disciplined way.
Also, while not that popular in the deep learning hype era, Vapnik's Nature of Statistical Learning is a great read.
That said, the past few weeks have been an absolute tsunami of potentially groundbreaking papers. And it is hard to keep up with The cutting edge.