Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Full-on machine learning for 2020, what are the best resources?
423 points by jamesxv7 on Dec 31, 2019 | hide | past | favorite | 118 comments
I want to focus on Machine Learning for this 2020 but I see to many options; Deep Learning, AI, Statistical Theory, Computational Cognitive and more... but to focus just on ML, where should I start? I work mostly as a data analyst on pharma where the focus is batch process.

Whoever read this - please please please ignore the posts that suggest to just play with numbers. This is the equivalent of suggesting to someone who wants to learn how to code to copy-paste formulas into excel. Just don't be that person.

To be very blunt, in 2020 most ML is still glorified statistics, except you lose the insights and explanations. The only tangible improvements can be random forests - some times. 99% of the stuff you can do with basic statistics. 99% of the coders I know just don't know statistics besides the mean (and even with that, they do senseless things like doing means of means)

So learn statistics - basic statistics, like in the "for dummies" book series.

If you want to be a little more practical, stats "for dummies" is often found in disciplines that depends on stats, but are not very good in math - biology, psychology, and economics are great candidates.

So just download biology basis stats (to know how to compare means - this gives you the A/B test superpower), then psychology factor analysis (to know PCA - this gives you the dimension reduction superpower) then econometrics basic regression (to know linear regression)

With these 3 superpowers, you will be able to do more than most of the "machine learning" people. When you have mastered that, try stuff like random forest, and see if you still think it's as cool as it's hyped to be.

Given that many data people run across is tabular, I appreciate your advice about the importance of statistics. Also kudos for mentioning hypothesis testing (no one in this thread mentioned it). Lastly, I’d add that ML practitioners will gain a lot by listening to statisticians and economists on the issue of data quality, e.g. selection bias.

That said, I am not as cynical about “machine learning.” ML and “data science” brought the importance of prediction front and center, i.e. can you fit a model that accurately predict the target value given a previously seen input? This point is made by the recently published stats textbook Computer Age Statistical Inference (Efron and Hastie).

In some applications, it may be beneficial to choose black box models with high predictive accuracy, as the goal for these applications is prediction, not interpreting individual model coefficients.

You can do pose estimation with basic statistics?

Many business data is tabular (possibly with time component), and if you are working with tabular data, the OP’s advice is sound.

The answer to this question depends on your level of computer & math proficiency. Some folks here have been debating about the relative merits of practice vs. theoretical foundations, but this dispute makes some assumptions about where you are starting from and where you are most comfortable. The fastest way to learn something is to fit it into a framework that you already understand. If you have a PhD in theoretical physics/abstract mathematics (like a lot of ML researchers), then the more mathematical (theoretical) frameworks will be a good way to build deep intuitions. If, on the other hand, you are more into applied data analysis, then you will probably find that working on applications will be the easiest way to go.

Personally, I enjoyed both Andrew Ng's and Geoffrey Hinton's respective courses on ML and Neural Networks on Coursera. You may also want to check out Michael Neilsen's online essay on deep learning (http://neuralnetworksanddeeplearning.com). Ultimately I would also encourage you to supplement your understanding by applying this work to your own applications. The universe is often the best teacher.

I’d suggest:

https://fast.ai - good intro on practical neural networks.

I wrote a guide to ML based NLP. We identify if a sentence is a question, statement or command using neural networks:


The truth is you don’t need to understand all the math right away with neural networks. Mostly it’s getting an understanding of why you use a given layer, bias, etc and when. Once you get some intuition then I’d learn the math.

That’s at least how I instruct others. In any case, there are lots of guides for any flavor. I’d start with deep learning and focus on the “practical” then move to the “theoretical”.

Machine Learning:

* https://www.youtube.com/watch?v=UzxYlbK2c7E: Andrew Ng's machine Learning course, the recommended entry point by most people

* https://mlcourse.ai/ : More kaggle focused, but also more modern and has interesting projects

Do both courses simultaneously, take good notes, write useful flashcards, and above all do all the exercises and projects

Deep Learning

* https://www.fast.ai/ - Very hands-on, begin with " Practical Deep Learning for Coders" and then "Advanced Deep Learning for coders"

* https://www.coursera.org/specializations/deep-learning : More bottom-up approach, helps to understand the theory better

Do those two courses in parallel (you can try 2 weeks of coursera followed by one of fastai in the beginning, and then just alternate between them), take notes, write good flashcards and above all do the exercises and projects.

After that you will be done with the beginning, your next step will depend on what area interested you the most, and getting way too many resources right now can be extremely confusing, so I would recommend doing a follow-up post after you worked through the above resources. Also as non-ML stuff I recommend Scott Young's Ultralearning and Azeria's self improvement posts (https://azeria-labs.com/the-importance-of-deep-work-the-30-h...)

Good free resources:

- MIT: Big Picture of Calculus

- Harvard: Stats 110

- MIT: Matrix Methods in Data Analysis, Signal Processing, and Machine Learning

If any of these seem too difficult - Khan Academy Precalculus (they also have Linear Algebra and Calculus material).

This gives you a math foundation. Some books more specific to ML:

- Foundations of Data Science - Blum et al.

- Elements of Statistical Learning - Hastie et al. The simpler version of this book - Introduction to Statistical Learning - also has a free companion course on Stanford's website.

- Machine Learning: A Probabilistic Perspective - Murphy

That's a lot of material to cover. And at some point you should start experimenting and building things yourself of course. If you'are already familiar with Python, the Data Science Handbook (Jake Vanderplas) is a good guide through the ecosystem of libraries that you would commonly use.

Things I don't recommend - Fast.ai, Goodfellow's Deep Learning Book, Bishop's Pattern Recognition and ML book, Andrew Ng's ML course, Coursera, Udacity, Udemy, Kaggle.

Bear in mind Elements of Statistical Learning is a grad-level text. I would never recommend that to a beginner to the field over an Introduction to Statistical Inference, by the same authors.

Geron Aurelien's Oreilly book is great - Hands-On Machine Learning with Scikit-Learn and TensorFlow. Get the second edition which covers Tensorflow 2.

You're right about ESL, that's why I started the list with some more fundamental material. Also, +1 for Aurelien's book, it's really good; I didn't know he had a revised edition for TensorFlow 2.

Why don't you recommend fast.ai and kaggle?

and Andrew Ng's ML course?

If you like books and you want to deeply understand ML techniques I'd suggest jumping straight into "Introduction to Statistical Learning" and only learning calculus/stats/matrix methods (linear algebra) as you need them (you really don't need much from them in practice).

But it's ok to start using libraries and fitting models without understanding how they work deeply, and coming back to these books later (just make sure you come back; there's lots of useful ideas in them!) In which case I'd recommend some of the resources the parent doesn't recommend

> If you like books and you want to deeply understand ML techniques I'd suggest jumping straight into "Introduction to Statistical Learning" and only learning calculus/stats/matrix methods (linear algebra) as you need them (you really don't need much from them in practice).

This doesn't work. ISL is good, but it aims to be accessible by excluding most of the math. So if you go over it, you'll neither "deeply understand ML techniques", nor will you encounter enough math that you can learn along the way as you suggest.

A lot of the resources proposed in the comments focus on theoretical knowledge, or a particular sub-domain (Reinforcement Learning, or Deep Learning). I recommend a top down approach where you pick a project and learn by building it. This can be easier said than done however, and after mentoring dozens of junior Data Scientists I wrote a how-to guide for people interested in using ML for practical topics.

You can find it from O'Reilly here (http://shop.oreilly.com/product/0636920215912.do) or on Amazon here (https://www.amazon.com/Building-Machine-Learning-Powered-App...).

I think it depends on what you want to focus on. If you want to do deep learning, fast.ai is probably the best resource available. Jeremy Howard and Rachel Thomas (the two founders) have poured quite a lot into fostering a positive, supportive community around fast.ai which really does add quite a lot of value.

If you want to really understand the fundamentals of machine learning (deep learning is just one subset of ML!), there is no substitute for picking up one of the classic texts like: Elements of Statistical Learning (https://web.stanford.edu/~hastie/ElemStatLearn/), Machine Learning: A Probabalistic Approach (https://www.cs.ubc.ca/~murphyk/MLbook/) and going through it slowly.

I'd recommend a two pronged approach: dig into fast.ai while reading a chapter a week (or at w/e pace matches your schedule) of w/e ML textbook you end up choosing. Despite all of the hype of deep learning, you really can do some pretty sweet things (ex: classify images/text) with neural nets within a day or two of getting started. Machine learning is a broad field, and you'll find that you will never know as much as you think you should, and that's okay. The most important thing is to stick to a schedule and be consistent with your learning. Good luck on this journey :)

Excellent recommendation. I really appreciate all the recommendations proposed. Happy New Year eachro.

Be sure to check out 3Blue1Brown's linear algebra series as well. (Maybe after you've built your own MNIST network) Blew my mind when I made the connection that each layer in a dense NN is learning how to do a linear transformation + a non-linear "activation" function.

In following order:

1. Michael Nielson's book: http://neuralnetworksanddeeplearning.com/

2. Stanford CS231n course: http://cs231n.stanford.edu/

3. DRL hands on book: https://www.amazon.com/Deep-Reinforcement-Learning-Hands-Q-n...

After this churn through research papers or medium articles on conv net architecture surveys, batchnorm, LSTM, RNN, transformers, bert. Write lots of code, try things out.

This may make sense if you want to do image processing and deep reinforcement learning. But there are lots of other domains.

For tabular data (which is probably most relevant in Pharma, and probably the best place to start) Introduction to Statistical Learning by Hastie et al and Max Kuhn's Applied Predictive modelling cover a lot of the classical techniques.

For univariate time series forecasting "Forecasting Principles and Practice" is great.

For natural language processing foundations Jurafsky's Speech and Language Processing is broadly recommended; for cutting edge natural language processing Stanford's CS224n is great: http://web.stanford.edu/class/cs224n/

I can't suggest Introduction to Statistical Learning enough, it's a fantastic book! I loaned my copy to another data scientist because I didn't want to hog such a valuable resource.

Study calculus, from the definition of real numbers and to taking complex integrals via residuals; then study linear algebra to some theorems about eigenvectors. 1 month total, assuming you're somewhat talented and determined to spend 12 hours a day learning proofs of boring theorems. After that you'll realise that most of the ML papers out there are just ad-hoc composed matrix multiplications with some formulas used as fillers. At that point I think it's more useful to learn what ML models work in practice (although nobody will be able to explain why they work, including the authors) and mix this practical knowledge with the math theory to develop good intuition.

I'd compare ML with weather models: we understand physics driving individual particles, we understand the high level diff equations, but as complexity builds up, we have to resort to intuition to develop at least somewhat working weather models.

What are complex integrals used for in machine learning?

They aren't. It's just a very coarse point where to stop.

I started with with the machine learning course[0] on Coursera followed by the deep learning specialization[1]. The former is a bit more theoretical while the latter is more applied. I would recommend both although you could jump straight to the deep learning specialization if you're mostly interested in neural networks.

[0] https://www.coursera.org/learn/machine-learning

[1] https://www.coursera.org/specializations/deep-learning

Is C/C++ still worth learning if o want to create some models from scratch (new layers or different paradigms)

I hear that C++ is a nightmare to work with and was wondering if Rust,Julia, or even Swift would be worth learning instead.

I know Python but deep learning frameworks seem to be written in C++, so to come up with new layers I need to understand C++, which I was told has lot of peculiarities that takes time to pick up. Compiler isn’t also very user friendly (what I’ve read)

C++ is not as tricky as people make it out to be. There is a lot of elitism among programmers, and a lot of people seem to claim it’s hard solely to make themselves look smarter for being able to write it.

If you know the basics of programming and have the persistence to. RTFM (Read The Fucking Manual), C++ will not give you any trouble. In fact, you might actually start to enjoy it more than the other languages you used in the past.

All that said, if you are focusing on machine learning rather than programming, then you should look into Python and R. A great resource is “an introduction to data science with R” by David Langer: https://m.youtube.com/watch?v=32o0DnuRjfg

If this is the case I would actually love to play around with C++ as a lot of software that Python wraps around is written is in it and it gives me chance to look a little deeper into the source code.

Julia is a blast to do research on this stuff in, if you want to go beyond the basics like TensorFlow and PyTorch allows. The 2020's is going to be the decade of mixing numerical PDEs with machine learning IMO, and Julia already has a lot of features along these lines that are missing from "traditional ML" libraries.

Interesting. I was going to go through their yearly conference talks to get an sense of Julia’s capabilities. JuliaCon2019 etc on youtube. Is that the best way?

Possibly. On this topic (machine learning, differentiable programming, GPU and parallel computing) I'd recommend the following videos:








You can actually implement most new layers or experimental ideas using frameworks like pytorch or tensorflow. They support fairly low-level primitives which are much more flexible than keras or pytorch sequential models. That said C/C++ is still very useful for implementing high performance systems.

Ah. I haven’t played around with Pytorch custom layers enough so I am going to give it a try. I was initially trying to do it in keras but Keras was just using tensorflow layers for most operations so I couldn’t tweak the original tensorflow layers through keras easily.

The concept of "layers" is not in fact enforced by pytorch or tensorflow at all. This tutorial is a really nice overview of the levels of abstraction available in pytorch https://pytorch.org/tutorials/beginner/nn_tutorial.html

No point unless you have an interest in numerical linear algebra. The people who write the foundational Fortran/C/C++ libraries are experts in numerical analysis which is another rabbit hole.

If you want to write your own for fun, then there are some great algebra libraries in C++ you can use or you can use bindings for PyTorch or TF.

Yeah I don’t want to write my own libraries but create new layers from the existing numerical algebra layers.

I was originally trying to create a new type of convolution layer in Keras and asked in their official google board, stackoverflow etc , after being stuck for a while but the answers I got weren’t solving the problem.

I haven’t tried creating custom layers in Pytorch yet though so maybe it’s possible to do so with Pytorch and can just learn C++ for other purposes.

The Rust SDK for Tensorflow is worth a look.

Honestly, I would start with fast.ai - if you dont like it by lesson 3 switch to another resource. If you do like it through fast.ai is probably the biggest bang for your buck(time).

I was in the same boat in 2014. I went a more traditional route by getting a degree in statistics and doing as much machine learning as my professors could stand (they went from groaning about machine learning to downright giddy over those two years). I worked as a data scientist for an oil-and-gas firm, and now work as a machine learning engineer (same thing, basically) for a defense contractor.

I’ve seen some really bad machine learning work in my short career. Don’t listen to the people saying “ignore the theory,” because the worst machine learning people say that and they know enough deep learning to build a model but can’t get good results. I’m also unimpressed with Fast AI for the reasons some other people mentioned, they just wrapped PyTorch. But also don’t read a theory book cover-to-cover before you write some code, that won’t help either. You won’t remember the bias-variance trade-off or Gini impurity or batch-norm or skip connections by the time you go to use them. Learn the software and the theory in tandem. I like to read about a new technique, get as much understanding as I think I can from reading, then try it out.

If I would do it all-over again I would:

1. Get a solid foundation in linear algebra. A lot of machine learning can be formulated in terms of a series of matrix operations, and sometimes it makes more sense to. I thought Coding the Matrix was pretty good, especially the first few chapters.

2. Read up on some basic optimization. Most of the time it makes the most sense to formulate the algorithm in terms of optimization. Usually, you want to minimize some loss function and thats simple, but regularization terms make things tricky. It’s also helpful to learn why you would regularize.

3. Learn a little bit of probability. The further you go the more helpful it will be when you want to run simulations or something like that. Jaynes has a good book but I wouldn’t say it’s elementary.

4. Learn statistical distributions: Gaussian, Poisson, Exponential, and beta are the big ones that I see a lot. You don’t have to memorize the formulas (I also look them up) but know when to use them.

While you’re learning this, play with linear regression and it’s variants: polynomial, lasso, logistic, etc. For tabular data, I always reach for the appropriate regression before I do anything more complicated. It’s straightforward, fast, you get to see what’s happening with the data (like what transformations you should perform or where you’re missing data), and it’s interpretable. It’s nice having some preliminary results to show and discuss while everyone else is struggling to get not-awful results from their neural networks.

Then you can really get into the meat with machine learning. I’d start with tree-based models first. They’re more straightforward and forgiving than neural networks. You can explore how the complexity of your models effects the predictions and start to get a feel for hyper-parameter optimization. Start with basic trees and then get into random forests in scikit-learn. Then explore gradient boosted trees with XGBoost. And you can get some really good results with trees. In my group, we rarely see neural networks outperform models built in XGBoost on tabular data.

Most blog posts suck. Most papers are useless. I recommend Geron’s Hands-On Machine Learning.

Then I’d explore the wide world of neural networks. Start with Keras, which really emphasizes the model building in a friendly way, and then get going with PyTorch as you get comfortable debugging Keras. Attack some object classification problems with-and-without pretrained backends, then get into detection and NLP. Play with weight regularization, batch norm and group norm, different learning rates, etc. If you really want to get deep into things, learn some CUDA programming too.

I really like Chollet’s Deep Learning with Python.

After that, do what you want to do. Time series, graphical models, reinforcement learning— the field’s exploded beyond simple image classification. Good luck!

This is the correct progression IMHO. I can tell you’ve been in industry because it mimics my experiences.

Always start with a simple model and see how far you can get. Most of the improvements I’ve seen comes from “working the data” anyway. You will be surprised how much you can improve model performance just by working the data, or improving the quality of the underlying data alone. Also simple models give you a “baseline”. What is the point of reaching for neural networks if you don’t have a baseline performance metric to compare against? XGBoost is a godsend. It trains extremely quickly and is surprisingly difficult to beat in practice.

As you say, constantly sharpen your saw with regards to probability theory and mathematics in general. There is simply no way around this in the long run.


Excellent detailed advice! This is THE roadmap for ML study.

PS: While many of us may not have the time/resources for a graduate course, one can absolutely get the mandatory theoretical ideas from books/courses/videos/etc.

Impressed with your response, thanks for the clarity you have presented through your examples. Once again, thanks a lot.

I'm not an expert, but I had heard lots of good things about Fast.ai's online course/content: https://course.fast.ai/

I've started/stopped a few courses with Georgia Tech's OMSCS program as well which might have been useful, but I still feel like I'm missing some of the mathematical foundation to allow me to make more sense of those courses so Fast.ai's approach seems like it could be a better fit for someone like myself that's more interested in the practical aspects of using it (I just haven't made the effort to go through their content myself).

Ask yourself: Do you really need ML to solve the problems you're interested in solving?

If you're learning it for career purposes, keep in mind that many corporate ML use-cases are problematic at best. At worst, you will produce something that kills someone inadvertently, possibly more than one person.

Learn about the many pitfalls and limitations of ML. Learn about inadvertent bias in datasets. Learn about the issues with inputs not represented (or not adequately represented) in your training dataset.

Most importantly, understand that ML is not magic and without significant guardrails in place, there's a good chance something will fuck up.

A lot of good advice here.

One thing I would add is replicate a couple of ML papers. It can help develop a lot of intuition about the specific area.

Actually this is a great idea. Seems I'll try this approach for 2020 Q1.

No one suggested standford cs231n: http://cs231n.github.io/. I'd recommend the winter 2016 lectures (by FeiFei Li, Karpathy and Johnson). For getting started with convnets / deeplearning, I think this is one of the best hands on ressources out there.

AFAIK FastAI courses are well recommended for their Deep Learning stuff but they also have ML course[0] Another usual recommendation is Elements of Statistical Learning book. Another option is finding a MOOC that you enjoy and following it.


There's a MOOC that uses 'Introduction to Statistical Learning' by the authors of 'Elements of Statistical Learning', here: https://lagunita.stanford.edu/courses/HumanitiesSciences/Sta...

I had a nice experience with Adam Geitgey's Machine Learning is Fun course.

He published a lot of free ML blog posts, in easy-to-understand writing with nice examples, so it never made anything seem out-of-reach. I found that a lot of other material was a little too abstract, so his stuff was great.

The blog posts are here: https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec...

And I also bought his paid course with code samples -- it's affordable and good value.

Does anyone have any resources for people with more advanced ML experience?

Have you read Goodfellow / Bengio / Courville’s deep learning book? The later chapters go in more depth than most other resources I’ve found.

1. Find a paper you like/admire

2. Implement their methods from scratch (i.e. numpy not pytorch)

3. Experiment a bit, tweaking the models/algs to gain intuition

4. Repeat 1-3

> Implement their methods from scratch (i.e. numpy not pytorch)

lol this is basically impossible and completely pointless. please show me a numpy implementation of BERT or CycleGAN or deformable convolutions (note that jax != numpy). it's like suggesting implementing a kernel to someone who wants to learn about virtual memory or scheduling.

better advice would be take a paper and implement the model using pytorch without looking at their implementation and fiddle with that.

Another suggestion: I like https://spinningup.openai.com for learning reinforcement learning.

I'm impressed by the responses generated in this conversation. My expectation was to get several links and start browsing each one of them. However, many have agreed that the best way is to start with a specific example and start creating a model. Many times I have tried to answer that same question, "which model to apply"? How do I know I'm not re-inventing the wheel?

> How do I know I'm not re-inventing the wheel?

You probably are, but for learning purposes that doesn't matter at all.

If you successfully invent a new, improved, wheel then you don't need help or guides.

Humblebundle has a bundle of machine learning books right now: https://www.humblebundle.com/books/python-machine-learning-p... I'm considering buying this bundle. Any of these books you would recommend?

I'm not affiliated with humblebundle in any way, and this was a genuine question. I know that the packt books are not best quality, but if one these books is a good introduction to practical ML, I would consider it a good deal. In my opinion much better than googling algorithms and tutorials and visiting 10s or 100s of sites full of ads and ad trackers to find a suitable algorithm for a given problem. Reading an EPUB on my daily commute sounds much better and works offline.

> where should I start? I work mostly as a data analyst on pharma where the focus is batch process.

Any tool needs an applied field but any applied field does not need all the tools. You have an applied field already (pharma), so start looking for one or two state-of-the-art ML papers for that? Happy 2020 and good luck, it’s going to be fun!

I am currently going through fast.ai's Deep Learning course and will totally recommend it because of its top-down approach.

Has anyone done non-DL courses on their website? For e.g., any thoughts on Rachel's Computational Linear Algebra?

Does anybody have resources on the math behind ML? I hit a dead end using Python frameworks because it was a black box, and I simply lacked the underlying knowledge.

Mathematics for Machine Learning - https://mml-book.github.io/

100-page ML book for a brisk tour Deep Learning (Goodfellow) Introduction to Statistical Learning

I'm using machine learning to solve some computer-vision problems. If you're interested in joining my project, email me at alex at roadometry.com

Excellent thread - I have the same goal and currently am working mostly with databases. Thanks for asking this question!

There is a question I have been asking for quite some time. It is known that Python is the language of choice when practicing ML. But, can similar results be achieved using Powershell? What makes Python superior to Powershell when making models for ML?

Technically you can do it in any language, but in software engineering we tend to stand on the shoulders of giants in order to get the job done on time.

A lot of original excellent data processing, statistical analysis, and ML libraries were built into Python and R, so all the deep learning stuff was built on top of those. R is somewhat harder to integrate into a production pipeline due to its typical reliance on something like RStudio, so Python ended up being the de facto standard as it is also well supported in cloud computing environments.

With TensorFlow API's being written for Swift, we might start to see Swift competing with Python.

Wow, I would never think to use Powershell outside of some Windows-specific tinkering. I guess every language has its diehard fans.


Honestly, skip all of the courses. Pick a problem to solve, start googling for common models that are used to solve the problem, then go on github, find code that solves that problem or a similar one. Download the code and start working with it, change it, experiment. All of the theory and such is mostly worthless, its too much to learn from scratch and you will probably use very little of it. There is so much ml code on github to learn from, its really the best way. When you encounter a concept you need to understand, google the concept and learn the background info. This will give you a highly applied and intuitive understanding of solving ml problems, but you will have large gaps. Which is fine, unless you are going in for job interviews.

Also bear in mind that courses like fast.ai (as you see plastered on here), aggresively market themselves by answering questions all over the internet. Its a form of SEO.

EDIT (Adding this here to explain my point better):

My opinion is that the theory starts to make sense after you know how to use the models and have seen different models produce different results.

Very few people can read about bias variance trade off and in the course of using a model, understand how to take that concept and directly apply it to the problem they are solving. In retrospect, they can look back and understand the outcomes. Also, most theory is useless in the application of ML, and only useful in the active research of new machine learning methods and paradigms. Courses make the mistake of mixing in that useless information.

The same thing is true of the million different optimizers for neural networks. Why different ones work better in different cases is something you would learn when trying to squeeze out performance on a neural network. Who here is intelligent enough to read a bunch about SGD and optimization theory (Adam etc), understand the implications, and then use different optimizers in different situations? No one.

I'm much better off having a mediocre NN, googling, "How to improve my VGG image model accuracy", and then finding out that I should tweak learning rates. Then I google learning rate, read a bit, try it on my model. Rinse and repeat.

Also, I will throw in my consiracy theory that most ML researchers and such push the theory/deep stats requirement as a form of gatekeeping. Modern deep learning results are extremely thin when it comes to theoretical backing.


Learn top down, not bottom up.

Watch maybe one or two short videos on back propagation. You don't need to be muddled in the theory and the math - you can become productive right away.

Once you start playing with pytorch and tensorflow models (train them yourself or do transfer learning), you'll start to develop an intuition for how the network graphs fit together. You'll also pick up tools like tensorboard.

Also, do transfer learning. It's so awesome to train on a publicly-available high quality and large data set, train for a lot of epochs for good problem domain fit, then swap out your own smaller data set. It's magical.

I have a feeling that ML in the future will be like engineering today. You can learn by doing and don't need a degree or formal background to be productive and eventually design your own networks.

I have no formal training (save one undergrad course that was way outdated in "general AI"), and I've designed my own TTS and voice conversion networks. I have real time models that run on the CPU for both of these, and as far as I know they're more performant than anything else out there (on CPU).

Eventually you might start reading papers. (You'll be productive long before you need to do this.) Most ML papers are open access, but review (broad survey) articles might need pirating. Thankfully there are websites that can help you get these. The papers aren't hard to read if you've spent some time playing with the networks they pertain to. Read the summary, abstract, and figures before diving into the paper. It may take a few reads and some googling.

You do not need to be a data scientist. Anybody can do it. That said, a good GPU will help a lot. I'm using two 1080Ti in SLI and they're pretty decent.

I feel somewhat similarly. If you want to learn ML from the “ground up” that means learning math (at least a few subjects) to the senior undergraduate level, some numerical methods, some probability and statistics, sprinklings of other stuff before you even get to the models. And it’s not even clear that stuff is important for ML in practice.

I’m someone who took all those math courses and some grad ML coursework. And what that means is that I’m qualified to try and hack together some specific research level things that a practitioner will be confused by, and then try to write a paper about it. It doesn’t mean I’m qualified to do what the practitioner does. Frankly I never ran my code on anything other than MNIST yet and don’t know the different architectures or applications well, since they’re not directly what I work on. They’re just different things, as I see it.

> I have no formal training (...) I have real time models that run on the CPU (..) and as far as I know they're more performant than anything else out there > You do not need to be a data scientist. Anybody can do it. That said, a good GPU will help a lot. I'm using two 1080Ti in SLI and they're pretty decent

An alternative is that, by not knowing what you are doing, you may not see all the options that exist -- and when you hit a problem too hard, you just throw more hardware (GPUs) at it.

This is not to say it is not sometimes a valid approach, but I'd be wary of someone who say hasn't had any formal training in C, and says that his stuff is more performant that anything out there- just because lack of training causes not knowing stuff that already exists.

> An alternative is that, by not knowing what you are doing, you may not see all the options that exist -- and when you hit a problem too hard, you just throw more hardware (GPUs) at it.

Maybe some will. I just explained that I'm running my models on CPUs, so I'm actually developing sparse and efficient resource constrained models that evaluate quickly.

I've been working with libtorch's JIT engine in Rust (tch.rs bindings).

I'm currently trying to adapt Melgan to the Voice Conversion problem domain so I can get real time, high-fidelity VC without using a classical vocoder. WORLD works great and quickly, but it's a poor substitute for the real thing as it only maps the fundamental frequency, spectral envelope, and aperiodicity. Melgan is super high quality and faaast.

Are you working on VC (input: speech of one speaker, output: the same spoken content, but sounds like another speaker) or speaker-adaptive speech synthesis (input: text, output: speech)?

Also check out ParallelWaveGAN, another high-quality and very fast (on CPU) neural vocoder.

> You do not need to be a data scientist. Anybody can do it. That said, a good GPU will help a lot. I'm using two 1080Ti in SLI and they're pretty decent.

You can also use Google colab for a free GPU/TPU

>Also, I will throw in my consiracy theory that most ML researchers and such push the theory/deep stats requirement as a form of gatekeeping.

Learning the fundamentals of a field is supposed to be gatekeeping. It's what stops you from making stupid mistakes. The field of ML is littered with horrible errors made by people who don't know the fundamentals.

Please don't follow this terrible advice.

Doesn't it depend on what you're trying to do?

I think there's a huge difference between research and learning enough to scrap something together for a hobby project. The deep maths can come later.

I don't need to study compiler theory to use GCC.

Your analogy is wrong i.e. you are comparing apples to oranges. ML is very different from other "normal" computation systems.

* Non-ML: Input + {Rules} = Output

* ML: Input + Output = {Rules}

where "{Rules}" = Infinite set of possible "Programs" each of which is a trace through a very large state space of variables.

In the first case, we humans use all our ingenuity to write the program and tweak it to get the right results. We already know the difficulties involved in writing "correct" programs but have mastered it to some extent.

In the second case, you cannot do that. Your "Programs" are derived by the system and encoded in numbers. How in the world do you even know that your encodings are correct? This is why you need the techniques of Mathematics to transform (eg. Linear Algebra) and constrain (eg. Inferential Statistics/Probability) the output "Rules" so you can have some measure of confidence in it. This is the fundamental challenge inherent in ML.

> How in the world do you even know that your encodings are correct?

Easy, you know that they aren't and will ever be entirely correct for complex enough ML problems, just like humans. The ways to handle its errors is not an ML topic though, you just have to ensure via old fashioned system design that the system you build doesn't depend on any ML model to always output correct results.

You can say that about any field, discipline or skill.

At the same time, there is a difference whether one starts learning that, and one wants to apply it in a large, production system with social implications (be it advertising, medicine, or anything). Hobby projects, or even small startups, rarely fall in that region.

Moreover, even a profound knowledge of mathematics does not give any edge in ethics, or even - awareness of problems with real data (noise, bias, malicious use, social reception, etc).

I hope you're trolling because this is a guaranteed way to climb a peak of stupidity [1]. If OP is determined to get a bit deeper than 30 min guides on Medium, there is sure theory to learn. But it is merely second year of college, and probably you would like to skip Kolmogorov axiomatics and measure theory, it won't hurt your understanding of bleeding edge researches.

[1] https://en.m.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effec...

I disagree with you. Both ways work. Starting from theory, or starting from practice.

However, in a business setting, starting from practice is much more effective. As a lead dev and a manager who's had over 20 years of experience in AI/ML I've trained several engineers in building ML systems.

I always start with a business problem and point them to resources (frameworks, blogs, jupyter notebooks) to help them along. The problem is small enough for them to solve in less than a quarter. I avoid micromanaging them and will only answer larger questions by providing more resources. If they really get stuck I'll sit with them and walk through the issue. I have yet to have an engineer be unable to 1) get a model working and 2) tune it to production quality.

My opinion is that the theory starts to make sense after you know how to use the models and have seen different models produce different results.

Very few people can read about bias variance trade off and in the course of using a model, understand how to take that concept and directly apply it to the problem they are solving. In retrospect, they can look back and understand the outcomes. Also, most theory is useless in the application of ML, and only usefull in the active research of new machine learning methods and paradigms. Courses make the mistake of mixing in that useless information.

The same thing is true of the million different optimizers for neural networks. Why different ones work better in different cases is something you would learn when trying to squeeze out performance on a neural network. Who here is intelligent enough to read a bunch about SGD and optimization theory (Adam etc), understand the implications, and then use different optimizers in different situations? No one.

I'm much better off having a mediocre NN, googling, "How to improve my VGG image model accuracy", and then finding out that I should tweak learning rates. Then I google learning rate, read a bit, try it on my model. Rinse and repeat.

What usually happens is that people get something working, think they now know ML, but don't even generally know enough to understand the things they did wrong, and never end up getting to the theory.

The best approach is to learn both concurrently. Learn some theory, apply it and understand that applications including pitfalls, then learn a bit more and repeat. Incremental learning with a solid base. It's fun to hate on academia but this is how experts with deep knowledge of a domain get to where they are.

Sadly, you are 100% correct. I see the same problems over and over in newly published AI research papers.

That said, playing for 1-2 weeks might be a good start towards getting motivated for learning the difficult and dry theory needed to excel in this field.

I personally started with Kaggle competitions and lots of googling (duckduckgoing right?), but quite quickly hit the wall of not understanding, I felt like a mindless creature who makes a decision based on couple of guides out there. Watching lectures from Andrew Ng, reading some books helped a lot, but I can't see a reason why one doesn't wanna start with theory. It's no gold and glitter, and no one promised you that, unless you're really want to delegate your work to AutoML

I guess his point is to tackle it from a top-down approach. For me, that's how I am breaking ground in my ML study. I tried Andrew Ng's course, I didn't understand a thing.

Then I tried Kaggle's mini-course. It kickstarted me into ML and motivated me to learn the theory as I go. For example, when I got to apply Random Forest Regressor, I went to Wikipedia and tried to read on it. Got some idea. And the progress is good.

Maybe for some of us, I think top-down is motivating and makes the learning process enjoyable.

Same here. I tried Andrew Ng's course a few times ever since it launched a few years back but I could only get through half of it. Fast ai makes more sense to me and I've picked up a decent amount of concepts where I can now go back and feel confident enough to tackle theory.

The danger is throwing something into production without understanding bias and variance, overfitting (or other important concept) with potentially disastrous results.


One cannot do ML without some basic theoretical knowledge of Statistics and Probability. This gives you the What and the Why behind everything. GI-GO is more true of ML than other disciplines. The techniques used are so opaque that if you don't know what you are doing, you can never trust the results.

One thing that made the Uber fatality possible was their over-confidence in their AI, which they apparently did not fully understand. They considered it unnecessary and disabled the car-integrated emergency collision breake system ...

“Scientists start out doing work that's perfect, in the sense that they're just trying to reproduce work someone else has already done for them. Eventually, they get to the point where they can do original work. Whereas hackers, from the start, are doing original work; it's just very bad. So hackers start original, and get good, and scientists start good, and get original.” - Paul Graham in Hackers and Painters

Which, well, I use as an opening quote to my intro to deep learning, https://github.com/stared/thinking-in-tensors-writing-in-pyt....

BTW: While information theory is everywhere, I have to yet see where measure theory makes a practical impact on practical deep learning. The importance of pure math for practical machine learning is highly overrated (and I speak as someone who did study that).

> All of the theory and such is mostly worthless

No. You will not get beyond copy-paste level without being comfortable with ML foundations. That doesn't mean you need to be able to prove variational inference bounds in your sleep, but you'll want to know why we need things like lower bounds for approximate inference.

>No. You will not get beyond copy-paste level without being comfortable with ML foundations.

but everyone else in here is hyping fastai, which is not just copy-paste but wrapped copy-paste at that (so you're not even learning pytorch).

Sure, go through the fastai material and maybe write a blog post about how you learned ML (read: DL) in a few months. What you really learned is copy-pasting code (as you mentioned) and some neural net tricks (like a good learning-rate to start SGD).

How to learn ML? Do fastai + reading Daphne Koller's and Chris Bishop's books on PGMs + re-implementing a paper on Gaussian process classification + another paper on GNNs + ....

bishop's book is a good suggestion (i prefer hastie) for ml but you have to admit that

1. fastai is neural nets 2. bishop's book (and whoever else's) are grad books that require considerable mathematical training to really profit from 3. the aforementioned books don't teach anything practical!

so ultimately i completely agree with the op of this thread - just jump in and read around when things don't work how you expect.

Just go for it. Learning the math just helps you understand it’s not magic, like learning to program helps you understand computers aren’t magic.

As someone that learned a good bit of the math and implemented NN code with backprop from scratch, I agree with the parent. To learn the math and get better results than cutting edge ML researchers would be as likely as winning the lottery.

As an exercise, the math is fun to learn and not terribly complicated for backprop type of stuff.

For what it's worth, this is basically the learning model fast.ai works on. You start by just applying pre-built models to things, then learn how to tweak them, then learn the theory that makes the tweaks work.

kudos to OP! AI & ML are also on my list for 2020!!

> All of the theory and such is mostly worthless, its too much to learn from scratch and you will probably use very little of it.

i, too, believe in code before theory. but not for stats, artificial intelligence, or machine learning, numerical computing, etc. why?

because, for instance, if you compare a popular & successful machine learning framework to a "build your own deep neural network in 150 lines of python", the difference as far as data structures or programming constructs choices will be staggering.

especially if you are an experienced programmer. or just someone who cares about the data structures and programming constructs in the first place. but these choices are not accidental!

you will find that "parameters" are represented by a "class", ie. objects with associated operations and not values. why? because you want to do things like accumulate contributions to derivatives, and all these other calculus things i thought i was never going to ever use.

theory is important for people who truly care!

ML engineer here. I didn’t take any ML classes in college and picked up most of what I know on the job.

I think this advice is directionally correct - reading through a theory-dense textbook like Bishop, which many consider to be a foundational ML textbook, is likely to be a bad use of your time. However, I think it does help to start with some theory, if only to give you the vocabulary with which to think about and get help with issues that you run into. At the risk of sounding like a broken record, Andrew Ng’s class on Coursera (https://www.coursera.org/learn/machine-learning) is quite good - it’s accessible with a bit of basic calculus knowledge (simple single variable derivatives and partial derivatives are all you need) and basic linear algebra (like, matrix multiplication). The whole class took me around 30 hours to get through, so if you’re determined, you could probably finish it in 2-3 weeks even if you’re pretty busy.

Also, if you like having text notes to refer to, I made these notes for myself a few years back when taking the class: https://github.com/tlv/ml_ng. There are some spots where, for my own understanding (I’m a bit of a stickler for mathematical rigor), I added more of the reasoning/equation pushing that Ng glosses over in his lectures. I would say that for a practical understanding of how to apply the concepts covered in the class, there’s no need to read those parts carefully (there’s a reason why Ng glossed over them).

But yeah, to all the people saying you should start by reading entire textbooks on multivariable calculus, statistics, and linear algebra...that’s not necessary. Most ML engineers I’ve met (and even most industry researchers, although my sample size there is much smaller) don’t understand all of those things that deeply.

Also, one last semi-related note - if you’re reading a paper and get intimidated by some really complex math, oftentimes that math is just included to make the paper look more impressive, and sometimes it’s not even correct.

Without experience in ML it's often hard to know what problems are solvable, how to frame the problem, and to tell a good solution in Github from a bad one, etc.

If you want to go an applied route I'd suggest starting somewhere like Kaggle and looking through the competitions for ones vaguely similar to yours. They've done all the hard work of choosing a challenging but solvable problem, sourcing and splitting the data, and choosing a metric. You then can see what techniques actually work really well, and benchmark different approaches. Academic challenges like Imagenet or Coco are also good for this, but you'll have to work harder to find relevant resources.

Once you've done this a couple of times, you can start framing your own problems, collecting and annotating your own datasets, deploying and maintaining models.

One thing I’ve personally seen is software engineers with an interest in deep learning use it to solve very simple problems that just need a linear statistical model. That’s a risk you take, and one reason “gatekeeping” happens.

If you want to raise your salary from $10 to $20 per hour, playing with existing models is the way to go.

If you want to make serious money solving real problems, take the time to learn about automated differentiation and all the related mathematics about how gradients flow backwards through the network.

But like the coding slave (great nick BTW) said, first play a bit, then learn how it works. Image transformation GANs are a lot of fun.

Here's why the learning part will be crucial to differentiate you from all the clueless outsourced cheap labor:

Recently, there has been a load of new AI papers by so-called scientists on optical flow, and even the greatest new approaches using millions of parameters and costing hundreds of thousands of dollars to train still DO NOT reach the general level of quality that the 2004 census transform approach had.

Similarly, there have been high-profile papers where people randomly chained together TensorFlow operations to build their loss function, oblivious to the fact that some intermediate operations were not differentiable and, hence, their loss would never back-propagate. As a result, all of their claims had to be fraudulent because one could mathematically prove that their network was incapable of learning.

The larger AI competitions have by now limited the number of submissions that teams are allowed to make per week, simply to discourage people from trying to guess the test results when their AI doesn't work as it should.

Or consider the Uber pedestrian fatality where their neural network was overtrained ( = bad loss function ) to the point where it was unwilling to recognize bicycles at night.

And lastly, not knowing about gradient descent will just waste boatloads of money by 100x-ing your training time. Most stereo disparity and depth estimation AI papers use loss functions that only work on adjacent pixels. That means for a single correction to propagate to all pixels in a HD frame, you'll need 1920 iterations when only 1 could be sufficient.

You will find that my examples are all from autonomous driving. That's because here the discrepancy between GPU-powered brute force amateurs and skilled professionals is the most striking. German luxury cars have integrated lane-keeping, street sign recognition, and safety distance keeping for 10+ years, so for those tasks there are proven algorithms that work on a Pentium III in real time. And now there's lots of NVIDIA GPU kiddies trying to reinvent the wheel with limited success.

For your future employer, you having a firm grasp of how gradients work is the difference between mediocre and state of the art results, and between affordable and too expensive. So if there is one single AI skill that is both exhausting to learn and crucially important, it is differentiation and gradient flow.

https://www.tensorflow.org/tutorials/generative/cyclegan Just click [Run in Google Colab] on the top left and start evaluating the code blocks.

Awesome answer. Any particular resources that you can point to in order to learn this?

As others have pointed out, terrible wrong-headed advice to ignore all "Theory". ML cannot be studied by a scatter-shot approach but needs a systematic plan with Theory first followed by Practice and constant iteration between the two.

When I've learnt something it often is helpful to get well known problems so you get to compare to how other solve it too. Kaggle was good for big data stuff like that. I'm not sure about ML.

true for other fields than AI as well. theory most of the time makes sense at the moment you need it for a practical problem.

I definitely agree that you don't need to go deep into theory to be able to do useful things. But I think the bias-variance tradeoff is a very bad example of "useless theory". It's essentially just another name for overfitting/underfitting, which are approximately the most important ML concepts there are.

I would again argue, the natural progression for this concept would be:

1.) Trains classifier 2.) My train error was so low! Why is my validation error so high 2.) Googles -> Why is my classifier training error lower than my validation error 3.) Learns about overfitting 4.) learns about bias variance

Its always a natural progression. Reading about this stuff without encountering it means it usually doesnt stick, and really doesnt make that much sense.

If you already have concepts of training and validation error then you're already there. The risk is not realising you can't test on your training data, or more subtly that you can't tune hyperparameters on your test data.

True, but I guess it depends on the person. Was just trying to give HN a view of how I write code. I've found it to be faster, but I go in knowing I will be doing a ton of googling.

This is one of the very few (!) concepts you need to know to get practical with ML. Why not watch a few videos on the concepts before you begin? They are all using high-school math anyway.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact