Start with (re)learning math. Take those boring university level full length courses in calculus, linear algebra, and probability/statistics. No, unless you're a fresh STEM graduate, 10 page math refreshers won't do. If you're self-studying, make sure you do all exercises and take practice exams to test yourself.
1. (Re)learn C/C++, as well as a linear algebra library, such as Numpy or MATLAB. You will also have to learn parallel and distributed programming at some point (CUDA, MPI, OpenMP, etc). Next take a boring university level full length course on algorithms and data structures.
2. Get a book describing ML algorithms, and implement them yourself, first using plain C, then with MPI or CUDA, and finally using plain Numpy/MATLAB, or one of the low-level ML frameworks (Theano or TensorFlow).
Finally, start doing ML. Not learning about it, doing it. Choose an application that interests you (computer vision, NLP, speech recognition, etc), and start learning what you need to make something work). Focus on specific, practical tasks. If you don't have any particular application in mind, go to Kaggle, choose a competition, and read what models/tricks the winners used. Then jump right in and start competing.
The first two requirements might take years to master, but if you skip them, you won't be able to do any serious work in ML, or even understand latest papers. You will be a script kiddie, not a hacker.
And you end with patronizing insults. Do you feel better?
That sort of crap doesn't happen in other, mature engineering fields. At least not to the extent it does in computing. It's as if there is some unaddressed need to be considered among the intellectual elite that festers and ferments into casual, ego driven aggression.
Second, with the exception of civil engineers other disciplines have no problems getting jobs without licenses. The license just opens more doors and makes finding work easier.
Actually I feel like I forgot a lot of math since university and I would like to pick it up again. Does anyone have a resource they can recommend either for self-study or in person?
Not sure if that's what you're looking for, but it seems to have all the stuff I forgot from college.
If you don't know that stuff but know generally how ML processes work (measurements, models, data partitioning and cleaning), and you start getting to know the frameworks (e.g caffe/tensorflow/torch) you'll do just fine.
Like, you can know what MapReduce is and how it works and just use abstractions over it, never having implemented it yourself, and you're still a data engineer.
It's a myth that X engineer needs to know everything about field X. There are different levels of everything, and different ways to get there.
You don't seem to understand what high-level and low-level mean, why on earth would you write something in C, and then re-write it in MATLAB? The point of the higher level languages is rapid development at the expense of performance.
Re-writing code in a different language is quite a basic skill and not something to waste time perfecting. Instead the focus should be on understanding the language/systems themselves on a deeper level.
No offense, but it sounds like you have a lot to learn before you should be giving out advice. Seriously you're talking about kaggle? Yet you don't make a single mention of computer science?
I first learned about neural networks from deeplearning.net tutorials. They use Theano, so when I got a convolutional network running, I was left wondering about two critical pieces: backpropagation, and the actual convolution operation. Theano completely hides them from a user. So I decided to implement a simple convnet using only Python and Numpy. That was when I realized that I had no idea how the backprop works for convnets. I'm guessing most people who only used Theano or similar framework to run convnets don't know how backprop works there.
Next, I realized that the code I wrote was too slow for any practical purposes, and I decided to rewrite it in pure C. Well, actually I gave up half way and used Armadillo package (Numpy analog for C/C++), but the end result was my convnet became 5 times faster. It was still too slow, and I noticed that my code is using only one core out of 4 cores available on my computer, so I decided to parallelize it, using Cilk+. It was actually easier than I expected, and soon I got almost linear speedup using all 4 cores.
The final exercise was to implement it with CUDA, and I got another factor of 5 speedup running it on my GPU.
The benefit of this experience is enormous - I learned a lot about convnets, I learned a lot about tools and libraries useful for ML, and I learned a lot about making my code faster.
p.s. I don't quite understand what you meant about Kaggle and CS.
p.p.s. And yes, I do still have a lot to learn, but I wish someone gave me this advice when I was starting out.
Computer science is key to all of this, understanding how to write performance code isn't just in the programming skill, but it's also in understanding the complexity theory, graph theory and computer architecture.
When going for a job at these places, (you did specify 'ML Engineer', this is different from a researcher), they will grill you to make sure you understand how to use a computer. They want you to know it inside out, not just having done some tutorials, but actually able to debug deep complex issues and performance tune.
Getting familiarity with modifying the linux kernel is much more important than having done some silly competitions/tutorials. You have to remove the marketing material from reality.
It's not about being able to "write something in C", it's about understanding the conception of every bit of code, and what is happening underneath, to engineer a quality solution.
I did mention a course on algorithms and data structures as a requirement, that's where you learn about complexity, graphs, and other things.
I'm guessing you don't know what Kaggle is. It's not a silly competition. It's a place where you get exposed to real-world problems, using real-world data. This is a good alternative to an actual internship as a ML/Data engineer.
You expect ML engineers to know how modify the Linux kernel? Do you also want them to know how to design a superscalar processor in Verilog? How about simulating circuits with SPICE? I don't think so.
Kaggle is just a hobbyist competition. How many engineers do you think have used Kaggle to gain their employment? I get that it's hard to understand from the ivory tower of academia what is actually required in the real world.
You seem to keep missing the point, an engineer should be comfortable with working on the low level, understanding performance tuning and they should have a solid understanding of architecture on the low level.
Experimenting with the linux kernel is one way to get an intuitive feel for how things work, not a requirement.
You also seem to keep misunderstanding the domain space, do you even know what an engineer does? What do you think they're doing all day? They're writing high performance code, not solving stupid riddles. They will almost never write anything MATLAB, they engineer things to specifications.
I hope your university has an internship, after you complete it maybe you'll understand then.
Because knowing how to implement something in C does not mean you know how to implement it in MATLAB, and the best way to learn to do both is to implement the same thing using both languages.
How many engineers do you think have used Kaggle to gain their employment?
I know two people personally who were asked, and bragged about, their Kaggle experience during ML engineer job interviews. Moreover, several of ML positions I was interested in had Kaggle experience mentioned as a desired qualification in the job description.
an engineer should be comfortable with working on the low level, understanding performance tuning and they should have a solid understanding of architecture on the low level
If you reread my comments, you will see that this was kind of my point (and a lot of people here disagreed with me on this). However, learning the details of the Linux kernel is not the best way to learn about computing, and is definitely not the best way to learn the skills needed to do machine learning.
do you even know what an engineer does?
Before enrolling in a PhD program, I worked as an engineer and an engineering manager for 12 years. Since I went back to school, I did two ML related internships, and I occasionally work on freelance ML projects for local companies.
What are your credentials, relevant to this discussion?
Learn Theano, TensorFlow, and maybe Torch. They'll handle the low level details for you. You just have to know some math to use them.
If you just want to dive in and be immediately productive, you'd be hard pressed to do better than sci-kit learn and keras (for machine learning / deep learning respectively)
Which one would you recommend?
Learn machine learning tools: XGBoost, Scikit-learn, Keras, Vowpal Wabbit.
Do data science competitions: Kaggle, DrivenData, TopCoder, Numerai.
Take these courses: https://www.coursera.org/learn/machine-learning http://work.caltech.edu/telecourse.html
Work on soft skills: Business, management, visualization, reporting.
Do at least one real-life data science project: Open Data, Journalism, Pet project.
Contribute to community: Create wrappers, open issues/pull reqs, write tutorials, write about projects.
Read: FastML, /r/MachineLearning, Kaggle Forums, Arxiv Sanity Preserver.
Implement: Recent papers, older algorithms, winning solutions.
As a software engineer you have a major advantage for applied ML: You know how to code. AI is just Advanced Informatics. If you want to become a machine learning researcher... skip all this and start from scratch: a PhD. Else: Learn by doing. Only those who got burned by overfit, will know how to avoid it next time.
They're called university course offerings.
Just look up a class you want to take, figure out what the prerequisites are (often listed in the course description or syllabus), and follow the chain of prerequisites until you hit a level that's appropriate.
You can easily look do this with a few different universities and just compare their courses for a particular subject, find the most common textbooks, etc.
And nowadays, this is even easier since you can find a MOOC or online course for basically any course in the DAG leading to nontrivial machine learning.
(Programming languages should never be a factor, unless you're choosing between two otherwise equivalent offerings.)
If you are going to cherry pick concepts it may be better to just go full bore into the classes that most interest you. You will be motivated to pick up the concepts you are lacking and the learning profile will exactly fit the specific course. You wouldn't want to do this in a college setting because you are paying a lot for each hour, but as an auto-didact the time investment is on you.
If you already have a background in CS or Engineering you can probably pick up the additional concepts with focused study/refresher.
The class you're currently looking at might only need that tiny subset, but what about classes you'll be interested in taking in the future? Do you really think it's more likely that that same tiny subset will be sufficient, or is it more likely that the utility of the class as a whole studied deeply rather than shallowly will allow you to tackle more advanced topics?
The analogy to building a foundation is very apt and especially so for a subject as ubiquitous and illuminating as linear algebra.
The course prerequisite list is, I think, usually a good balance of the right amount of information. You should usually prioritize understanding those subjects deeply if you want a high ROI thing to spend your time on.
I strongly disagree with this statement. I see many college curricula as hugely time inefficient. Especially when you want to learn deeply on a specific topic. I personally disregarded the prerequisites for many elective engineering classes because of scheduling constraints. And anecdotally I found you only needed small pieces of the prereq's.
You can argue for a 'foundation' all you want. I was simply stating that I would like a detailed list of knowledge required to take a course instead of a generic 'you need a 100 level linear algebra course.' A detailed list would then allow people to decide if they know/remember enough info or need to learn/brush up on a topic. Speaking of linear algebra, seems like a struck a nerve with my made up example.
I also think it is a disservice to say that you need 'understanding [of] those subjects deeply if you want a high ROI.' These kind of statements discourage learning. I have seen many people struggle with so called necessary prereq's only to flourish in more advanced classes.
People who only ever search for the quick secret trick to learning X never end up learning X for anything more complicated than for toy examples unless they already have the technical foundation upon which more advanced courses can rest.
I've seen far more people struggle in courses because of weak foundations of prerequisite knowledge than I have people who were irrationally intimidated by simpler material. Weak foundations are arguably the greatest source of problematic learning outcomes in education.
It seems strange to me that a list of prerequisites isn't sufficient. You can always just pick up some old exams or homework assignments (your own or some publically posted ones) and quickly see if you remember.
The type of person who would be discouraged by the idea that becoming fluent in sophomore mathematics would have a large ROI was never going to learn anything nontrivial without changing their mindset.
I would argue that the current pre-req system is responsible. It lumps a full class as a pre-req. And someone can pass a full class while not understanding pieces of it. Those pieces could be the essential ones needed for the specialty knowledge a person pursues. If instead of saying generally, this requires a 200 level linear algebra course and instead enumerated the topics from that course that were necessary than people would know the foundation that is needed. You saying that 'fluency' is required in some broad course is intellectual elitism.
"I see the pursuit of quick and shallow ways of learning advanced subjects"
Also in what way is clearly defined prerequisite knowledge 'shallow'. And where is this coming from:
"People who only ever search for the quick secret trick to learning X"
You are injecting biases into this debate that weren't there in the first place to make it seem like I am advocating some kind of get rich scheme for learning things. I am actually advocating clearer and more descriptive knowledge ontologies. This would allow for better learning efficiency and overall may benefit the field. People would be able to spend more time on depth in the field they want to pursue.
Also using 'never' in an argument is a habit you need to break. It is hyperbolic and usually trivial to find counter examples.
Its still work in progress, but we have published the first two parts. Does this help? Feedback on the format and the content would be appreciated.
Part 1: The Best Intro to Programming Courses for Data Science 
Part 2: The Best Statistics & Probability Courses for Data Science 
They offer nanodegrees , check syllabus of nanodegree and follow the free courses acoording to syllabus.
For machine learning udacity courses focus on python.
PS: I am a graduate of Data analyst nanodeg @udacity
People generally think in terms of how much time they have - these lists are great but hard to act on - I'd like to know if I spend x hours, I'll learn y skills
* Metacademy (http://metacademy.org) If you just want to check out what ML is about this is the best site.
* Better Explained (https://betterexplained.com/) if you need to brush up on some of the math
* Introduction to Probability (https://smile.amazon.com/Introduction-Probability-Chapman-St...)
* Stanford EE263: Introduction to Linear Dynamical Systems (http://ee263.stanford.edu/)
* Andrew Ng's class (http://cs229.stanford.edu)
* Python Machine Learning (https://smile.amazon.com/Python-Machine-Learning-Sebastian-R...)
* An Introduction to Statistical Learning (https://smile.amazon.com/Introduction-Statistical-Learning-A...)
* Pattern Recognition and Machine Learning (https://smile.amazon.com/Pattern-Recognition-Learning-Inform...)
* Machine Learning: A Probabilistic Perspective (https://smile.amazon.com/Machine-Learning-Probabilistic-Pers...)
* All of Statistics: A Concise Course in Statistical Inference (https://smile.amazon.com/gp/product/0387402721/)
* Elements of Statistical Learning: Data Mining, Inference, and Prediction (https://smile.amazon.com/gp/product/0387848576(
* Stanford CS131 Computer vision (http://vision.stanford.edu/teaching/cs131_fall1617/)
* Stanford CS231n Convolutional Neural Networks for Visual Recognition (http://cs231n.github.io/)
* Convex Optimization (https://smile.amazon.com/Convex-Optimization-Stephen-Boyd/dp...)
* Deep Learning (http://www.deeplearningbook.org/ or https://smile.amazon.com/Deep-Learning-Adaptive-Computation-...)
* Neural Networks and Deep Learning (http://neuralnetworksanddeeplearning.com/)
* Probabilistic Graphical Models: Principles and Techniques (https://smile.amazon.com/Probabilistic-Graphical-Models-Prin...)
I have also found that looking into probabilistic programming is helpful too. These resources are pretty good:
* The Design and Implementation of Probabilistic Programming Languages (http://dippl.org)
* Practical Probabilistic Programming (https://smile.amazon.com/Practical-Probabilistic-Programming...)
The currently most popular ML frameworks are scikit-learn, Tensorflow, Theano and Keras.
* Neural Networks and Deep Learning (http://neuralnetworksanddeeplearning.com/) - perfect overview, go over it twice at least (the second time you will understand much of the decisions in the start)
* Tensorflow and deep learning, without a PhD (https://www.youtube.com/watch?v=sEciSlAClL8) - as much as I hate video lectures, this one was worth it; a good complement to the book above
* Theano Tutorial (http://deeplearning.net/software/theano/tutorial/index.html) - using Theano or TensorFlow takes some getting used to. I found TensorFlow documentation absolutely horrible for beginners, probably because the authors expect users to already know such frameworks. Once you learn Theano you won't have trouble with TensorFlow (if that's what you want to use).
Then there are more specific papers, but I guess those depend on the problem at hand.
When CS majors or programmers estimate how fast/efficient code is they generally are intuitively applying big 'O' notation or some similar concept. This can be gained from an intro level algorithms class. Or you could look up and read about the topic.
The classical coding interview questions mostly stem from an intro algorithms and data structures class. Or you could get a book on interview prep and work through a couple a day.
Finally, the coding style issue usually is ancillary knowledge and not found (or at least not done well) in typical CS curricular. I would suggest the book "Clean Code" by Robert Martin. It goes through code smells, commenting, general style, and is well written.
So overall, you could look at Algorithms and Data Structures MOOC's. Also read a book on CS interview prep and writing maintainable code. Hope this helps a little. Good Luck!
Wondering what could be the rationale? I mean is mobile-dev so critically different from ML?
"Mobile-dev" is more or less application development for a target client, possibly with a client-server relationship. It may incorporate the products of a ML system, or any number of other systems, but is itself the same application development model we've had for 30 years, but with a mobile client.
Machine Learning is applied computational statistics. It and its close cousin "data science" have become incredibly hyped the last few years. I suppose it's a matter of taste, but my view is that applying simple probability to a problem (e.g. "recommending" based on picking the thing most frequently voted up by other users and adjusting the recommendation as data comes in) isn't "machine learning" or "data science", but I've seen it called such by candidates I've interviewed and even one colleague. It's not that simple, and trivializing the terms just contributes to the hype.
It really depends on what you mean by 'mobile developer'. If you're the kind of guy that can open up the RFC for SNMP and turn that spec into code, you're probably in decent shape. You can take abstract, but very specific, description and turn it into reliable code.
Even better would be to take an image processing paper and turn that into code. Maybe something like seam carving to delete stuff from photos. This is better than an RFC, because papers skip steps that are 'obvious' to readers. Video stabilization, AR, there are a bunch of applications that have more intense math.
At the end of the day, it's just "finding the minimum" of some function. (like programming is just zeros and ones, easy right?) Each paper is some trick to descend a gradient in a new way, or hop over a local minima or avoid overfitting.
You don't really need to understand a whole lot. You can patternmatch math syntax and retype that as code. if you make an error, debugging will suck in that case. But really, you can do a lot of damage without knowing a whole lot. Some linear algebra, some diff eq. You don't have to be super fast at it, there's nothing wrong with taking a couple of days where a math grad student knows off the top of their head. The goal is simply faithful translation.
So if a developer just sticking buttons on a panel, that mobile developer probably wouldn't make it. But there are a bunch of mobile developers that could make the switch.