First, I went through the Recurse Center, which is a 3 month program sort of like a writing retreat for programmers. I learned a lot about Python and AWS in that time, and got an internship as a data engineer.
In that Fall, I started a computer science master's. I've taken mostly courses in machine learning including: Machine Learning Theory, Deep Learning, Probabilistic Graphical Models, NLP, and GPUs. I've collaborated with two professors on research papers, which has definitely been the highlight of my degree although I definitely think the courses were necessary as I continue to use the information that was covered.
Finally, I'll be starting this summer as a research engineer doing deep learning! This process took me 2.5 years, but I feel very prepared for my new role. It probably is possible to do this faster by joining a program like Metis or Insight, which prepare you for data science like jobs w/in 3 months. I would say that approach is slightly more challenging / high risk. If you really want to go into machine learning, I'd say doing the degree is a more surefire approach, granted it's more expensive in time and money.
And congratulations for you achievement and your bravery to quit the job!
That is VERY different than what 99% of people should be doing with ML which is: Spinning up some K80s on Azure, installing TF/CUDA/OpenCL, pulling existing pre-trained models off the shelf, and running inference on a novel data set.
That's how you get into it as a garden variety dev.
Otherwise, go for the PhD if you want to actually make new stuff.
I have worked with people who get a training example code and apply it to a dataset. And few weeks later they were still pulling off their hair because the model wasn't working in production but they have such a great results in their test. I took a look to their way of doing the training and I could point to so many errors they were doing why the model will never work in production.
That is not cutting edge, but at some point there is a new model that works better, and you should understand why in order to improve you current model. So probably you will have to read the paper and understand it.
The garden variety dev shouldn't be trying to implement a research paper or train new models - that's the point. There are enough proven tools out there to do good work and more are being put out there every day.
> Spinning up some K80s on Azure, installing TF/CUDA/OpenCL
I think a single k80 instance is roughly ~$1/hr. If you had an experiment running 24hrs a day for a year, you'd spend a little over $8.5k. You can build an equivalent desktop machine for less than $2k , which might be slightly more convenient (once it's built), although I haven't really factored in energy costs.
> That's how you get into it as a garden variety dev.
Btw, you don't really need a GPU to start learning about deep learning. You can train a SotA modal on MNIST using Caffe I think in roughly 10m on CPU (maybe 1m on GPU). You can also train a reasonable sentiment classifier or natural language inference classifier in less than an hour on CPU. My perception is that these types of tasks are really solid for someone who is beginning to learn about machine learning or deep learning, as they'll provide a playground to mess around with different optimization techniques (SGD v. SGD+Momentum vs. Adam vs. etc), regularization (L1, L2, dropout, batch norm, etc), data augmentation, error analysis, and so on. If you do an ML interview for an entry level position, chances are these are the types things they will ask about.
I guess deploying ML solutions for a company you are working at is a different story.
> Otherwise, go for the PhD if you want to actually make new stuff.
There's some truth to this! PhD (like a Master's) probably doesn't make sense most of the time as a dollar-efficient career move. Rather, it's something you should pursue if you find being in an academic environment personally satisfying. You definitely don't need to be in a PhD program to work on new stuff (although it might make things easier because you will hopefully be surrounded by lots of fresh ideas). I've heard about people in bootcamps working on novel research. Now that so many powerful tools are open source and easy to use (Pytorch, Tensorflow, etc.), it's pretty easy for anyone to put together a novel model.
Unless you have a novel data set and a way to quickly train you're probably better off using existing trained models in most cases.
I agree with the transfer learning piece wholeheartedly though.
Training models often is tricky, but it's not that hard, my experience shows that decent undergrads learn to train standard models on their own datasets after a single one semester course, and train quite difficult models after two semesters; so teaching/learning basic ML takes comparable time and effort to e.g. teaching/learning basic JS frontend development.
So if some company's IT department has some minimum ML skills, lack of expertise shouldn't be preventing them from training models. And even more so, using your own data (IMHO) is the whole point of adopting ML; if the problem is so generic that you don't need to adapt it to your data, then you shouldn't be learning to use ML but rather buying and integrating a SaaS API run by someone else.
Which is a form of hard...for example if you need 60,000 semantically labeled images, you need to train people to know how to do that specific of labeling and then have them do it, then QC the data, break it up into training and validation sets etc...
Don't forget that this advice is for a front end dev who hasn't ever touched caffe or torch or whatever. In many cases it takes new people a week to set up drivers and an environment on a GPU.
I've got the programming, math, some stats, and am currently involved in research, but I only know a little about deep learning. I'm currently working my way through the course.fast.ai deep learning courses and am going to do Part 2 when it is released.
Any other resources that would be useful for getting a job in this area? Best to just work on my own projects?
Edit: for the Master's degree.
I am happy to give more targeted advice on grad school. Please send me an email at andrew [at] mrdrozdov.com.
Did you throw enough CPU at it to compare results ?!
So I have started using IBM's Watson platform, and some of Google's AI tools. I was specifically interested in speech processing applications (I have some background in signal processing and audio, which helps a little), and I've found the Watson stuff particularly useful.
At the end of the day, if depends on your motivation. If you really want to become a true expert, stop reading this and start studying. Otherwise, I think there does exist a significant "gap in the market", as it were, to build useful front ends to these technologies, which currently exist as raw APIs.
In terms of career prospects, I have already met several Watson consultants, who do exactly that, and charge top dollar for it. The plain fact is, it doesn't take very much to be considered an AI "expert" in the current climate. And you're probably more likely to get there quickly by standing on the shoulders of giants than by dedicating your life to a PhD.
Basically you get your source speech as an uncompressed WAV file, create an IBM Bluemix account (free trial), create a Watson "app" on the site (basically gives you some credentials for calling the API), and then write a script to upload your WAV file to the API and decode the JSON response.
It gets more complex when you want to start parallelizing the process to make it faster, and dealing with the results in an intelligent manner, but the initial proof of concept is remarkably easy.
If I recall, the Google one was even easier - no script at all, did it all with curl I think.
What I've found after taking off for 12 months of self-study, is that it quickly dissolves into you must know math. As far as I can tell, its take a problem, map it into a vector space, then use the full power of mathematical analysis on it .
There is a huge push by large companies to make AI as a service though, and for that, you only really need to know how to use the APIs.
Which makes me wonder, why in hell have I done burnt through all my savings for this. Sure I have a new found love for math, but I'm not going to be accepted as a mathematician ever without the rigor of a formal education, and if I just wanted to use APIs ... I could have continued to do what I was doing.
I think there are two separate things here:
1. Being a research mathematician requires a degree of expertise which is easier to come by with formal education.
2. Knowing how to use APIs and make correct distributional assumptions; despite the bullshit fed by our industry, it is not easy or non trivial to design a completely idiot proof API. So having know-how of how the math works under the hood is helpful even if you are going to just use the API.
I would encourage you to take part of some Kaggle competitions to get a better feel for the practical aspect of machine learning.
That level of math helps to model the problem domain. The part of modeling the problem is to see that everything in ML is a graph. So you can look at it from that point of view as well, at least computational wise. Mapping the math to the graph is the heart of it all.
Honestly I haven't started the job search, I plan to start in earnest in about a month.
Udacity seems sure about placing us in jobs though, apparently, the market has a great demand and a small supply.
Do you have any advice ?
Kidding aside, I've seen these kinds of posts so many times and I want to make sure that for those that are thinking "strategically" about their profession, career and passions then I would advise to buckle down with a good BS in Math at a minimum (or CS).
But why? Im your older self telling you that you will grow to really really like and enjoy programming, computers, tech etc. and may want to continuously dive deeper. And when you attempt to do that it all comes down to Math. So save yourself a ton of time and money and just do it, close all your browser tabs, cobble together all your transcripts and get into a Math program (if you already have, go get a MS in Math at a uni that has a strong CS program).
The only thing that sucks about it is all the GE classes I have to take before I can even start my fun Comp-Sci classes.
In terms of the age thing, I get it. At 19 I hated being at school because I wanted to be making $20+/hour working on websites. But now at 25 and right under 6-figures, I see that I can simply work during the day and then go to school at night. It just means I have to do that for 4+ years. I started at 24, but I will not be done until I am 30. So, if I would have done it right the first time, and not dropped out, I __might__ have been further.
I don't regret it tho. I rolled the dice and although I was trying to hit the start-up lottery at that age (why I got into programming), I ended up on a great career with incredibly valuable experiences that I would not have had if I didn't drop out at 19.
8/10 would do again.
This is the link to his channel
When I stopped feeling like I was getting anywhere with front-end I started to take on more back-end projects and sell myself as a full-stack developer.
I've never been in a situation where I've felt "man, I really should have gotten a CS degree": but what do I know; I'm just a web developer.
Maybe we're not talking about web developers?
No advanced engineering skills have been required to make the frontends/backends/architect a system for Fortune 500s and local businesses that I've been involved with. (I'm speaking of things I would have learned in school with CS and math principles had I been able to get past CS 101).
-- college dropout with successful programming career.
I am a bootcamp grad who specialized in undergrad in electrical engineering. I worked for a few years as a research for the one of the top tech universities in the country; I have found that I have a stronger math and physics background than many CS grads. However, many times when doing something I have been stopped by saying "wait aren't you a bootcamp grad? can you even understand this?"
I'm not alone; my bootcamp had plenty of STEM majors. One of colleagues was a biomedical engineer and worked in a research lab and now works in the front end at a top firm in SF. Another was a math major at an Ivy before working full stack. The top guy at my bootcamp went to Berkeley in Biochem and was way way smarter than me.
So stop generalizing. I understand that they are unaccredited institution so you get a wide variance of talent but you can't shit on everyone.
To be honest, your issues probably in how you screen talent. My current company has found success in hiring bootcamp and top schools in the area (Berkeley, Stanford).
It's just a very simple fact that 90% of the bootcamp grads can't program their way out of a paper bag. In fact, most self-trained programmers are vastly better than even the best bootcamp graduates.
The problem with bootcamp grads is that they don't know what they don't know. And they don't know a lot. Undoubtedly some of them will turn out to be great programmers, but not after 3 months. Or even 6. Or even a year.
And you know how I know what I don't know? I constantly read, get mentorship from senior engineers at my company, build side projects, etc. The learning process hasn't stopped and it hasn't for many of my colleagues.
I do agree self trained programmers are better, because frankly thats way harder.
I took them both to mean "no formal schooling" and "self-taught". In my brief experience with I.T. training; some knowledge already had to be assumed.
Wanted to know for my own financial planning.
I don't even know why you are screening these candidates, the profile you are talking about doesn't even hit our hiring team usually or gets binned within 5 minutes.
I really think you have a recruiting process problem.
As a bootcamp graduate, what do you want me to ... do? I can't go back in time and major in CS. My employer is satisfied with my work and I build things I'm asked to build independently. Should I give up a startup salary, inflated as it may be, and ship myself off to a CS monastery?
In more than a dozen years of programming, never once have I had to talk to someone about IPC. And if someone pulled that sort of higher-than-thou test-by-acronym-recogition at a job interview I would laugh at them unless the salary is literally 200k and in an area which involves kernel hacking, which is pretty much the only situation in which I think it would be reasonable.
Reality is that there is a bias against hiring self-taught people and many common practices like these are basically screeners against that as opposed to measures of actual skill and expertise. One tip if you are job-hunting is to ask potential employers to review your code before you jump through their interview hoops. The places/people who are serious about hiring will do it. Those who will not are more likely to filter using this sort of arcane minutiae and are not worth your time unless they are paying for you to attend the interview.
Your salary expectations are also quite reasonable fwiw.
What most people mean is something like COM(Windows), Named pipes, memory mapped files, etc that allows processes on the same machine to communicate with each other.
If I was hiring for desktop software or pure server software, I would expect someone to know some IPC mechanisms.
It's awesome. I have a library stacked full of software and CS books. Maybe try to get your bachelors at night? I got my master's at night while working.
This is a craft. If you want to be an expert, train like an expert. Realize 3 months of a bootcamp isn't going to cut it, not at least until you've been working for 4, 6, 8 years.
If you can't tell me what inter process communication is, what it's used for, pipes, signals, etc. then I think you have some pretty big gaps in knowledge that preclude you from being an expert at this point.
Frankly, the recommendations @seibelj is making are in a particular niche--OS fundamentals--and one that I'm guessing makes him feel smart knowing about. But they aren't necessarily relevant to you, or important to know. It depends on what you work on. Some people have trouble realizing that their pet interview question isn't actually as universal as they think it is.
PS: IPC is "Interprocess communication," and it's how you can have multiple processes coordinate with each other. You may have heard of pipes or sockets--those are for IPC. (Technically, so are files.) If not, don't worry about it. I have over 20 years' experience as a professional developer and while IPC primitives like pipes and sockets have come up from time to time, it's hardly central to my work.
Not to mention starting every task with implementing a linked-list or sorting algorithm from scratch.
Day to day, I don't think I know anyone who would actually use the acronym IPC over the expanded term "interprocess communication", assuming the subject even comes up.
The generic term just isn't particularly useful, given the wide variety of mechanisms included. Instead, you will hear developers talk about pipes, sockets, ports, connections, queues, and so on as may be warranted.
Knowing when to use a pipe and when to use a socket is important, but the fact that both are grouped together as "interprocess communication" mechanisms along with a bad idea like shared memory really isn't.
His videos are pretty dense but if you know Python and some basic ideas about neural networks you should be able to follow through.
I'm a plain dev, gone fullstack, gone frontend and it was pretty fun for me to reproduce and get to understand what's happening.
Its hands down the best course to get your hands dirty with latest , state of art stuff, and then learn how it works. It has completely different approach to most courses. It is top down.
Do this first, you can immediately apply it to cool stuff like image classification , nlp etc.
The assignments have additional resources where you can get into more detailed math (but the course doesn't dumb things down, but gives more intuitive explanations) and dive even deeper.
I have worked with ai and nlp guys. How i have seen this works out: there is a problem x. They get the best most recent respected research on the problem x. They implement it most of the time it's on (github).
If it doesn't solve the problem at hand they shrug their shoulders and say something like "it is the standford nlp parser can't do better than that!"
the concept "getting into ai" - I am confused. We need more people to git clone ai repos? Or are these people truly interested in ai research - at that point they should be looking at a phd.
Then people pile on: "learn how an nn works!" Uh why? Anyone can git clone and setup nodes. I am missing something. Please help.
Let me first say that I am unlikely to ever design a new novel algorithm like an SVM kernel. I have however studied ML theory extensively and have a good grasp of the underlying math. I also had the advantage of working in medical research starting in high school and even before college I had learned a lot about statistics and was comfortable using a tool like SPSS to perform ROC analysis as well as gaining a solid understanding of what real statistical rigor was.
I, and those I know and work with, do a lot more than clone some repos from GitHub and see if they work. Typically there is some sort of a business problem that needs solving. Sometime we know of an approach that will work but often there is a literature survey that needs to be conducted to see if anyone has solved a similar enough problem and written about it. I am comfortable reading ML/NLP literature and evaluating the methodologies described. Often there is some open source stuff to get us started but rarely (I can't think of any, but its early in the morning) have I been able to put together a complete solution without solving some difficult problems on my own.
If I were to give someone advice it would be probably not the advice that they would want but here goes. I assume that the person would already have a solid mathematical foundation like engineering calculus.
1. Start by getting solid foundation in statistics and probability.
2. You will need a foundation in linear algebra.
3. Find a mentor(s) that can help you with both the theoretical side of ML and the applied side. In my case they were different people.
4. Implement some learning algorithms from scratch. I build a NN library a long time ago. I never used it in a production application but the learnings it gave me are still invaluable.
5. Read the research. You need to feel comfortable picking up a paper, understanding it, and evaluating whether you should believe the authors or not.
Maybe there are shorter roads. Personally I don't believe so. I was lucky to be paid to learn these skills through my career. I am sure there are people who are smarter than me or who can just learn by reading. I learn by doing. But this has led to success for me and I think gave me the ability to succeed in different environments, using different technologies, and long before the entire world was so enamored with deep learning.
But there's still also a lot of work that people can do applying the "Github repositories" to new problems. And to do that effectively, you also have to know stuff (e.g. you need to be able to read the most recent research, now when tool X is appropriate over tool Y, know what preprocessing makes sense in a given situation, etc). There's money to be made there and people want to do that work.
When things "just work" with off the shelf tools then you probably don't need the researcher (although sometimes you will need them to just find the right solution/tool). When things don't work, you will need them. I guess this can be said about many fields though? (Databases, front end development, etc)
- Make sure you know Python well - pretty much everything interesting in ML is in Python. If you already know JS this shouldn't be too difficult.
- Learn you some classic ML using scikit-learn and some online course.
- Learn Deep Learning using TensorFlow, Keras and/or PyTorch with one of the online courses.
- Get in the habit of reading new papers in ML (which are out on a daily basis) and replicating the results.
- Start working on some cool, original stuff.
Also, you can join communities.
So take a look at http://www.gitxiv.com and http://www.arxiv-sanity.com
the latter, 'might' be a challenge. I started looking into linkedin profiles of data scientists of top tech companies after I realized there were more wharton mba's as data scientists than there where people who mastered in CIS in my program.
So far, 50% had a masters/phd in statistics, 25% Information / Data Science, 25% business background. However, I am still looking into it as my sample is small and I have selection bias in my sampling.
The jobs for people working in machine learning tend to fall into: software engineer (machine learning), research engineer, and research scientist.
Yann LeCun has an excellent Quora Q&A where he delves into the differences of the roles. I do not have the link on hand but it is probably easy to find.
It's also weird when we say collate big data and data science.
Any statistician worth their salt can do inference in limited/noisy data sets.
And you never know, you might be already using some api which does machine learning.
you meant as such.
Whenever I encountered something I didn't know, I googled around until I had even a vague idea.
That said a) I don't know how do-able that would have been without the math background I've already got, and b) I haven't gotten much further in that reading that one paper ...yet.
Regarding the math - The only thing I know for sure to point you at is multivariable calculus, which is also (probably) the most useful math class I ever took. (It helps that the teacher was completely amazing). The set of concepts it introduced me to are amongst the most useful things I ever learned, and it's that understanding that has always given me a leg up on jumping into things at the deep end.
(I took multivar from UC berkeley in, I think, spring 2002? I went looking for video lectures, but did not find what I was looking for...)
From the description: These tutorials have been chosen to maximize learning curve, i.e. learn the most in the shortest amount of time and cover topics from basic deep learning all the way to research done within the last 1 year.
They cover significantly more material than a typical deep learning course and took me lesser time. Good luck!
This will let you focus on the essence of machine learning (data gathering and cleansing, interpretation) rather than the mechanics. The gathering of data and building of intuition about results are by far the hardest parts of machine learning, in my experience and reading. This is especially true if you are just getting started.
Plenty of time to focus on the mechanics later.
(Full disclosure, I am working on a ebook about Amazon machine learning link in profile.)
If you want to get more hands on, I'd look into Tensorflow. You can search github for popular projects using it to get some ideas. I haven't played with it in awhile.
EDIT: On top of it, if software engineering is your strength (testing, automatic deployment, etc), data-scientists will also benefit, since you can show them how to professionally develop a sw product.
Not sure how applicable this is to a straight up front end developer, though. I would definitely suggest learning Python if you head this route. R would be good as well.
look ML courses and their prereqs.
It does require matlab though.
Generally speaking though, it's easier to go from maths/stats -> programming than the other way.
One is very programmer heavy. There's a lot of data processing that goes into machine learning, but you can't quite separate the two. Often, the programmer who prepares and processes the data needs to write the code that actually runs the model and parses the result, and this means you benefit from more understanding of machine learning. That role is most likely the best one available to programmers who don't have much mathematics background in this area.
To really get into machine learning itself as a data scientist, though... I do think it requires some math. There's a reason a large percentage of people who work in this field have an MS or PhD in a very quantitative field. And I don't just mean algorithm designers - to really be able to explain the difference between naive bayes, random forest, neural nets, and logistic regression, it helps enormously to have a background in math.
To illustrate this, I've taken two coursera courses on data science. They were both excellent, but approached from different angles. Bill Howe's data science class involved an exercise to use a random forest to do some classification, but the focus was on calling the scikit-learn library. We did of course review the algorithm, but not in mathematical depth.
Andrew Ng's course on machine learning got into implementing the algorithms (with a language called Octave, which honestly I didn't like much, but that's a completely different topic where plenty of people would disagree with me). To do that class, honestly, I'd just ask if the terms "vector calculus", "matrix of second order partial derivatives" or "logistic function" mean something to you. It's ok if you can't define these things on the spot, but was there a time when you could? You can get up to speed, but I'd say if you haven't taken basic calculus through differential equations (with linear algebra), then you won't be able to understand this material.
I've been impressed with how well people learn on their own, picking up a lot of math as they go along. And I don't think you need to be able to implement these algorithms yourself to use them meaningfully. But if you're going to be deciding what kind of model to use, even if you're using libraries to do it (and most people who can implement these algorithms would still use a library), I think that you do need to be able to describe how a neural net work vs random forests vs logistic regression vs naive bayes. There is a side of this that is very math-y as well.
On the bright side, we live in an era of amazingly available learning material. I personally think a dedicated person can probably learn calc, linear algebra, and differential equations through web-based coursework now.
SO overall, I'd say - start on the data side as much as possible, leveraging your programming skills. While doing this, keep getting more exposure to ML algorithms, and make sure you are taking a coursera or other web-based class on the side.
From what I've read on machine learning, a lot of the more basic techniques includes statistical methods (linear regression, logical regression, random forests, Bayesian statistics) that more or less are taught at master's degree level statistics courses at most, not doctorate level. If I remember right basic linear regression even showed up in stat 101.
I realize that many of these techniques can't solve some of the problems the deeper, more complex machine learning techniques can (for which your Ph.D statement might be right). But not every problem needs a very complex solution.
If you want to do simple recommendation systems or spam filters than O.k. Those are solved problems, hence commoditized.
If you want to build novel things, you really need academic-grade ML.
If you want another argument, I came from working in VC and startups, and they think they understand ML. Boy, they really don't. They are like kids pretending to play a guitar that can't strike a single chord right.
For an anecdote, I recall hearing one of the Kaggle founders mention that many of their bounties are won by non-statisticians/ML-ists. Producing novel (in the academic sense) stuff is unlikely outside of an academic setting, but producing products or solving problems is do-able.
Edit/comment: no need to downvote rsrsrs86 people. He's putting forward a position and defending it, not trolling. If you disagree, then disagree. The whole point of a thread like this is hearing people's take. Surely, PHD is a valid suggestion.
But ML can do much more than analytics., and much more than supervised problems. And the great problems to be solved are not supervised problems. They involve learning as you go, without a clean database with examples to learn from. They are adaptive problems.
You might optimize prices in an online retail player by trying to estimate supply and demand curves, but you will fail, and the best way to do it is not much different than teaching a neural network to play video games, but is fundamentally different from supervised learning and regressions.
ML can do self-driving cars, it can build drones that learn to fly, it can translate horses to zebras, it can play defeat humans at Go, it can make guitars sound like pianos.
There is a lot of technique and theory into framing any problem as a problem that can be solved by machine learning. Machine learning is generally not feasible unless you restrict the problem properly.
If you want to achieve novel (better than yesterday's state of art) results on existing problems, then yes, you really need academic grade ML. Especially for "solved" (i.e. well researched) problems - if the current solution isn't good enough for your needs, then you're going to need serious work to improve on that.
However, if you want to attack novel business problems, then it's quite likely that you can solve them without needing to solve any new ML problems. You have to know what "instruments" are available, and you have to be able to read&learn how implement a particular solution that you choose, but generally you just need to squint hard enough to map your business problem to one or more ML tasks that have a known solution.
One approach might be to tackle a MOOC like Andrew Ng's coursera offering (which isn't very math heavy) while simultaneously brushing up on your stats and linear algebra (mostly stats tbh). Even if you end up just focusing on implementation, I think this will be time well spent.