I've been coding for 20 years and professionally for almost 15 and I started watching the first video and found it to be pretty difficult to follow. I think you make a pretty common mistake that a lot of technical people do, which is take for granted how much "institutional" knowledge you have.
The topics touched on in the first 30 minutes of the video include: AWS, Jupityr Notebooks, Neural Networks, Tmux, and a few others. I understand that this is the reality of the situation today (very large up front cost of setting everything up) but it would be better to not even touch upon something like Tmux because it's not absolutely essential and just results in information overload. You can replace it with something like "I like using Tmux to save my terminal sessions, check out the wiki to learn more" instead of "here's this 3 minute tutorial on Tmux in the middle of how to use Jupyter notebooks". Very few people are smart enough to be concentrating on following what's going on with AWS/Jupyter notebooks and then pause that, process the Tmux stuff, and then context switch back to AWS/Jupyter.
There's a reason why the wiki/forums are so invaluable. There's definitely some really good information in the videos so if you guys had the time I really hope you edit the videos to an "abridged" versions that focus on only 1 topic instead of jumping around so much.
That's a great point - but I would like to add that the expectation is that people put in at least 10 hours a week. So the amount of material is designed to be what you can learn in 10 hours of study, with the support of the wiki and forums.
The lessons are not designed to stand alone, or to be watched just once straight through.
The details around setting up an effective environment are a key part of the course, not a diversion. We really want to show all the pieces of the applied deep learning puzzle.
In a more standard MOOC setting this would have been many more lessons , each one much shorter - but we felt the upside of having the live experience with people asking questions was worth the compromise.
I hope you stick with it and if you do I'd be interested to here if you still feel the same way at the end. We're doing another course next year so feedback is important. On the whole most our in-person students ended up liking the approach, although most found it more intensive than they expected. (We did regular anonymous surveys. )
Awesome stuff, but please let's not pretend that tmux is an important piece of the applied deep learning puzzle that needs to have anything to do with the intensiveness of the course.
I apologise if that's how my response came across. what I was trying to express is that I wish to show what I believe are best practices, and these tools are used throughout the course. I do not mean to suggest that tmux is critical, or that there are not other good options.
Personally I enjoy seeing how others setup and use their tools, and feel it is underappreciated in many courses; attempting to fill that gap is a little of my particular bias.
> Personally I enjoy seeing how others setup and use their tools, and feel it is underappreciated in many courses
Interesting - that explains a lot, thanks. Sorry to say, but I for one hate courses / tutorials where the tutor tries to force me to use his/her pet technology, totally unrelated to the topic at hand. I have always wondered what made them do it, and I guess they just want to help.
Still, if you want to be helpful, show me this favorite tech of yours and optionally point me to a separate resource for learning it, I might check it out sometime. But I didn't come here to learn tmux, I came for deep learning. My time is precious so please don't waste it. I would understand if you wanted to teach me basics of TensorFlow (and TensorBoard) or similar, but tmux, vi, emacs, bash,... are all outside the scope. IMHO of course. :)
You may well be right, and that's certainly how material is usually taught.
However my reading of the literature on this topic has led me to believe that the standard treatment is not the best approach for most students. I'm particularly influenced by David Perkins on this point - although I'm sure I could be applying his theories a lot better!
My collaborator, Rachel Thomas, has written about these ideas : http://www.fast.ai/2016/10/08/teaching-philosophy/ . We're learning as we go and we're spending a lot of time talking to our students to understand what's working and what's not. I hope each lesson sees some improvements...
It's okay to teach a little tmux, a little notebooks, and a little AWS each lesson, rather than the course be topic by topic. I agree with your point that it's a better way to teach material in a course, because it allows you to get people started doing things.
My point was that within a lesson, you should separate topics, because people don't handle the topic switches well when trying to listen -- it's basically just a cache/stack smash. Diversions are distracting, even when materials are interwoven in the course.
I like the idea of this: blending offline and online education. I took CodePath's iOS class ("bootcamp" but for senior engineers), and I loved it. Keep up the good job!
I got so excited when I read your comment. No offense, but there are so many basic intro videos like what you're asking for. However, after that... there's nothing.. I've been looking for something to take me to the next level, and when I read "in the first 30 minutes of the video include: AWS, Jupityr Notebooks, Neural Networks, Tmux" I squeeled with joy. If anyone knows other advanced tutorials on how to design, manage, and scale up their operation into some seriously organized, efficient, and automated machine learning.. I'm dying to find out.
This is really stunning. I can't wait to commence to course. I finished a Masters from a top-50 worldwide university, and frankly, the approach to data science was mediocre at best. The NLP module notes were plagiarised from Stanford and we were quite happy with this! It gave us a break from 20 year old textbooks that set the plodding pace for the Data Mining module. And don't get me started on my deep learning dissertation. The only expert in the uni on the topic got poached by facebook halfway through the project. The universities are finding it difficult to keep up and are resorting to 'interesting' techniques to retain talent - witness the Turing Institute in the UK. They gave out titles to many professors in several universities a year or so ago... as I gather, as a precursor to pivotal data science research.
MS won't be so fruitful unless you do some research and publish (which is difficult and also many university don't support giving research work to MS student).
Its better to go for Ph.D.
Remember for going into openai or google brain you need to be among top even after Ph.D.
I'll note that my MS was hugely useful and didn't result in publications directly. The mileage of your MS or PhD is dependent on many factors.
OpenAI and Google Brain, like most other more research driven deep learning institutions, are more interested in the results you can produce rather than the accreditation you hold. Publications obviously count but well used or written deep learning projects / packages would too. Many PhDs who come out having spent many years in academia still wouldn't get an offer from these places and many of the talented people I know in these places don't have a PhD either.
To the parent of this post, I'd also look into what I'd refer to as "Masters in industry" i.e. Google Brain Residency[1] and other similar opportunities. From their page, "The residency program is similar to spending a year in a Master's or Ph.D. program in deep learning. Residents are expected to read papers, work on research projects, and publish their work in top tier venues. By the end of the program, residents are expected to gain significant research experience in deep learning.". This is likely an even more direct path than most institutions would provide. Though obviously the competition is fierce, many of my friends who participated in this ended up with a paper in a top tier conference by the end of the.
The best way to get the attention of those companies is to do peer-reviewed, published research in ML. Which is certainly possible while getting a Master's at one of those universities.
Just FYI: The AWS console UI is currently different in a number of ways from the interface shown in the "how to set up your aws deep learning server" video, beyond what is accounted for in the overlay-text amendments to the video. (e.g. creating a user has 4 involved steps before you get to see the security credentials, including setting up a user group and access policies for that group -- I made my user have the AdministratorAccess policy...)
Recommendations on how to go from basics (being able to fine-tune pretrained ImageNet/Inceptionv3 with new data etc) to a real project? I'd like to play with semantic segmentation of satellite images (hyperspectral). Any pointers?
That's really what this course tries to do. Lesson 7 shows how to do localization in a couple of different ways. And in every lesson I try to show how to get a state of the art result on a real dataset, showing the process from end to end. Jupyter notebooks are available for all of these projects.
(We'll be looking at more segmentation techniques in the next part of the course next year.)
You need to have a basic understanding of matrix products (from linear algebra) and the chain rule (from calculus). Both can be learnt on demand using Khan Academy - no need to go study math first, but just study what you need when you need it. We give brief explanations of both when they come up.
The wiki (http://wiki.fast.ai) has links to necessary learning resources for each lesson.
Deep learning is pretty much entirely linear algebra and calculus, at a mathematical level. But I really think that a code-first approach is much better. So I'd suggest just getting started working through the course, and really dive into the Jupyter notebooks for each lesson. Do lots of experiments.
>"But I really think that a code-first approach is much better."
This is really refreshing to hear Most of the books I have looked at seem to be mostly math-first and very densely so at that. As much as I appreciate math it's not always the warmest welcome.
I am looking forward to working through your course and seeing part 2 as well. Cheers.
If you want a solid understanding of DL⊂ML, you'll also need a good understanding of probability theory and information theory. While the mechanics require just matrix math and some vector calc, you'll also want to know linear algebra proper and these days, function spaces especially. A small helping of analysis, for measure theory, will not hurt either.
Interesting that you mentioned Information Theory, when I think of this I think of Shannon, encoding symbols, bauds, bit, noise and channels in the Telecom network sense.
But I guess this all applies quite similarly in the neural network sense in DL/ML as well?
There's a 1:1 relationship between likelihood functions and entropy, so a background in information theory is likely to make your understanding of loss functions come quicker.
First of all, let me say that what you all have put together is truly excellent. It covers the sort of things toy treatments and papers leave out or gloss over if you actually want to get something working in the real world.
That said, I strongly disagree with your disagreement. There was a recent paper whose abstract I read that made me think of homotopies between convolutional networks. Unfortunately I lost the paper behind some stream and never got to read it proper. In the context of that, I realized that the search for convnet design will likely soon be highly automatable, obsoleting much of the work that many DLers are doing now.
What will be future proof is understanding information theory so that loss functions become less magical. Information theory is needed to understand what aspects of the current approaches to Reinforcement learning are likely dead ends (typically to do with getting a good exploration strategy, also related to creativity). Concentration of measure is vital to understanding so many properties we find in optimization, dimensionality reduction and learning. Understanding learning stability and ideal properties of a learner/convergence means being comfortable with concepts like Jacobians and semi-positive definiteness for a start.
Probability theory is needed for the newer variational methods, whether in the context of autoencoders or a library like Edward (whose like I think is the future). Functionals and the variational calculus is becoming more important, in both deep learning and for understanding the brain. There's lots of work in game theory of dynamical systems (think evolutionary game theory) that can help contextualize GANs as a special case of a broader category of strategies.
Much to the contrary, the topics I mentioned are both the future of deep learning and future proof in general. This blog post by Ferenc captures my sentiment on the matter: http://www.inference.vc/deep-learning-is-easy/
I agree with all you say here. And indeed for researchers interested in building the next stage of deep learning, such topics may be important.
However this course is about teaching how to use deep learning today to solve applied problems. Most participants in the in-person course worked in business, medicine, or life sciences. It is to that audience that my comment was directed.
So we're perhaps each guilty of assuming the person asking the question wishes to study deep learning in the way we are focused on ourselves. :) Hopefully between the two of us we have now answered the question more fully than either of us individually!...
Yes, I certainly do think it possible. But it will be more difficult and slower going. Read widely, as the biggest hurdle for a self-learner will be in identifying and filling their gaps of knowledge. Read papers so you learn how to use language such that you are not setting off any crank or dangerous dabbler alarms. Reading lots is also necessary to habituate yourself to the jargon. Read the books by David MacKay and Christopher Bishop. Don't waste time if you're stuck on papers, it means either you're lacking either the shared knowledge or the authors are being purposely obtuse. Save anything you think is the prior for later (unless is a minor iteration of something, which most are). It might take multiple readings spread out over a large time span before you can truly understand the important concepts and papers.
If you can, try and organize or join study groups where the levels of skill are varying. Having a group will help in those times when it seems all too much and you're ready to quit.
Finally, don't try to compete with the well heeled industry titans and their GPU Factory Farms. Find an understudied but important niche where your lack of knowledge is not so much a setback because even if available mental tools will differ, everyone is equally ignorant on the terrain.
There's no way that's graduate level math. I've gone through Nando's lectures and although they're more mathematically involved than most courses, they aren't grad level math.
You might be right. The level of education is different for every individual and also people come from different country.
In America you can be a graduate in computer science without knowing calculus, and in some other country they teach calculus/probability in high school.
So the above person might have studied the chain rule and entropy in high school. So its not exactly graduate level math for everyone.
> In America you can be a graduate in computer science without knowing calculus, and in some other country they teach calculus/probability in high school.
This certainly isn't true at my university in the US. As a matter of fact, I'm not sure why you named a country at all seeing as this varies by university. Anyway, I would definitely agree that on the scale of reputable international universities this is not grad-level math. A graduate school in a well-qualified university would expect students to either know this material or be able to learn it on their own. They are intro classes for the math major at my (and many other) universities.
My academic masters' program still had a couple of classes on data science and mathematical finance. So besides their research people are able to work outside academia.
Will it be available in the future - that is, can I take the course later (I'm currently involved with the Udacity Self-Driving Car Engineer nanodegree - and that will finish at the end of July roughly - I have no free time for anything else right now)?
Thanks for announcing this, looks amazing! As somebody that's been toying around with deep learning and machine learning, I've been wondering what the steps are to move from 'cool example' to viable product. I know somebody else mentioned something general like that in another comment, but I had a concrete example.
For instance, it's extremely easy to set up an MNIST clone and achieve almost world-record performance for single character recognition with a simple CNN. But how do you expand that to a real example, for instance to do license plate OCR? Or receipt OCR? Do you have to do two models, 1 to perform localization (detecting license plates or individual items in a receipt) and then a second model which can perform OCR from the regions detected from the first model? Or are these usually done with a single model that can do it all?
I'm not sure if answering these questions is a goal of your course, or if they're perhaps naive questions to begin with.
They are excellent questions and that indeed is exactly the goal of this course. I hope you try it out and let us know if you find the answers you need.
For this particular question, a model that does localization and then integrated classification is called an "attentional model". It's an area of much active research. If your images aren't too big, or the thing you're looking for isn't too small in the image, you probably won't need to worry about it.
And if you do need to worry about it, then it can be done very easily - lesson 7 shows two methods to do localization, and you can just do a 2nd pass on the cropped images manually. For a great step by step explanation, see the winners of the Kaggle Right Whale competition: http://blog.kaggle.com/2016/01/29/noaa-right-whale-recogniti...
(There are more sophisticated integrated techniques, such as the one used by Google for viewing street view house numbers. But you should only consider that if you've tried the simple approaches and found they don't work.)
The big struggle with deep models is their thirst for data.
MNIST is considered a simple toy example, and it has 50k images spread across 10 classes.
ImageNet has 1m images spread across 1k samples.
One of the things that has made image recognition in the form of categorisation easier is that using a network pre-trained on ImageNet, and then finetuning it to your task actually works pretty well and requires far fewer images.
The struggle with doing something like license plate OCR is that it's unlikely that doing that you can transfer the learning from ImageNet to your target task.
So, in reality your struggle is going to be more around the data than the model. If you already had a system deployed that was getting data in and you were getting some feedback of when your model failed, then this problem would be easily solved, but if you're building from scratch this is going to be your biggest problem.
And since you don't necessarily know ahead of time how easy or hard your problem is, you don't know how many samples you will need or how much it will cost you.
So, if you did actually want to build a license plate reader using deep learning, my suggestion would be to try and artificially create a dataset by generating images that look like license plates and sticking them in photos in the state you expect to see them in (i.e. blurred, at weird angles, etc) and then training a neural net to recognise them. That would give you a sense for how hard the problem is, and how much data you will need to collect.
In terms of the model; I would probably just try having 6 outputs with 36 classes per output corresponding to the characters/digits in order. I don't know if it will work well, but it's a good baseline to start with before trying more complicated things like attention models or sequence decoders (https://github.com/farizrahman4u/seq2seq )
I see a lot of criticism about tmux and other non-core items being included in the overall curriculum. I think the author is trying to portray the workflow he is currently on and exposing the full tool kit he uses.
I don't think he is saying - this is "THE" approach one has to follow. I for one think that this is a perfectly legitimate way of teaching. People can leave out pieces which do not interest them or substitute them with other tool sets if they chose to.
For me the key take away here is that - some one who has been a consistent beater in the Kaggle competition for two years and the founder of a ML company is teaching a "hands-on" course which fills a gap (from tech talk to step-by-step hands on) and I think I can live with this method of teaching.
If you've no experience with ML stuff, you might want to start with Andrew Ng's course, which has a small bit on neural networks (MLP and backpropagation), with examples in Matlab/Octave. I found it useful, along with "Make Your Own Neural Network" by Tariq Rashid. Very good intro to coding MLPs directly in Python.
Ng's course would be a great place to start, imho - though I am a bit biased: I started out my journey by taking the ML Class in 2011. Lot's of "concretely's" strewn about!
Anyhow, it was a great introduction, and light on the calc (but more emphasis on probability and linear algebra). If you have matlab or octave experience, it will also help (I didn't - the revelation of having a vector primitive was wonderful once I got the swing of it, though).
Note again, though, that I took the ML Class - not the Coursera version; I have heard that they are identical, but it has been 5+ years since I took it, too.
Strongly agree. Andrew's course is a great choice to take before this one, if you have the time. It's not a prerequisite however.
The Udacity course has a very different aim - it covers much less territory and takes much less time. If you're just wanting to get a taste of deep learning, it's a very good option, but it's not a great platform to build longer term capability on IMHO.
I'm currently taking the Udacity Self-Programming Car Engineer nanodegree; I'm currently working on the lab right before the 2nd project lesson (I'm in the November cohort). That lab is to re-create the LeNET 5 CNN in TensorFlow (we have to re-create the layers of the convnet).
Last night I spent an hour or so getting my system (Ubuntu 14.04 LTS) set up to use CUDA and cudnn with Python 3; setting up the drivers and everything for TensorFlow under Anaconda - for my GTX-750ti.
That wasn't really straightforward, but I ultimately got it working. It probably isn't needed for the course, but it was fun to learn how to do it.
I would like to take this fast.ai course as well, but so far the Udacity one is eating all of my (free) time. Maybe I can give it a shot in the future.
I have used and recommend [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) for exploration (at least for those with a docker-capable kernel as the base OS).
Hey, so I know that 0.90 isn't much...
But is there a difference between running the programs on AWS than on my local computer with my mediocre video card?
Will they take much longer to trian? I honestly don't know the difference so I'm asking
Your mediocre graphics card may or may not have CUDA support. If it doesn't, or it only supports an old version, then it's the same as not having a graphics card at all. And of course, Nvidia only.
If it supported, then the major difference is GPU memory, which limits the size of the network you can train. The newest models are faster than some 1-2 year old ones, but older hardware does the job fine.
In addition to this excellent advice, I'll just add that my experience is that a 2GB CUDA compatible card is (just) enough to complete this course. Although 4GB makes life much easier
This is what I have and it's been holding up great for everything I've wanted to do till now. Core i3 6100, Nvidia GTX 950 2GB, 8GB DDR4 RAM. Did not try running a neural net on the GPU, but it should do the job. The whole thing - http://lonesword.in/hardware/2015/11/10/Assembled-a-computer... - including the cheap table and chair cost me around 60K indian rupees which roughly translates to 1000 USD - but that's because computer components are around 30-40% more expensive in India than in the US.
A used Dell workstation off eBay plus a GTX 1070 will get you exceptional performance for under $1000. If you want to spend less, a used GTX 980 is also a good option.
I just set up CUDA and cudnn for python3 and tensorflow last night on my machine - with that same card. That card should have 640 cuda cores; every little bit counts, imho. I'm not really sure what the difference will be compared to what I was doing before, but anything has to be better than the quad-core cpu I'm currently using (some older AMD thing).
In short, my wife got sick and needed brain surgery while she was pregnant, and I ended up being away from Enlitic for nearly a year. It made me reassess what I really wanted to do with my time.
Now that I spend all my time coding and teaching, I'm much happier. And I think that making deep learning more accessible for all will be more impactful than working with just one company. Deep learning has been ridiculously exclusive up until now, on the whole, and very few people are researching in the areas that I think matter the most.
Finally, I think I achieved what I set out to do with Enlitic - deep learning for medicine is now recognized as having great potential and lots of folks are working on it.
Thanks for sharing this -- I've been doing hobby-level work with computer vision on the side for a couple years now, but always kinda hit a wall when moving beyond anything trivial. I'll give this a shot and see where it takes me!
Yeah I know just what you mean. Check out the feedback from Christopher Kelly here, who describes something very similar: http://course.fast.ai/testimonials.html . Perhaps you'll find some inspiration there...
I really hope that you get past the wall! If you do find yourself getting stuck, demotivated, etc, please do come join the community on the forums, since they can really help overcome any issues you have: http://forums.fast.ai/
Our preference is for most questions to go through the forums. When you join the forums you'll get more information on why we think that's best for the overall community (in short, because it's easier for others to find answers when they are organized by topic), and also how to access the Slack channel.
Last time I checked they didn't have GPU virtual machines. So no, probably not. If there is a way to run jupyter notebooks on GPU machines on Google, I'd certainly be interested in learning about it.
The topics touched on in the first 30 minutes of the video include: AWS, Jupityr Notebooks, Neural Networks, Tmux, and a few others. I understand that this is the reality of the situation today (very large up front cost of setting everything up) but it would be better to not even touch upon something like Tmux because it's not absolutely essential and just results in information overload. You can replace it with something like "I like using Tmux to save my terminal sessions, check out the wiki to learn more" instead of "here's this 3 minute tutorial on Tmux in the middle of how to use Jupyter notebooks". Very few people are smart enough to be concentrating on following what's going on with AWS/Jupyter notebooks and then pause that, process the Tmux stuff, and then context switch back to AWS/Jupyter.
There's a reason why the wiki/forums are so invaluable. There's definitely some really good information in the videos so if you guys had the time I really hope you edit the videos to an "abridged" versions that focus on only 1 topic instead of jumping around so much.