I've really enjoyed talking to Jacques about his lego project over the last few days, and I hope that it will lead to some additional learning materials on course.fast.ai (which he kindly links to in his article) as a result. It's great to see such a thorough writeup of the whole process, which I think is much more interesting and useful than just showing the final result.
The key insight here, to me, is that deep learning saves a lot of time, as well as being more accurate. I hear very frequently people say "I'll just start with something simple - I'm not sure I even need deep learning"... then months later I see that they've built a complex and fragile feature engineering system and are having to maintain thousands of lines of code.
Every time I've heard from someone who has switched from a manual feature engineering approach to deep learning I've heard the same results as Jacques found in his lego sorter: dramatic improvements in accuracy, generally within a few days of work (sometimes even a few hours of work), with far less code to write and maintain. (This is in a fairly biased sample, since I've spent a lot of time with people in medical imaging over the past few years - but I've seen this in time series analysis, NLP, and other areas too.)
I know it's been trendy to hate on deep learning over the last year or so on HN, and I understand the reaction - we all react negatively to heavily hyped tech. And there's been a strong reaction from those who are heavily invested in SVMs/kernel methods, bayesian methods, etc to claim that deep learning isn't theoretically well grounded (which is not really that true any more, but is also beside the point for those that just want to get the best results for their project.)
I'd urge people that haven't really tried to built something with deep learning to have a go, and get your own experience before you come to conclusions.
You're seeing hate for deep learning on HN!?! I'd love to see something approaching critical thought! Most of the comments I see are silly fluff about the singularity and/or AGI and killer robots.
But seriously: I think you're overstating the case. I've enjoyed watching the fast.ai videos, and I think deep learning is a clear choice for some areas right now. Many others, no.
If you're working in a data-limited domain, in particular (i.e. most of them) there are
probably better first choices. I know plenty of folks who have used deep learning to achieve no better than comparable results to the simpler methods they were using before.
But yes, the initial engineering costs of a deep learning system have come down a lot further than I thought before I watched your videos.
Most of the hate for deep learning on HN is usually against using it in titles as clickbait. (i.e. just simply taking a pretrained/predefined model and changing the source dataset without any optimization)
The in-depth explanations of mechanics discussed in deep learning as noted in fast.ai courses and this submission do justify the use of the deep learning moniker, though.
Deep learning is great if you're dealing with a hugely dimensional problem, and you have the data to train the model. If one of those things is not true (and usually, one of those things is not true), you're better off starting simple.
That's an excellent point and one that kept me from trying the deep learning approach for a long time. But in the end the machine + rudimentary deep learning provided its own dataset and that really made it work.
So even if I didn't have the data to train the model I had enough data to bootstrap the process and sometimes that's all you need.
>Every time I've heard from someone who has switched from a manual feature engineering approach to deep learning I've heard the same results ... dramatic improvements in accuracy, generally within a few days of work
It bothers me how often otherwise rational individuals continue to be susceptible to survivorship bias.
Of course you'd only hear about the successes, the failures are either embarrassed or know better than to tell you their story.
The people who tried it for several months, saw accuracy cap out at something like 60-70% for processes that need something like >90% confidence to justify expenses, and which proceed to get ignominiously shifted into a different team (or fired) for building what the higher-ups view as a massive waste of money at your salary and the other engineers see as a data-guzzling black box. No, these guys aren't likely to tell their story. Or, at least THIS story.
The story you instead get from these guys is how
>After having had some fun times doing deep learning and data analysis, I'm excited get into the big new field of $THING_THAT_WAS_HIRING
and all the other dross that modern vocal programmers use to mask any possible scent of failure and actual on-the-job difficulty experienced.
This goes pretty much for every tool though. Applying a tool to a job it isn't well suited to is always going to come out as a frustrating experience, be it a hammer or a piece of software.
What I always try to do is to get a feel for the problem space trying different methods with as little investment as possible. That way you will - once you decide to go full power after a certain solution method - at least have a feeling that you are on the right path.
Dogmatically trying to shoehorn every problem into the toolset that you know how to use is a way to stay reasonably productive but it rarely leads to optimal outcomes, sometimes you simply have to learn how to use a new tool in order to get to the maximum.
I'm the last person to jump on new bandwagons, still have a dumb phone, don't use facebook and still run my own mailserver. Even so, when a tool has a significant and most importantly measurable advantage compared to the tools I'm already familiar with I'll adapt.
I see plenty of stories about "$POPULAR_THING is not the solution and here's why". If it's true that HN has been hating on Deep Learning for the past year, there should be a market for articles explaining why or when it doesn't work. If someone started out using Deep Learning and then switched to something else, that seems like an excellent subject for an article.
Good points. How about you the following though: driven by all the hype surrounding deep learning and AI, consultants are currently jumping on this new thing they can sell to management without understanding much of the intricacies behind it, so even after facing many disillusions just a few years ago when the talk was all about big data, companies are now jumping towards investing in products promising some of that sweet deep learning magic.
That's fine, but the thing is they once again plan to apply it on their badly maintained data warehouses (now often somewhere in their dusty hdfs stack) to build traditional predictive models. Customer churn, next best offer, propensity modeling, that sort of thing.
I try to keep telling them that with structured data sets, and simple classification problems, a random forest or God forbid, a logistic regression model, would do just as well given that you spend some solid time in feature cleaning and preprocessing. Am I wrong in this setting? Perhaps I should also jump on the deep learning bandwagon and start selling 10 layer binary Classification networks :).
Not disagreeing with you, but I do fear that many traditional industries will be jumping on board pretty soon without any of the actual use cases to warrant deep learning.
I can't recommend this comment enough! So many people give outdated advice with regards to deep learning like "it requires too much data and takes too much time" and "start with something theoretically grounded (tm) like an SVM".
The truth is that an imagenet pre trained network will often work astonishingly well. The only other mark against deep learning is computational complexity, which I expect to go away soon (yours truly ;) )
> And there's been a strong reaction from those who are heavily invested in SVMs/kernel methods, bayesian methods, etc to claim that deep learning isn't theoretically well grounded (which is not really that true any more, but is also beside the point for those that just want to get the best results for their project.)
Could you elaborate more on what has changed recently in the theoretical grounding of deep learning?
It's not so much that there's just one or two specific results, but rather that there's just far more researchers working on this now, and quite a few are working on more theoretical stuff. Sometimes, that theory results in really practical outcomes - a good example would be https://arxiv.org/abs/1701.07875 . Or another by the same researcher: https://arxiv.org/abs/1506.00059 .
We've seen, in the last year or two, interesting results in nearly every area of theoretical research of deep learning, including generalization, optimization, generative modelling, bayesian models, and network architecture.
I've really enjoyed the Fast.ai videos and am in the process of using it to build some hardware/software solutions. Even 2 years ago, it was not possible to get started on a problem like this and solve it in a short time.
What gave you that impression? It works just fine for my use-case, and I will most likely be able to add more use-cases by virtue of being able to enrich the data from other data sets out there once I have the part identified.
It is possible that at some point to get to the last .1% or so I may have to embed features that were engineered but right now there is no indication that that will be the case.
It sounds like you went though a similar process as the computer vision community over last couple decades.
First people used to write classifiers by hand, but they found it's too tedious, unreliable and has to redone for each object you want to classify. Then they tried to learn to detect objects by using local feature detector and train a machine learning model to classify objects based on that. This worked much better, but still made some mistakes. Convolutional Neural Network were already used to classify small images of digits, but people were skeptical they would scale to larger images.
This was until in 2012 AlexNet came along. Since then performance of convolutional networks has improved each year. Now they can classify images with similar performance as humans.
Am I wrong to see this as a bit scandalous for computer vision as a field before 2012? (It kind of seems like maybe a decade of research at the Berkeley CS department will be tossed out?)
Not entirely. In a lot of fields that where DNNs (or other ML techniques) have shown dramatic improvements of late, there are several reasons why the field didn't show improvement in the past.
A big reason is the tremendous increase in computing power available to the researcher for low cost. Most of these improvements depend on CPU-expensive training over lots of examples. In the past, the time to train a model or evaluate a situation would have been very high.
Another big reason is that datasets in a lot of these areas were fairly small, and the newer techniques tend to need a lot of data to train.
Another reason is that most previous researchers were focused on feature engineering, whereas modern techniques seem to move feature engineering into the ML system. This is a sort of conceptual change,
I don't really see it as "scandalous" in the sense that you'd expect people to have realized manual feature engineering wasn't the fastest way to get human-like results, or that computers were going to get faster for ML tasks, or that it would be possible to train deep networks, or that having good datasets against which everybody in the community can run and evaluate would be valuable.
But not a petaflop @ 200W which is possible for deep learning. GPUs have driven a lot of progress in deep learning, I'll be excited to see what DPUs will do in terms of deep learning progress, assuming of course, they have floating point, (otherwise creative algorithm will be a problem)
Commercial deep learning ASICs will definitely happen. The Nvidia stuff is in a way getting there, to go from 3500 ops / tick to 35000 ops / tick with the same power consumption will most likely require more than a merely incremental improvement in the hardware.
It would have to be:
- less general
- a smaller process node
- possibly more than one chip on a board tightly coupled
- specialized data types
- very tight coupling between memory and computation
(so maybe memory on the chip)
- a slightly higher clock speed, say twice as high
GPUs are much too general but if all the factors above can be realized a factor of 10 in a PCIe add-in card should be possible.
It's not just computer vision. Many traditional methods in speech recognition and synthesis, translation, and game AI are being replaced by the same algorithms.
Suggestion: why not use three cameras simultaneously, each from a different angle, then classify the three images? Those cameras must be nearly free in cost.
Also, to get more training data, what about setting up a puffer to blow the part back on the belt and tumble it? If you could configure the loader belt to load parts slowly and stop after one is seen, you could automatically re-image the first part an arbitrary number of times by blowing it backwards before letting it move along and restarting to first belt to get another.
And question: do you normalize out color at any stage? As in, classify a black and white image, with a separate classifier for the color?
> why not use three cameras simultaneously, each from a different angle, then classify the three images? Those cameras must be nearly free in cost.
Because you really only need one camera (and two mirrors).
> Also, to get more training data, what about setting up a puffer to blow the part back on the belt and tumble it?
That's an interesting and novel idea. It probably will not work because the difference between the heaviest and the lightest parts are such that you'd blow most of the parts clear off the belt. Also the camera is sampling fast enough that the part would end up imaged in many positions without the ability to stitch the parts together again. But interesting.
> And question: do you normalize out color at any stage? As in, classify a black and white image, with a separate classifier for the color?
No, but I am considering using HSV or LAB as the colorspace to see if that improves accuracy or reduces training time to get to a given accuracy.
Changing to a different fixed color space is unlikely to help. However adding a learnable color space transformation is a great (and rarely known) trick. Both are discussed and compared in this terrific paper: https://arxiv.org/abs/1606.02228
Could you have another belt going back the other way, and blow lego to hit a wall and drop on to it, instead of in a bin? Send it back towards the ramp, and find a mechanical or air powered way of getting the part over the plywood wall to be sent back round. That way you could just send a batch of 3-4 of the same part round repeatedly for a bit.
What about something easier? Instead of a puffer, just take a big bin of the same part and loop it through the hopper for a few hours. Presto, as much accurately-labeled training data as you want (though limited to a single part at a time).
If I had big bins of sorted parts I wouldn't need the machine ;)
This is an approach that will work for parts that are so common that even after a few runs you have lots of them but the 'long tail' of lego parts is the vast majority of parts and they are quite rare.
This is true hacking. I mean, at its essential core. The purpose, the methods, the tools, the rationale. If there is an archetype for Hacker, it's jacquesm.
If can also recommend the Logitech HD Pro Webcam C920 for visual computing projects like this. I was pleasantly surprised of the range of focus and that everything can be controlled via UVC. It could be cheaper though...
Which of the two do you think would function better for moving objects. I got a crappy off the shelf camera to put on a robot I made to do some work while moving horizontally.... it moved too fast to acquire anything.
> I simply don’t think it is responsible to splurge on a 4 GPU machine in order to make this project go faster.
2 things: 1. You can rent 8-GPU machines on AWS, Azure or GCE. 2. The incredibly wide applicability of machine learning means that an investment in hardware might not be wasted. Even if you only use the machine for this one project, if it helps you learn more about the field it will probably still be a good investment career wise.
Or just use finetuning. He mentioned in the comments last time that he was training from scratch, but I still don't understand why. If you are not changing the architecture and you are using almost essentially the same dataset (just augmented with some more active-learning-created datapoints), why wouldn't you reuse a fully converged checkpoint? It would save potentially orders of magnitude time, and it's trivial to implement: you just load the checkpoint and pass that into your pre-existing code. It's literally three or four lines (one line for a CLI argument '--checkpoint' and three lines for a conditional to load either the checkpoint or a blank model).
I have tried finetuning extensively, a typical run over a pre-trained set before expanding the number of classes has the loss steadily increasing without any clear indication of how long that would last. Maybe I should let a test run for a couple of days to see if it will eventually converge.
Also, keep in mind that the dataset is still tiny and that a method that works for large numbers of images may very well fail if you only have a few tens to maybe 100 or so images per class.
For finetuning on additional data, you would have to lower the learning rate because you're only adding a few datapoints and it's almost entirely converged as it is. If your loss is increasing, that suggests overfitting to me via a too-high learning rate
Now, if you're changing the architecture (such as by adding additional categories of pieces), as I said, that's more tricky - what people usually do there is something like lop off the top layers and retrain them from scratch, possibly while freezing the rest of the NN (the assumption there being that the learned filters and lower layers ought to already be sufficient to classify a new category, which is reasonable since the lower layers tend to be learning things like lines and corners, all primitives which should be able to classify yet another square or rectangle etc).
Since this is the obvious response any reader familiar with deep learning would have while reading complaints about how slow your CNN is to train from scratch, it'd be good to discuss it in some detail what sort of finetuning you've tried and how it failed.
I will send you an email with a re-run of my original experiments, they were roughly what you described (take a pre-trained net, remove the last layer and re-connect to a layer with the right number of classes), learning rates I tried were from 1e-6 to 1e-3 and none of those had satisfactory results.
I was about ready to give up on it when I decided to try to bring up a net from scratch and that worked quite well.
Do I understand correctly that a checkpoint is just a snapshot of the model at a point in time? i.e. "Here are the probabilities of each outcome given the characteristics I have observed already."
Also, what does "fully converged" signify? Are there points in the course of training the model at which it is more appropriate to "save" progress than at other times?
In machine learning/deep learning, the decrease in training loss has major diminishing returns as training continues. Eventually, training the model hits a point where the loss barely improves each epoch/iteration. (fun visualization from one of my projects: http://minimaxir.com/img/char-embeddings/epoch-losses.png)
In some cases, the loss can stop improving entirely, or increase.
Start with Keras, if you run into something you want to do that is not supported by Keras drop into TensorFlow, they are not mutually exclusive and all of TensorFlow is availble.
There it'd be hitting a new limit for speed which is the round trip time between the AWS machine and the local sorter. It can't take any longer than the belt takes to travel to the first sorting point or you'll miss parts and have to rerun them constantly.
I love projects like this because, while it doesn't necessarily have a direct application right away, it solves a piece of a problem that could go a long way to help something else. Reminds me of the skittles/M&M sorting machine that someone built a little while ago. As more projects like this develop, we're teaching computers more and more about visual recognition.
> while it doesn't necessarily have a direct application right away
This is one part I didn't fully understand. In the previous post jacquesm said that the sorted Lego sets are more expensive than the unsorted one (and a fake piece destroys the price). So:
* Is he planning to make a few additional buck buying unsorting sets and selling them after sorting?
* Does he have a huge collection and is bored to try to find the pieces?
Go re-read the first few paragraphs of the first post:
> After doing some minimal research I noticed that sets do roughly 40 euros / Kg and that bulk lego is about 10, rare parts and lego technic go for 100’s of euros per kg. So, there exists a cottage industry of people that buy lego in bulk, buy new sets and then part this all out or sort it (manually) into more desirable and thus more valuable groupings.
> I figured this would be a fun thing to get in on and to build an automated sorter.
He then impulsively bid on a ton of bulk lego on eBay and ended up with a garage completely full of the stuff.
Sounds like it started for fun, then spiraled out of control and is now a thing he would very much like to do to get all this lego the hell out of his life for at least enough of a profit to cover shipping it to and from his place, if not much more.
Nice work, I enjoyed the write-ups. You wrote that you wanted to sell off complete sets.
Would you be able to first make an inventory of all your available pieces. And then load a DB with (all?) complete sets and let the machine sort different sets in 1 bucket (starting with the most expensive set first?). Or how are you going to get your sets together?
The optimization for value from a set of Lego parts into a compete sets would be a fun and interesting challenge. Would be more than happy to help out if interested.
One question: Wouldn't it have been easier to use a line scan camera and tether line aquisition to the belt's movement by attaching a rotary encoder which output would trigger individual line scans? That's the standard solution in the industry.
Line scan cameras and (good) encoders are expensive, I figured I'd try to do it with an absolute minimum in terms of fancy hardware. That sort of constraint also helps to boost creativity :)
As a tinkerer let me tell you, that, had I been confronted with the problem, would probably have taken apart one of the old Logitech Scanman handheld scanners still collecting dust in my attic. A little bit of mechanical modification should do the trick. These are essentially line scan cameras and they come with a high resolution rotary encoder.
"then several things happened in a very short time: about two months ago HN user greenpizza13 pointed me at Keras, rather than going the long way around and using TensorFlow directly (and Anaconda actually does save you from having to build TensorFlow). And this in turn led me to Jeremy Howard and Rachel Thomas’ excellent starter course on machine learning."
This is why you read HN. Interesting though had Jacques not made the original attempts I don't think the payoff above would have been as useful.
It could be fun if you released your tagged data set of lego piece pictures so people in the ML community could try to write classifiers. Even untagged pics could be interesting.
> [The stitcher determines] how much the belt with the parts on it has moved since the previoue frame (that’s the function of that wavy line in the videos in part 1, that wavy line helps to keep track of the belt position even when there are no parts on the belt)
I'm curious about this wavy line--does it need to be specially encoded in any way or did you just squiggle the belt with a marker and let the software figure out how it lines up?
> Right now training speed is the bottle-neck, and even though my Nvidia GPU is fast it is not nearly as fast as I would like it to be. It takes a few days to generate a new net from scratch but I simply don’t think it is responsible to splurge on a 4 GPU machine in order to make this project go faster.
Yes, many and those are really hard to get around. Two views at the same time eliminate the vast majority though so that's why the system now works that way. A third view would likely get rid of the remainder but is for various reasons very hard to implement reliably.
Would a training set of 3d renderings of every angle of each lego piece work? That should be easy to produce and would make the manual labeling step obsolete.
has anyone applied this sort of thing to voice recognition ? i see a lot of computer vision applications, but haven't found any audio classifiers amongst the CV articles
> Right now training speed is the bottle-neck, and even though my Nvidia GPU is fast it is not nearly as fast as I would like it to be. It takes a few days to generate a new net from scratch but I simply don’t think it is responsible to splurge on a 4 GPU machine in order to make this project go faster.
You should stick up a donate button, if you keep writing interesting articles about how it all works, I'd happily throw a few dollars towards the process.
For those who don't know, ww.com was the first multi-webcam site on the net. Jacques has shared some stories over the years of hacking that (technically and the nascent business modelling) together - very much in the spirit of this Lego, and his other, projects.
Wow, looks like he is really 'hoarding' a lot of domains.
I'd be interested to know what the HN sentiment is for this kind of behaviour; his only claim to these domains is that he got there first - he obviously has no intention of using them beyond selling to the highest bidder.
I think it's shitty behaviour - but what's the alternative? I've found good fresh domains in the past, and let them drop when the project I registered them for (inevitably) doesn't go anywhere, only to see them grabbed by domain squatter.
Recently I listed an unneeded but above average (trademarkable, keyword, .com) on HN as free to anyone who could use it, with the stipulation that they pass it on if subsequently not used.
That was the asking price, it fetched a lot less than that (about $1M). Then there were taxes and some debt to get rid of but in the end it was a good deal.
The key insight here, to me, is that deep learning saves a lot of time, as well as being more accurate. I hear very frequently people say "I'll just start with something simple - I'm not sure I even need deep learning"... then months later I see that they've built a complex and fragile feature engineering system and are having to maintain thousands of lines of code.
Every time I've heard from someone who has switched from a manual feature engineering approach to deep learning I've heard the same results as Jacques found in his lego sorter: dramatic improvements in accuracy, generally within a few days of work (sometimes even a few hours of work), with far less code to write and maintain. (This is in a fairly biased sample, since I've spent a lot of time with people in medical imaging over the past few years - but I've seen this in time series analysis, NLP, and other areas too.)
I know it's been trendy to hate on deep learning over the last year or so on HN, and I understand the reaction - we all react negatively to heavily hyped tech. And there's been a strong reaction from those who are heavily invested in SVMs/kernel methods, bayesian methods, etc to claim that deep learning isn't theoretically well grounded (which is not really that true any more, but is also beside the point for those that just want to get the best results for their project.)
I'd urge people that haven't really tried to built something with deep learning to have a go, and get your own experience before you come to conclusions.