
Sorting Two Tons of Lego, the Software Side - jacquesm
https://jacquesmattheij.com/sorting-lego-the-software-side
======
jph00
I've really enjoyed talking to Jacques about his lego project over the last
few days, and I hope that it will lead to some additional learning materials
on course.fast.ai (which he kindly links to in his article) as a result. It's
great to see such a thorough writeup of the whole process, which I think is
much more interesting and useful than just showing the final result.

The key insight here, to me, is that deep learning saves a lot of time, _as
well_ as being more accurate. I hear very frequently people say "I'll just
start with something simple - I'm not sure I even need deep learning"... then
months later I see that they've built a complex and fragile feature
engineering system and are having to maintain thousands of lines of code.

 _Every_ time I've heard from someone who has switched from a manual feature
engineering approach to deep learning I've heard the same results as Jacques
found in his lego sorter: dramatic improvements in accuracy, generally within
a few days of work (sometimes even a few hours of work), with far less code to
write and maintain. (This is in a fairly biased sample, since I've spent a lot
of time with people in medical imaging over the past few years - but I've seen
this in time series analysis, NLP, and other areas too.)

I know it's been trendy to hate on deep learning over the last year or so on
HN, and I understand the reaction - we all react negatively to heavily hyped
tech. And there's been a strong reaction from those who are heavily invested
in SVMs/kernel methods, bayesian methods, etc to claim that deep learning
isn't theoretically well grounded (which is not really that true any more, but
is also beside the point for those that just want to get the best results for
their project.)

I'd urge people that haven't really tried to built something with deep
learning to have a go, and get your own experience before you come to
conclusions.

~~~
irq11
You're seeing hate for deep learning on HN!?! I'd love to see something
approaching critical thought! Most of the comments I see are silly fluff about
the singularity and/or AGI and killer robots.

But seriously: I think you're overstating the case. I've enjoyed watching the
fast.ai videos, and I think deep learning is a clear choice for some areas
right now. Many others, no.

If you're working in a data-limited domain, in particular (i.e. most of them)
there are probably better first choices. I know plenty of folks who have used
deep learning to achieve no better than comparable results to the simpler
methods they were using before.

But yes, the initial engineering costs of a deep learning system have come
down a lot further than I thought before I watched your videos.

~~~
deepnotderp
A pre trained deep net can work very well in a data limited domain if there's
sufficient overlap.

~~~
irq11
Yes, but that's true of any ML approach.

Deep learning is great if you're dealing with a hugely dimensional problem,
and you have the data to train the model. If one of those things is not true
(and _usually_ , one of those things is not true), you're better off starting
simple.

~~~
jacquesm
That's an excellent point and one that kept me from trying the deep learning
approach for a long time. But in the end the machine + rudimentary deep
learning provided _its own dataset_ and that really made it work.

So even if I didn't have the data to train the model I had enough data to
bootstrap the process and sometimes that's all you need.

~~~
irq11
Yes, agreed. Your use case was interesting to me in the way that you
bootstrapped the problem. If only more domains were like that!

------
ma2rten
It sounds like you went though a similar process as the computer vision
community over last couple decades.

First people used to write classifiers by hand, but they found it's too
tedious, unreliable and has to redone for each object you want to classify.
Then they tried to learn to detect objects by using local feature detector and
train a machine learning model to classify objects based on that. This worked
much better, but still made some mistakes. Convolutional Neural Network were
already used to classify small images of digits, but people were skeptical
they would scale to larger images.

This was until in 2012 AlexNet came along. Since then performance of
convolutional networks has improved each year. Now they can classify images
with similar performance as humans.

~~~
danblick
Am I wrong to see this as a bit scandalous for computer vision as a field
before 2012? (It kind of seems like maybe a decade of research at the Berkeley
CS department will be tossed out?)

~~~
dekhn
Not entirely. In a lot of fields that where DNNs (or other ML techniques) have
shown dramatic improvements of late, there are several reasons why the field
didn't show improvement in the past.

A big reason is the tremendous increase in computing power available to the
researcher for low cost. Most of these improvements depend on CPU-expensive
training over lots of examples. In the past, the time to train a model or
evaluate a situation would have been very high.

Another big reason is that datasets in a lot of these areas were fairly small,
and the newer techniques tend to need a lot of data to train.

Another reason is that most previous researchers were focused on feature
engineering, whereas modern techniques seem to move feature engineering into
the ML system. This is a sort of conceptual change,

I don't really see it as "scandalous" in the sense that you'd expect people to
have realized manual feature engineering wasn't the fastest way to get human-
like results, or that computers were going to get faster for ML tasks, or that
it would be possible to train deep networks, or that having good datasets
against which everybody in the community can run and evaluate would be
valuable.

~~~
jacquesm
Yes, the speed of these GPUs is insane. Teraflops @ 200 Watts.

~~~
deepnotderp
But not a petaflop @ 200W which is possible for deep learning. GPUs have
driven a lot of progress in deep learning, I'll be excited to see what DPUs
will do in terms of deep learning progress, assuming of course, they have
floating point, (otherwise creative algorithm will be a problem)

~~~
jacquesm
Commercial deep learning ASICs will definitely happen. The Nvidia stuff is in
a way getting there, to go from 3500 ops / tick to 35000 ops / tick with the
same power consumption will most likely require more than a merely incremental
improvement in the hardware.

It would have to be:

\- less general

\- a smaller process node

\- possibly more than one chip on a board tightly coupled

\- specialized data types

\- very tight coupling between memory and computation

(so maybe memory on the chip)

\- a slightly higher clock speed, say twice as high

GPUs are _much_ too general but if all the factors above can be realized a
factor of 10 in a PCIe add-in card should be possible.

------
marze
Suggestion: why not use three cameras simultaneously, each from a different
angle, then classify the three images? Those cameras must be nearly free in
cost.

Also, to get more training data, what about setting up a puffer to blow the
part back on the belt and tumble it? If you could configure the loader belt to
load parts slowly and stop after one is seen, you could automatically re-image
the first part an arbitrary number of times by blowing it backwards before
letting it move along and restarting to first belt to get another.

And question: do you normalize out color at any stage? As in, classify a black
and white image, with a separate classifier for the color?

~~~
jacquesm
> why not use three cameras simultaneously, each from a different angle, then
> classify the three images? Those cameras must be nearly free in cost.

Because you really only need one camera (and two mirrors).

> Also, to get more training data, what about setting up a puffer to blow the
> part back on the belt and tumble it?

That's an interesting and novel idea. It _probably_ will not work because the
difference between the heaviest and the lightest parts are such that you'd
blow most of the parts clear off the belt. Also the camera is sampling fast
enough that the part would end up imaged in many positions without the ability
to stitch the parts together again. But interesting.

> And question: do you normalize out color at any stage? As in, classify a
> black and white image, with a separate classifier for the color?

No, but I am considering using HSV or LAB as the colorspace to see if that
improves accuracy or reduces training time to get to a given accuracy.

~~~
iDemonix
Could you have another belt going back the other way, and blow lego to hit a
wall and drop on to it, instead of in a bin? Send it back towards the ramp,
and find a mechanical or air powered way of getting the part over the plywood
wall to be sent back round. That way you could just send a batch of 3-4 of the
same part round repeatedly for a bit.

~~~
marze
Simpler to put up side walls and a back barrier to prevent the blown piece
from leaving the belt.

------
11thEarlOfMar
This is true hacking. I mean, at its essential core. The purpose, the methods,
the tools, the rationale. If there is an archetype for Hacker, it's jacquesm.

Bravo!

~~~
chris_st
Absolutely correct, but for me what really nails it is the _explanation_.
That's rare, at such a high quality!

------
ziikutv
For anyone wondering, that is a USB Microscope camera which can be ordered via
Amazon[1].

[1]: [https://www.amazon.com/XCSOURCE-Microscope-Endoscope-
Magnifi...](https://www.amazon.com/XCSOURCE-Microscope-Endoscope-Magnifier-
TE071/dp/B00N4K22OA/ref=lp_2742273011_1_9?s=photo&ie=UTF8&qid=1494089043&sr=1-9)

~~~
nom
If can also recommend the Logitech HD Pro Webcam C920 for visual computing
projects like this. I was pleasantly surprised of the range of focus and that
everything can be controlled via UVC. It could be cheaper though...

~~~
ziikutv
Which of the two do you think would function better for moving objects. I got
a crappy off the shelf camera to put on a robot I made to do some work while
moving horizontally.... it moved too fast to acquire anything.

~~~
taneq
If you can control the lighting, you can use an LED strobe to capture sharp
images while moving.

------
justforlego
Would it be possible to use existing 3D descriptions of the bricks to train
the model? The LDraw library contains mainly every LEGO brick [1].

[1]: [http://www.ldraw.org/](http://www.ldraw.org/)

~~~
leipert
This was formerly discussed (and answered by jacquesm) in the comments of the
first blogpost:
[https://news.ycombinator.com/item?id=14227862](https://news.ycombinator.com/item?id=14227862)

------
modeless
Awesome project!

> I simply don’t think it is responsible to splurge on a 4 GPU machine in
> order to make this project go faster.

2 things: 1. You can rent 8-GPU machines on AWS, Azure or GCE. 2. The
incredibly wide applicability of machine learning means that an investment in
hardware might not be wasted. Even if you only use the machine for this one
project, if it helps you learn more about the field it will probably still be
a good investment career wise.

~~~
gwern
Or just use finetuning. He mentioned in the comments last time that he was
training from scratch, but I still don't understand why. If you are not
changing the architecture and you are using almost essentially the same
dataset (just augmented with some more active-learning-created datapoints),
why _wouldn 't_ you reuse a fully converged checkpoint? It would save
potentially orders of magnitude time, and it's trivial to implement: you just
load the checkpoint and pass that into your pre-existing code. It's literally
three or four lines (one line for a CLI argument '\--checkpoint' and three
lines for a conditional to load either the checkpoint or a blank model).

~~~
jacquesm
I have tried finetuning extensively, a typical run over a pre-trained set
before expanding the number of classes has the loss steadily increasing
without any clear indication of how long that would last. Maybe I should let a
test run for a couple of days to see if it will eventually converge.

Also, keep in mind that the dataset is still _tiny_ and that a method that
works for large numbers of images may very well fail if you only have a few
tens to maybe 100 or so images per class.

~~~
gwern
For finetuning on additional data, you would have to lower the learning rate
because you're only adding a few datapoints and it's almost entirely converged
as it is. If your loss is increasing, that suggests overfitting to me via a
too-high learning rate

Now, if you're changing the _architecture_ (such as by adding additional
categories of pieces), as I said, that's more tricky - what people usually do
there is something like lop off the top layers and retrain them from scratch,
possibly while freezing the rest of the NN (the assumption there being that
the learned filters and lower layers ought to already be sufficient to
classify a new category, which is reasonable since the lower layers tend to be
learning things like lines and corners, all primitives which should be able to
classify yet another square or rectangle etc).

Since this is the obvious response any reader familiar with deep learning
would have while reading complaints about how slow your CNN is to train from
scratch, it'd be good to discuss it in some detail what sort of finetuning
you've tried and how it failed.

~~~
jacquesm
I will send you an email with a re-run of my original experiments, they were
roughly what you described (take a pre-trained net, remove the last layer and
re-connect to a layer with the right number of classes), learning rates I
tried were from 1e-6 to 1e-3 and none of those had satisfactory results.

I was about ready to give up on it when I decided to try to bring up a net
from scratch and that worked quite well.

------
dpkonofa
I love projects like this because, while it doesn't necessarily have a direct
application right away, it solves a piece of a problem that could go a long
way to help something else. Reminds me of the skittles/M&M sorting machine
that someone built a little while ago. As more projects like this develop,
we're teaching computers more and more about visual recognition.

Can't wait for Skynet to go live! :-P

~~~
gus_massa
> _while it doesn 't necessarily have a direct application right away_

This is one part I didn't fully understand. In the previous post jacquesm said
that the sorted Lego sets are more expensive than the unsorted one (and a fake
piece destroys the price). So:

* Is he planning to make a few additional buck buying unsorting sets and selling them after sorting?

* Does he have a huge collection and is bored to try to find the pieces?

* Just a project for fun?

~~~
egypturnash
Go re-read the first few paragraphs of the first post:

> After doing some minimal research I noticed that sets do roughly 40 euros /
> Kg and that bulk lego is about 10, rare parts and lego technic go for 100’s
> of euros per kg. So, there exists a cottage industry of people that buy lego
> in bulk, buy new sets and then part this all out or sort it (manually) into
> more desirable and thus more valuable groupings.

> I figured this would be a fun thing to get in on and to build an automated
> sorter.

He then impulsively bid on a ton of bulk lego on eBay and ended up with a
garage completely full of the stuff.

Sounds like it started for fun, then spiraled out of control and is now a
thing he would very much like to do to get all this lego the hell out of his
life for at least enough of a profit to cover shipping it to and from his
place, if not much more.

------
Saus
Nice work, I enjoyed the write-ups. You wrote that you wanted to sell off
complete sets.

Would you be able to first make an inventory of all your available pieces. And
then load a DB with (all?) complete sets and let the machine sort different
sets in 1 bucket (starting with the most expensive set first?). Or how are you
going to get your sets together?

~~~
froindt
The optimization for value from a set of Lego parts into a compete sets would
be a fun and interesting challenge. Would be more than happy to help out if
interested.

------
datenwolf
Cool project!

One question: Wouldn't it have been easier to use a line scan camera and
tether line aquisition to the belt's movement by attaching a rotary encoder
which output would trigger individual line scans? That's the standard solution
in the industry.

~~~
jacquesm
Line scan cameras and (good) encoders are expensive, I figured I'd try to do
it with an absolute minimum in terms of fancy hardware. That sort of
constraint also helps to boost creativity :)

~~~
datenwolf
As a tinkerer let me tell you, that, had I been confronted with the problem,
would probably have taken apart one of the old Logitech Scanman handheld
scanners still collecting dust in my attic. A little bit of mechanical
modification should do the trick. These are essentially line scan cameras and
they come with a high resolution rotary encoder.

------
bootload
_" then several things happened in a very short time: about two months ago HN
user greenpizza13 pointed me at Keras, rather than going the long way around
and using TensorFlow directly (and Anaconda actually does save you from having
to build TensorFlow). And this in turn led me to Jeremy Howard and Rachel
Thomas’ excellent starter course on machine learning."_

This is why you read HN. Interesting though had Jacques not made the original
attempts I don't think the payoff above would have been as useful.

------
otaviogood
It could be fun if you released your tagged data set of lego piece pictures so
people in the ML community could try to write classifiers. Even untagged pics
could be interesting.

~~~
jacquesm
It will happen. Need more images first. Right now about 20K, it needs to grow
to about 300K then it will be really useful.

------
wolfgang42
_> [The stitcher determines] how much the belt with the parts on it has moved
since the previoue frame (that’s the function of that wavy line in the videos
in part 1, that wavy line helps to keep track of the belt position even when
there are no parts on the belt)_

I'm curious about this wavy line--does it need to be specially encoded in any
way or did you just squiggle the belt with a marker and let the software
figure out how it lines up?

~~~
jacquesm
As long as it squiggles enough to avoid aliasing it seems to work well.

------
RoboTeddy
> Right now training speed is the bottle-neck, and even though my Nvidia GPU
> is fast it is not nearly as fast as I would like it to be. It takes a few
> days to generate a new net from scratch but I simply don’t think it is
> responsible to splurge on a 4 GPU machine in order to make this project go
> faster.

Easy cloud training: [https://www.floydhub.com/](https://www.floydhub.com/)

------
unityByFreedom
Thank you for posting this follow-up!

I look forward to seeing if you can push it further by leveraging faster
hardware in the cloud.

I suspect the training time could cause you to lose interest in iterating
improvements. But, how cool would it be to make the project even better =)

------
Nexxxeh
Are there instances where multiple parts look the same from different angles?

~~~
jacquesm
Yes, many and those are really hard to get around. Two views at the same time
eliminate the vast majority though so that's why the system now works that
way. A third view would likely get rid of the remainder but is for various
reasons very hard to implement reliably.

------
Jakob
Would a training set of 3d renderings of every angle of each lego piece work?
That should be easy to produce and would make the manual labeling step
obsolete.

------
geoffbrown2014
What a terrific project! So many levels of challenges. How many different
images of each piece was needed to before you could train the system?

~~~
jacquesm
Some I now have 100's of images of some only a few tens or even just a few.
Obviously the error rates for the latter are higher.

------
tezza
excellent writeup.

has anyone applied this sort of thing to voice recognition ? i see a lot of
computer vision applications, but haven't found any audio classifiers amongst
the CV articles

------
wwarner
great write up, thank you for sharing it.

------
iDemonix
> Right now training speed is the bottle-neck, and even though my Nvidia GPU
> is fast it is not nearly as fast as I would like it to be. It takes a few
> days to generate a new net from scratch but I simply don’t think it is
> responsible to splurge on a 4 GPU machine in order to make this project go
> faster.

You should stick up a donate button, if you keep writing interesting articles
about how it all works, I'd happily throw a few dollars towards the process.

~~~
jamesgagan
Looks like he sold one of his domains, ww.com, for 3 million; guessing he
probably doesn't need donations :) [https://jacquesmattheij.com/domains-for-
sale/](https://jacquesmattheij.com/domains-for-sale/)

~~~
GordonS
Wow, looks like he is really 'hoarding' a lot of domains.

I'd be interested to know what the HN sentiment is for this kind of behaviour;
his only claim to these domains is that he got there first - he obviously has
no intention of using them beyond selling to the highest bidder.

~~~
mbrookes
I think it's shitty behaviour - but what's the alternative? I've found good
fresh domains in the past, and let them drop when the project I registered
them for (inevitably) doesn't go anywhere, only to see them grabbed by domain
squatter.

Recently I listed an unneeded but above average (trademarkable, keyword, .com)
on HN as free to anyone who could use it, with the stipulation that they pass
it on if subsequently not used.

I'd be glad to see more of that.

