I tried, but in the end a straight up train-correct-retrain loop took care of all the edge cases much quicker and much more reliable than any feature engineering and database correlation that I tried before. This is roughly the fourth incarnation of the software and by far the most clean and effective. HN pointed me in the direction of Keras a few weeks ago, that coupled with Jeremy Howard's course gave me the keys to finally crack the software in a decisive way.
> It sounds like you trained the classifier manually.
Only the first batch, after that it was mostly corrections. What it does is while it classifies one batch it saves a log which gives me more data to feed the classifier with for the next training session. There are so few errors now that I can add another 4K images to the training set in half an hour or so.
> Further to that, if you have pricing data on sets you have a nice little optimisation problem - given my metric ton of parts, what are the most valuable complete sets I can make?
I'm on that one :) And a few others that are not so obvious. There is a lot to know about lego. Far more than you'd think at first glance.
I'd love to hear more about what you tried specifically. I'm considering doing this myself, and I was thinking of building a very large labeled dataset of 3d rendered images using the LDraw parts library and training on that. I could include hundreds of images per part by using different viewing angles, zoom levels, focus, etc in the rendering process. Did you try anything like that?
After endless messing around I finally bit the bullet and trained a neural net, from 0 to 100 in a few weeks and it is rapidly getting more usable now.
The feature detection code may get a second life though: as a meta-data vector to be embedded in to the net. But only if it is really necessary.
I'm quite curious though if you can get your method to work, especially for the parts that are very rare and rare colors.
I was assuming that at minimum I'd need to do a lot of filtering in order to get the camera images and renders into a state where they are similar enough to work for training.
Any chance that you'll be releasing source code for this project and/or your labeled dataset?
Yes, but not yet. It needs to get a lot better before I'm going to stamp my name on it as a release. Right now it is rather embarrassing from a code quality point of view, it has been ripped apart and put together several times now and every time it gets a lot better but we're not there yet.
Just so I understand the process correctly, did you manually sort some pieces to get a labeled training set, feed those through the machine, train the NN with that, and then manually correct the errors when sorting unknown pieces, added all those pictures to the same training set and then finally run the full training again? How many labeled images do you need to start getting acceptable performance? Are you training the NN continuously with every new image, or from scratch with an increasing data set?
Do you think a stereo camera would improve the classification in a meaningful way, or maybe a second camera from a different angle?
Yes, but that cycle repeats every day. So the training never really stops, it just runs at night and the machine runs during the day. Today it sorted close to 10K parts and those images will now be added to the training set and then I'll start the training overnight so tomorrow morning my error rate should be much better than it was today and so on.
> How many labeled images do you need to start getting acceptable performance?
Good question! Answer: I don't really know but judging by how fast the error rate is improving between 100 and 200 per 'class' so that will be 200K images or so when it is one with the 1000 most commonly found parts.
> Are you training the NN continuously with every new image, or from scratch with an increasing data set?
From scratch with every expanded set. I suspect that's the better way but I have no proof. My intution is that it is hard to make a neural net learn something entirely new that it has not seen before and every day totally new stuff gets added. So I re-train all the way from noise.
> Do you think a stereo camera would improve the classification in a meaningful way, or maybe a second camera from a different angle?
You're getting close to the secret sauce :)
I guess my lack of knowledge in the field shines through. Continuous learning is apparently under active research at the moment, and this blog post about it  is less than two months old, so your intuition was right.
If I were to guess the secret sauce I'd say that a mirror might be involved. Is depth information not worth the trouble for these kinds of classification problems?
You might be right there :)
> Is depth information not worth the trouble for these kinds of classification problems?
Yes, it would be, but there's much more to it than that. Also keep in mind that there are parts that are almost transparent and that no matter what background color you come up with there will be a bunch of lego parts that match it.
Colored strobes may also help separating out different color pieces, although I expect that would be overkill.
This way you could also cross-reference with part seller sites to see the going rate for a part and determine whether it's worth your time to separate it manually. Have a bin for rare parts worth separating by hand.
Yes, but that did not give me accuracy enough, so now I train from scratch. I had hoped to save having to train the conv layers.
> also, did you use batch normalization?
> also, did you try ResNets?
> you probably don't care at this point, but all of this would __massively__ decrease training time
Oh, I care all right :) I'm re-training the net every evening (it's running right now) after adding another batch of training images.