Just so I understand the process correctly, did you manually sort some pieces to get a labeled training set, feed those through the machine, train the NN with that, and then manually correct the errors when sorting unknown pieces, added all those pictures to the same training set and then finally run the full training again? How many labeled images do you need to start getting acceptable performance? Are you training the NN continuously with every new image, or from scratch with an increasing data set?
Do you think a stereo camera would improve the classification in a meaningful way, or maybe a second camera from a different angle?
Yes, but that cycle repeats every day. So the training never really stops, it just runs at night and the machine runs during the day. Today it sorted close to 10K parts and those images will now be added to the training set and then I'll start the training overnight so tomorrow morning my error rate should be much better than it was today and so on.
> How many labeled images do you need to start getting acceptable performance?
Good question! Answer: I don't really know but judging by how fast the error rate is improving between 100 and 200 per 'class' so that will be 200K images or so when it is one with the 1000 most commonly found parts.
> Are you training the NN continuously with every new image, or from scratch with an increasing data set?
From scratch with every expanded set. I suspect that's the better way but I have no proof. My intution is that it is hard to make a neural net learn something entirely new that it has not seen before and every day totally new stuff gets added. So I re-train all the way from noise.
> Do you think a stereo camera would improve the classification in a meaningful way, or maybe a second camera from a different angle?
You're getting close to the secret sauce :)
I guess my lack of knowledge in the field shines through. Continuous learning is apparently under active research at the moment, and this blog post about it  is less than two months old, so your intuition was right.
If I were to guess the secret sauce I'd say that a mirror might be involved. Is depth information not worth the trouble for these kinds of classification problems?
You might be right there :)
> Is depth information not worth the trouble for these kinds of classification problems?
Yes, it would be, but there's much more to it than that. Also keep in mind that there are parts that are almost transparent and that no matter what background color you come up with there will be a bunch of lego parts that match it.
Colored strobes may also help separating out different color pieces, although I expect that would be overkill.