Hacker News new | past | comments | ask | show | jobs | submit login
Sorting 2 Tons of Lego, Many Questions, Results (jacquesmattheij.com)
322 points by darwhy on June 28, 2017 | hide | past | web | favorite | 205 comments

I'm begging you, please post a full-speed video that shows a seemingly random assortment of pieces sorted into boxes with identical color/size/shape pieces. I think it would be insanely satisfying for many people to watch. I would literally grab popcorn and watch it for hours (and I think many others would too).

Given your current setup, it might be easiest to just take one bin of identical pieces and then sort it by color into various bins. I'm salivating over what this would look like.

If you posted that video, I bet it would rack up millions of YouTube views.

Ok, will do but first I have to speed up that first feeder stage because I really don't want to stand there for an hour to drop pieces on the recognition belt.

Any good ideas for that first stage are very welcome. I've looked at all kinds of mechanism (belts, pickers, vibrating drums) and none so far have the required speed, regularity and are quiet enough to be used in a home.

https://www.youtube.com/watch?v=wqAsOxUrXIw at about the 54 second mark you can see how this system does size sorting of trash (legos are not trash)

A combination of pins and holes will get you roughly sized items.

A vibrating table with lots of W snapped channels underneath is going to help too.

A belt with "scoops" is a great way to feed out of your initial hopper, if you drop it onto a perpendicular belt (one with W grooves) your going to get a good initial separation more naturally.

Air knives and puffers are your friend as well. Since you already have a detection system running it twice might be of benefit to you, just puff off anything that you can't sort as a single item.

I don't think "quite" and "lego" really go together so no happy solution for you there other than sound isolation.

What about a big subwoofer for a car, and you can try driving it with random sound waves until you find one that works. Or go through a set of tones that can hit the resonant frequencies of various pieces?

Noise is mentioned as a problem. That said, hearing it sing to its lego would be interesting.

That's a very nice idea, I will try that. Thank you!

Proposed feeder design:

Massive hopper (non-tapered if the tapering causes jams), feeding into slowly rotating (60 rpm) spiral-inner drum, falling onto very fast moving (10 feet per second) belt.

Parts get spread into a ~2 inch thick layer inside the drum, and then falling (at least a foot) onto the fast moving belt will then likely separate them further. For parts that end up entangled, end up on the belt too close together, or are unrecognisable in the orientation they fall, simply redirect them back into the top of the hopper to be resorted.

Even a 50% successful sort rate should be good enough.

Hmm. Tough problem. What about a couple of "tables" which vibrate back and forth at a really shallow angle, just enough to move pieces in one direction? I have gravel sorting tables with screens in mind, but the same principle with a solid table to just break down a heap into a flat layer may just work.

I think that a couple of tables, cones, and dividers (to create lanes of pieces) could do a reasonable job of turning a heap of pieces into a steady stream of individual pieces.

what about a screw feed?

I've tried that but it kept damaging the pieces. Is there a way to 'tame' a screw feed so that it doesn't eat up pieces that are sloped between the tube and the screw?

A coat of teflon could be sufficient for the taming of the screw.

Make the outer edge of the screw have door brushes[1]?

[1]: http://www.formseal.co.uk/

That is an excellent idea. And then elongate the screwthread spacing as it gets nearer to the drop off point to separate the pieces further.

Palm sanders are frequently used to vibrate stuff like concrete forms (There are also specialized tools).

How about a concrete drum mixer that was slowly rotating and slowly tilted :)

Too noisy

Heck, make it a Twitch livestream and you will get corrections via chat.

Probably become the next big twitch thing too. Probably making some nice cash.


Yes crowdsource the corrections from the viewers to automatically feed into the training pipeline. Sounds Twitchworthy.

What about 4chan? As soon as they found out... Trolls, trolls everywhere!

I don't think there are that many troll-shaped bricks...

I stand corrected. :D

I was talking about 4chan users trolling the system with fake data =P

> The next morning I woke up to a rather large number of emails congratulating me on having won almost every bid (lesson 1: if you win almost all bids you are bidding too high)

That reminds me of the economist George Stigler's observation: "If you never miss a plane, you're spending too much time at the airport".

That's a strange observation. I would think an economist of all people would know that the negative utility of missing a plane is huge compared to that of spending a little unnecessary time at the airport. I'm rarely doing anything productive with the time prior to the airport arrival on "travel days" anyway.

It's an analogy for understanding the idea of fair value. If you want to spend a "fair" amount of time at the airport, it means that you need to balance the utility of missing one plane against the utility of always spend a lot of time at the airport in order to never miss a plane.

If you take 1000 planes in your life, and spend an extra 1 hour per plane in order to be sure that you will be on time, that's 1000 hours in your life. If, instead, you cut down that hour to 30 minutes, and that makes you miss 3 planes in your life, you'd still have "earned" 500 hours, while losing 3 planes. It's a better trade off overall.

But yeah, just an analogy, I bet the guy never missed a plane...

If you have lounge access, then the extra time at the airport is just as productive as at the office. No lost time, and plenty of peace of mind.

Depend how often you travel by plane: all that little lost time do add up.

The concept of 'lost time' should be abandoned, so our lives become less hectic again.

Do they add up to more than the time you spend waiting for the next flight, which may take a full 24 hours?

I guess that's a rhetorical question, because the answer is the same to what you wrote; nvm.

A few years ago a friend and I started work on a similar machine for sorting Magic: the Gathering cards. We weren't using any machine learning (this was beyond our capabilities) but our design was similar, except the bins were on a moving platform which slowed down as the weight of the cards in bins increased - due to the cheap parts/weakness of the motors we were using. Unfortunately it occupied an entire room in my house and was never completed and ultimately dismantled.

That's really interesting actually. In the distant past I spent more time sorting the cards than playing the game so there is probably a good trade off on automating it.

I went and bought a duel deck the other day to play with the wife so this is going to become another issue in the future again.

There are card identifier apps that connect to your webcam - you just flash a card under the camera and it will identify the card and set (used most often to give a quick market price or help you catalog your collection digitally) but you could definitely fit that into a larger automation project.

Yeah I remember seeing these for the first time not long ago. We were scraping magiccards.info without the owner's permission to use color coherence vector for comparison. As software engineers this was the really fun stuff for us - trying to figure out computer vision problems having absolutely no experience.

Do you have any pictures?

No; I had the bins and the conveyer/webcam/structure in my garage for years until I finally tossed non-Lego Mindstorms parts.

Fun, what were your sorting criterion? Set alone or more?

We were trying to it to sort by set at first - when ingesting a large collection of cards (my wife owned a brick and mortar card shop) they were often unsorted. Then with multiple passes we wanted to pull rarity, color and even alphabetize uncommons and rares [way, way too many passes]. The real goal was to be able to process a booster box quickly, so when a new set was released 40 cases in singles could be available very quickly - never even came close to that goal. It turns out that on release night customers are always willing to sort a booster box for you for a pack of cards...

Customer labor - love it!

If, like me, you find photography of piles of sorted LEGO pieces hypnotically soothing, you will probably like this album I put together:


A while back, I sorted my childhood LEGO pile before sending it to my brother. 73 pounds of LEGO sorted by hand. I'm sure it says things about how strangely wired my brain is, but this was an eminently enjoyable endeavor.

That's very impressive to do by hand, 10 Kg or so back into sets takes a week by hand, how much time did this take you?

Sorting into sets would be a lot harder. Just grouping by similar pieces is a fairly fast, automatic process. I could just sort of zone out and run on pleasant zen autopilot. I didn't need to keep running counts, etc.

I found that by looking for pieces with certain shapes, my brain/eyes would lock in after a little while and it got really easy to spot them as I sifted through the pile.

What I find is that if I look at them myself I tend to match shape and color, then I have to really force myself to look for the same shape in a different color. Very strange how automatic this is. It's like some subroutine taking over.

Yes! It did take some effort to untrain myself from matching on color.

Another one like that: Looking for a certain shape, being 100% sure that you have all of them, then you start looking for a different shape and you keep on finding that previous shape over and over again.

The album says it was about a week of work. It really is a very impressive album, too!

A week is insane speed for that quantity. That's day-and-night work.

Not that bad. Usually a couple of hours in the evening after the kids went to bed for a couple of weeks.

Did you use any sieves or other tricks to speed it up?

Nope. Spread it all out on a bedsheet and worked my way through the pile. I did (roughly) one category of piece at a time, starting with the bigger pieces.

Even more impressive. Large to small is exactly the right way to do it. You seem to have thought through the problem for a bit before embarking on it, almost every choice you made is optimal or near optimal to get this done in an acceptable time.

It's interesting how much you learn about how our minds work by doing a job like this. All kinds of insights into color discrimination, shape recognition and so on, as well as how much of a job gets pushed down to our subconscious once we've done enough of it.

Sorting things by hand is very relaxing and built in to us humans. I sort perls with my daughter when we make those perl-plates that you can melt with a hot iron. Or puzzle pieces, lego, whatever:)

This series is the best thing ever.

An idea: I have a three-month-old daughter who I'll definitely be buying Lego for when she gets old enough. What I'll be looking for is whatever's the least valuable in your collection -- big bins of cheap mixed stuff to see what she invents. Maybe that would be a cool side output for your project? Cheap chaff packs for kids of hackers ...

Oh that's a great idea. Thank you! I have garbage bags full of 2x4, 2x3 and 2x2 bricks in common colors that are worth relatively little. I'll ask my panel of experts (5 and 7 years old) to see what they would do with such a treasure trove.

I second that- when I was a kid I had tons of common bricks (1- and 2-by-N's) and regularly built decent-sized houses with full-height walls and multiple rooms and such. My 5 year-old daughter recently graduated from Duplo to Lego and I don't see a good way of building up a decent foundation of just "normal" bricks from which she can build lots of things. I definitely feel like there's a lack of sets with good quantities of them. Even the large "Classic" sets are pretty light on them compared to other pieces, and what they do have is spread out over like 15 colors, so you can forget about trying to make anything match.

I've looked on eBay, but most of what I see in bulk there are lots of just one size or one color bricks, or just pounds of completely random pieces. I'd love it if I could find something like an assortment of 1,000 mixed 1-by-1/2/3/4/6/8 bricks in 4-5 colors, or the same with 2-by-N bricks.

Not a fanatic builder at all, been almost two decades since I built something with Lego, but really surprised to see that the categories are not color separated.

If you're building any esthetic Lego project would have color restrictions. Also I imagine some colors are worth more than others, just because they're used in more projects.

Also, color separated Lego bins just look pretty.

Assuming there's a practical limit to the number of sorting categories that rules out separation by color and function, I'd much rather have them separated by function as shown.

Is it easier to find the functional piece you need in a bin full of same-colored pieces, or to find the color you need in a bin full of same-function pieces? For me, the latter would be far faster/easier.

That said, I do agree that individual pieces labeled by function and color is probably ideal for sale to buyers looking for a single elusive piece (or groups of pieces).

I had mine separated out by type (function) as you describe, and unsorted by color. That worked, and would continue to do so until the collection reached a scale far greater than anything I ever contemplated.

You're right about the need for individual separation when selling parts. Bricklink does that, so that people in need of, say, exactly four orange cheese wedges, can go and buy them. That seems to be the preferred style for most builders, since it makes possible a mapping from a parts list to a purchase order; there are other styles in use, including bulk sale by mass or approximate quantity, but the individual price for a given part is much lower when sold in bulk than alone. So if it can be made feasible to automatically, or mostly automatically, sort down to the individual part level by type and color, that'll show the best return on investment per part at the point of sale.

(Update: based on jacquesm's comments elsethread, I don't suppose I can any longer recommend selling on Bricklink. But I'd imagine the pricing effects I describe are similar elsewhere.)

If you have a bin of different Lego pieces of the same color, you have to carefully look for the piece you want.

If you have a bin of the same pieces in various color, you just pick the color you want.

I suspect this is our evolutionary trained neural net looking for fruits in vegetation that helps us here.

It makes me wonder if there are species (that have full color vision) that are better at the opposite. Is shape discrimination simply harder than color discrimination, or is it related to our evolutionary history as you posit?

I would say color discrimination is easier because it's basically 1-dimensional. In the best case you could do it with a resolution of a single photoreceptor. If you account for ambient lighting, shadows, etc you still just have a 2-dimensional task.

Shape discrimination is a 3-dimensional task. In the worst case you need to change your point-of-view or the orientation of the object.

Adding color sorting is fairly easy, the only hard to distinguish colors are the grays, especially old and new gray.

I think this is more proof of concept. You could probably easily add that sort criteria yourself.

Why not go big? Write up a business plan that involves buying 100 tons of lego, making a bigger machine with 200 bins off the belt, or whatever, and making a new brickowl / bricklink site of your own. I'm sure you could find investors, you've got great publicity already. And you could propose it as a general sorting AI (or web sale of small item) firm in the long term, with the lego being your initial application.

You'd need to solve the inverse problem of how to efficiently go from bin to packing container, which doesn't seem too hard. Rotating carousels of bins with little robots or something. Set the item on a tray, and if isn't the right item, puff it back into the bin.

If it won't work small it won't work big. First learn to walk. If I can overcome all the challenges and there is a sufficiently large market for the end product and when all the costs (energy, for instance) have been accounted for there is still a business case left then this might be worth it. But there is a ton of work to be done before you could begin to answer those questions. For now it is a fun project and I'm trying hard to keep it that way. I've been approached by some of the larger bulk resellers that would love to get rid of the humans they employ, none of them made a good enough case why I should partner with them or even seemed to be nice people.

The general nature of neural nets for sorting and classification I think is well recognized in industry and there are many companies that are capitalizing on this (and have been for some time). I highly doubt I'm doing anything original in that sense.

If someone wants to give you 100k euros and develop some interesting technology, it seems like it would make the project more fun.

What does anyone else think? Should Jacques open it up for investment?

I doubt anybody would give 100k euros. What is more likely to happen is that they would want equity in some company, a seat at the non existent board table, some wild eyed exit plan 3 years down the line and if they don't get their 20% ROI within the first year they'll be staging a minority shareholder revolt and possibly a lawsuit due to mis-management.

For 100K I will gladly sidestep all those problems and self-finance.

The problem is that 100k is nothing when building machines for professional/industrial use. Cost of material and various prototypes adds up quickly.

True. But there are a number of business models that don't require that.

If it were me, I'd position it as an AI startup. I think Jacques might overestimate how awesome existing solutions are for sorting. Customer provides belts and bins, he provides software. The lego is a proof of concept and prototype.

Or he could do a lego website startup, and expand his project and purchase inventory, and operate a warehouse of lego. Bootstrap this way until the software is ready for general use.

Or he could construct systems like the one he has built and sell them to other lego resellers, and expand into industrial or agricultural sorting in time.

Are there CAD models available for the various Lego components? I'm wondering if, given such models, you could use rendering software to generate labeled training images. That could allow you to get a large training set with many images for each kind of component in many different orientations, even for components you do not yet have.

If the CAD models are not available, how much work is it to make a CAD model given an instance of a particular component? I'd expect that for some components, such as those NxM plates, you'd only need to build one model by hand in CAD software, and then could algorithmically generate the models for other sizes.

I don't know CAD models, but it would probably make sense to scrape the lego website where you can select all their bricks. Each named brick has 8 images (various orientations):


I find it hard to believe that there are only 1421 bricks, including color combinations.

Example: "Apple with leaf", Color Family: Dark Green, Exact Color: Bright green, Category: Foodstuff Element ID: 4107050, Design ID: 33051, orientation 8: https://sh-s7-live-s.legocdn.com/is/image/LEGOPCS/4107050_s8

...This also makes me wonder whether it would make sense to set up multiple cameras, to get multiple orientations for every block. Use the image that has the lowest classification error, but use all orientations to train the network.

The Lego pick-a-brick store is very very limited.

The images are not very useful to me because the images the machine takes are very different from those. But maybe it would somehow contribute. I'd be very ware of using other people's copyrighted content though.

I do use multiple orientations.

Point taken re copyright. Re similarity, it may be possible to write some sort of imagemagic filter that makes the bricks appear more realistic somehow. (But yeah, this would probably only be useful as a starting point for training the very rare bricks)

> set up multiple cameras

I believe he mentioned somewhere that he's using mirrors to achieve essentially the same effect with a single camera.


Jacques, John Tooker might be able to help you with the sorting issue: http://lego.jtooker.com/sorting/.

Thank you! That's a great hint.

I actually just gave away all my Lego to a colleague whose three sons are much better able to appreciate it than I am, and (because!) it's been quite a few years since I actually built anything.

But, that said, were I looking to buy parts, I'd hesitate to do so in bulk by these categories, unless the price were very good indeed - between the inability to match a parts list to a purchase order, and the extensive manual labor required to sort parts sold by mass at a useful degree of granularity, they'd need to be very cheap in order to make the purchase worth my while.

(On the other hand, most of my builds back in the day were spacecraft of various fictional types, and I heavily favored complexity, which means a large number of small parts, designing in wiring channels for LED lighting, and the like. I'm not sure how representative my comments here are of any other style.)

At some point you can completely reverse entropy and re-create the original sets :-) But in terms of just selling off excess Lego looking at the mix of the builder sets would let you figure out a sort of SKU you could put together and sell on Ebay or else where as a 'builder pack 1', 'builder pack 2' etc.

> At some point you can completely reverse entropy and re-create the original sets :-)

Already did that for a batch, about 150 partial sets as a result. Those would then need to be completed and checked by hand, I don't think that can be done profitably, but maybe there is some way.

> But in terms of just selling off excess Lego looking at the mix of the builder sets would let you figure out a sort of SKU you could put together and sell on Ebay or else where as a 'builder pack 1', 'builder pack 2' etc.

Yes, and right now I'm looking for what the contents of those packs should be.

I think your best bet to minimize human effort is this:

* Run the machine over a sample of raw parts, just counting what it sees. Use that estimate to predict which sets can be completely made with high certainty.

* Next choose 6 types of set which can be made, and program the machine to put into 6 bins the correct parts with the correct ratio to form the sets. Anything uncertain or not within the ratios goes in the junk box.

* Now rerun the contents of each box, this time programming the machine to deposit one complete set into each bucket, again discarding parts which don't have a high probability of being correct. Make it beep when it believes a set is complete.

* Now you have an operator weigh the complete set, and if it matches the correct set weight, bag and ship it (self sealing padded bag and auto-printed label), and if not, throw it all back into the input hopper.

Total human time per set should be sub 30 seconds, so assuming a large enough market, it sounds profitable. Assuming 300 pieces per set and 5 parts sorted per second with a 50% failure rate, you would need 4 machines to keep a human busy for the 2nd part of the process. The first part is slower, but could run without supervision.

Assuming you have a 95% part recognition accuracy first time round, and 99.9% the 2nd time (since you are selecting from what is already believed to be 95% the correct part mix for the job, and it's fine to have a very high reject rate), most sets of <1000 parts will be all correct. The weighing step will then probably weed out >75% of errors, and the remaining errors are likely only color related and not so serious.

With this method, you should be able to get major (ie. not colour) errors in sets of 300 to below 3%. Make that lower still by including a couple of extras of common and cheap parts.

For the remaining 1%, just send out a new pack entirely if the customer complains. It adds 1% to your costs. Big deal.

Wonder if there would be a copyright issue with selling original sets as well? I'm not a lawyer, but I believe in the US you can't copyright a recipe but you can copyright a recipe book.

I wonder if though, down the road, you could setup a web site with the complete inventory of pieces that you cataloged, and let people pick and choose what pieces they want for a specific order.

Then just put the pieces through the hopper to pick/pack individual orders. Maybe get a super accurate scale, weigh pieces when catalogging and weigh the end sets to help ensure order accuracy?

Again, I'm not very knowledgable, but I think the first sale doctrine just says it's now your lego and you can sell the lego.

If you start copying the piece assortments ("lego recipes") that lego also sells, that's where I'm unsure that you're in a safe legal spot. Lego put time and effort making sets for specific purposes, and I would think that those "lego recipes" are possibly protected.

Especially if you were thinking of selling a large number of different sets, which would mean you'd be creating kind of a "lego recipe" catalog?


> Already did that for a batch, about 150 partial sets as a result. Those would then need to be completed and checked by hand, I don't think that can be done profitably, but maybe there is some way.

There's likely someone who's willing to pay X% of the set's price for a bucket that has a 95% chance of having all of the pieces, and a 100% chance of having too many pieces, including some that don't belong in the set, since you'd want to err on the side of caution.

But even that might not be profitable.

I think the problem with this would be to find sufficient sources of random bricks where the good stuff has not already been removed.

E.g. minifigs can get high enough prices in some instances that the total value of a set can be higher if you sell the minifigs and higher value bricks separately, because some people want to buy just the minifig, or need some piece that only appear in a few sets.

This would be my main issue with buying Lego to resell - I'd want to be very sure I'm not buying from other resellers where there'd be a risk it'd been stripped of the more valuable pieces.

Would it be possible to modify the sort such that it counts parts it identifies?

It already makes two kinds of logs, one textual log and an image log, I use the image log to expand the training set, the textual log for debugging purposes.

So could you change the sorting rule after X number of pieces are collected? This could them be run through again to double check/increase accuracy (i.e. Build sets this way).

One idea for selling the parts would be to keep a precise inventory of everything entered into the system. A precise inventory + a database of the parts needed for any LEGO set would allow customers to order an arbitrary set for a nice price. A total system which would take random parts as input, and output packaged complete LEGO sets would be neat.

Ditto. I think that's definitely what would sell at the best value.

Coincidentally, it kind of simplifies the sorting problem too. Just do like Amazon and don't sort!

Instead, use the machine to figure out exactly what was in that bin you just bought. Give the bin an ID, and store it as is. Then, when putting together a set, have your software find the minimum number of bins you need to pull from to assemble the set. Run them through, and have the machine pull the parts.

This is already possible, but I'm not using it because in some trial sales I've found that sets are rather hard to sell profitably:

- the sets have to be absolutely perfect to be worth something

- you need to check to make sure all parts are present, this is hard enough for Lego which starts from known quantities and brand new parts, with second hand parts and people being ultra picky about such details as year-of-manufacture of the bricks it becomes an intractable problem.

- Rare parts are really rare. In fact, the only parts you would have to document like this are the rare ones, and 'rare' is actually one of the categories that I can sort for. This gives the option to sell only the rare parts for a certain set and to leave the bulk parts to some other method. Much easier to do that profitable.

Gotcha. The gap between hobby-grade solutions and business-grade solutions blows my mind every time.

Can you choose which sets to assemble based solely on the rare pieces you have in hand (i.e. obtained), prioritized by some notion of complexity-weighted value of the set?

Said another way, assemble sets around the rare needles you have stumbled across unintentionally while sorting the haystack.

This is almost exactly what https://www.bricklink.com/ is for. Sellers post the listing of which parts they have, in what colors/quantities and at what price, and buyers can custom-order any permutations they need. To my knowledge it's the most popular Lego parts resale site among AFOLs.

This is all true. But: bricklink stores are captive so bricklink store owners are just creating brand recognition for bricklink, they can't legally sell their stores and bricklink.com has been sold to some Hong Kong entity after the original founder died. Slowly bricklink sellers are being squeezed for more of their profits because the site has critical mass.

And then there is the 'incrowd' which does everything they can to get rid of newcomers. All in all not too impressed with Bricklink in its current incarnation, the Lego group, which used to quietly promote bricklink has gone completely silent on them.

A final item about bricklink that I don't like is that they tend to sue or send C&D letters to other Lego related sites for the use of 'their' content whereas that content is all user generated, it's just that Bricklink overnight went from having all that content owned by the contributors to claiming copyright on images and terminology. They really pulled a Gracenote/Freedb stunt here.

Yikes. I haven't done any meaningful business on Bricklink since before Dan died, and it's always been a somewhat prickly place, as often happens with niche hobbies - but I had no idea how it's gone downhill since. Sad to see, but it does make me feel like I dodged more of a bullet than I might've realized when I decided to give away my own collection instead of trying to sell out of it.

It's a real pity, Dan was a super nice guy with his heart in the right place, the present owners are not in any way approaching this in a nice way.

Wait, how can Bricklink send C&D letters, unless it's actually owned by Lego? It's like Ebay sending C&D letters to Craigslist because someone happens to list something that's also solde there.

That's part of the craziness. Most small Lego trading sites are not wealthy enough to defend against this so they fold.


Imo they don't have a leg to stand on with 'their' numbering system because it is based on Lego mold numbers to begin with.

That would require a robotic warehouse as well to pick the orders in a profitable way.

Possibly not, right? If you don't sort but instead use the machine to do the inventory of your bins, then you could have your software tell you which bins you are going to need to fulfill your orders for the day. The big "if" in that scenario is that your machine is not categorizing anymore but identifying. I don't have the experience to assess how much harder it is.

Then you'd go and pick the bins, run them through, and the machine assembles the sets from those bins. That's similar to the way Amazon does it. Now they have the shelves on robotic trolleys that bring them directly to the packers, but that's just a required efficiency at their scale.

I guess the problem with this scheme is that you move the problem from classifying to identifying... twice. So the precision requirement goes up. I don't know how big your dataset would need to be to require minimum human intervention.

That, and you'd waste a ton of time because in such a re-run you'd be looking for a very limited number of parts from a much larger bin. So in the end you'd run around with bins rather than picking the parts from pre-sorted and binned by color and type parts. Either way it is a boatload of work.

The closest I've come to this is where you identify which sets are present in a batch by doing a trial on a sample (that's a pretty easy statistical job), and then sort directly into sets starting from the largest sets down. That way you reduce the parts count very rapidly. So, I did this for a bit and now have 18 60 liter crates of almost complete sets which all need to be manually completed and checked. Again, not profitable.

If you just want to do this to keep busy it is easy, if you want to make more than what you could make by flipping burgers it is surprisingly hard.

How many distinct Lego parts are there? Assuming they're all sorted into separate bins, picking them doesn't seem too hard. I guess sorting + putting them in the right bins would require something more complex. Surely there's a way.

About 40,000 shapes, about 100 different colors.

And picking them is stupendously hard due to the large variety in shapes and sizes.

Ah I see. Looks like that ideal setup is a ways off. It would be really cool to see it happen some day!

Thanks for the heads up. Apparently serving up large numbers of images through ssl kills the server :( Working on it. I've disabled some of the images further down in the article to make it load again, so at least it should work.

Ok, should be much better now. Ironically, your helpful cache link made things worse because it kept people clicking on a version of the page with huge images, I totally forgot to resize them. Now I've copied the small images over the larger ones so that this link works too and that appears to have solved the issue.

On commercial pecan farms, the nuts need graded based on color, size, and variety. Currently, pecans are tumbled in rotating chutes composed of grates of increasing spacing, so that pecans are sorted based on size. Color and variety are not filtered, ultimately resulting in a lower sales price for the farmer. Your device could take this type of sorting to the next level.

Optical sorting is pretty common in agriculture. Sounds like these guys do what you are proposing for pecans among their competitors: http://www.buhlergroup.com/global/en/process-technologies/op...

Optical sorting in Ag with devices less than $1,000 ... not common.

That's a good point, industrial machinery is very expensive, smart industrial machinery is extremely expensive.

That's a little grim. It's the method used for sorting rocks too, so that your gravel doesn't have bits that are too big.

With the machine working so well, you could run each sorted batch through it again, and sort into very specific pieces/colours, if you wanted.

You just need to re-train on those classes instead- but I bet the front-end of the tensorflow network would already be trained well enough that re-training for new classes would be very fast.

IIRC the machine sorts like that already, but you can't have a bin for every separate type - you'd need a warehouse of boxes, and the boxes would have to be of several sizes. So I think the pictures just illustrate how he's chosen to cluster for now, as an example to people who try to decide how they'd actually want their purchase to be grouped.

Correct. A typical bin contains many different part ids but it would be trivial to make other divisions.


So each item is actually identified right down to the size, colour, type, etc, but then literally bucketed into a group of similar items?

Yes, because otherwise I'd have to make very many passes through the machine to sort a particular batch.

I was thinking that you had trained the identifier just on the bucketized classes instead- fewer classes comprising all the similar parts. Maybe higher identification rate that way, or maybe not. Identifying right down to the specific colour and part means a heck of a lot of classes to train though.

This kind of stuff fascinates me. I've worked on software for package sorting machines in Amazon warehouses. Very similar idea to this: identify, remove from conveyor at right place/time. Only the machines are millions of dollars, run at very high speeds, and use barcode scanning for identification.

I can only dream of the kind of setup that you could make with a large budget. My only advantage is that I can put a lot of time into this project.

One approach that seems obvious is to re-sort based on orders. If someone wants 30 different piece types, but all in blue, that'll be a pain. But if someone wants any subcategory of what's already identified, that could be approached by pulling the existing box for a new sorting run.

I've thought about this machine a number of times since the previous post. In that post, there was a movie showing the machine's operation, but it was operating at a slow speed, and he's mentioned multiple times that there's been speed improvements.

Does anyone know if there's a realtime movie anywhere?

The speed of the machine is mostly a function of the mechanical feed speed of the parts. This is the hopper/belt mechanism and it is by far the slowest part of the machine.

Once the parts are on the recognition belt things more very fast, a single part can be scanned and recognized in about 30 milliseconds, so roughly 30 parts / second.

Making the machine much faster then means making that feed mechanism faster than it is today (roughly 1 part per second tops) and that's the challenge I'm looking at right now. Ideas more than welcome! Even bad ones, you never know which bad idea with a twist can become a good idea or even a very good idea.

Not to state the obvious, but if the "hopper" is slow, why not just have multiple of them? Are they expensive? Take up too much space?

I assume you've thought of this but I'm still curious.

Really cool stuff by the way!

Space wise this is hard, also parts tend to fall in clusters so the second belt now has to be much faster than the first one to make sure there is enough separation. I do have a second belt that I have not used yet so if there is no better idea then that is what I'll end up doing.

So just to make sure I got this right, you need a way to get a single piece of any size out of a container full of random pieces, and onto the first moving belt, at a rate higher than 1 piece per second?

I'm not familiar with the project but this seems like an interesting challenge.

Yes, that's roughly the gist of it. This is a hard problem because Lego comes in many shapes and sizes, a pre-sieving step limits the pieces that hit the belt at 42 mm largest cross section. So anything below that (but possibly much longer than that, all the way up to 2x1x16) can be found in the hopper.

There are some really good suggestions in this thread, the one that stands out for me most is the brush edged screw feeder, I think that one has most properties just right to work.

Perhaps the dumbest idea would be to build a second feeder.

Would double the speed and sounds like the rest of it is fast enough that you wouldn't even need to worry about parts falling on top of each other.

That's not a dumb idea, however: space is an issue, the rest is fast enough but even so parts do tend to fall in clumps and if two such clumps were to fall at the same time it could lead to a jam which requires manual intervention.

Ideally parts would land on the belt in a very regular rhythm, without any kind of irregularity whether the parts are 8 mm round 1x1 plates or 16 long 1x1 bricks. You need about 3 cm of space on either side to make sure the right part gets blown off the belt, that's the most important thing that will eventually limit the speed.

Oh, another factor limiting the speed is the maximum acceleration over the stretch towards the camera, otherwise the parts are still moving on the belt when they reach the camera and then you get bad images.

My five year daughter old mostly just wants to build castles, so if there was a less-expensive way to get large number of 2-peg width bricks (especially in pink and purple, which she reminds me daily are her favorite colors), I could see that being useful.

Watch out, they are modern day caltrops. My son really enjoyed them and I bought him obscene amounts, after he learned to stop leaving them on the floor.

Initially, after stepping on one in the middle of the night, any that I found left on the floor went right into the trash. I assume he dug them back out, being a kid and all. However, he learned.

Eventually, he amassed a giant collection. With his permission, I gave them away to a kid that lives nearby. There were six large tote boxes and several dozen box sets. That kid now wants to be an engineer, though my son went into biology.

So, they are really great and it's even better that the old blocks still work with the new blocks. Just look out, they're caltrops, on par with a four-sided die.

I had to look up what a caltrop is:


You should play more D&D.

Yeah, watch out for them when you're not wearing shoes. Evil buggers.

Almost only thing worse is a UK electrical plug.

You can order them.


Order all of them.

Bricklink has a much wider selection and better prices.


It seems wild to me that there's someone with a single black 2x4 brick in their inventory: https://store.bricklink.com/Teo.92&itemID=119350903#/shop?o=...

Testing, maybe? I might do likewise to get a sense of the UI, markup, sales flow, et cetera, before sinking a bunch of time listing everything.

Maybe, but they've got 213 other items for sale.

Oh - didn't see that.

If I'd decided to sell out of my collection instead of giving it away, I'd have had some parallel cases - any large enough collection is liable to include a small but significant proportion of one-off parts. I'm not sure a 2x4 black brick is likely enough to sell that it'd be worth listing alone, but I suppose someone with a sufficiently completist bent might go to the trouble.

I wonder if they bought some sets, and are selling based off known contents - since they're technically pre-sorted.

> Overall the machine works well but I’m still not happy with the hopper and the first stage belt. It is running much slower than the speed at which the parts recognizer works (30 parts / second) and I really would like to come up with something better.

It sounds like your existing hopper is reliable but slow? If that's the case, could you just make two (perhaps somewhat smaller) hoppers and merge their outputs?

It's not even all that reliable, plenty of times it will vomit 5 to 10 parts in one go because they got clumped together somehow and when one falls the rest will too. I've tried to improve on this by using a slide towards the second belt and this works sometimes but not always. Still plenty of 'multipart' images.

I suspect Mobile Frame Zero ( http://mobileframezero.com/mfz/ ) players would be delighted to get a source of the more esoteric bricks that go into their builds

Oh, for - of all the things to discover the month after I give away my entire collection...

I wonder if Jaques tried to do dropout to increase the efficiency? Also if he made the trained neural net available you could made a lego identifier app.

I've been thinking about making such an app. There are some problems in using the net as it is to power an app: the machine has a uniform background as well as multiple images of the same piece. I don't think the net as it is right now would easily work to recognize images taken with a smartphone. But if I had access to a large number of such smartphone taken images I do think that I could use the method I have right now to help label an initial training set from which a much better network could be derived.

Could you artificially vary the background of your current training set to create a larger and more varied set of backgrounds to train against initially. That is, if it's a uniform color or pattern just treat it like a green screen. Then use real photos with different background and lighting conditions to refine it later?

Yes, this is possible, already I know with fairly large accuracy where the part is in the image and changing or texturing the background is possible. But the optical setup is quite complex and hard to replicate with a smartphone, that would need some work.

You may be able to normalize the labeled images by removing the background. I found some OCR examples of doing that with bluring and subtraction. The same normalization steps could be run on smartphone images taken with a consistent background.

Hm. Ok, so 'white sheet of paper' would qualify I guess. Then only white parts would be hard :)

Maybe search Flickr/google images / imgur with the lego tag ? Or ask the lego subreddit?

Good ideas, maybe just make the app, explain that it will work lousy in the beginning but will get better over time?

In part 1 you mention that under vibration the bricks self assemble into bridges that obstruct the conveyor belt, and you want to minimize that.

How about maximizing it instead? Try to build bridges as long as possible? Or some other fitness function?

In theory you could control the vibration sequence and the part feeding sequence (once you have sorted all the parts :) ) and use an evolutionary algorithm...

That's a hilarious idea. I'm somewhat tempted to try this. I've managed to bridge 35 cm 'by accident', I'm pretty sure you could go substantially larger than that with relatively little work.

Something I've wanted to try for years is what about a Lego refraction distillation tower? Similar to how oil refineries use a tower to sort hydrocarbon compounds by length based on their volatility. Except in this case instead of using heat we would use air blowing upward in a long tube to make the pieces "volatile". Picture a 6" diameter clear tube stretching floor to ceiling with a powerful blower at the base. Or a smaller version could be prototyped using an air popcorn popper with the heating element disabled.

Essentially it would sort pieces according to their free-fall terminal velocity. You could exploit the Venturi effect to make the bottom of the tube have a higher velocity than the top, or simply add bleed-air holes at progressive levels. There could be take-off doors at different levels of the tube. A nice property is that if the ingress and egress mechanisms are designed elegantly, it could run continuously.

(Although, that last point is one reason I've never built it: it seems like there would be a fine line between "air powered sorter" and "air powered rock tumbler" and I worry it could wear the corners of all my Lego bricks if it goes wrong.)

Another potential issue is that there might not be much logical relationship between the pieces which share similar terminal velocities, and different pieces may have different or metastable terminal velocities depending on initial orientation.

Obviously this wouldn't be a perfect sorter, but it could be a good pre- or post- sorter for your visual system. And for my purposes it might work well since instead of adapting the system to suit my sorting preferences I could simply train my sorting expectations based on what I learn the machine tends to sort together.

Hmmm now that I have an industrial dust collector installed in my workshop, I actually already have the bulkiest / most expensive part of this invention. I just need a tall acrylic tube and I could try this!

How do asymmetry and angular momentum affect this? Or how do they affect terminal velocity in general? (For example, what can happen to a Lego plate which has a very different terminal velocity in one orientation than in another?)

> Judging by the increase in accuracy from 16K images to 60K images there is something of a diminishing rate of return here.

This is a well-known thing in machine learning. As far as I know all ML algorithms have this property, where the learning curve is basically logarithmic and each additive increase in performance requires multiplicatively more training material.

Pretty cool to keep seeing follow-ups on this.

Jacques, any plans to publish your models or image dataset somewhere?

Eventually yes, but there is a long way to go until it is complete. The software will be published as well.

A university has asked for copies of the images that I have right now to make a project out of it.

That's cool. Good luck, hope to see part 4!

I have a suggestion but am not a Lego guy. Maybe you can do a few things now that the sorter works, potentially using it to automate or minimize order fulfillment.

The idea is that if you can get a dataset of Lego packs / models you can use your inventory to make them. You can offer sales of the kits based on dynamic info. You can also provide inventory dynamically to users to purchase parts or upload a build manifest which will be filled.

I think the 2 huge assets you have are sorting for processing AND fulfillment, and real-time inventory levels of verified authentic parts.

Also, I would make some sort of contest, auction fun game where a person can win the whole pile of discarded stuff. Idk what's in there but I bet someone would want it.

I would pay good money to send you a big box of Lego from mixed sets and have it come back sorted into one bag per set.

You'd probably need the set numbers so that it'd be easier to choose which set to associate a part with.

I'm more than happy to be a beta customer!

With international shipping there and back I don't know if you'd still like that. But I'm game :) SaaS: Sorting as a Service.

I make enough trips across the Atlantic to make it work, at least for me.

To scale... I've always wanted to rent a cargo ship!

Well, feel free to drop by with a suitcase of Lego and I'll be happy to run the machine for you.

I can generate a cheaper shipping ticket for PostNL for you. Mail in profile

Mail sent.

You can really earn money by modifying this machine to sort various nuts and fruits by quality in the food industry...

On speeding up your feed, how about adding another belt prior to the main belt, with a variable speed?

You drop parts onto the variable speed belt from enough sources to get a good feed rate, and constantly vary the speed of that belt with video feedback to drop them onto the main belt with reasonably even spacing.

Now you have a bunch of pieces each with different classifications.

Possibly you could feed in parts lists from a huge amount of existing sets - have some sort of (handwavey learning thing or even markov chain), and generate mixes of pieces based on data sets from spaceships / cars / etc.

Why not sort Lego the same way we sort plasic in Norway? Watch this video. Drawing of the technique at 0:35 and shown from 0:52 https://youtu.be/Mi0FHNLkim0

I think the sorted photos are great art. Maybe more valuable than the contents ;)

Consider them public domain, if you want high res versions to print then just drop the .small. from the image filename and you should be good to go.

Technics and larger wheels are useful for Lego robotics (FLL) teams everywhere; certainly useful for our team.

Edit: this is in response to your asking for feedback about categories/demand.

Larger wheels is a good category to have. Anything larger than 42 mm diameter gets filtered out very early on so that's a trivial class to add.

Thank you!

You should extend it so it can assemble the pieces ;)

Hehe, ok, how about you do that bit and I'll stick to the sorting. The complexities of assembly are orders of magnitude larger for the general case than sorting.

One guy made a little Lego car factory:


Has Lego approached you at all? If they don't have similar in-house technology, I could see this adding value to them.

No, but I don't think I'm qualified to polish the shoes of the maintenance technicians that designed Lego's internal logistics, let alone measure myself with their engineers. There are some videos floating about of what Lego's manufacturing lines look like, very impressive stuff.

Impressive, but I'm still wondering what 2 tons of lego look like.

I really should make a picture of the storage, it's essentially a garage full of Lego. The garage is nearby, not at the same location where the machine is (that's at my house), I don't have room enough here to store that much stuff.

You should use https://www.cloudflare.com/. It's free and very easy to implement.

We detached this subthread from https://news.ycombinator.com/item?id=14654396 and marked it off-topic.

Yes, but I'm not a fan of the way in which several really large players (including CF) are dominating the internet. I think such monoculture is very bad. Besides that this was just a stupid mistake (forgot to resize the images) and easily fixed, if these had gone out through cloudflare chances are that I would not even have realized it until days later wasting everybody's bandwidth.

CloudFlare is getting rather large. I'd rather use a competitor. Does one exist?

EDIT: Some discussion on alternatives: https://news.ycombinator.com/item?id=11913330

Just a simple NGINX server with Letsencrypt with images running through a orgin-pulling CDN like CDN77 should survive the HN effect without having to rely on thirdparties like CloudFlare. Especially when you have a CMS that has good caching or static HTML pages.

Netlify has been getting some love for free hosting for static sites like this blog post.

Why exactly do you want to go to a competitor? CloudFlare is fantastic and they have a lot of Free services

1. CEO is an ass.

2. Supports Booters, script kiddies, and pay-to-play blackhat hackers

3. Fucks over Tor any chance they can get.


Proof for said allegations:

[Youtube video] https://www.youtube.com/watch?v=wW5vJyI_HcU

A: He continually berates because "he called and Brian Krebs never responded"

B: Cloudflare CEO states that the booters (ddos pay-as-you-go sites protected by cloudflare) don't even pay, or pay with stolen credit cards. And admits is "just a disaster".

C: Direct insult towards Krebs onstage "Well, who needs to actually ask questions as a journalist?" 48:08 [/Youtube Video]

https://news.ycombinator.com/item?id=12575047 look for user:eastdakota page text:"Yes, you can see Brian's critique of us here:"

The exchange starts at https://youtu.be/wW5vJyI_HcU?t=45m17s

I'm not sure I see this from your perspective; multiple people in the crowd clap after your point "C", and much more clapping is heard when he sits down. If I were forced to classify the behavior (edit:specifically at 48:08) I would choose "smartass". The entire discussion is interesting in light of what happened recently with Uber/Kalanick.

I was listening to the booing, shouts of disbelief and guffawing (and clapping) when he gave his zinger about journalists.

He was invited on stage, and continually berates Krebs. He could have did that graciously, but instead continually insults. He admits that when the Booters do pay, they're paid with stolen CCs. And then, finally attacks Krebs' standing as a Journalist.

Uh, no. Absolutely not. No way, no how.

I'll deal with Google, Amazon, Microsoft.. Whomever for a CDN. I'm not touching CloudFlare.

My impression was the crowd's response was more of a "sick burn" (Krebs himself smiles and nods in this manner) rather than booing, shouts of disbelief and guffawing (and clapping). If anyone is seriously considering avoiding CloudFlare based on your summary of the interaction, I encourage them to view the video for themselves.

I personally appreciate the willingness to be true to oneself with a quick-witted reply rather than 100% PR-safe professionalism all the time. However, I don't mean to imply that your impression somehow should change or that you should change your decision! My world would be so much nicer if I could just avoid working with people that rubbed me the wrong way.


Perhaps you're right.

But the content is about if CloudFlare will kick off booters off their platform. And he defends them, EVEN after saying they don't even pay.

This isn't a "Should we let the KKK march in the streets?", or "Should $sexist group be able to spread propaganda?". This issue is directly about people who harm the fabric of the Internet, and the companies that knowingly allow it and continue to propagate it.

I get defending controversial things in terms of defending Free Speech. But these skiddies want to precisely use their "Free Speech" and actions to deprive others of it.

No matter how fantastic and benevolent a company is at the moment, they always need competition to keep them in check and on their toes.

Pretty much. I also dislike how they handled the CloudBleed fiasco and how the CEO tried to offload some of the blame onto Google (incorrectly). https://news.ycombinator.com/item?id=13721644

I agree with you, and so do others -- some much more vehemently!


Why do we need competition in our society at all? /s

Fastly. Google sites.

Fastly does have a good reputation, but they are currently experiencing a global outage. https://news.ycombinator.com/item?id=14654231

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact