Hacker News new | past | comments | ask | show | jobs | submit login
I found two identical packs of Skittles among 468 packs (possiblywrong.wordpress.com)
826 points by bookofjoe on Apr 17, 2019 | hide | past | web | favorite | 207 comments

>So, what’s the point? Why bother with nearly three months of effort to collect this data? One easy answer is that I simply found it interesting. But I think a better answer is that this seemed like a great opportunity to demonstrate the predictive power of mathematics. A few months ago, we did some calculations on a cocktail napkin, so to speak, predicting that we should be able to find a pair of identical packs of Skittles with a reasonably– and perhaps surprisingly– small amount of effort. Actually seeing that effort through to the finish line can be a vivid demonstration for students of this predictive power of what might otherwise be viewed as “merely abstract” and not concretely useful mathematics.

>Actually seeing that effort through to the finish line can be a vivid demonstration for students of this predictive power of what might otherwise be viewed as “merely abstract” and not concretely useful mathematics.

This is how I feel about engineering.

If you design something, prove it out with logic and math. It will work.

Or you will learn where you were wrong.

>If you design something, prove it out with logic and math. It will work. >Or you will learn where you were wrong.

In other words, if you design something and prove it out with logic and math, sometimes it will work and sometimes it won't...

Honestly, pretty much. It has a better chance of working then just trying stuff though. For advanced stuff a thought out model can bring up success rate from 0.0001% to 50%. And when it fails, you will learn more from the failure.

Except that if you build something on a model that it is based on sound logic and math, it will have a much higher probability of working than otherwise.

Also, specific to Engineering, there are countless tools that help making sure your product will have a high probability of working as intended, anything from CAD tools, simulation tools (which it could be argued are a more automated way of applying logic and math), QA processes, etc.

Just on the point of simulations and not specifically being down on your suggestion of using them but thought people would be interested. I listened to a great podcast from Sean Brady, a forensic structural engineer, who said that if you can’t verify your simulation by doing the maths on paper then you shouldn’t be using it because you don’t understand how it is getting the results. He said that simulation tools allow engineers to make more mistakes faster than ever before because the simulations are complicated to set up and engineers tend to get distracted from what they are trying to simulate by tweaking the parameters until the simulation ‘compiles’ and feel they have achieved the correct result once that happens...

Doing the math by hand might work for some structural problems but isn't feasible for many domains

Yes and no...when learning about the LRFD (Load Resistance Factor Design) for structural engineering, we'd design a component (a portion of a floor, for example) and we'd size it for live loads (dancers), Dead Loads (piano), environmental loads, and NVH issues (the floor moving when people walked on it)

The last one there was more a comfort level thing...people like to not fall down to the next floor, but they also like to not feel AS IF they'd fall down to the next floor...to you overdesign to remove the vibration.

9 times out of 10, it was that NVH aspect that drove the design.

More that your model was wrong and your math needs to get better.


And for 'discovered' things, there is no reason to make a mistake.

However, most of the time you are working on something that is New.

>However, most of the time you are working on something that is New.

So I was just thinking about that today. I just bought one of those ridiculous 'incline' rowing machines; as you 'row' it lifts your body off the ground to provide resistance (in addition to the normal rowing motions) - I'd bet money this design lasts rather longer than the type of cheap rowing machine that uses gas pistons to provide the resistance.

There's all sorts of ordinary, everyday stuff that must be engineered to not come apart while using the absolute minimum of material (or the absolute cheapest and lightest of material; I imagine shipping was a significant portion of the cost to me)

I mean, that's the thing about engineering stuff. If you are willing to spend money on lots of high quality raw materials, even someone like me without much engineering background at all can design something that lasts just by massively over-speccing everything. The engineering skill comes in when you have to build a rowing machine that can stand up to vigorous use by a fifteen stone american, but they only give you a buck fifty worth of aluminum to work with.

I mean, I guess what I'm saying is that I look around my world and I see a bunch of stuff that must have taken a huge amount of engineering skill to make (I mean, to make at the price point that I got it at... to be functional in spite of the absolute minimum of raw materials) - would all that stuff be 'new' to an engineer?

I tell my students that the purpose of Engineering is not to build something stronger. It is to build something as weak as possible without breaking. Any baron can conscript a bunch of peasants and build a castle with walls 10 meters thick. But some will go bankrupt trying. Or, get an engineer to figure out how to build a wall 50 cm thick that can withstand trebuchet attacks.

If you're not minimizing something, you're not engineering.

"Among the hundreds of quotes that [Alice] Calaprice notes are misattributed to Einstein are many that are subtly debatable. Some are edited or paraphrased to sharpen or neaten the original. 'Everything should be made as simple as possible, but no simpler' might, says Calaprice, be a compressed version of lines from a 1933 lecture by Einstein: 'It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.'" From: https://www.nature.com/articles/d41586-018-05004-4

You forget "...to last 5 days longer than the warranty."

Really the blame for that lies with the public for not demanding a longer warranty.

If enough people were prepared to pay double to buy something with a 25 year warranty, I'm sure companies would step up and make those products. People don't want to pay enough for it though.

No warranty necessary. Reputation alone would be enough. At least in those cases where people care about a reputation for longevity.

Paying more now for a longer lasting product depends on how much you discount the future (ie effectively your internal interest rate), and on how likely you'll think that you'll even want to use that gadged in the future. Eg the NES was probably ridiculously over-engineered for most people, since realistically they'd only need to not fall apart until you buy a SNES or Playstation.

People do, though; There is a thriving market for (heavy and expensive) stuff built to last a really long time. Obviously, if you want your product to be light and/or cheap, especially if (as in my case) your light product is supporting something very heavy, some sacrifices will have to be made.

Thank you, honestly I've been struggling with a decision I made at work around array design ... But honestly based on this concept I can only defend the way I fell

How much was it? A Concept 2 rower will last 'forever' and they are only about $900.

Like 180 or so? Where I live, the major cost of owning stuff is storage, so something light and compact is worth a premium, which I am paying here in terms of reduced longevity compared to a better machine. I bought with the expectation that if it fell into disuse for any period of time, I would give it away rather than store it, and the expectation of a 12 month service life.

If I enjoyed selling things more, I probably would have gone for something more expensive anyhow, but I think the light weight is worth a lot in my situation.

But if it doesn't, you'll be able to trace the failure back to the flaw in the logic (or mistake in the math), which will give you a path forward to correct the failure. If you just try things and they don't work, it can be much less obvious where to go from there.

It's a good exercise for discovering that how right you think you are is mostly irrelevant.

Sometimes the model that you did the logic and math on really did describe the real world enough to predict success - sometimes it is wrong enough to matter.

As opposed to say, faith, where your explanation always works.

In theory, theory and practice are the same. In practice…

The stronger point is that if you design it without logic and math, it very likely won't work.

alternatively, it will work, but you won't know why it worked.

And so you might not know how to replicate it, yet alone adapt it.

With enough experience you might be able to build an informal, tacit model in your brain, though.

And sometimes it works, but users hate it.

> Beware of bugs in the above code; I have only proved it correct, not tried it.

– Donald E. Knuth (1977)

Why climb Everest? Because it was there!

Pursue your passions, whether it is climbing a huge mountain or finding identical packs of skittles. It is what makes the world great and interesting.

Can I ask the point of quoting a snippet of the article without comment?

I admit; I don't read most articles. I prefer the HNer snippets like this. I generally find HN comments to be more cogent and provocative than most editorial nonsense. So, thank you parent, for choosing and posting a snippet which you considered worthy.

I also don't read most articles posted on HN, although I voraciously read the comments. And that's because the comments are better and more interesting than the articles in most cases.

How would you know, the article is worse then? ;)

I used to read the articles more often. And sometimes I do click through.

Yes, it's pretty obvious that lots of HN readers don't actually read the article.

At this point, it should be an official HN rule: "do not read the article, go directly to the comments and speculate what the article is about based on the character-limited title"

For many of us, submitted articles are just social objects for provoking a discussion on particular topics. Also, sometimes the articles are really worth a read, other times they're garbage; usually they're something in-between. Going straight to comments is the fastest way to discover which is which.

Commenting without reading the article is fine. Speculating about what's in the article without having read it is a problem.

so we're reddit, just without the memes?

that does seem a bit disappointing. I think anyone simply quoting the article needs to explicitly show that that is the case because it was very misleading.

still this article was fascinating to me, from the idea that someone would go to the effort to the results of such effort. then top off my fascination with the idea of trying elsewhere in the country should there be more than one manufacturing point or trying to buy up a production lot as seeing the results of that as well.

Now how many packs of M&Ms? There are six colors there. If you go with the peanut version it probably is worse because they you have the variability of the peanuts which would make the chances of encountering packs with more variation in just the number of candies.

> so we're reddit, just without the memes?

Kind of? And with slightly higher average standards of discourse? And I believe this is actually a compliment.

In my experience, HN comments under article are almost always more useful and more informative than the original article. The same is the case with various subreddits. When I read, say, /r/SpaceX, I also immediately jump into comments, as there is better quality info there.

This applies to mainstream news stories in particular. On HN, there's a good chance you'll find someone who was - or knows someone who was - involved in the topic first-hand, and who then proceeds to debunk various nonsense a typical news story contains. That's a huge value-added.

> I think anyone simply quoting the article needs to explicitly show that that is the case because it was very misleading.

Sure, I think making it clear what text is quoted (and from where) should be an obvious rule. And it doesn't have anything to do with whether or not others read the article; it simply saves brain cycles trying to understand the comment.

It's weird to me how often people try to shoehorn aspects of the site into reddit/not-reddit. It has always been common to talk about some piece of tech that's related to the article, in a way that's not touched by the article itself and doesn't need it to be read first. That doesn't make those comments shallow, and I wouldn't put it in a list of distinguishing factors of reddit either. I agree with the idea that it's not a problem until you're speculating on the article, or raising a point that the article already addressed.

>so we're reddit, just without the memes?

There are memes here too, just not in the form of image macros, so the faux-intelligentsia here pretends they're better than those boorish rubes that frequent reddit

>> so we're reddit, just without the memes?

HN has memes. Turning unlikely things into SaaS, Rust/Crystal strike force and bikeshedding are all too real here.

And now apparently not reading source material has that potential as well. Which is kind of hilarious.

Well, TBF, thats the generally normal forum behaviour when you 'know your audience'

For example, know that most HNers are fairly informed individuals in tech. Thus, I trust that I am going to find a good deal of content and insightful comments without reading the article - because I can typically glean the gist of the article and the tidbits that would have interested me anyway, from the HN community.

> For example, know that most HNers are fairly informed individuals in tech

Ha. If that were true, why are there hourly front-page articles about "facebook is leaking your info!!!" and "google is evil!!", or my favorite "bitcoin {is,isn't} a great thing!!" ?

At the risk of angering folks and being downvoted/flagged, I think either there is a major disconnect between the types of people on HN that upvote articles and the people on HN who comment on articles, OR folks here are, on average, not as informed as you think they are.

Because if youre also well informed you should be able to filter those super easily!

Plenty of other sites exist that link to content. I don’t come to HN for that, I come to HN for the forum and their take on the content. Which in my opinion just isn’t matched by any other place these days. (Though if anyone wants to share a contrary oppinion I’d love to hear about those!)

I will go out on a limb and speculate that even more don't read the directions.

I thought it was the standout quote that would get people interested in this article.

Provides us (hn users) a place to discuss that thought.

In my experience, without this, there would have been a ton of "yeah, why bother?" comments on HN.

Sometimes it's an invitation to discuss an interesting part of the article.

This sounds like a version of the birthday paradox. The chance that two Skittles packs are identical is very small, and the chance that a Skittles pack is identical to at least one pack in a large group is still small. However, the chance that an identical pair exists in a large group is not small.

It's really hard for the human brain to comprehend what these type of long odds mean practically speaking. This is a great project that illustrates it rather well.

This is a big part why I like to play games of chance like poker and football simulators. It's a way of testing out your math-predictive skills.

I'd think that he could have also proven this with a lot less work by writing a simulation.

Simulation proves only the theory - sampling and experimentation proves the implementation. Otherwise it might be entirely possible for the packaging process to fill an exact match at a far higher rate - and for the mix to NOT be random.

But the experiment also does not prove much beyond that it was 488 in this particular one case or does it?

He also has 488 samples of skittle distributions which can be used to validate the model assumptions on their own.

Or to make a lot of kids happy.

Right, so he didn't really prove the implementation, he'd need to repeat his experiment a number of times. Trivial in a simulation, less so if you have to buy and count them.

It would not have been practical to design a simulation of a Skittles packaging machine, as the design of those machines is not public information.

How would he know what distribution to use for the simulation without a sufficiently large dataset to use as an example? A simulation of a made-up system is pretty worthless.

But a simulation would not be able to prove your assumptions correct and would do nothing but validate your math.

All your math can be correct but if your assumptions are incorrect, your results may not be accurate.

The simulation needn't use the same math. It does, however, help validate the assumptions -- that bags are filled randomly.

But to write his simulation, he just needs to fill virtual bags with random colors and count them, no math needed.

But "fill bags with random colors" is already basing your simulation on the assumption that the colors are uniformly distributed. And in fact, that seems to be one of the things he found incorrect in his actual experiment.

I'm so glad to see there are people who think like me. I have spent thousands of hours doing various projects like this.

The only difference is, I normally spend my time automating the process with computer vision (a skittle sorter/counter wouldn't be that hard to build; opening the bag is harder than identifying colors). And then I never really finish the project.

As someone who (also?) builds automated industrial machinery for a living (and agrees that opening the bag would be harder than picking colors), the challenge would be in improving reliability from a typical 97% to the nigh-impossible task of getting 27,000 operations correct in a row. What will you do with small pieces of skittle shell, or broken skittles that lack a shell, or 2 skittles that are bonded together, or shreds of red bag that got into the hopper? What will you do if the dye is running low, or is contaminated, and the color fades or shifts between bags? The original article had some rather difficult to automate, subjective criteria for handling these questions.

One thing to remember in the panic over automation killing off jobs is that handling the normal case is maybe 20% of the work. Handling routine errors is 80% of the work. Handling the errors that crop up once in a thousand parts - or dozens of times in this small sample - is why running lights-out is so difficult, and why we'll always need human operators and maintenance staff who can diagnose and correct unexpected problems.

That said, I'm at least 97% confident that OP, in manually counting and marking down these 27,000 items, mis-counted or mis-marked at least once. So I'm not feeling too bad about my pessimism regarding the automated solution. Even if it can't identify if a given sample is "gross" or not, it will add 1 to a counter reliably!

Also, one test of 468 bags that shows one result does not prove anything about the average number required. You'd need to run this test many times for that, and for that you'd need automation!

I work in biotech now using microscopy to understand diseases better. Same exact problems there- so much still depends on expert humans who can diagnose and deal with all the edge cases with ease. I see all these articles about amazing results using image recognition, and while some of the results are really good, they always depend on massively cleaned-up datasets.

In high school, our second year engineering classes required us to build a skittle sorter out of lego and some sensors/software.

The only requirement was a hopper for the teacher to dump the skittles into at the beginning. She would have a different amount for each group that was pre-counted and our numbers had to match hers at the end within a certain % of error.

It was a month+ long project and really fun to see each team's solution in how they moved/sorted/stored the final product. Some teams used long conveyor belts, others used short stacked belts, etc. It was all really interesting and I thoroughly enjoyed it!

Sounds like a great project.

Designing a hopper was a recently project for me and it was really fun. I learned a bunch about mass flow and realized that there are all sorts of interesting industrial problems involving hoppers.

That reminds me of this post about sorting 2 tons of lego: https://jacquesmattheij.com/sorting-two-metric-tons-of-lego/

that post was a revelation to me and led me down the path of making object detection classifiers for various projects. I was shocked at how easy it was to build a classifier that was as good as a human, but faster (or more accurate than a human in the same amount of time).

I ended up learning there is a whole area of computer vision in industry- relatively "boring" stuff like just looking at a line of bottles going by. I went on a tour of the money making machines in DC and they stream sheets of bills by industrial vision cameras to detect whether the bills are within QC.

Still amazes me people get paid to do this. To me it's just like a big fun game.

Ah, I forgot I had seen that before. yes, it's really nice. Pretty much what I was thinking. I've also been through the "hopper design process" (built a Galton board, needed to feed it a ton of balls, learned about mass flow in the process).

Yup, dump on a piece of paper and segment by contours, bright colours so super easy. Though you'd lose the nice skittle graphs which I think is half the fun of the process.

Well opening the packs is probably not that much of an effort compared to sorting/counting 27,000 skittles. Instead of 3 month you could break it down to one or two days

If you have to put the bag in the machine, it's not much more effort to tear it open. If you do that over a large funnel, that should work to get all the skittles into the counter most of the time.

I was thinking that time could be saved by not sorting the skittles so nicely. Just take a picture of the skittles from each bag spread out randomly. You should be able to figure out the color distribution just by counting pixels within each color range.

You do not even need to count the pixels. OpenCV will identify the objects, then their dominant colour can be determined.

actually ripping a small bag filled with stuff open reliably is a bit harder than you might think.

actually ripping a small bag filled with stuff open reliably is a bit harder than you might think.

"That's why we built Loppy the Chopper over there."

That's why I was saying I wouldn't try to automate that part. You don't need a super-reliable method if you're doing it by hand. On average, it probably takes a lot longer to count the skittles by hand than it does to open the bag, so most of the savings comes from the part that's easy to do reliably.

The experiment is great hands on math but I would have enjoyed a discussion of variance versus expected value and the difference between short and long term averages... it’s too easy to infer that everything is great because he was lucky enough to get within the target range but the likelihood of that occurring is not actually that high and was only implied by the shape of the Monte Carlo distribution in his previous post. When such experimental results are this conveniently “accurate”, amateurs in the audience may take away the kinds of wrong inferences which create “it’s a hot day so must be global warming” type of logical inaccuracies.

Author of the article here; this is a great point. This experiment initially stemmed from a nice analytical solution to the problem of computing the expected value (via generating functions as described in the post). Computing other moments, let alone the entire distribution, required some Monte Carlo simulation, as shown at the end of the first article (https://possiblywrong.wordpress.com/2019/01/09/identical-pac...) before I started the experiment.

And even this histogram assumes a distribution of total number of Skittles per pack (that varies) that I had to guess at beforehand. In hindsight, the final sample distribution suggests that I probably initially overestimated the true variance, and thus also overestimated the expected number of packs I would need to inspect. In other words, this experiment arguably took longer than "average."

So you're right-- this experiment could have extended into 700 packs, 800 packs... and still have been consistent with the assumed model, but I would have simply been in an unfortunate 90-th percentile possible universe where it took much longer than "average."

> From 12 January through 4 April, I worked my way through 13 boxes, for a total of 468 packs, at the approximate rate of six packs per day.

Honestly thought I was going to read a "just cause" blog on machine vision and process automation, e.g. 3 months to develop a functional prototype and train the system, 3 days to process 468 packs...and automated repeatability at the end of it all.

I assume most people read the title and we’re expecting machine vision. I definitely did within the first two seconds of reading.

Yeah as soon as I saw the title I made the assumption that it'd be automated.

I can't imagine doing this manually - though it'd take me more time to write a Machine Vision solution rather than just doing it manually. :P

This was really hard to read through because I kept dreaming up systems and applications which might automate the whole process.

This guy missed an excuse to build a Lego sorting machine....

Sorting machine?, this is hacker news, a visual AI that would count each color is the rage these days.

Killing a fly with an RPG is all the rage these days.. You can achieve that with simple computer vision techniques that have been here for decades

Or by counting them yourself

Lego Mindstorms has a color sensor, so theoretically you can build that AI in Lego.

I suspect the parent's suggestion is more along the lines of 1 snapshot that captures all Skittles in a pack, then ML object detection to classify and count outcome...processes an entire pack in a single shot, as opposed to 1 Skittle at a time.

> ML

Why? There is zero need for this. I know it's one of the buzzwords of the recent years, but identifying colors? Come on.

Why not? Is the problem really just naive color identification? It looks more like classifying uniformity, shape, size, orientation, character imprint, malformation, anomalous objects, etc. Can the problem be solved with a more traditional image processing toolbox? Sure, why not? But what fun is there if we're not swinging HN's favorite rubber mallet at the problem...and then doing it again and again just because.

Keeping with the context of the GP's remark, a solution is more about achieving process speed while minimizing error and human intervention in the face of uncertainty--i.e. single-shot at the package level, which is at least a 50x speed increase over individual mechanical sorting.

Except one of his criteria for whether or not to count a defect was if it was uniformly costed or not. So now you’re back to the sorting machine with a tumbler or multiple cameras.

With that additional requirement, consider the following naive 2-camera solution flow:

1. load packet contents on transparent substrate 2. short vibration to randomly disperse 3. image capture top and bottom 4. ML processing 5. dispose of load 6. goto step (1)

Cameras for this class of application are cheap, but if 1 camera is an explicit design requirement, then a simple motor with 90-degree camera mount to capture top, rotate 180-deg, capture bottom is simple enough as well (albeit less reliable given the introduction of a precision electromotive element to the system).

If alignment is critical, then error budget can be appreciably extended by marking corners of transparent rectangular platter with distinct color/shape on top and bottom to serve as fiducials in computer-aided skew correction/alignment, compensating for inevitable drift over time; it could also supplement human validation of archived images so orientation is easily determined.

>2. short vibration to randomly disperse 3. image capture top and bottom

So you're saying a machine and multiple cameras?

2 fixed cameras, of 1 fixed motor-mounted...take your pick? Suggest one?

Wouldn't really consider a discrete vibrator afixed to a solid platter a "machine" per se, but if you want to call it that, sure. I gave the problem 10 sec of back-of-the-envelope thought. 10 sec more suggests you can skip vibrator integration by selecting a more rigid transparent substrate, e.g. glass...Skittles would naturally rattle on that as a package is loaded. 10 sec more suggests you can constrain camera movement even further by using carefully oriented mirrors. How far down the rabbit hole would you like to go?

I think he means something to count a pack dumped on the table.

Stats is the science of sciences in my opinion, and anyone who brings it to life like this is awesome.

Have you seen Efficiency Is Everything's data on food?


Nope hadn’t seen that before. Thank you! Ironically I am not a big efficiency person but where I am very interested in efficiency is foods. Like what food has the best amino acid profile for the dollar, that kind of thing. And this appears to be that.

That site is amazing and engrossing. Thank you for this link - I think I'm going to loose myself in excel for a few hours. Nutrients/calorie and nutrients/dollar are incredibly useful metrics for anyone while shopping!

It's not really useful. It's extremely easy to get plenty of nutrients and calories extremely cheaply, but that's not what people are paying for when they shop for food.

Please share your data on everyone's shopping habits.

I take it you don't engage in strenuous physical activities far from grocers. For instance.

How does distance traveled relate to willingness to pay a higher price for food?

It’s almost as if it costs money to transport goods or something.

Awesome data, I really wish it was presented better. Screenshots of tabular data?! Nothing is searchable. Alignment is all over the place.

Any pages in particular? I tried to either have the excel file on page, or leave it in an HTML table.

Btw, thats my next big overhaul. How would you want the data? Javascript HTML table?

Yes, a clean html table would perfect. Sortable! There is just data everywhere, simple alignment would go a long way. Thanks for your hard work!

I wonder if it would have been worth it to build a device to scan the skittles for you.

edit: would love an explanation for the downvotes

I didn't downvote you but he did have a threshold for "this misshapen lump isn't a skittle"

Thanks. I also think people are latching onto the word "device" strongly. I really just meant some form of automation.

A device seems like overkill. Write code to apply machine vision to this grid: https://possiblywrong.files.wordpress.com/2019/04/skittles_a...

I think the fastest and most accurate process would be to:

* empty a pack of skittles onto white paper and take a photo

* use some existing image recognition libraries to count each color

* export to csv or smn and do excel magic

Even without an image recognition library, it should be possible to simply count the pixels and how close to red, green, blue, etc. they are to know how many skittles there are in the picture.

You'd probably want a jig for that, wouldn't you?

Hence, building.

It's interesting this person can program, but chose to manually count 27,740 Skittles instead of automate it in some way.

It feels like an almost-trivial computer vision problem.

Since no one seemed believed it's almost trivial, I went ahead and implemented it:


It took about an hour of Googling OpenCV questions. Also, I've never actually used OpenCV before, and don't use Python regularly.

I haven't tested it against the full set of images. Even if it's not perfect you could at least use this to narrow down ones that are within some threshold, then manually verify they were tagged correctly.

I'll post the code shortly.

...but it's almost trivial to count them as you go along, in addition to laying them out.

Have you verified the author's claim that this is the only match in the whole data set?

It's much quicker to dump the bag of Skittles out and snap a photo than to arrange them and count them.

My program isn't perfect, but I only spent a couple hours on it. I think it could be improved, and better lightning when taking the photos would help too.

However, I was curious how well it did so I compared it to the author's results. My program misclassifies approximately 80 out of 27,740 Skittles.

I searched for matches and it did indeed find the same match the author found:

    (0, './skittles/images/334.jpg', './skittles/images/464.jpg')
    (1, './skittles/images/103.jpg', './skittles/images/384.jpg')
    (1, './skittles/images/111.jpg', './skittles/images/134.jpg')
    (1, './skittles/images/118.jpg', './skittles/images/353.jpg')
    (1, './skittles/images/139.jpg', './skittles/images/168.jpg')
    (1, './skittles/images/152.jpg', './skittles/images/281.jpg')
    (1, './skittles/images/158.jpg', './skittles/images/244.jpg')
    (1, './skittles/images/198.jpg', './skittles/images/255.jpg')
    (1, './skittles/images/198.jpg', './skittles/images/334.jpg')
    (1, './skittles/images/198.jpg', './skittles/images/464.jpg')
    (1, './skittles/images/201.jpg', './skittles/images/338.jpg')
(first row is identical, subsequent rows are one different)

Since there are some errors, I did get lucky. However, if it's good enough, you could also manually verify ones that are close.

Anyway, personally I'd rather write code than manually count 27,740 Skittles, but I'm grateful the author did this because it's an interesting dataset to work with.

Author of the article here; I suppose I should confess that I did consider automating the counting, which did seem like an interesting problem... but I rejected the idea, for what I think is good reason: the sorting was the time-consuming part. If we skip the sorting, and depend on the code to get the counting right, and we assume that we are at worst off by one-- or more realistically, off by two, to account for the questionably-identifiable chunks of paste and such-- then how often do we need to go back for manual verification?

With hindsight (although up-front simulation bore this out as well), we see that we would have to go back and manually verify anywhere from 8% (off-by-one) to 25% (off-by-two); that's every fourth pack. Now consider how much more unpleasant that manual counting is, since the Skittles are haphazardly un-arranged in the image. In short, I'm unconvinced that an automated-- while similarly accurate-- accounting would be that much more efficient.

> the sorting was the time-consuming part

I'm not sure I understand why sorting is necessary.

> Now consider how much more unpleasant that manual counting is, since the Skittles are haphazardly un-arranged in the image.

It's actually very easy: my program outputs images annotated with each recognized Skittle circled with the guessed color: https://imgur.com/a/jlPWXRf You don't need to manually count, just make sure all the circles are the correct colors.

I'm also pretty confident you could improve the accuracy quite a bit by improving the lighting when taking the photos, and/or incorporating something more sophisticated like a neural network (probably trained with Skittles from several bags, separated by color+rejects, with many photos in different random arrangements)

> You don't need to manually count, just make sure all the circles are the correct colors.

I guess this takes me a while, and seems significantly more error-prone to me than the corresponding re-count when they are sorted. Granted, it's pretty easy when the Skittles are sorted as they are in these images. But how quickly can you scan this image: https://imgur.com/a/KpddGdH and check whether there are any errors? And how confident are you in that visual check?

You are right that this approach may be good enough to find a duplicate, which was the primary objective of the experiment. But I had hoped that this might also serve as a useful dataset for future student exercises, in probability, or even in just this sort of computer vision project... but I wanted to have accurate ground truth, so to speak. Inspecting your spreadsheet, it looks like this algorithm is still less than 95% accurate, even if we only evaluate the "clean" images with Uncounted=0.

Any update on posting the code?


It's definitely not perfect but could probably be improved a bit.

There are a few where strawberry and grape or strawberry and lemon are swapped. I think better lighting would help.


> I think better lighting would help.

That also demonstrates the difference between having the possibility to optimize input until the output matches the known values and having to process "real life" images made under different conditions.

The later is certainly more interesting: e.g. I dream of having an OCR that is "always right" no matter how bad the image is.

Controlled lighting is extremely helpful in computer vision and key to the operation of a lot of industrial uses of it. From a system design point of view it makes a lot of sense, but obviously it's not possible in many applications (and researchers obviously want to go after the toughest problems, and would rather mostly solve a hard problem than entirely solve an easier one).

Thank you, forked! I'll be learning from this.

FWIW I basically just cobbled together Stack Overflow answers :)

This is awesome.

What would it take to determine which brand (skittles, m&ms etc) and which sub-brand (tropical / peanut, etc) and how many calories are on the table?

In my head, you would determine the relative color difference of each piece to determine the brand, but I'm not 100% clear on how that kind of ai works.

Thanks. I have no idea if it's the "right" way to do this. It's the first computer vision problem I've done in a long time.

With better lighting you could probably distinguish between colors of different types of candy, assuming they're different enough. If you can determine the type of candy and know how many calories are in each piece it would be easy to count the calories, of course.

FYI all of your new comments are being marked as [dead] for some reason.

It's been done! A teenager built an M&M and Skittles sorting machine a few years ago. I expected this article to be based on that, but to my surprise, he counted them all manually.


To the author's credit though, doing it manually probably made more sense in his situation. It would take a few months to build the auto-sorter, which would've had to be tweaked to handle the misshapen blobs. The collection of Skittles photos is impressive too, and hopefully will end up in a Maths classroom.

Hard to imagine that process would be trivial especially since as he noted in the article he had to have criteria on when to count a skittle such as if it were too small, appeared unedible and other such characteristics. But after completing the manual way would be interesting to running programs on it to see how close they are from the actual numbers.

computer vision is trivial? when did that happen?and how was it made trivial?

I've done a few computer vision projects lately and it's really amazing what you can get done with opencv for relatively clean data. In this case you're trying to identify circles and classify their color into one of several groups. This is pretty common, the main challenge is cleaning up the raw data so that the hough transform runs quickly and accurately, and the color cluster identifier is robust.

If you're pretty good with Tensorflow, you could do it with an object detector. I've built absurdly accurate object detectors using the TF Object Detector tutorial and just a little hand labelled data plus some good synthetic augmentation.

For this project's scope and scale it's probably not worth automating unless you're going to be repeatedly running the process.

He's saying detecting skittle colors is "trivial" in the field of computer vision, and he would be correct. A 50 line Python script with a popular classifier library would solve this problem with high accuracy. This is perhaps the easiest problem to solve with CV and would be a great first tutorial for such libraries.


Counting the number of round colored objects on a white background is not a hard problem.

I think I've found a project for this evening...

I like practical statistics, for example when betting red/black in a casino, what is the chance that you would lose 10 times in a row ? Just keep doubling they say, but eventually you will get a bad streak and wont have enough money or you'll hit the limit.

If it landed on red 10 times in a row so far, the odds of landing another red on the next round are almost 50%. Gambler's fallacy is so much fun to observe at the tables though :)

Under the assumption that it is a fair table. So what is the likelihood that it is a fair table knowing that 1% of tables play 80% red?

In The Newtonian Casino (Might be called something else in the US) students who built computers in their shoes (in the 1970s!) to predict the outcome of roulette wheels discovered that lots of wheels had pretty severe tilts (which affected outcome) and that skilled croupiers had a "signature" -- they knew that particular wheel very well, and had practiced many times, and could increase their chance of hitting a particular number. (This is described in the chapter titled "Lady Luck").

It's a good book!


It's supposedly possible to predict to a reasonable degree of accuracy where a roulette ball will land if you know its speed (ie time it hits the wheel, and time to pass a predefined marker).


Also: Don't forget green. The 0 and 00 slices of the wheel are green because that's where the house makes their money.

I remember an experiment in high school math of some variety.

The teacher had us all "guess" what twenty coin flips would look like. The longest any streak any student wrote on their paper was maybe 4-ish in a row or something.

He then had us all actually flip the coins and record the results. One student had like 11 in a row, most hit a streak of somewhere between 5-8 of the same result.

Lesson learned, we're really bad at guessing 50/50 streaks.

Empirically, you should (~85% of the time) be seeing a streak of about 3-6; anything 8 or larger has a probability of about 1 in 20.

To a rough approximation, 20 coin flips gives you 16 chances to get a 1/16 probability event of 5 in a row of heads or tails, so the average number of 5-streaks is 1 per student. With 32 kids a class, add 5 more to the streak length to expect to see.

Big fan of skittles - never crossed my mind. Very cool! Perhaps try different kinds as another follow up? Machine learning and openCV can do the trick for counting without sorting.

So if I'm a statistical stupid what books do you recommend for the beginner?

Andrew Gelman has some pretty good books/lectures, but it might be too advanced.

This immediately made me wonder what the odds are of getting a bag of all the same skittles. My stats are rusty so feel free to correct me, but here's what I got:

(1/5)^59 = 1/1.73*10^41

Apparently ~200 million skittles are made daily, so at that rate, we might expect to get a monochrome bag of skittles after ~10^33 years.

Jesus Christ, apple and grape?!? Poor Americans... in the UK that's Lime and Blackcurrant.

Maybe I just have a really bad sense of taste, but I thought they were just "green" and "purple" (also in the UK).

Why poor Americans? It's purely a matter of taste.

Apple is culturally significant and Blackcurrants are banned in the US (due to then carrying a disease that would devastate one of the forests). So one is more important than lime and the other is totally unfamiliar.

To be fair, in America green skittles used to be Lime, too. I remember being quite disappointed when they switched to apple.

Blackcurrants are banned in the US? Do they not even have Ribena over there!?

You can consume blackcurrants in US. You can't grow it (in some states nowadays). Read more at https://en.wikipedia.org/wiki/Blackcurrant#History

The dataset itself is nice, but if weighting each pack would have been part of the procedure it would have been even more interesting, no? Anyway, this dataset is already in my favorite examples of illustrating a birthday attack for my crypto lectures ;)

The picture he shows of all the Skittles' counts looks like something I would've seen at the local museum. He missed his chance of taking an ultra high resolution photo of it and putting it on display:


that is a seriously skewed candy budget... do you have any teeth left?

OP didn't eat them, per the post.

He said, "Yeah, I learned from this experiment that I don’t actually like Skittles, which is probably good, so a lot of Skittles were bagged and handed off to relatives." To learn from this experiment that he doesnt like them, that means he must have started out eating them. But yes, he didn't eat them all (thank goodness for his teeth!)

Anyone do the math on this?

The cheapest 36 packs of 2.17oz bags I can find on Amazon are $20.99[1], which would cost $272.87 (plus tax?) for the 13 boxes used in the study.

[1]: https://www.amazon.com/Skittles-Original-Candy-2-17-Pack/dp/...

as one gets older, cavity becomes less common. sigh

This is brilliant. Takes a theory, iid colors, it turns out to be wrong, but still gets a hell of a lot of conclusions out of it.

Since this is a nerd site, the next step is to use this:


Build a Lego contraption to push some skittles through a sensor that counts them.

I don't think the author presented any evidence that invalidates the assumption of IID colors. They show a histogram of the different counts of each color, which seems more or less consistent with a uniform distribution. There are some fluctuations but they're not surprising; the Poisson error bars would be roughly +/- 0.16 in that figure. With error bars that large, it would be surprising if the data were in more exact agreement with the flat line; it's actually related to the same question that the author is examining ("what are the odds to observe all 5 colors at exactly their expected rate of 1/5 within measurement error?").

They do speculate that the number of candies per pack is not IID, i.e., that there are (anti)correlations from one pack to the next. But without knowing more about the packing process, and presumably also having some lot/serial number information for each pack, it would be pretty hard to establish this.

Cool thing to do.

This is similar tot the birthday problem: How many people do you need to have a probability of > x% that two are born on the same day?

It's something like 50 people to have a probability of > 80%. You can conduct this experiment at a school, using each class as a sample experiment to see if there are two pupils born on the same day.

Author of the article here-- right! This was the key "real world" motivation for this experiment as an attempt at a pedagogical tool; from the article:

> As an aside, I think the fact that this particular concrete application happens to be recreational, or even downright frivolous, is beside the point. For one thing, recreational mathematics is fun. But perhaps more importantly, there are useful, non-recreational, “real-world” applications of the same underlying mathematics. Cryptography is one such example application; this experiment is really just a birthday attack in slightly more complicated form.

Sometimes I find these concrete investigations necessary for our brains to make peace with the unreasonable effectiveness of mathematics, as it's been called.

I would say one of the first great discoveries for a person is the exponential series (a real world examples: population growth). Another is the divergence of the harmonic series 1/n and convergence of 1/n^2 (my preferred real world example: pizza slices that converge to 1 pizza or diverge to infinitely many). E.g. give me 1/n slices for the rest of my life and I'll pay you $100 (-:

When travelling, I also have go-to experiments that I like doing (e.g., elementary proofs that the earth is round/spherical such as: great circles; N-E-S-W always at 90 degrees; shadow angles [Erastothenes]; seasons; etc.)

There are other things to investigate that are not really "proofs" or "combinatorial evidence", but equally interesting. One example is using music (esp. the piano) as a physical logarithm device. The music "sounds" additive but the frequencies are multiplicative.

"In a room of just 23 people there’s a 50-50 chance of at least two people having the same birthday. In a room of 75 there’s a 99.9% chance of at least two people matching." https://betterexplained.com/articles/understanding-the-birth...

I love this and it demonstrates the kind of statistics I wish I was a lot better at.

I've just ordered a huge box of fun sized M&M packets and will try using computer vision to count them to copy this study from start to finish for learning purposes.

It's nice to read science like this without it being reduced to a once a year period of https://www.improbable.com/ig/ related links.

I love this. Hacking is not always practical. Let us have some fun. Or sweet!!!

Fun article. The Birthday Paradox was my introduction to recreational mathematics.


What a time to be alive!

This is 2 minutes paper. And I am your host.

Haha. So this ingenious method started with packet number 1 and examined thank you to the Patreon supporters John Jim and Richard now onto Packet number 456 where the color red was found. Thank you and please subscribe like and share.

Don't care. Kill green apple and bring back lime!

This is the real truth here.

Just out of curiosity, did you actually consume them?

OP said in the comments of his blog that he didn't actually consumed them, instead he gave to relatives.

Thanks, I skipped over the comments. Apparently, though, it appears that the OP may not have started immediately handing them out to his/her relatives:

"Yeah, I learned from this experiment that I don’t actually like Skittles, which is probably good, so a lot of Skittles were bagged and handed off to relatives. "

I would recommend openCV to anyone considering a similar experiment.

A few hours with a blob detection tutorial would have saved hours of tedium.

I hope he uses all the Skittles to decorate a wall or something :) otherwise it's a lot of wasted food

High fructose corn syrup is barely food.

I would have loved to see all the counts of each in piechart form with slices for each color :-)

What's the likelihood of getting a potato chip bag with only one chip?

Zero because QA :)

hmm, I'd disagree.

There is no formal definition of "a potato chip bag". Hence it's possible to "just" create "a single potato chip bag"...

Uh, congrats, I guess...

The best justification ever for making Skittles-based vodka :)

This is important and high quality research. It deserves an Ig Noble nomination.

..... Let's get this guy some cancer microscopy data!

I've always wanted to write an app that can take a picture of a poured out bag of skittles or M&Ms and not only count how much of each color, but use the relative color difference to tell you what brand (skittles/m&m) or sub-brand (regular, tropical, peanut, etc) and how many calories are on the table.

No reason, no monetary plan.


I posted it. I'm a 70-year-old retired (38 years experience) neurosurgical anesthesiologist. This is the first time in my life anyone has ever alluded to the possibility that I might be on the spectrum. Better late than never!

No ones mentioned this, but it's unlikely skittles are "randomly distributed". Instead it's likely in the Skittles factory there's some system that attempts to reasonably distribute colours so no one bag is too skewed. So the whole premise is faulty.

No one has mentioned it? Did you read the article? The article discusses exactly what you just said.

Where? I don't see it anywhere. There's some discussion of total volume perhaps being non-independent from box to box, so an underfilled box is followed by an overfilled one, but this isn't the point the grandparent is making. The grandparent is making the point that there are probably systems in place to prevent a box from being 80% red, say, so that the assumption each individual skittle in a box is independently uniformly drawn from each possible color does not likely accurately model the dynamics.

Author of the article here; you have a point that there is no explicit discussion of validating this assumption, beyond the variability shown in the colored curves in the "count per pack" plot.

Having said that, this small sample is indeed reasonably consistent (or at least not inconsistent) with that iid assumption for the color of each individual Skittle. We would not expect to see any 80+% red packs even assuming that color was perfectly uniformly iid, because the probability of observing such a pack is so small (less than 10^(-19)).

However, still assuming this model, we should expect to see packs with very small proportion of reds... and we do, with one pack having just 3 red Skittles, for example. The entire distribution of proportion of red follows the assumed binomial distribution very closely.

I would bet the Skittles factory has no such system - the data appears to be highly consistent with what you would expect from mixing the skittles well and making entirely random bags.

Kind of disappointed at the methodology. It would have been more impressive to turn this around in a day with ML.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact