
I found two identical packs of Skittles among 468 packs - bookofjoe
https://possiblywrong.wordpress.com/2019/04/06/follow-up-i-found-two-identical-packs-of-skittles-among-468-packs-with-a-total-of-27740-skittles/
======
fouc
>So, what’s the point? Why bother with nearly three months of effort to
collect this data? One easy answer is that I simply found it interesting. But
I think a better answer is that this seemed like a great opportunity to
demonstrate the predictive power of mathematics. A few months ago, we did some
calculations on a cocktail napkin, so to speak, predicting that we should be
able to find a pair of identical packs of Skittles with a reasonably– and
perhaps surprisingly– small amount of effort. Actually seeing that effort
through to the finish line can be a vivid demonstration for students of this
predictive power of what might otherwise be viewed as “merely abstract” and
not concretely useful mathematics.

~~~
indigodaddy
Can I ask the point of quoting a snippet of the article without comment?

~~~
ademup
I admit; I don't read most articles. I prefer the HNer snippets like this. I
generally find HN comments to be more cogent and provocative than most
editorial nonsense. So, thank you parent, for choosing and posting a snippet
which you considered worthy.

~~~
chasontherobot
Yes, it's pretty obvious that lots of HN readers don't actually read the
article.

~~~
craftyguy
At this point, it should be an official HN rule: "do not read the article, go
directly to the comments and speculate what the article is about based on the
character-limited title"

~~~
TeMPOraL
For many of us, submitted articles are just social objects for provoking a
discussion on particular topics. Also, sometimes the articles are really worth
a read, other times they're garbage; usually they're something in-between.
Going straight to comments is the fastest way to discover which is which.

Commenting without reading the article is fine. Speculating about what's in
the article without having read it is a problem.

~~~
Shivetya
so we're reddit, just without the memes?

that does seem a bit disappointing. I think anyone simply quoting the article
needs to explicitly show that that is the case because it was very misleading.

still this article was fascinating to me, from the idea that someone would go
to the effort to the results of such effort. then top off my fascination with
the idea of trying elsewhere in the country should there be more than one
manufacturing point or trying to buy up a production lot as seeing the results
of that as well.

Now how many packs of M&Ms? There are six colors there. If you go with the
peanut version it probably is worse because they you have the variability of
the peanuts which would make the chances of encountering packs with more
variation in just the number of candies.

~~~
TeMPOraL
> _so we 're reddit, just without the memes?_

Kind of? And with slightly higher average standards of discourse? And I
believe this is actually a compliment.

In my experience, HN comments under article are almost always more useful and
more informative than the original article. The same is the case with various
subreddits. When I read, say, /r/SpaceX, I also immediately jump into
comments, as there is better quality info there.

This applies to mainstream news stories in particular. On HN, there's a good
chance you'll find someone who was - or knows someone who was - involved in
the topic first-hand, and who then proceeds to debunk various nonsense a
typical news story contains. That's a huge value-added.

> _I think anyone simply quoting the article needs to explicitly show that
> that is the case because it was very misleading._

Sure, I think making it clear what text is quoted (and from where) should be
an obvious rule. And it doesn't have anything to do with whether or not others
read the article; it simply saves brain cycles trying to understand the
comment.

------
dekhn
I'm so glad to see there are people who think like me. I have spent thousands
of hours doing various projects like this.

The only difference is, I normally spend my time automating the process with
computer vision (a skittle sorter/counter wouldn't be that hard to build;
opening the bag is harder than identifying colors). And then I never really
finish the project.

~~~
yots
You might enjoy this :) [https://willemm.nl/mm-skittles-sorting-
machine/](https://willemm.nl/mm-skittles-sorting-machine/)

~~~
Insanity
That reminds me of this post about sorting 2 tons of lego:
[https://jacquesmattheij.com/sorting-two-metric-tons-of-
lego/](https://jacquesmattheij.com/sorting-two-metric-tons-of-lego/)

~~~
dekhn
that post was a revelation to me and led me down the path of making object
detection classifiers for various projects. I was shocked at how easy it was
to build a classifier that was as good as a human, but faster (or more
accurate than a human in the same amount of time).

I ended up learning there is a whole area of computer vision in industry-
relatively "boring" stuff like just looking at a line of bottles going by. I
went on a tour of the money making machines in DC and they stream sheets of
bills by industrial vision cameras to detect whether the bills are within QC.

Still amazes me people get paid to do this. To me it's just like a big fun
game.

------
eriktrautman
The experiment is great hands on math but I would have enjoyed a discussion of
variance versus expected value and the difference between short and long term
averages... it’s too easy to infer that everything is great because he was
lucky enough to get within the target range but the likelihood of that
occurring is not actually that high and was only implied by the shape of the
Monte Carlo distribution in his previous post. When such experimental results
are this conveniently “accurate”, amateurs in the audience may take away the
kinds of wrong inferences which create “it’s a hot day so must be global
warming” type of logical inaccuracies.

~~~
possiblywrong
Author of the article here; this is a great point. This experiment initially
stemmed from a nice analytical solution to the problem of computing the
expected value (via generating functions as described in the post). Computing
other moments, let alone the entire distribution, required some Monte Carlo
simulation, as shown at the end of the first article
([https://possiblywrong.wordpress.com/2019/01/09/identical-
pac...](https://possiblywrong.wordpress.com/2019/01/09/identical-packs-of-
skittles/)) before I started the experiment.

And even this histogram assumes a distribution of _total_ number of Skittles
per pack (that varies) that I had to guess at beforehand. In hindsight, the
final sample distribution suggests that I probably initially overestimated the
true variance, and thus also _overestimated_ the expected number of packs I
would need to inspect. In other words, this experiment arguably took _longer_
than "average."

So you're right-- this experiment _could_ have extended into 700 packs, 800
packs... and still have been consistent with the assumed model, but I would
have simply been in an unfortunate 90-th percentile possible universe where it
took much longer than "average."

------
metaphor
> _From 12 January through 4 April, I worked my way through 13 boxes, for a
> total of 468 packs, at the approximate rate of six packs per day._

Honestly thought I was going to read a "just cause" blog on machine vision and
process automation, e.g. 3 months to develop a functional prototype and train
the system, 3 days to process 468 packs...and automated repeatability at the
end of it all.

~~~
SlowRobotAhead
I assume most people read the title and we’re expecting machine vision. I
definitely did within the first two seconds of reading.

~~~
Insanity
Yeah as soon as I saw the title I made the assumption that it'd be automated.

I can't imagine doing this manually - though it'd take me more time to write a
Machine Vision solution rather than just doing it manually. :P

------
joemaller1
This was really hard to read through because I kept dreaming up systems and
applications which might automate the whole process.

------
bakul
This guy missed an excuse to build a Lego sorting machine....

~~~
pixl97
Sorting machine?, this is hacker news, a visual AI that would count each color
is the rage these days.

~~~
DCoder
Lego Mindstorms has a color sensor, so theoretically you _can_ build that AI
in Lego.

~~~
metaphor
I suspect the parent's suggestion is more along the lines of 1 snapshot that
captures all Skittles in a pack, then ML object detection to classify and
count outcome...processes an entire pack in a single shot, as opposed to 1
Skittle at a time.

~~~
rplnt
> ML

Why? There is zero need for this. I know it's one of _the_ buzzwords of the
recent years, but identifying colors? Come on.

~~~
metaphor
Why not? Is the problem _really_ just naive color identification? It looks
more like classifying uniformity, shape, size, orientation, character imprint,
malformation, anomalous objects, etc. Can the problem be solved with a more
traditional image processing toolbox? Sure, why not? But what fun is there if
we're not swinging HN's favorite rubber mallet at the problem...and then doing
it again and again just because.

Keeping with the context of the GP's remark, a solution is more about
achieving process speed while minimizing error and human intervention in the
face of uncertainty--i.e. single-shot at the package level, which is at least
a 50x speed increase over individual mechanical sorting.

------
dmitryminkovsky
Stats is the science of sciences in my opinion, and anyone who brings it to
life like this is awesome.

~~~
robertAngst
Have you seen Efficiency Is Everything's data on food?

[https://efficiencyiseverything.com/food/](https://efficiencyiseverything.com/food/)

~~~
ed312
That site is amazing and engrossing. Thank you for this link - I think I'm
going to loose myself in excel for a few hours. Nutrients/calorie and
nutrients/dollar are incredibly useful metrics for anyone while shopping!

~~~
hopler
It's not really useful. It's extremely easy to get plenty of nutrients and
calories extremely cheaply, but that's not what people are paying for when
they shop for food.

~~~
_jal
Please share your data on everyone's shopping habits.

I take it you don't engage in strenuous physical activities far from grocers.
For instance.

~~~
hopler
How does distance traveled relate to willingness to pay a higher price for
food?

~~~
benaiah
It’s almost as if it costs money to transport goods or something.

------
city41
I wonder if it would have been worth it to build a device to scan the skittles
for you.

edit: would love an explanation for the downvotes

~~~
omarchowdhury
A device seems like overkill. Write code to apply machine vision to this grid:
[https://possiblywrong.files.wordpress.com/2019/04/skittles_a...](https://possiblywrong.files.wordpress.com/2019/04/skittles_all.png)

~~~
marmshallow
I think the fastest and most accurate process would be to:

* empty a pack of skittles onto white paper and take a photo

* use some existing image recognition libraries to count each color

* export to csv or smn and do excel magic

~~~
laurent123456
Even without an image recognition library, it should be possible to simply
count the pixels and how close to red, green, blue, etc. they are to know how
many skittles there are in the picture.

------
tlrobinson
It's interesting this person can program, but chose to manually count 27,740
Skittles instead of automate it in some way.

It feels like an almost-trivial computer vision problem.

~~~
tayo42
computer vision is trivial? when did that happen?and how was it made trivial?

~~~
dekhn
I've done a few computer vision projects lately and it's really amazing what
you can get done with opencv for relatively clean data. In this case you're
trying to identify circles and classify their color into one of several
groups. This is pretty common, the main challenge is cleaning up the raw data
so that the hough transform runs quickly and accurately, and the color cluster
identifier is robust.

If you're pretty good with Tensorflow, you could do it with an object
detector. I've built absurdly accurate object detectors using the TF Object
Detector tutorial and just a little hand labelled data plus some good
synthetic augmentation.

For this project's scope and scale it's probably not worth automating unless
you're going to be repeatedly running the process.

------
z3t4
I like practical statistics, for example when betting red/black in a casino,
what is the chance that you would lose 10 times in a row ? Just keep doubling
they say, but eventually you will get a bad streak and wont have enough money
or you'll hit the limit.

~~~
sharkweek
I remember an experiment in high school math of some variety.

The teacher had us all "guess" what twenty coin flips would look like. The
longest any streak any student wrote on their paper was maybe 4-ish in a row
or something.

He then had us all actually flip the coins and record the results. One student
had like 11 in a row, most hit a streak of somewhere between 5-8 of the same
result.

Lesson learned, we're really bad at guessing 50/50 streaks.

~~~
saagarjha
Empirically, you should (~85% of the time) be seeing a streak of about 3-6;
anything 8 or larger has a probability of about 1 in 20.

------
praeconium
Big fan of skittles - never crossed my mind. Very cool! Perhaps try different
kinds as another follow up? Machine learning and openCV can do the trick for
counting without sorting.

------
towndrunk
So if I'm a statistical stupid what books do you recommend for the beginner?

~~~
RobertDeNiro
Andrew Gelman has some pretty good books/lectures, but it might be too
advanced.

------
gibspaulding
This immediately made me wonder what the odds are of getting a bag of all the
same skittles. My stats are rusty so feel free to correct me, but here's what
I got:

(1/5)^59 = 1/1.73*10^41

Apparently ~200 million skittles are made daily, so at that rate, we might
expect to get a monochrome bag of skittles after ~10^33 years.

------
jordn
Jesus Christ, apple and grape?!? Poor Americans... in the UK that's Lime and
Blackcurrant.

~~~
kosievdmerwe
Why poor Americans? It's purely a matter of taste.

Apple is culturally significant and Blackcurrants are banned in the US (due to
then carrying a disease that would devastate one of the forests). So one is
more important than lime and the other is totally unfamiliar.

~~~
SmellyGeekBoy
Blackcurrants are _banned_ in the US? Do they not even have Ribena over
there!?

~~~
avaika
You can consume blackcurrants in US. You can't grow it (in some states
nowadays). Read more at
[https://en.wikipedia.org/wiki/Blackcurrant#History](https://en.wikipedia.org/wiki/Blackcurrant#History)

------
kerouanton
The dataset itself is nice, but if weighting each pack would have been part of
the procedure it would have been even more interesting, no? Anyway, this
dataset is already in my favorite examples of illustrating a birthday attack
for my crypto lectures ;)

------
Spectral
The picture he shows of all the Skittles' counts looks like something I
would've seen at the local museum. He missed his chance of taking an ultra
high resolution photo of it and putting it on display:

[https://possiblywrong.wordpress.com/2019/04/06/follow-up-
i-f...](https://possiblywrong.wordpress.com/2019/04/06/follow-up-i-found-two-
identical-packs-of-skittles-among-468-packs-with-a-total-
of-27740-skittles/#jp-carousel-3199)

------
justaaron
that is a seriously skewed candy budget... do you have any teeth left?

~~~
atdrummond
OP didn't eat them, per the post.

~~~
heavymark
He said, "Yeah, I learned from this experiment that I don’t actually like
Skittles, which is probably good, so a lot of Skittles were bagged and handed
off to relatives." To learn from this experiment that he doesnt like them,
that means he must have started out eating them. But yes, he didn't eat them
all (thank goodness for his teeth!)

------
lordnacho
This is brilliant. Takes a theory, iid colors, it turns out to be wrong, but
still gets a hell of a lot of conclusions out of it.

Since this is a nerd site, the next step is to use this:

[https://www.planet-gbc.com/](https://www.planet-gbc.com/)

Build a Lego contraption to push some skittles through a sensor that counts
them.

~~~
cshimmin
I don't think the author presented any evidence that invalidates the
assumption of IID colors. They show a histogram of the different counts of
each color, which seems more or less consistent with a uniform distribution.
There are some fluctuations but they're not surprising; the Poisson error bars
would be roughly +/\- 0.16 in that figure. With error bars that large, it
would be surprising if the data were in more exact agreement with the flat
line; it's actually related to the same question that the author is examining
("what are the odds to observe all 5 colors at exactly their expected rate of
1/5 within measurement error?").

They do speculate that the number of candies per pack is not IID, i.e., that
there are (anti)correlations from one pack to the next. But without knowing
more about the packing process, and presumably also having some lot/serial
number information for each pack, it would be pretty hard to establish this.

------
mikorym
Cool thing to do.

This is similar tot the birthday problem: How many people do you need to have
a probability of > x% that two are born on the same day?

It's something like 50 people to have a probability of > 80%. You can conduct
this experiment at a school, using each class as a sample experiment to see if
there are two pupils born on the same day.

~~~
possiblywrong
Author of the article here-- right! This was the key "real world" motivation
for this experiment as an attempt at a pedagogical tool; from the article:

> As an aside, I think the fact that this particular concrete application
> happens to be recreational, or even downright frivolous, is beside the
> point. For one thing, recreational mathematics is fun. But perhaps more
> importantly, there are useful, non-recreational, “real-world” applications
> of the same underlying mathematics. Cryptography is one such example
> application; this experiment is really just a birthday attack in slightly
> more complicated form.

~~~
mikorym
Sometimes I find these concrete investigations necessary for our brains to
make peace with the _unreasonable effectiveness of mathematics_ , as it's been
called.

I would say one of the first great discoveries for a person is the exponential
series (a real world examples: population growth). Another is the divergence
of the harmonic series 1/n and convergence of 1/n^2 (my preferred real world
example: pizza slices that converge to 1 pizza or diverge to infinitely many).
E.g. give me 1/n slices for the rest of my life and I'll pay you $100 (-:

When travelling, I also have go-to experiments that I like doing (e.g.,
elementary proofs that the earth is round/spherical such as: great circles;
N-E-S-W always at 90 degrees; shadow angles [Erastothenes]; seasons; etc.)

There are other things to investigate that are not really "proofs" or
"combinatorial evidence", but equally interesting. One example is using music
(esp. the piano) as a physical logarithm device. The music "sounds" additive
but the frequencies are multiplicative.

------
jackvalentine
I love this and it demonstrates the kind of statistics I wish I was a lot
better at.

I've just ordered a huge box of fun sized M&M packets and will try using
computer vision to count them to copy this study from start to finish for
learning purposes.

------
Zenst
It's nice to read science like this without it being reduced to a once a year
period of [https://www.improbable.com/ig/](https://www.improbable.com/ig/)
related links.

------
ngcc_hk
I love this. Hacking is not always practical. Let us have some fun. Or
sweet!!!

------
User23
Fun article. The Birthday Paradox was my introduction to recreational
mathematics.

------
megous
60kg.

------
mindfulplay
What a time to be alive!

~~~
HNLurker2
This is 2 minutes paper. And I am your host.

~~~
mindfulplay
Haha. So this ingenious method started with packet number 1 and examined thank
you to the Patreon supporters John Jim and Richard now onto Packet number 456
where the color red was found. Thank you and please subscribe like and share.

------
dblohm7
Don't care. Kill green apple and bring back lime!

~~~
thebouv
This is the real truth here.

------
pseudolus
Just out of curiosity, did you actually consume them?

~~~
m45t3r
OP said in the comments of his blog that he didn't actually consumed them,
instead he gave to relatives.

~~~
pseudolus
Thanks, I skipped over the comments. Apparently, though, it appears that the
OP may not have started immediately handing them out to his/her relatives:

"Yeah, I learned from this experiment that I don’t actually like Skittles,
which is probably good, so a lot of Skittles were bagged and handed off to
relatives. "

------
elif
I would recommend openCV to anyone considering a similar experiment.

A few hours with a blob detection tutorial would have saved hours of tedium.

------
mtw
I hope he uses all the Skittles to decorate a wall or something :) otherwise
it's a lot of wasted food

~~~
acct1771
High fructose corn syrup is barely food.

------
samstave
I would have loved to see all the counts of each in piechart form with slices
for each color :-)

------
lanius
What's the likelihood of getting a potato chip bag with only one chip?

~~~
always4getpass
Zero because QA :)

~~~
xxs
hmm, I'd disagree.

There is no formal definition of "a potato chip bag". Hence it's possible to
"just" create "a single potato chip bag"...

------
krick
Uh, congrats, I guess...

------
zie1ony
The best justification ever for making Skittles-based vodka :)

------
atrn
This is important and high quality research. It deserves an Ig Noble
nomination.

------
subcosmos
..... Let's get this guy some cancer microscopy data!

------
MentallyRetired
I've always wanted to write an app that can take a picture of a poured out bag
of skittles or M&Ms and not only count how much of each color, but use the
relative color difference to tell you what brand (skittles/m&m) or sub-brand
(regular, tropical, peanut, etc) and how many calories are on the table.

No reason, no monetary plan.

------
personjerry
No ones mentioned this, but it's unlikely skittles are "randomly distributed".
Instead it's likely in the Skittles factory there's some system that attempts
to reasonably distribute colours so no one bag is too skewed. So the whole
premise is faulty.

~~~
jdreaver
No one has mentioned it? Did you read the article? The article discusses
exactly what you just said.

~~~
Chinjut
Where? I don't see it anywhere. There's some discussion of total volume
perhaps being non-independent from box to box, so an underfilled box is
followed by an overfilled one, but this isn't the point the grandparent is
making. The grandparent is making the point that there are probably systems in
place to prevent a box from being 80% red, say, so that the assumption each
individual skittle in a box is independently uniformly drawn from each
possible color does not likely accurately model the dynamics.

~~~
possiblywrong
Author of the article here; you have a point that there is no explicit
discussion of validating this assumption, beyond the variability shown in the
colored curves in the "count per pack" plot.

Having said that, this small sample is indeed reasonably consistent (or at
least not inconsistent) with that iid assumption for the color of each
individual Skittle. We would not _expect_ to see any 80+% red packs even
assuming that color was perfectly uniformly iid, because the probability of
observing such a pack is so small (less than 10^(-19)).

However, still assuming this model, we _should_ expect to see packs with very
_small_ proportion of reds... and we do, with one pack having just 3 red
Skittles, for example. The entire distribution of proportion of red follows
the assumed binomial distribution very closely.

------
rorykoehler
Kind of disappointed at the methodology. It would have been more impressive to
turn this around in a day with ML.

