Hacker News new | past | comments | ask | show | jobs | submit login

Reminds me of how the Allies estimated the production capacity of German tanks during WW2: https://medium.com/dataseries/how-data-science-gave-the-alli...



If you're interested, I referenced this and used the technique to estimate the quantity of STS SRBs recently:

https://space.stackexchange.com/questions/9261/how-many-soli...

Oh, no.... The date on that post is 2015... "recently"... I'm old...


My intuitive answer to that problem was to say that if we assume the captured serial numbers are randomly distributed, and the numbering starts at 1, then they will have the same average as all the numbers, so the estimate should be the average of captured serial numbers times 2. Which gives a result close to the formula used in this article, but not the same. I'm not sure where is the flaw.


If there are 100 tanks, and you get 1, 2, 5, and 99, your method would give 54 tanks ((1 + 2 + 5 + 99)/4 * 2), which is obviously wrong.

Your error is in stating "if we assume the captured serial numbers are randomly distributed" - you're assuming they're -uniformly- distributed. Randomly distributed != uniformly distributed.

Their method would give you 125 as a guess. It's including the known info (i.e., adding "m") to take into account the fact that they're not necessarily evenly distributed.

On that note, if you continued to get tanks at low numbers (3, 4, 6, etc), averaging gets -less- accurate, because that 99 becomes more and more of an outlier. Their method gets MORE accurate, again, because they're taking advantage of all data that is known (we know it goes at least to 99), and averaging doesn't. The new low numbers we've added mean that there are less likely to be many tanks, and the formula in the link takes that into account with m/k.

Both methods will be accurate if you have 100% of the data, but taking twice the average ignores known data, so the sparser the data the less likely it is to be correct.


Hmmm, on the other hand, suppose you first find a tank with the serial number 1234.

Then the next 50 tanks you find are all from the range [1, 100].

Is it more reasonable to assume that there are around 1258 tanks, or that there are probably closer to 100 tanks, and that first one with the very large serial number was not a sequentially numbered tank?


Certainly!

But, from the article's initial proposition - "You do know that the Germans have a sequential numbering system (1, 2, …, n)" and in giving historical context "On investigation, it became clear that the serial numbers were sequential, without gaps."

So, yes, without that being a prior, of course it's more likely that that outlier is a strange one off, and you'd do better to exclude it from your data set (and/or continue to investigate, because it's NOT at all clear that the serial numbers are sequential yet).

But, that context and ordering matters. Assume just the opposite series of events - you started by finding 50 tanks with serial numbers [1, 100]. And then three or four months go by you didn't get any tank serials sent to you. And then you get 1234. 1258 tanks seems really reasonable at that point (and, in fact, would fit the reality; the Germans were producing ~256 tanks per month per the article).


Great read!

In comparison, I have to wonder why the "intelligence estimates" were so bad/severe over estimates.


My intuition says counter-intelligence. The Allies were using intercepted communications, visual confirmation, captured sources, etc.

You can send fake reports if you think the other side is listening. You can move or mock material if you think the other side is watching. You can feed false information if you are captured.


And this is why you may be better off solving the “serial number problem” not by going to entirely random ones but instead change to one that implies false data that you want the Enemy to find.


Thank you! Great read, indeed!


paywalled for me.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: