
The Missing 11th of the Month (2015) - shawndumas
http://drhagen.com/blog/the-missing-11th-of-the-month/
======
Unknoob
This was a really fun read. The explanation for the 1860s error rate peak blew
my mind. I wonder how many other characters could generate similar errors and
how long it will take for us to notice the patterns.

~~~
drhagen
The only reason anyone noticed the error with the 11th of the month was
because we had enough similar phrases (8th, 12th, etc.) that we could make a
statistical argument that something was amiss. It would be hard to even detect
most other errors in the database.

------
drhagen
Oh hello HN. I'm the author of the linked article. Someone texted me that I
was on the front page of HN, though I couldn't guess what for. I guess I'll
stick around a bit and answer questions.

------
mst
Note that, the article being from 2015, the promised future post for the 23rd
is already live at [http://drhagen.com/blog/the-missing-23rd-of-the-
month/](http://drhagen.com/blog/the-missing-23rd-of-the-month/) and
differently fascinating.

------
indigochill
There are some other amusing typographical glitches. For instance, I was once
curious about the origin of "badass" so I searched Google Books for usages. I
got several hits from the 19th century and was excited at my historical
discovery until I checked them and saw Google was just misreading things.

------
p1mrx
Coincidentally, this past 11th was the first "missing day" of my life,
according to the local clocks: I flew out of SFO Friday night, and landed in
SIN Sunday morning.

------
interfixus
Oh yes. I grew up in the sixties, and I pounded my father's portable Hermes
since long before I could actually write. Transitioning to computers in the
early eighties, it took me unreasonable amounts of time and effort to unlearn
my muscle memory and stop typing l and o for 1 and 0.

Also, in seventh grade I took a typewriting class. Since school start I had
turned in absolutely all the homework I could get away with in neatly
typewritten form, while I got the impression that none of my classmates had
ever even gone near such a wrting machine. I expected great things!

Alas, I flunked. The only one to do so, and hugely. I still typed better than
anyone else, but I never got the hang of touch-typing. Sad to say, I haven't
to this day. I seem to be stuck with the method developed by four year old me.

~~~
eru
I converted to Dvorak out of general geekery a few years ago. But since I did
that one a 'normal' keyboard, it essentially forced me to learn how to touch
type.

(If you want to give it a try, have a look at Colemak, I hear great things
about it on HN every once in a while. Or Neo2 layout, if you need non-English
characters.)

~~~
adrianN
For true geekery, compile a corpus of text that you typed and generate a good
layout for your inputting needs using a genetic algorithm or something like
that. There is some code on github that you can steal.

~~~
eru
Yeah, you could do that. Dvorak has the advantage of being comparatively
mainstream, it even comes with random Windows computers.

Also by now I have a keyboard that speaks dvorak natively. The Kinesis
Advantage. Of course, that keyboard is programmable enough that I could make
it speak custom layouts too, or just fix it in the OS.

------
jbclements
Okay, this is going to sound mean, but this is like the _definition_ of
p-hacking. When you look at 30 values, you simply can't be surprised that one
of them is lower than the mean, with a p value around 1/20th. Use something
like a Bonferroni correction, to get a significance level of 1/600\. Does the
result still stand up? In fact, there's an xkcd about this very topic.
[https://www.xkcd.com/882/](https://www.xkcd.com/882/)

~~~
drhagen
I completely agree that it's important to take this kind of thing into account
when approaching a problem like this. As I say in the post, "There are 31 days
and one of them _has_ to be smallest. Maybe the 11th isn’t an outlier; it’s
just on the smaller end and our eyes are picking up on a pattern that doesn’t
exist."

I'll admit that a straight p-value is not the appropriate statistic here. I
don't even know how what the perfect statistic for this problem is. A
Bonferroni correction is not enough because not only is the 11th of the month
the lowest for a particular year--it's the lowest for every year.

I was convinced that this was real when I looked at the first line graph of
the post. The 11th is the lowest either every year or almost every year, being
3-5 standard deviations below the mean for the bulk of the last 200 years.
That just can't happen by chance no matter how you slice it.

If anyone knows the proper way to calculate a statistic on something like
this, I would love to hear about it.

