
English Letter Frequency Counts: Mayzner Revisited or ETAOIN SRHLDCU (2013) - signa11
http://norvig.com/mayzner.html
======
gumby
In the not too distant past you could earn the PhD by writing a concordance
(massive index, essentialy) of every word in the works of Shakespeare, all
mentions of the differences between men and women, etc. Also for generating
pre-computed tables (books of logarithms, roots, etc).

These books were quite useful so that work really did have valuable impact,
but what a waste of resources!

------
alejohausner
Oddly enough, when you play hangman, you should use this sequence instead:

ESIARN TOLCDU

In hangman, you are guessing words, not text. The high frequency of the
trigram THE makes T and H more probable in a text corpus. But if every _word_
is equiprobable as in a dictionary list of words, you get the above sequence.

[http://datagenetics.com/blog/april12012/index.html](http://datagenetics.com/blog/april12012/index.html)

~~~
bloak
Someone should do a proper game-theoretical analysis of hangman: you're
playing hangman with DEATH, who has chosen an n-letter word according to an
optimised strategy to minimise your chances of survival, assuming that you too
will be following an optimal strategy. So a word containing very few common
letters will probably be chosen with a higher probability that one that
contains lots of common letters, so a frequency analysis of a dictionary won't
directly help you.

------
ape4
I googled that on a whim and got an article.
[https://en.wikipedia.org/wiki/Etaoin_shrdlu](https://en.wikipedia.org/wiki/Etaoin_shrdlu)

~~~
jws
And now the long dead typesetters' mistakes are indexed by google for us all
to read…

From the January 1917 issue of the Southern and Southwestern Railway Club
bimonthly proceedings…

[https://books.google.com/books?id=Kx0wAQAAMAAJ&pg=RA4-PA24&d...](https://books.google.com/books?id=Kx0wAQAAMAAJ&pg=RA4-PA24&dq=shrdlu&hl=en&sa=X&ved=0ahUKEwj6g-yZqdjjAhWsB50JHfN8AwMQ6wEITzAH#v=onepage&q=shrdlu&f=false)

That's a fair bit of technological evolution in 102 years.

The publication seems like a forerunner of the podcast. It is a verbatim
transcript of an enthusiasts' meeting with advertisements injected.

~~~
dredmorbius
NB, those aren't "enthusiasts", the S&SWRWC is an industry association.

------
tptacek
It should probably be "ETAOINSRHLDCU", or " ETAOINSRHLDCU", right?

~~~
StavrosK
I'm sure there's some joke I'm not getting, but just in case, it's in
reference to this:

[https://en.m.wikipedia.org/wiki/Etaoin_shrdlu](https://en.m.wikipedia.org/wiki/Etaoin_shrdlu)

~~~
tptacek
The point is that "space" is probably not the 7th most frequent character in
English text.

~~~
StavrosK
Space isn't a letter either, it was just added to the phrase because the
typesetters' lines had a length of 6. It's a relic.

