

Google labs:word frequency in books over the last 200 years - prat
http://ngrams.googlelabs.com/graph?content=fuck&year_start=1800&year_end=2000&corpus=0&smoothing=3
I was surprised to see the high popularity of the word "fuck" prior to 1820
======
EliAndrewC
The example in the OP (fuck) was so common until the early 1800s because of
the typographic convention to substitute an f for an s. In other words, the
word "suck" was being written as "fuck", which is why the word appeared so
often until the early 1800s.

~~~
tseabrooks
Any background on the origin of this typesetting convention? I'd like to know
the whys and whatfors...

~~~
EliAndrewC
The technical term for this is a "long s", which Wikipedia describes at great
length: <http://en.wikipedia.org/wiki/Long_s>

~~~
edge17
thanks, that was incredibly fascinating and enlightening, especially the
examples.

 _The long s survives in elongated form, and with an italic-style curled
descender, as the integral symbol ∫ used in calculus; Gottfried Wilhelm von
Leibniz based the character on the Latin word summa ("sum"), which he wrote
ſumma. This use first appeared publicly in his paper De Geometria, published
in Acta Eruditorum of June 1686,[2] but he had been using it in private
manuscripts at least since 1675.[3]_

------
Groxx
Utterly awesome.
[http://ngrams.googlelabs.com/graph?content=My+name+is+Inigo+...](http://ngrams.googlelabs.com/graph?content=My+name+is+Inigo+Montoya&year_start=1800&year_end=2011&corpus=0&smoothing=3)

Potentially even more awesome is that they have the entire dataset available
for download o_O

edit: case sensitivity is more fun than insensitivity:
[http://ngrams.googlelabs.com/graph?content=Star+Trek%2Cstar+...](http://ngrams.googlelabs.com/graph?content=Star+Trek%2Cstar+trek%2CStar+trek%2Cstar+Trek&year_start=1900&year_end=2008&corpus=0&smoothing=3)
vs
[http://ngrams.googlelabs.com/graph?content=star+trek%2CStar+...](http://ngrams.googlelabs.com/graph?content=star+trek%2CStar+trek%2Cstar+Trek&year_start=1900&year_end=2008&corpus=0&smoothing=3)

edit2: there are a whole bunch of geek-term bumps around and just after 1900.
Anyone know why? E.g.:
[http://ngrams.googlelabs.com/graph?content=Star+Wars&yea...](http://ngrams.googlelabs.com/graph?content=Star+Wars&year_start=1900&year_end=2008&corpus=0&smoothing=3)

~~~
splat
I have no idea, but my guess is that they don't know the dates for some books
and the system automatically classifies the publication date as "1900" or
"1901." If you search the word "quark," you also get a bump at around 1900
even though the word wasn't coined until Joyce's _Finnegans Wake_ in 1939.

------
PetrolMan
I find it kind of interesting that a lot of words peak around the middle of
the 19th century and have been in decline ever since. I'm guessing this has
something to do with the increasing number of books published but it is still
kind of hard for me to imagine that "the" is less commonly used now than one
hundred years ago. The pattern holds true for a lot of common words...

------
sylvinus
Is this weird ? :)

[http://ngrams.googlelabs.com/graph?content=google&year_s...](http://ngrams.googlelabs.com/graph?content=google&year_start=1800&year_end=2003&corpus=0&smoothing=3)

------
edge17
[http://ngrams.googlelabs.com/graph?content=terrorist&yea...](http://ngrams.googlelabs.com/graph?content=terrorist&year_start=1800&year_end=2011&corpus=0&smoothing=3)

------
thekevan
They had smartphones in the 1900s? Could this be related to that woman
supposedly seen speaking on a cell phone in the Charlie Chaplin video?

[http://ngrams.googlelabs.com/graph?content=smartphone&ye...](http://ngrams.googlelabs.com/graph?content=smartphone&year_start=1800&year_end=2000&corpus=0&smoothing=1)

(Actually, "internet" also has a similar spike. I suspect some books are
mislabeled in their dates.)

~~~
nrkn
Some dates are mislabled, but mostly OCR errors:

[http://www.google.com/search?q=%22internet%22&tbs=bks:1,...](http://www.google.com/search?q=%22internet%22&tbs=bks:1,cdr:1,cd_min:1600,cd_max:1681&lr=lang_en)

------
jalmos
Given the birther-related news today, I was curious about another uncouth
term. Sad results:

[http://ngrams.googlelabs.com/graph?content=nigger&year_s...](http://ngrams.googlelabs.com/graph?content=nigger&year_start=1800&year_end=2000&corpus=0&smoothing=3)

------
iunk
I don't know in what context they could use Geek in 1800.
[http://ngrams.googlelabs.com/graph?content=geek&year_sta...](http://ngrams.googlelabs.com/graph?content=geek&year_start=1800&year_end=2000&corpus=0&smoothing=3)

~~~
Groxx
Or the 1700s:
[http://ngrams.googlelabs.com/graph?content=geek&year_sta...](http://ngrams.googlelabs.com/graph?content=geek&year_start=1000&year_end=2000&corpus=0&smoothing=3)

Perhaps weirder, "Woot":
[http://ngrams.googlelabs.com/graph?content=Woot&year_sta...](http://ngrams.googlelabs.com/graph?content=Woot&year_start=1500&year_end=2000&corpus=0&smoothing=3)

~~~
iunk
Seems like some are incorrect when i look at the scanned pages and one of
those says Geck instead of Geek
[http://books.google.com/books?id=gvYqAAAAYAAJ&pg=PA497&#...</a>

~~~
Groxx
Ah, good point. This _is_ generated by OCR + reCAPTCHA, error is guaranteed to
creep in.

------
ryan42
some interesting results: (sad about liberty)

[http://ngrams.googlelabs.com/graph?content=liberty&year_...](http://ngrams.googlelabs.com/graph?content=liberty&year_start=1800&year_end=2000&corpus=0&smoothing=3)

[http://ngrams.googlelabs.com/graph?content=l33t&year_sta...](http://ngrams.googlelabs.com/graph?content=l33t&year_start=1800&year_end=2000&corpus=0&smoothing=3)

[http://ngrams.googlelabs.com/graph?content=hacker&year_s...](http://ngrams.googlelabs.com/graph?content=hacker&year_start=1800&year_end=2000&corpus=0&smoothing=3)

------
samuel1604
while love is going down
[http://ngrams.googlelabs.com/graph?content=love&year_sta...](http://ngrams.googlelabs.com/graph?content=love&year_start=1800&year_end=2000&corpus=5&smoothing=3)
sad..

------
dlsspy
I'm going to have an impact on google's internet bill this month.

