

Google Books Ngram datasets - abraham
http://ngrams.googlelabs.com/datasets

======
moultano
Here's some fun I had with the viewer:
[http://moultano.blogspot.com/2010/12/history-through-
google-...](http://moultano.blogspot.com/2010/12/history-through-google-books-
ngrams.html)

I've always kind of scoffed when people complain that a history teacher didn't
make what they were learning "relevant" or "relatable" but after playing with
this for the first time I understand the benefits. At this point in my life,
making something "relatable" means expressing it in terms of term frequency
statistics and graphs, so this totally blew me away. I was obsessed with it to
the point of mania for the first 24 hours. I believe in history now. :)

------
Smerity
This is hugely exciting news. I previously used Google's Web1T corpus in NLP
experiments and the restrictive license limited a number of potential uses.

This new corpus has a temporal aspect (as it keeps the track of a word's usage
over a given publication year) and is additionally under the Creative Commons
license. I'd love to see this become the basis of a large scale database
benchmark / competition or open source linguistic application.

------
kristopher
Lots of interesting treasures hidden in this dataset. For example, here is
Benford's Law:

[http://ngrams.googlelabs.com/graph?content=1,2,3,4,5,6,7,8,9...](http://ngrams.googlelabs.com/graph?content=1,2,3,4,5,6,7,8,9&year_start=1800&year_end=2008&corpus=0&smoothing=1)

------
kraemate
For anyone in NLP (Natural Language Processing), this is a goldmine.

