

The unreasonable effectiveness of data - helwr
http://www.scribd.com/doc/13863110/The-Unreasonable-Effectiveness-of-Data

======
imurray
_Google released a trillion-word corpus with frequency counts for all
sequences up to five words long... contains...all sorts of other errors...But
the fact that it's a million times larger than the Brown Corpus outweighs
these drawbacks._

Maybe not: [http://nlpers.blogspot.com/2010/02/google-5gram-corpus-
has-u...](http://nlpers.blogspot.com/2010/02/google-5gram-corpus-has-
unreasonable.html)

------
xanderhud
Anyone have a link to a readable copy? I even tried logging in using my fb
account on scribd and there were more hoops to jump through to get a pdf

