I am glad to find out there are some other issues with it. My main disappointment with it however remains its license. It would of been really cool if Google released it with a CC or MIT license but instead its restricted for academic usage only. Better to spend time with other corpus'.
Better yet if they also released the corpus over bittorrent instead of requiring one spend $100+ for the cds. Let everyone play with it, not just full-time researchers.