I worked on Google's statistical machine translation system during my internship. There, I learned that data really is king. The Google Translator team spends equal effort collecting data as they do improving their algorithms.
The 2008 NIST results [1] show that Google's translator swept every category with unconstrained training sets. That is, when Google was allowed to use all of the data that they collected, they smoked the competition. When the training sets were constraint to a common set for all competitors, better algorithms prevaled. You can be sure that the very talented team at Google will be improving their algorithms to ensure that never happens again. But you can also be sure that competitors will be collecting even more data to counter Google's victories.
But who remember that in 1998, how many webpages Google indexed and how many Altavista indexed? Data is important, for spam filter, translation etc. but is far from a "king". We have the fancy that data is much important than algorithm Because now we actually get some really good statistical learning methods.
The 2008 NIST results [1] show that Google's translator swept every category with unconstrained training sets. That is, when Google was allowed to use all of the data that they collected, they smoked the competition. When the training sets were constraint to a common set for all competitors, better algorithms prevaled. You can be sure that the very talented team at Google will be improving their algorithms to ensure that never happens again. But you can also be sure that competitors will be collecting even more data to counter Google's victories.
[1] http://www.nist.gov/speech/tests/mt/2008/doc/mt08_official_r...