It's very Norvig-esq (http://www.youtube.com/watch?v=LNjJTgXujno), but there's also http://anand.typepad.com/datawocky/2008/03/more-data-usual.h... and also Chris Anderson and Wired's flame bait http://www.wired.com/science/discoveries/magazine/16-07/pb_t... (that month's wired was dedicated to this subject)
And like someone at the previous discussions has said, this is the base of the scientific method, not it's death
Q: What's your opinion about semantic web?
A: Semantic web. Future of the web. And it always will be.
If I assigned engineers to (semantic web) formats based on the percentage of pages that had those formats, then the correct number of engineers for semantic web was zero.
If you're Google, Yahoo! or one of their friends, you can get away with relying just on correlations extracted directly from data. After all, you have all the data you could possibly want, and if you don't have, you can easily measure it in a straightforward way.
Everybody else, however, has to do a much better job of developing the right algorithms and insights to get the upper hand. The best way to do this of course, is to use whatever data you manage to scrape together.
Luckily, they also seem to recognize that sometimes data just isn't enough and ask for help. You've seen this in the Netflix prize, the AOL search log debacle and more recently in Microsoft's release of search logs for WSCD09.
I just said that sometimes, all the data in the world isn't enough if you don't have the right algorithms or insights.
I guess cutting edge natural language apps are going to be the playground of the big boys until PCs reach the scale necessary to do experiments.