In other words, work produced by lonely geniuses is quite likely to go unnoticed.
For all we know, the content that is being produced by companies like Demand Media has already been produced by thoughtful people, writing at length about subjects they love on obscure websites that no one ever links to. What a shame that would be!
It’s not quite as depressing as that. I recently made a quaint little site for a band and it has exactly zero other sites linking to it. It’s the first result when you search for the name of the band (which is town-name+generic-term-used-in-bandnames).
This only works with stuff that’s rare on the web, though. If there were other bands with the same name and if someone linked to them my little website would probably get swamped. (The same would presumably happen if someone were to write a blog post about the band – say, a scathing review of their last gig – and if that one post gets only a handful of links. Hm, so getting a few links seems at least like a good defense in such cases. Luckily many of the band’s target demographic aren’t actually all that internet savvy :)
The google page rank algorithm is designed in such a way that the work of geniuses should go unnoticed. Pagerank is designed for the masses. For the masses of consumers specifically.
Google is not designed for the geniuses. It's designed for people who want what everyone else wants.
In the beginning, when google was a tool used primarily by geniuses, then geniuses were the community. They were the masses that used google. Their algorithms now pick selections from a new community. Bloggers who can copy/paste. Bloggers with lots of friends who will link to their posts because the friends are asked to and because other friends reciprocate.
Google doesn't know if you are linking to a web page because you like the web page or because someone who built the web page asked you to link to it or because you are getting paid.
And google doesn't care.
The problem is that "indistinguishable" does not mean "identical". The Optimization-by-Proxy concept also applies to the way we recognize useful content and distinguish it from spam: if spam-creators exploit the gap between our perception of content and the actual quality of the content, they will ultimately create spam that fools even savvy users, and we will be influenced by it without even realizing it.
One of the characters in Neal Stephenson's "Anathem" described this phenomenon, occurring on his world's equivalent of the internet: sophisticated AI had led to spam (or "crap" as he called it) which was created by taking perfectly valid, reasonable ideas, combining them with falsehoods or biased information expressed clearly and reasonably, and releasing it in the form of real, substantive communications between users. A great deal of time and energy had to go into sorting "crap" from valid information.
I think this is something that has happened throughout history. The web probably makes it easier for the their work to be uncover than before but they are still at a disadvantage.
The summary of the history seems pretty accurate to my perception of it, but I don't think it's hopeless from here. :)
"Deep-sea ice crystals stymie Gulf oil leak fix - Yahoo! News
8 May 2010 ... thick blobs of tar began washing up on Alabama's white sand beaches. ... platform at the Deep Sea Horizon oil spill site in the Gulf"
At least a result from 4 days ago is an improvement on when I'd get usenet or mailing list results from 1999-2004 whenever I searched for anything linuxy.
All of the fascinating things about signals are confidential for all of the reasons listed in the article, and Google has been sued so many times by sites that think they should rank better than they do that I can't really give examples.
I think it's safe to say though that there are a lot of people worried about and thinking hard about what the web is turning into and how to rank it appropriately.
Most of the content is no longer written by devoted hobbyists, people no longer link as often to things they like, and much of the content on the front pages of reddit, digg (and sometimes even hackernews) was put there by people trying to make your search results worse.
Similarly, stackoverflow.com doesn't have very good answers for a number of technical topics, but it's often on the first 10 hits, even when the answers are useless and there's much better answers ranked lower (like project mailing list archives).
After all, Demand Media does produce real, editorially vetted content from real human writers. The payment system encourages what I'll call extreme efficiency of research and writing, but that simply optimizes it for the handy-reference domain of search results (e.g. How to fillet a smallmouth bass), which may not be "high quality" as such but does provide direct, clearly written and reasonably valid responses to the search queries that elicit them.
She's a Markov-based script inspired by the public domain works of a certain poet. All of her incantations are checked to make sure they don't repeat verbatim her model.
Every once in a while she comes up with something inspired.
This is exactly what makes Fox News, as an example, so dangerous. They don't care about the truth when they report; they only care about getting more eyeballs. I suspect that ANY spam that humans have to deal with to determine if it's useful is much the same.
I think this issue is very important - to Google, to web searchers, to businesses seeking to be found by Google and even to less scrupulous web operators. I'd love the opportunity to engage in 20-30 minute written chat with you and publish it (anywhere on the web you'd like).
As background, I've worked for years as an SEO consultant, founded a community and company in the space (SEOmoz.org), and have been spending the last few years developing and launching search marketing software.
I certainly respect your background and beliefs, but I think there's some flawed logic in your assumptions and arguments that I'd love to dig into, talk about and maybe even have some of my own perceptions changed. I would not ask you to disclose anything that's confidential - I'm much more interested in the theory and logic behind web spam, SEO and search relevancy.
You can reach me via email - email@example.com. Would love to hear from you!
I like Matt a lot, too, but his opinions (at least, those he publicly offers) are well known and well publicized. It would be great to hear other voices.
If not, how about if/when you ever leave the team. Happy to be patient :-)
Maybe that could become one of the first privacy disasters, when people realize they made their email unusable by publishing their connections.
Maybe in some cases keeping a system convoluted and inconsistent can actually help ensure stability and durability?
And sufficiently advanced errors are indistinguishable from pages made for pure irony.