In a sense, Google is the largest crowdsourced project of all-time. It's a lot like Reddit or HackerNews in that every link is an upvote, but the genius of Google is that each link carries a different weight, and that links are a natural byproduct of using the Internet. In short, the people contributing to the crowdsourced ranking system don't even realize they're doing so most of the time. They're just doing what they like and leaving a byproduct of doing so (links, social signals) that Google can use to tell you which sites people consider valuable.
But that means that once people realize what Google is using to rank, they can mimic those signals, and sway the algorithm in their favor. The problem Google is going to run into is that once spammers can closely (and at times programmatically) mimic what is happening "organically," Google's algorithms cannot tell the difference.
Right now Google's approach seems to be ignoring or highly devaluing portions of the Internet that have been overrun with spam. Article submissions, blog comments, and now apparently guest posting, which sucks for people that do really high quality, organic guest posting; for Google that has to be collateral damage. Spammers will simply move on to the next portion of the Internet, mimicking what Google still uses as a ranking signal. It's an endless battle.
One of the big things I see happening now is entire website hijackings (I've been meaning to email you about that, Matt). I did a quick little report on the search engine results for "Viagra," and 81 of the top 100 are hijacked websites, including a client I have to upgrade to a newer version of Drupal, as the older one has been compromised. I don't know if or how Google will win this battle, but it's far from over. I honestly feel like the new way we gather data to rank websites, and what will be successful in 25 years, will have to be completely unrelated from what Google is doing now, and much harder to manipulate than spreading links all over the Internet.
It's true that website hacking remains a big issue in the spammiest areas of the web even though it's completely illegal. Unfortunately, there's a lot of unmaintained, unpatched web servers out there that blackhats exploit. It's fundamentally a hard problem, but we've been working on the next generation of our hacking detection algorithms.
I've always wondered why we don't see more startups offering hacker protection, detection, clean-up, etc. Companies like McAfee made a lot of money protecting personal computers, and there's a similar opportunity on the web server side.
I definitely agree with you that it is somewhat of an under tapped market, but definitely think it needs to be head up by the right individual(s).
To be sure there are many patched maintained web servers, applications that are also exploited (by someone for some purpose).
The best and the brightest get hacked and software maintained even by professionals regularly needs to be patched for new exploits. (Take Flash which seems to be running at between 1 and 4 updates per month). Or even OSX security updates.
I know everyone seems to think it's the other guy that doesn't have his act together and isn't following the obvious advice but there are many "other guys" that are quite capable and still end up having problems and being exploited. (Source: stuff that I read in news stories the same as everyone else.)
Focused exactly on what you mentioned (web site recovery, monitoring and protection).
*I work there :)
I know very little about SEO or website design, but isn't google already kinda tipping their hand to what the future will hold?
Without adblock, a quick google search shows that the entire top half of the result page are ads. To me, it's fairly obvious the future will bring pay-for-priority search inclusion. You want a top spot? Break out your wallet.
Whether that's a good thing or a bad thing, only time will tell. I'm more curious about search engine competitors 25 years from now. Google's had a great decade, but that can't last forever ... can it?
For instance, I recently watched a terrible movie, and I'm trying to remember what it's called so I can warn people off of it. Searching for that is a specific piece of information, not a generic "noun"-type thing:
A query like "sports" or "news" may or may not have an ad in the search results. Keywords like "credit cards for people with poor credit scores" will almost always show lots of ads, even though the query is quite specific and contains 8 words.
In some particularly valuable verticals (like hotels) Google then further adds their other paid verticals to the results in addition to the AdWords ads.
One of Matt's videos (not sure which one) mentioned online publishers tend to focus a lot on say 10% to 20% of highly commercial terms, while often paying much less attention to the rest of the searches as the commercial terms typically are far more valuable than the informational ones. (Mesothelioma lawyers can afford to pay more for traffic than say people selling flour, or people offering recipes that contain flour).
First of all, it's really hard to replicate high-quality signals. Yes, we've got spammy guest postings and spammy comments and entire websites overrun with spam. But if you were to analyze those links, you'll notice that they are still islands. As in, do you see any Viagra-related links on Hacker News or in your Reddit subscriptions? Do you see Viagra-related links in your Twitter or Facebook stream? What about all your other news channels? The only places I see Viagra-stuff these days is either in my Gmail's Spam folder or on porn sites. And I don't know why, but I'm not seeing much spam in Google's search results while I'm logged in, maybe it's the stuff I search for, or maybe Google learned my interests - but in Incognito mode I get a lot more spam.
And second, if there are dark corners of Google's search engine, search keywords that have been overrun with spam, then Google is partly to blame because they've turned a blind eye towards spam for far too long, as they tolerated and still tolerate Adsense spam and content farms.
If Google is indeed facing a spam problem, then there's a whole lot more they could do. Off the top of my head, why not penalize websites hosted on old, insecure versions of Wordpress or Drupal? Why not expose a "Report Spam" button to logged in users? Why did it took so long for Google to detect and ban scraped websites?
SEO is sort of the same way in terms of there being a range of options. Some people might use SAPE, Xrummer or Fiverr or such (nude person with a URL streaking at a sporting event), whereas others might use more nuanced strategies where any SEO impact appears incidental.
Design itself can play a big roll in the perception of spam. Designs that look crisp can contain nonsensical content without feeling as spammy as they are.
But another factor here is that Google wants to keep raising the bar over time. Things that are white hat fade to gray then black as they become more popular & widespread. And as Google scrapes-n-displaces a wider array of content types, that forces publishers to go deeper (if they can do so without bankrupting themselves).
How could this work? Well my company is trying something like that now, though it is early going. The easiest way to do this is by filtering the Web by domains instead of individual web pages. This alone removes several magnitudes of complexity.
many topics tend to bleed across niches
sites themselves will change purposes over time
how sites are monetized (and other user experience choices) may change over time
people buy and sell sites
sometimes the desired information is only accessible on sites which are not particularly popular because they are not aligned with moneyed interests
sometimes sites that are popular might be popular because they are inaccurate, conforming to an expected + desired bias
if the whitelist is black and white, entities may change their approach after being well trusted (one of the elegant aspects of Panda is how it can remeasure over time)
One idea would be to have sites that concentrate on some topic(s) of interest, where people can submit interesting links they find directly, and where others who share those common interests can lend their support to promote the best material and share other related links they've found themselves. I know, I know, it's a radical concept, but I truly believe there's some potential there...
Then you could maybe even organize them into publicly based on those tags. You would need a hierarchy of tags then though but that would give you a rather comprehensive, yet still curated, view of the web...