Especially when it comes down to a site allowing fair use of their content.
From what I can gather, Google can and does penalize sites that show copyrighted content even when the site has very legitimate fair use claims. DMCA notices are never served and never appear on chillingeffects.org. The sites aren't even delisted, as described in Google's "Transparency Report", but rather moved to lower and lower positions in the search results. Google will never acknowledge that the site is being penalized, and it seems that completely removing the "offending" content won't resolve it. (Source: personal experience.)
"Hi Nathan, my name is Matt Cutts and I'm an engineer in the search quality group at Google. Thanks for asking about this; it helped the indexing team uncover an issue in how we're indexing Craigslist, and we're in the process of fixing it right now.
To understand what happened, you need to know about the "Expires" HTTP header and Google's "unavailable_after" extension to the Robots Exclusion Protocol. As you can see at http://googleblog.blogspot.com/2007/07/robots-exclusion-prot... , Google's "unavailable_after" lets a website say "after date X, remove this page from Google's main web search results." In contrast, the "Expires" HTTP header relates to caching, and gives the date when a page is considered stale.
A few years ago, users were complaining that Google was returning pages from Craigslist that were defunct or where the offer had expired a long time ago. And at the time, Craigslist was using the "Expires" HTTP header as if it were "unavailable_after"--that is, the Expires header was describing when the listing on Craigslist was obsolete and shouldn't be shown to users. We ended up writing an algorithm for sites that appeared to be using the Expires header (instead of "unavailable_after") to try to list when content was defunct and shouldn't be shown anymore.
You might be able to see where this is going. Not too long ago, Craigslist changed how they generated the "Expires" HTTP header. It looks like they moved to the traditional interpretation of Expires for caching, and our indexing system didn't notice. We're in the process of fixing this, and I expect it to be fixed pretty quickly. The indexing team has already corrected this, so now it's just a matter of re-crawling Craigslist over the next few days.
So we were trying to go the extra mile to help users not see defunct pages, but that caused an issue when Craigslist changed how they used the "Expires" HTTP header. It sounded like you preferred Google's Custom Search API over Bing's so it should be safe to switch back to Google if you want. Thanks again for pointing this out."
The nice thing about SearchTempest (in my obviously biased opinion) is that you can set a radius to only search nearby cities, rather than the shotgun approach of googling (or binging) everywhere.
However about half of the ads that were relevant had expired. Do you have a way of dealing with that?
Some searches tend to have more expired posts than others, but if you find they're a problem, we have a couple of alternatives. For searches across the whole country, the best option is RSS feeds. (Too bad 'RSS is dying' and all... ;) ) You can run any search on SearchTempest and click the 'Get Feeds for this search' link to grab an OPML file of all the craigslist results RSS feeds matching your search, within the search radius you specified. Import that file into a folder in your favorite RSS reader, and you've got a convenient, auto-updating feed of new results for your search, straight from craigslist.
The other alternative is our Direct Results mode. Basically that just opens up two windows: one for the results from craigslist, and one as an index to flip through cities. So you only see results for one city at a time, but can quickly flip through them with the 'Next' link. Obviously that can take a while for searches across the whole country though, so we recommend it more for smaller searches. Basically just a small optimization compared to manually opening up and pasting a search into a few separate CL cities directly.
More info here: http://www.searchtempest.com/faq.php#deleted
Thanks for building this!
And you're welcome. I actually built it back in 2006, but it's evolved a fair bit since then!