

Google not indexing Craigslist – SearchTempest switches to Bing - tempestn
http://www.tempestblog.com/2013/03/14/google-not-indexing-craigslist-searchtempest-switches-to-bing/

======
Matt_Cutts
I left a comment on the original blog which I'll also paste here in case
people want to understand what happened.

"Hi Nathan, my name is Matt Cutts and I'm an engineer in the search quality
group at Google. Thanks for asking about this; it helped the indexing team
uncover an issue in how we're indexing Craigslist, and we're in the process of
fixing it right now.

To understand what happened, you need to know about the "Expires" HTTP header
and Google's "unavailable_after" extension to the Robots Exclusion Protocol.
As you can see at [http://googleblog.blogspot.com/2007/07/robots-exclusion-
prot...](http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-
with-even.html) , Google's "unavailable_after" lets a website say "after date
X, remove this page from Google's main web search results." In contrast, the
"Expires" HTTP header relates to caching, and gives the date when a page is
considered stale.

A few years ago, users were complaining that Google was returning pages from
Craigslist that were defunct or where the offer had expired a long time ago.
And at the time, Craigslist was using the "Expires" HTTP header as if it were
"unavailable_after"--that is, the Expires header was describing when the
listing on Craigslist was obsolete and shouldn't be shown to users. We ended
up writing an algorithm for sites that appeared to be using the Expires header
(instead of "unavailable_after") to try to list when content was defunct and
shouldn't be shown anymore.

You might be able to see where this is going. Not too long ago, Craigslist
changed how they generated the "Expires" HTTP header. It looks like they moved
to the traditional interpretation of Expires for caching, and our indexing
system didn't notice. We're in the process of fixing this, and I expect it to
be fixed pretty quickly. The indexing team has already corrected this, so now
it's just a matter of re-crawling Craigslist over the next few days.

So we were trying to go the extra mile to help users not see defunct pages,
but that caused an issue when Craigslist changed how they used the "Expires"
HTTP header. It sounded like you preferred Google's Custom Search API over
Bing's so it should be safe to switch back to Google if you want. Thanks again
for pointing this out."

------
JohnTHaller
Considering that craigslist is hostile to basically everyone who tries to use
anything on their site for anything (fair use or not), I'm not sure I have any
problem with Google simply ditching them.

~~~
WalterGR
I'm really surprised that so many people trust Google so implicitly that
they'd relinquish their choice of not using a site when it appears in Google
search results - in favor of letting Google deciding whose business practices
deserve their being listed at all.

Especially when it comes down to a site allowing fair use of their content.

From what I can gather, Google can and does penalize sites that show
copyrighted content even when the site has very legitimate fair use claims.
DMCA notices are never served and never appear on chillingeffects.org. The
sites aren't even delisted, as described in Google's "Transparency Report"[1],
but rather moved to lower and lower positions in the search results. Google
will never acknowledge that the site is being penalized, and it seems that
completely removing the "offending" content won't resolve it. (Source:
personal experience.)

[1]
[http://www.google.com/transparencyreport/removals/copyright/...](http://www.google.com/transparencyreport/removals/copyright/faq/)

------
jzawodn
<http://www.craigslist.org/robots.txt>

~~~
Groxx
Doesn't seem to include the /sys posts that are the main topic of discussion.
Unless it has changed?

------
AnthonyMouse
Why is Google not indexing craigslist?

~~~
tempestn
Looks like it's probably a bug:
[http://productforums.google.com/d/msg/websearch/p6VbIaBkbWA/...](http://productforums.google.com/d/msg/websearch/p6VbIaBkbWA/B1q96Q1V0WgJ)

------
jrussbowman
why would I search on Google, Bing or where ever and not just use the search
on craigslist?

~~~
tempestn
The most common reasons are that you want to search multiple cities at the
same time, or that you want to use advanced logical operators.

The nice thing about SearchTempest (in my obviously biased opinion) is that
you can set a radius to only search nearby cities, rather than the shotgun
approach of googling (or binging) everywhere.

------
tempestn
It's really amazing how little attention the Google problem is getting so far.
It seems like most people search, see some posts in the past few minutes and
some in February, and totally miss the fact that there's nothing in-between!

------
jonah
Side note: I was just wishing I could search across all of CL. (Looking for a
particular car.)

However about half of the ads that were relevant had expired. Do you have a
way of dealing with that?

~~~
tempestn
We do. First off, the reason that's happening is that Bing (and Google before
it) indexes craigslist pages, but has no way of knowing immediately when a
previously-existing page disappears. They obviously can't be continually
spidering every single page on the site to see if it's still there, and while
craigslist could notify search engines of deleted posts, they have no
incentive to do so.

Some searches tend to have more expired posts than others, but if you find
they're a problem, we have a couple of alternatives. For searches across the
whole country, the best option is RSS feeds. (Too bad 'RSS is dying' and
all... ;) ) You can run any search on SearchTempest and click the 'Get Feeds
for this search' link to grab an OPML file of all the craigslist results RSS
feeds matching your search, within the search radius you specified. Import
that file into a folder in your favorite RSS reader, and you've got a
convenient, auto-updating feed of new results for your search, straight from
craigslist.

The other alternative is our Direct Results mode. Basically that just opens up
two windows: one for the results from craigslist, and one as an index to flip
through cities. So you only see results for one city at a time, but can
quickly flip through them with the 'Next' link. Obviously that can take a
while for searches across the whole country though, so we recommend it more
for smaller searches. Basically just a small optimization compared to manually
opening up and pasting a search into a few separate CL cities directly.

More info here: <http://www.searchtempest.com/faq.php#deleted>

~~~
jonah
I guess you've gotta do the best you can with what you have to work with. ;)

Thanks for building this!

~~~
tempestn
I almost think that should be the site's tagline.

And you're welcome. I actually built it back in 2006, but it's evolved a fair
bit since then!

------
brentledent
I wish they would make it less ugly.

~~~
tempestn
'They' craigslist, or SearchTempest? I can't do anything about the former, but
I might be able to for the latter! :) (Our blog, certainly, could use an
overhaul, but there's never a time when I would prefer to do that than work on
the site itself!)

