Hacker News new | past | comments | ask | show | jobs | submit login
3taps says CL blocking "all general search engines"
25 points by sigmadelta on Aug 7, 2012 | hide | past | web | favorite | 16 comments
homepage pop-up: "At approximately noon on Sunday August 5th, Craigslist instructed all general search engines to stop indexing CL postings -- effectively blocking 3taps and other 3rd party use of that data from these public domain sources. We are sorry that CL has chosen this course of action and are exploring options to restore service but may be down for an extended period of time unless we or CL change practices. As soon as we know more, we will share it here and on our Twitter account."

I don't think this is accurate. As far as I can tell, there is nothing in CL's robots.txt, meta tags, or response headers that prevents Google from indexing them. Further, requesting a CL post with the Googlebot user agent yields the same content. This only leaves the possibility that they are excluding Google via specific IP blocks, which seems unlikely. Is there something I'm missing?

i don't know, but if i understand http://3taps-statistics.qatro.com/craigslist/index.pl correctly then they are missing lots of posts. the numbers seem to be percentages.


Scroll down to the "daily report" and they've dropped from almost 2 million posts a day to 153.

Yeah. This is all talking about 3taps, not "general search engines". 3taps seems to be claiming that Craigslist has cut off Google, but I think it's just that Craigslist has cut off 3taps.

*Edit: Craigslist added nocache directives to their posts, which means that 3taps can't scrape the Google cached copies. They're not blocking anyone. Interestingly, this also reveals that 3taps was previously violating the Google TOS, which prohibits automated access of the Google cache.

Do some site:city.craigslist.org searches and limit to 24 hours. You'll get no results. You can't cut off 3taps without cutting off Google.

Just tried again, and I got a bunch of results:


Of the 10 results on the first page, 7 are within the last hour.

(1) Boat sfbay.craigslist.org/sby/boa/3191071012.html 2 hours ago ... 408-726-8722 i don�t know about the motor or about the boat but if you want to see it call.... that is the reason that i can�t wrote about the ...

SF bay area boats - by owner classifieds - craigslist sfbay.craigslist.org/boa/ - Cached - Similar SF bay area boats - by owner classifieds - craigslist.

(2) 1964 Fabuglas 16ft fishing boat sfbay.craigslist.org/nby/boa/3191162483.html 1 hour ago ... This was my dads fishing boat. It runs good. But could use a little TLC. Its a 1964 16 foot Fabuglas out of Nashville Tennessee. Most of the ...

SF bay area marine services classifieds - craigslist sfbay.craigslist.org/mas/ - Cached Sat Aug 04. Shipwright/Boat Work - (berkeley) ... Boat & Marine Related Service - (hayward / castro valley) ... SF Charter Boat, Book Now - (San Francisco Bay) ...

(3) SIDEWINDER 16' SPEED BOAT trade for services or...??? - Craigslist sfbay.craigslist.org/eby/bar/3191164088.html 1 hour ago ... 1980 BLUE sidewinder motor boat. Seats 4. 35+mph. 70 hp 2 stroke VRO Evinrude motor, no need to premix fuel. Runs strong. Starts right up.

Fishing / Hunting Boat sfbay.craigslist.org/sby/boa/3176240765.html 6 days ago ... 2004 War Eagle Boat, semi v front flat bottom(17ft.), with a 2003 40hp Mercury(4 stroke) motor with 20- 25 hours on it. The boat is on an EZ ...

(4) Sailboat rudder off a 22' boat sfbay.craigslist.org/sby/boa/3191237551.html 14 minutes ago ... 4'10'' tall rudder from a 22' boat. It is in great condition and is a solid Bay rudder. Can be brought up in case of a grounding by pulling on a rope ...

(5) MB Sports V Drive Ski & Wakeboard Boat sfbay.craigslist.org/eby/boa/3187986181.html 1 day ago ... 2002 MB SPORTS 220 V-Drive. This boat is an excellent for both skiing and wakeboards. We have used it and enjoyed it for slalom skiing and ...

(6) * BAYLINER* NICE BOAT, NICE PRICE... BEST OFFER MOVING ... sfbay.craigslist.org/eby/boa/3191163056.html 1 hour ago ... VERY CLEAN BOAT New wheel bearings on trailer. 3.0 Mercruiser 135 H.P. Great on gas. 40-45 mph top speed. Just registered! Garaged for 7 ...

(7) wanted boat polisher sfbay.craigslist.org/eby/boa/3191196564.html 50 minutes ago ... wanted boat polisher (pittsburg / antioch) ... I am looking for someone to polish and wax my 28 ft boat. topside only dont have to do the hull.

Pretty brilliant, I don't think Craiglist ever needed Google traffic at all anymore. People know to go there to buy and sell.

Not to mention that local classifieds are not really meant to have any permanence so often times when I saw a CL listing on Google for an item I was looking for, it was already sold or the posting was deleted. I'm still looking forward to a viable Craigslist competitor though.


"One data harvester, 3taps, said earlier this week that Craigslist had blocked search engines such as Google from including Craigslist pages in search results. But that report was inaccurate.

3taps’ product and quality assurance leader, Meg Nakamura, acknowledged Wednesday in a chat with The Chronicle that something fishy was taking place, but developers there haven’t fully figured out what’s going on."


Not sure I agree with most the conclusions drawn in that article.

The article does say that "sure enough, Google displays recent listings from Craigslist right now," which does seem to be true for me, too, when I try.


Mark Milian ‏@markmilian 7 Aug

Contradicting earlier statement, 3Taps spokeswoman emails to say, "Craigslist is still allowing indexing of pages." Still nothing from CL PR

Actually the part about search engines doesn't seem to be true... I just performed searches using Google, Yahoo, and Bing and got links to CL postings that were made within the last hour.

They probably did it through the webmaster tools portals provided by the search engines themselves. 3Taps has gone from indexing ~2 million posts a day to virtually 0. I did a couple site:cityname.craigslist.org searches on Google, restricted to the past 24 hours, and got no results.

Google, and I presume other big search engines too, cache robots.txt for a week by default. We're well within the window for them to still be indexing CL.

If that's true, how does 3taps know that CL is blocking search engines? Also:

$ wget -q -O- --save-headers http://www.craigslist.org/robots.txt | fgrep Last-Modified

Last-Modified: Fri, 04 Nov 2011 18:13:24 GMT

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact