

The Craigwatch Story - stickfigure
http://craigwatch.com/story.html

======
stickfigure
I used Craigwatch. It was a useful tool - when a search pops, I get an email.
I used to use Google Reader for this but when Reader died so did my RSS
reading habit, so now email is the only part of my daily workflow that I poll
frequently.

The real mystery to me is why Craigslist shut the service down, because
(protocols aside) it wasn't fundamentally different from any other way of
consuming craigslist. Craigslist's terms of service is actually pretty
confusing:

 _USE. You agree not to use or provide software (except for general purpose
web browsers and email clients, or software expressly licensed by us) or
services that interact or interoperate with CL, e.g. for downloading,
uploading, posting, flagging, emailing, search, or mobile use. Robots,
spiders, scripts, scrapers, crawlers, etc. are prohibited, as are misleading,
unsolicited, unlawful, and /or spam postings/email. You agree not to collect
users’ personal and/or contact information (“PI”)._

...which if you think about it, bans the use of RSS readers too. There's a
short blurb on the RSS page:

 _craigslist RSS feeds are for your personal use only, and are not available
for commercial use without first obtaining a license from craigslist. Please
consult our <Terms of Use> for more information on using craigslist RSS
feeds._

The only relevant part of the Terms of Use is the section mentioned above,
which doesn't really shed any light on the situation.

Honestly this is really confusing. Once again, Craigslist disappoints.

~~~
yonran
By my reading, prior to Dec 2013, the old terms[1] permitted using personal
RSS readers but forbade aggregators. However, the new terms are much more
restrictive; they prohibit any use of the craigslist website outside of a web
browser. Thus, I think that it is now against the terms to use an RSS reader
to read the craigslist RSS feeds! (By the way, it is also against the
craigslist terms to write a crawler such as google.com or archive.org even if
it respects robots.txt!)

I happened to review these terms because I too was sent a letter from Brian
Hennessy representing craigslist last Friday regarding a little Chrome
extension I wrote a couple years ago[2].

[1]
[https://web.archive.org/web/20131128233421/http://www.craigs...](https://web.archive.org/web/20131128233421/http://www.craigslist.org/about/terms.of.use)

[2] [https://github.com/yonran/craigslist-
shortcuts](https://github.com/yonran/craigslist-shortcuts)

~~~
mkaziz
I'm curious as to whether you got a response to your refusal to "comply." Did
he get back to you after that?

~~~
yonran
Not yet.

------
kazinator
In the past, I have sometimes used Craigslist's RSS feature to watch for the
appearance of items. That is to say, the ability of CL to export a keyword
search as an RSS feed that you can save in your feed reader and watch for
updates.

In what ways did Craigwatch improve over RSS? If it still operated today, why
would I use that instead of scraping RSS items directly from Craigslist to my
feed reader?

Did Craigwatch use RSS feeds, or was it scraping material from the rendered
HTML?

~~~
wiseleo
Useful free stuff is typically gone in minutes. Only junk remains posted.

Its RSS feeds have a refresh period of 1 hour, which is simply too long to
have the opportunity to grab hot items.

As Craigslist has very shallow category listings in its "free" category, where
you can't separate furniture from clothes as an example, the site forces users
to rely on manually refreshing pages, which change very often.

I have better user experience using Craigslist from my iPad. Craigslist
appears to selectively make the service available to app makers but not to
websites.

In the past 3 days of shopping for a specific car within about 15 Craigslist
local communities, I managed to get automatically blocked by the service
several times. What I am looking for is very specific and I don't mind
traveling a bit to get what I want.

I ended up having to write code with sleep timers to reduce my number of web
queries once I narrowed what I wanted.

An essential feature in the iPad app is that I can cross off listings that are
no longer interesting to me and highlight favorites. There is no listing
cross-off feature for the website.

I guess my next step is to add that feature. _laugh_

~~~
kazinator
Wiseleo, is that really true about the refresh rate? RSS readers have a
configurable rate. Often there is a one hour default in order to be reasonably
nice to the server. Basically, a Craigslist RSS item is just a URL with the
search parameters embedded. Maybe I'm wrong, but I suspect that whenever you
fetch this URL, the server executes the search and produces the results as RSS
XML items, so the fetch rate is controlled by you (your reader). Or are you
saying there is some additional throttling on the server side, so that RSS-
based searches do not see up-to-the-second updates that are visible through
the Web interface? So that no matter how often you refresh the RSS feed, you
don't see new items that are already visible via HTML?

~~~
wiseleo
Gotta love being downvoted for an informative post... :)

If you look at the XML returned for RSS feeds, you will notice:

<syn:updateBase>2014-07-16T14:25:18-07:00</syn:updateBase>
<syn:updateFrequency>1</syn:updateFrequency>
<syn:updatePeriod>hourly</syn:updatePeriod>

That is the configuration setting for RSS readers to not update more than once
per hour.

See spec:
[http://web.resource.org/rss/1.0/modules/syndication/](http://web.resource.org/rss/1.0/modules/syndication/)

Craigslist has aggressive blocking for excessive GET requests. The RSS feed
contains only the first 250 characters of text description of the ad. Thus,
you will see that something got added but you will not see its details. More
importantly, attributes are not available as part of the RSS feed.

That means that you were interested in specific colors of a car, you would
need to define a separate RSS feed for red, yellow, and so on.

It is hard to test whether the RSS search results are additionally throttled,
but you will likely get blocked while testing. :)

While legitimately shopping, I got blocked multiple times for becoming more
efficient.

RSS is not hard to read. Here are some red manual transmission cars in San
Francisco Bay Area
[http://sfbay.craigslist.org/search/cto?auto_paint=7&auto_tra...](http://sfbay.craigslist.org/search/cto?auto_paint=7&auto_transmission=1&s=0&format=rss)

------
CanSpice
So will IFTTT get a cease and desist sometime in the near future, or have they
come to an agreement with Craigslist to provide their channel?

------
8ig8
A while back, I hacked together notifications using RSS, but lately, I just
use IFTTT...

[https://ifttt.com/craigslist](https://ifttt.com/craigslist)

~~~
pwenzel
I use IFTTT's craigslist channel all the time for long-term polling of
craigslist. Not sure how instant it is, but notifications are pretty reliable.

------
houston-mouse
One thing I like about this reflection is the highlighting of the weirdness of
the connection between a service's creators and its users. I'm on both sides
of the fence, and I find myself wrestling with _lots_ of different feelings
towards my users (mostly positive, but not all). I guess this dynamic has
existed since the printing press, but it feels a bit more intimate now since
you can see the emails and read the tweets.

~~~
jarofgreen
This is a very good point, and something that was also touched on in
[http://www.saurik.com/id/20](http://www.saurik.com/id/20) recently.

------
dpweb
I would think CL using a proxy that silently blocks client ips that interact
with the service suspiciously may be cheaper than sending out lawyer letters
and more immediately effective. Just prefer technical solutions to legal
threatening I guess..

~~~
pyoung
For fun and as a learning experience, I built a simple CL scrapping web site.
I set it up on a small aws instance about 6 months ago and after playing
around with and making modifications/updates to it over the course of a month
or two, I largely forgot about it. After seeing this thread, it dawned on me
that I had not received an e-mail update on any of my search alerts for a
while. I ssh'ed into the server and got a 403 with 'wget craigslist.com', so
my guess is that they are doing some sort of blocking. Looking at the logs,
the block probably started about a month ago.

~~~
Wingman4l7
Modifying the user agent to mimic a normal browser might solve the problem.

------
jijji
I ran a system that was a back end for craigslist for many years, and I never
stopped even after getting a cease and desist from them. What really got me
was after they started charging people to post ads... that really killed my
business model. Generally its not a great idea to build a business model
around the availability of a third party... I think the whole concept of
craigslist sending a 'cease and desist' over what is generally considered
public_html is a laughable joke in my opinion, and good luck trying to
restrict public html from crawlers and robots.

------
dbla
A little bit off topic but interesting nonetheless.

I've known Beau for many years now and his writing style has improved so much.
He is a testament to the "practice makes perfect" (or better at least) mantra.
It's really interesting to see the transition in his travel blog from when he
started writing to now. If you're interested you can check out. Compare the
first post to the most recent.
[http://dangertravels.com/](http://dangertravels.com/)

~~~
tabrischen
Thanks for sharing, his travel stories are inspiring

------
mkolodny
I'd be willing to be that Craigslist shut you down because they're building
their own "watch" feature. The same thing happened with Padmapper - they sent
them a cease and desist letter, and then came out with their own map feature
([http://newyork.craigslist.org/aap/#map](http://newyork.craigslist.org/aap/#map)).

------
laxk
Does it mean that any browser extension which is modifying CL web page(s)
violating their terms of use?

------
intellegacy
it's a great story.

BTW, can anyone give me perspective on what it would take to recreate this in
Python? As a learning exercise

Is it possible to do? how long would it take for someone relatively new, etc.

~~~
bnejad
With all the libraries available for parsing and scraping I suspect this could
be recreated very quickly.

------
atomical
Did craigwatch.com agree to the terms on the site?

~~~
jstanek
There's probably a provision in the TOS that says that you implicitly agree to
the terms by virtue of using the service. I don't know for sure, just
speculating.

------
asaamaraa
mmm

------
melvinng
Stand up to the MAN!!!

~~~
melvinng
Everyone is too law abiding.. But really what laws is he breaking here??
Trademark? Copyright?

Spoke to my legal counsel and I don't see either..

~~~
unreal37
Terms of service are a contractual agreement between a service and its users.
There's no laws being broken, but it's breach of contract.

Your legal counsel would know that I suspect.

~~~
jijji
click through agreements and terms of service agreements (TOS) are not legally
binding contracts. they are used by service providers to provide fear,
uncertainty and doubt (FUD).

