

A Week of Mining Seattle’s Craigslist Apartment Pricing - racketracer
https://racketracer.wordpress.com/2014/12/23/a-week-of-seattles-craigslist-apartment-pricing/

======
alexyang21
Pretty cool analysis! Scrapy is great but if you're looking to extend this
further, I recommend checking out the 3taps API. 3taps is the only API I've
seen that breaks down the different components of a Craigslist post (e.g.
heading, price, number of bedrooms) and makes it available for scraping. Using
3taps will also make it easy for you to extend the analysis to new cities (you
literally change a single parameter). And you should set up a cron job (or
Heroku scheduler task if a web app) so you don't have to run the scraper
manually every day :-)

If you want to check out an example of automated Craigslist scraping, you can
check out a search tool I built (craigslist-scraper.herokuapp.com) and the
accompanying tutorial (baserails.com/apiscraper). Both are best viewed via
laptop.

------
OrwellianChild
I like the varied analysis! Though, if you want to build a prediction model,
it's probably best to avoid variables that can be gamed, e.g. # of pictures in
a listing... For the larger zip codes, a heat-map of prices might help cut
them down to more consistent behavior for prediction. Keep iterating - would
love to see your progress over time!

------
dreen
Did you do anything to avoid an IP ban for the crawler?

~~~
dave5104
I'm curious about this too, especially since OP says they'd like to continue
scraping. I've heard of other people crawling craigslist getting banned pretty
fast and even one site getting slapped with a lawsuit [0]. (Which looks like
was not entirely successful for craigslist.)

Craigslist has even taken some measures to get their users to assign copyright
of the content over to craigslist themselves [1].

[0] [http://venturebeat.com/2013/05/01/padmapper-
craigslist-3taps...](http://venturebeat.com/2013/05/01/padmapper-
craigslist-3taps/) [1] [http://www.on-site.com/what-you-need-to-know-about-
craigslis...](http://www.on-site.com/what-you-need-to-know-about-craigslist-
copyright-assignment/)

~~~
racketracer
I scraped it probably at least 50 times and nothing has happened. I heard that
craigslist only blocks IPs that direct a lot of traffic toward their sites
like Padmapper did.

------
masonhensley
That's really neat.

Fun fact - many apartments run by management companies / owned by
institutional investors dynamically price their units on a daily basis based
on the number of phone inquiries, walk ins to the leasing office, occupancy
rates in the neighborhood, and many other factors.

Ex: [http://www.realpage.com/yieldstar/](http://www.realpage.com/yieldstar/)

~~~
gyardley
Yes, and scraping their publicly-posted apartment rents daily for a couple
months can reveal patterns that'll save you a decent chunk of your lease.

Biggest determinant is supply of a particular unit size, though. If there's
five one-bedrooms available in a yield-managed complex, you're getting a
better deal relative to the building than if there's only one one-bedroom
remaining - sometimes to the tune of hundreds of dollars a month.

~~~
dave5104
A few other products out there apart from YieldStar include LRO and Rent
Maximizer.

It's fascinating how much goes into the pricing. It's all based on quite a few
variables, ranging from availability (like you mention), competing communities
in the area, lease end date optimizations (I don't want >X leases expiring in
one month, since that's a lot of work for my leasing staff), time of year, and
price history. And on top of this, prices can change daily.

I've had the chance to work on a revenue management product myself--and I can
safely say that it's changed my perspective as a renter. There's a lot that
can go into the pricing.

~~~
Nicholas_C
My roommate swears that checking the posting a few times can raise the price.
Is the number of clicks/views on a posting taken into account when prices are
changed daily?

~~~
dave5104
No systems I've seen use that specific data point. Sadly for communities (and
probably good for prospective renters), the third party sites you might find
listings on (Craigslist, Apartments.com, ForRent.com, etc.) don't share that
data that I've ever seen.

However, demand, which is usually a variable in the pricing equations, will
definitely be generated by the lead tracking the community does, which can
include: # of incoming phone calls, # of walk-ins, # of incoming emails.

tl;dr: Clicking on the ads won't do anything. Taking action and reaching out
to the community can do something. And a lead is usually tracked on a unique
user basis--so one person calling 50 times vs. 50 people calling once is
vastly different.

------
timtk
Insightful read. Your mention of text mining in "Future Look" reminds me of
Levitt's findings regarding sales correlations with keywords in real estate
listings.

[http://www.nber.org/papers/w11053.pdf](http://www.nber.org/papers/w11053.pdf)

[http://freakonomics.com/books/freakonomics/chapter-
excerpts/...](http://freakonomics.com/books/freakonomics/chapter-
excerpts/chapter-2/)

