Hacker News new | past | comments | ask | show | jobs | submit login
Craigslist posts shown on a map while viewing Craigslist housing (chrome.google.com)
93 points by welder on Aug 1, 2012 | hide | past | favorite | 72 comments

In the end, client side scraping of the data actually increases the traffic on CL's servers, because multiple people have to scrape the same posts, rather than a central server. If padmapper remained small, then they probably scrape more posts than they have unique views. Of the 100 they scrape, maybe only 10 are seen. As padmapper et al grow, it may reduce the bandwidth on cl.

In 10 years, if all clients looking for apartments use a plug-in however, now you have cl paying for a lot more bandwidth than outsourcing this functionality to a centralized service, because for every 100 posts they scrape, they may only really look at one and they want the ability to show thousands on a single map.

I really don't understand why cl doesn't just do it themselves. It's not hard. I've built these maps countless times on top of all the popular mapping services and using different geocoders.

The hardest part is figuring out what the poster means in their location: where is "forth and main" [sic] or do you mean "East 4th St at Main St?", but I can see a simple address validator that updates a map with their location until they get something that will geocode correctly. That's the best solution and everyone wins here, except padmapper and the like, but there are other sources of data to map that aren't from craigslist too.

It's really trivial. The only other issue is cost, but those can be covered easily with all the money CL leaves on the table.

> client side scraping of the data actually increases the traffic on CL's servers

Of course, but not in PadMapper's case.

"Padmapper gets its data from 3Taps, a site that collects and republishes Craigslist's content. 3Taps, in turn, says it gets its data from cached copies supplied by Google and Bing, rather than Craigslist directly."

Source: http://arstechnica.com/tech-policy/2012/08/craigslist-tighte...

I'm sorry, but you should really ask yourself if this is true. Bing maybe, as they can do what ddg does and pull a feed from Yahoo/Bing BOSS (maybe?) but to scrape Google would be a very bad and unreliable (and counter to the Google TOS) way of getting this information.

This whole situation is very sketchy, honestly.

Padmapper used to scrape Craigslist, but stopped after a cease and desist letter. While buying the data from a service that scrapes search engine caches may be unreliable, my understanding is that it's legally much more defensible.

Is that argument in consideration of the fact that padmapper started off by scraping CL directly and forwarding everyone back over to CL?

They used to do that. They were told to stop. They stopped. They found a new way that didn't affect CL bandwidth. So what they started off doing is not all that relevant to what they are doing now. However, what this Chrome Extension is doing is pretty much what PadMapper was doing... so it would follow that this Chrome Extension will get smacked. Or worse... the extension users each get smacked.

Not only that, but for every housing page with a list on it, the scraper is going to pull all those pages to parse the data and display on a map. I suspect a person who is doing a lot of apartment hunting with this extension runs the risk of just getting their IP address banned. I assume CL has that capability.

Since users will be browsing the CL website, CL will still control the distribution of their content. I'm betting tech details are just details as long as users are browsing the CL website.


I don't CL ever made the argument that the problem was traffic. The whole "we're not hurting your servers" thing seems like a strawman to justify violating CL's TOS.

CL repeatedly claimed that traffic was a major issue with scraping.

> I really don't understand why cl doesn't just do it themselves. It's not hard.

It's not hard to do a mediocre job, but it is super hard to build something really great.

And it's super easy to build something so shitty, that it'll drive away the users that you have now.

I think CL is worried that their bandwidth would start going down as people start putting their apartment listings in PadMapper instead of CL because CL doesn't have any apartment listings any more.

I've just pushed the full source code to github: https://github.com/alanhamlett/CLMapper

If you find issues or have feature suggestions, please add them and I will push fixes: https://github.com/alanhamlett/CLMapper/issues

This is not working for New York for me, didn't show up at all, but it was definitely working in Gainesville, FL (where I'm at now), popped up right away. Are there certain cities it will not function in, for whatever reason?

It works for: http://newyork.craigslist.org/hhh/ but newyork.craigslist.org has housing categories different from sfbay.craigslist.org and a Chrome extension's client script only runs on the pages which match the manifest.json: https://github.com/alanhamlett/CLMapper/blob/master/src/mani...

I'll add the newyork urls to the manifest.

Cool idea. Will be interesting to see what companies like Craigslist can/will do about this stuff. It's not like Craigslist is ideologically opposed to maps, they just didn't want them to be hosted on a competing site.

So I imagine they might not mind this at all..

I use a Chrome extension that automatically displays the embedded photos as thumbnails next to each listing. Per request, it probably increases the resource burden on CL's servers by one or two orders of magnitude. I can't imagine they are happy about that and keep waiting for the day when they implement some sort of rate limiting.

Or maybe it really is as they say, and they just don't want 3rd parties doing this sort of thing, even though it's far more efficient. Personally I'm skeptical.

What extension? I was just looking for something like this.

I agree a good work around for the padmapper problem, wonder how padmapper will react.

I'm the person who made PadMapper, I think this is great. I'm worried that CL doesn't distinguish, though. (I'm a bit confused as to why you think I'd have a problem with it, did you mean to type Craigslist there?)

I think some tend to think you are on a mission to prove a point or make money and have a popular site, slightly contrary to your stated position of just benevolently wanting to make craigslist apartment hunting better, so this could take your traffic and set you off the mission they really think you are on.

Hey there. Somewhat offtopic, but: thank you for shaking up an old industry by creating Padmapper. Kudos.

What Sillysaurus said. I'm still confused by a) this is one area CL makes revenue b) Your service makes it more usable thus increasing demand = c) more revenue to offset any costs associated with load?

Thanks :-)

Much love for PadMapper. I'll miss it man...I even recommended it to my landlord so he can track neighborhood apartment prices better (for rent adjustment)

Haha cool. No need to miss it, it's still there.

ditto, PadMapper is the sh*t! It has saved me countless hours of apartment hunting. Thanks!

Thanks, and nice work yourself :-)

Because it turns your entire platform / service into a Chrome widget and eliminates the need for your product?

I think it's a cool solution. I want apartment hunting to be less awful, I don't necessarily have to be the one to do it.

Yep. While its cool, most people apartment shop for max a few weeks per year? I don't know about others, but I tend to avoid installing anything that has a single purpose and limited use, even if its free, because I don't like being cluttered with apps and widgets and extensions and the like.

If the general public is anything like me, a website is far more inviting than something like this.

Some people will use it. Maybe not as many as PadMapper, but I developed CLMapper with myself as the ideal user.

... eliminates the need for his product only by people who are into using Chrome plugins like that. The rest of the people looking for apartments will still have a need for his product. I wonder which user base would be bigger. I suspect it is not the Chrome Plugin users.

There is nothing that prevents this Chrome widget from becoming as threatening to CL as was padmapper.

http://housingmaps.com - the mothership - how do they still exist?

They aren't trying to use Craigslist to jumpstart their own business - they are simply offering a service to users, so CL is fine with them.

CL isn't opposed to these services existing on top of CL, it's opposed to them using CL to jumpstart their own business that is supposed to _replace_ CL.

Padmapper's eventual goal is to destroy CL - which, of course, CL is not going to go out of its way to enable.

Padmapper's eventual goal is to destroy CL

What? I don't think that's ever been Padmapper's goal.

It is the unavoidable end-game if padmapper is successful.

I think I have some say in the end-game for PadMapper, no?

http://www.padlister.com/ - "PadLister is a full suite of tools for renting"

Are you seriously telling me that you're not trying to compete with Craigslist in the Apartment Listings space?

All incentives, evidence, and your actions point in this direction. The only dissonance is with what you're saying, not with what you're doing.

Even if you mean it now (in context, this seems unlikely), incentives have a way of wearing down your resistance. If PadMapper actually succeeded in supplanting Craigslist as the primary destination for apartment ads, you'd be outright foolish to not start accepting ads directly yourself ... ___which you already do, through PadLister.___

So how are you not aiming to surplant Craigslist, exactly? There's nothing wrong with competing with them, but it's insanely disingenuous to claim that you aren't.

Yeah, that might be the issue, and I'd strongly consider shutting it down if that would make working with CL possible. I disagree that competition is inevitable, though.

> Yeah, that might be the issue, and I'd strongly consider shutting it down if that would make working with CL possible.

Carrot and stick. That's not a highly principled stand.

> I disagree that competition is inevitable, though.

You're doing this out of the pure altruistic goodness of your heart?

All the incentives point in a single direction. There's nothing wrong with competing with Craigslist. You'd be foolish to not take a piece of the pie -- if you can. Likewise, Craigslist would be idiotic to give away the valuable data that you need to take a piece of their pie.

I'd need to be able to talk with them to see if it's actually the thing they have an issue with before I would nuke that project. And yes, I started and continue this project because I wanted to solve a problem, and I find it fulfilling. Money isn't a strong motivator for me.

" Money isn't a strong motivator for me."

Please don't say that - it is disingenuous. Money is a strong motivator for 99.99999% of the world. Money lets you do things you otherwise could not do without that money. It's reasonable to say things like, "Money isn't the only motivator for me."

Sorry, that wasn't specific enough, money past my current point isn't a strong motivator for me, since I have relatively cheap tastes and low expenses. Even below market salary for an engineer, being childless, I have such low expenses that I can afford to go on trips and do the other things I want without needing to make more. Obviously, if I couldn't afford good food, I would care a lot more about money and less about doing stuff I enjoy doing. And of course, that might change over time if I ever wanted to buy a house or car or had kids.

The whole notion that successful competition ends in destruction is pretty...wrong. This isn't a football match where team A needs to lose in order for team B to win. Competition is healthy and normal, and multiple organizations can thrive in the same space.

Networks effects suggest the market will coalesce around a single free provider of classified listings. If (when?) Craigslist goes down - it will be fast, and very, very destructive. One moment they have 99+% of the Bay Area rentals, a month or two later they'll have close to 0%.

unavoidable? It would be very easy for padmapper to be successful and not destroy CL. If padmapper continues to pull search engine cached listings from sources that don't increase CL bandwidth, then displays them in a really nice UI and also links them directly to the CL listing page... that is successful. And CL is no worse off. CL is being a jerk about this. This Chrome extension is just going to increase CL bandwidth and probably get people IP banned... and very likely further piss off an already irrational entity.

If, if, if.

If Padmapper succeeds in becoming the primary target for traffic, and Padmapper has the ability to post listings (which it already does!), then Padmapper will innately be supplanting Craigslist.

It's unavoidable because the incentives are extremely strong, and doing so would be a natural, easy progression. That's just how humans work.

well... technically it is PadLister that has the ability to post listings. Yes... same guy but different site. I would imagine that PadMapper runs PadLister listings along side CL listings. Which is a smart move for any aggregate service like PadMapper. PadLister is really the CL competition.

> PadLister is really the CL competition.

You said it yourself: same guy.

i'm still not seeing the difference between what they are doing and what padmapper was doing.

Housing Maps doesn't have any intention of replacing CL. CL is required for Housing Maps to exist. Padmapper was very clear in its intent to eliminate craigslist eventually, even if they made no explicit statements stating that was their goal. Anybody with even the slightest modicum of business development experience could explain what the eventual trajectory would be for these two organizations.

HousingMaps has zero desire to be anything other than a Google Maps interface to craigslist. That's it. Nothing more.

The instant they start moving out of that space (going commercial, accepting rental submissions, etc...) - they'll get shut down and/or hit with a C&D just like padmapper.

Local client-side scraping of CG to bring about similar results to Padmapper - I like it.

Noting that CG has recently gone on the offensive, I wonder where they would draw the line. At the moment, everything is done locally and it doesn't look like the extension is communicating with any central source. What if there was a central server that aggregated the results of all the distributed scraping to cache results and a) display them more quickly to users, and b) reduce the number of hits to CG?

Would CG rebuke the extension b/c its communicating with a central server and sharing CG's data in a manner not controlled by CG?

I had the idea that browser clients could scrape content as they browse then send that anonymously to a central system that could be used by all, but with the copyright issue now in play it seems that it would give CL due process to shut down such a service.

That's essentially what NotifyWire.com did and got a C&D.

Oh well, good to know the idea was put into practice at least.

It'll be interesting to see how CL views this in light of section 5 of their TOU:

    Any copying, aggregation, display, distribution, performance or 
    derivative use of craigslist or any content posted on craigslist
    whether done directly or through intermediaries (including but not
    limited to by means of spiders, robots, crawlers, scrapers, framing,
    iframes or RSS feeds) is prohibited. 

Would CL really want to sue its users though?

Isn't a user's browser technically an intermediary? The browser reads html from an initial request and makes automated additional requests for images, javascript and css files, etc and then renders the page using its own rendering engine. The only difference here is that the browser is making multiple requests and using a different rendering engine (the extension) to display it using a map.

There's an obvious market for people who want the amount of data that craigslist has, but wanted it displayed in a smarter way that allows better filtering and searching. They could make a killing licensing their data to 3rd party sites (or launching a new site with an improved interface). Instead they are giving users more and more reasons to switch to other services.

Plus he has open sourced it now https://github.com/alanhamlett/CLMapper

I don't think this is aimed at users at all but at what padmapper was doing - using an intermediary to copy CL data.

Neat. I have heard of CL C&Ding some extensions as well, but it may have just been a subset of extensions that were doing something they didn't like.

You mean like overloading their servers with needless scraping?

There's no reason to assume this will overload CL servers or CDN.

FYI, http://www.housingmaps.com/ is still a decent CL housing + Map mashup

Would be great if it worked for cities outside of the US.. the ones I tested (even in English) did not. Map stayed centered on California.

Sorry, I made this in 24 hrs. That's one of the things that will be fixed soon.

No need for apologies. Craigslist should've put out a plugin like this, that way they can keep their site simple, but also give it some boost for people that want it.

I was very close to making a bookmarklet / extension like this, but I'd rather see padmapper succeed.

I like this. It's interesting that a viable solution to the Craigslist third party problem is to just make everybody scrape Craigslist. From a purely technical standpoint, that seems rather unfortunate, but I'd rather have a local scraper + Padmapper UI than have no Padmapper at all.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact