

Ask HN: Why not build a library that scrapes Twitter's DOM into our own API? - benguild

OK, so there are multiple things to consider here:<p>1) Users provide all content to Twitter. Without the users, Twitter is nothing.<p>2) If users cannot use their favorite clients, then what is Twitter if they're restricting the free-flow of content that generates this so-called "firehose"?<p>3) Twitter's content is not particularly challenging to analyze. The fact that it is 140 characters of text should make it fairly scrapable.<p>---------------<p>Strategies:<p>a] Depending on what the developer wants, there are multiple approaches. The first would be using cloud instances or even the third-party Twitter clients themselves to distributively spider or scrape Twitter constantly and provide a third-party "stream" or unofficial index.<p>b] If it's just the user's timeline the client wants to access, this can just be done through any HTTP connection as long as the library is up-to-date enough to properly decode the HTML and DOM elements into the previously readily available JSON elements.<p>Although the API is a convenience, it almost seems like a non-issue to access content that is publicly available or available when the user logs in if the user is willing to hand over their credentials. I agree it's a nuisance, but for read only access it shouldn't be terribly difficult to accomplish, and for write access it'd be even easier given it's HTTP POST and AJAX calls.<p>Why not build a replacement "API"?
======
dangrossman
> Why not build a replacement "API"?

Because what you're suggesting is entirely illegal, and playing with Twitter
is not worth going to court and bankrupting yourself over.

It's not just civil either. Remember the woman that made a fake MySpace
profile to instigate a girl to commit suicide? She was successfully convicted
under the federal computer crimes laws; once you violate a site's terms, your
use becomes unauthorized access to a computer system.

