
My Hacker News firehose is flowing again - davewiner
http://scripting.com/stories/2010/11/09/myHackerNewsFirehoseIsFlow.html
======
michaelhart
Developers themselves could also simply scrape this data as needed...

    
    
      preg_match_all('%<td>.*?id=up_([0-9]{5,25}).*?<td class="title"><a href="(.*?)".*?>(.*?)</a>.*?</td>.*?score_[0-9]{5,25}>(.*?) poin.*?user\?id=(.*?)">.*?</a> (.*?) ago%s', $html, $result, PREG_PATTERN_ORDER);
    

I'm working on an interesting app now that I plan to release later today using
that same regex. (PLEASE NOTE: I am not a regex God, so I KNOW that it can
probably be optimized/simplified significantly... But it works great :D)

Simply curl the ./newest page and apply that. The output is a beautiful array
that can be manipulated at your pleasure.

This method is also impossible to "kill", as it is essentially normal traffic.
As long as they don't do a major redesign or block your user agent (which you
can easily change) (or possibly IP, but unlikely), this method works great.

~~~
mmastrac
I just use BeautifulSoup for my own personal HN/Proggit aggregator. It's not
that hard to do and I don't have to rely on someone else to keep it up.
Relying on someone else scraping for you means that you're dealing with your
own whimsical interests as well as someone else's.

FWIW, my personal aggregator is here: <http://progscrape.com>. It's not really
intended for public use, though a handful of people use it. I mainly use it to
output Atom feeds that I can use for sharing via Google Reader.

