

News.YC in Polar Coordinates - nostrademons
http://labs.diffle.com/ycpolar/

======
nostrademons
This was something that John Yu (yubrew) and I hacked up during DevHouseBoston
this weekend. It scrapes the front page, figures out which stories have
commenters in common, and plots it in polar coordinates. The radial coordinate
is the total number of comments, on a logarithmic scale (with the inner 1/3 or
so reserved to reduce clutter). The angular coordinate attempts to cluster
together stories with similar commenters. It updates every 6 hours, and
there's a delay while scraping to keep from overloading PG's server.

A lot of the features (animation, selection, title/domain swapping) are
basically gratuitous - I wanted to get more practice with JQuery since I need
that functionality in my startup. Hell, the whole thing is basically
gratuitous. But it was pretty fun to write.

John was working on a scraper for Reddit...that may be a bit more interesting,
since it updates more frequently and the larger audience means more posts in
common.

~~~
bootload
Nice hack nostro. What process did you use to scrape? Been working on a newsyc
scraper

\- extract & parse with python + beautifulSoup, display with YUI

\- parsing the RSS feed for user, story points, story title and date posted

\- then grabbing user details via homepage for karma, inception date

\- exporting to xml

\- skinning with YUI

Will display when ready, heres a screenshot ~
<http://flickr.com/photos/bootload/1400863977/>

~~~
nostrademons
Regexps. John tried a beautifulSoup parser, but it turned out to be more
trouble than it was worth. Insert joke about "now you have two problems", but
it works in a pinch, and it really only had to last till we showed off our
projects at 8:00 PM ;-).

I don't mind sharing the code if PG doesn't mind the potential flood of screen
scrapers. It just handles the front page and comment page for individual
articles, and it's pretty compact - only 60 lines including doc comments.

~~~
bootload
_"... Regexps. John tried a beautifulSoup parser, but it turned out to be more
trouble than it was worth ...'_

Took me a bit of mucking around to get it working. My trick was just utilising
the FOX HTML Validator to find the page structure, then CUT+PASTE the text
into IDLE and call up BS in IDLE, then write the BS expression to oarse the
string till I had the right data.

 _"... I don't mind sharing the code if PG doesn't mind the potential flood of
screen scrapers. ..."_

A better suggestion might be supply the raw data you collected as a service
and let others do what they want with it. Most just want the data and are not
particularly interested in the HOW. If they are interested in the HOW then
it's better to point them to some scrapers to play with.

------
Kaizyn
This is awesome! You may be able to improve it further by considering other
things besides common comment posters to determine the angle measure for a
story. I haven't seen this good of a web information graphic since the treemap
used at "newsmap" found here:
<http://www.marumushi.com/apps/newsmap/newsmap.cfm>

------
khoerling
This is a hot new view.

