

My project: CSVGet -- Get structured data from sites as CSV - fizx
http://github.com/fizx/csvget/tree/master

======
mc
The real magic behind this gem is another one called Parsley, which is an
astounding library for screen scraping.

<http://github.com/fizx/parsley/tree/master>

They've even built an entire app on top of the library:
<http://parselets.com/>

------
tectonic
Use <http://selectorgadget.com> to help make the selectors, too.

~~~
vbar
But why mess with the selectors by hand? <http://search.cpan.org/~vbar/HTML-
ListScraper-0.05/> can discover the structure automatically (well, sometimes
:-) - some pages just aren't regular enough, but it does work on HN, for
example)...

------
sgrove
Wow, incredibly cool. I did the same thing with collection of curl / grep /
sed / awk, and it was awful. I later redid it with some python library, and
then with hpricot, and most recently with scrubyt. Each step was a little bit
better, but I really should have been looking towards making a more
generalized solution like this.

Well done!

------
skorgu
A flag to output JSON would be nifty, then you could pass it through a filter
[1] and get interweb-grep on steroids.

[1] <http://goessner.net/articles/JsonPath/>

~~~
fizx
Um, you got it. :) I just added a "jsonget" binary to the package.

<http://gist.github.com/177304>

~~~
skorgu
Now that's service!

------
Aschwin
Oh my, this rocks. What would be even cooler is a library to use these
functions. That would beat me to it, since I was thinking about a general data
interface for websites too. If I could implement your effort into my code
(PHP) that would be so awesome. Anyhow, nice idea though.

~~~
fizx
<http://github.com/fizx/parsley/tree/master> is a C library which represents
the core of the parsing functionality. A PHP binding is quite possible, and
indeed, there already are bindings for Python and Ruby. In fact, I might just
go look at how PHP bindings are done in general...

------
hadley
This is fantastic. I do a load of data scraping of the web and this (plus
selector gadget) is going to make my life sooooo much easier.

