

Wiki for tools to scrape website data - araneae
http://scraperwiki.com/get_involved/

======
tzury
<http://theinfo.org/> is a good one as well (by Aaron Swartz)

get | process | view

get (<http://theinfo.org/get/tools>)

process (<http://theinfo.org/process/tools>)

view (<http://theinfo.org/view/tools>)

------
WalterGR
Previous discussion, with 39 comments:

<http://news.ycombinator.com/item?id=1584597>

------
mdaniel
If you have not yet started the tutorial, click on at least the first one.
It's snazzy the FireBug-ish console action they have going on there, and the
Bespin/Skywriter editor action, too.

If this doesn't show what the next generation of web-app looks like, I don't
know what would. It remains to be seen, however, how that model holds up to
"real" work - which is the same concern I had about Bespin/Skywriter.

~~~
jerhinesmith
The tutorial is certainly nice; however, I'm guessing that running the
following snippet in the ruby tutorial:

    
    
      output = `cd \\etc && cat passwd`
      puts output
    

shouldn't actually be returning the contents of the passwd file.

~~~
alvinliang
I reported this to their staff, hopefully they will fix it soon.

------
mike4u2
Interesting. Can it be used to post and request iMacros macros too?

If so, I can donate at least 10 web scraping macros right away.

PS: The software I refer to is <http://wiki.imacros.net/Data_Extraction> (open
source and commercial web scraping browser addon)

------
jdee
Had a play with it and it's lots of fun. I scraped the Hacker News headlines.
Is mucking about with HN the new 'Hello World'?

Saw the chaps behind it pitch at Software City in Liverpool this week. Good
guys and lots of potential for use in journalism, government and beyond. I
think its partly open source too which is always a bonus.

------
tcarnell
Nice project! You might cQuery useful for more complex content extraction:

<http://cquery.com/>

