
Scrappy - Simple Perl Scraping Framework - alnewkirk
http://ana.im/press/2010/09/scrappy/
======
mwexler
Most scrapers seem to break down on heavy dynamic/ajax pages. For example,
anything made with GWT appears to provide little for the average scraper to
grab (say, for automated daily tracking of android app downloads, for
example). Short of reversing out the foreign pages api calls, has anyone
encountered a solution to do more processing and then scrape the rendered
page?

(well, and short of using Selenium to script a login, then scrape the rendered
page via a controlled firefox... which works, but is clunky)

~~~
arkitaip
I was looking at several scraping solutions (e.g. imacros, selenium) that can
handle DHTML for a project and they all have significant performance issues
since they need to render the actual pages before processing them. A couple of
thousands or rows isn't a problem but try anything more and you got a real
performance bottleneck.

~~~
odonnell
DHTML is server-side. You mean AJAX. Also, think of the page as an interface
to a more lightweight web service. You should probably be parsing that
directly.

~~~
thirdusername
He's referring to this: <http://en.wikipedia.org/wiki/Dhtml> I'm not sure what
DHTML you are thinking of that would be server-side.

~~~
odonnell
Fuck, thinking of SHTML for some reason.

------
bravura
At the very least, Scrappy (Perl) should link to Scrapy (Python).

Otherwise, it seems remiss of this developer to pick a name that is easily
confused that of an existing open-source project with similar purpose.

~~~
jonathansizz
Oh, at the very _least_!

Then the module author should prostrate himself at the feet of the all-mighty
Python community, trembling in unreserved awe whilst acknowledging that the
world quite deservedly revolves around _them_!

Module authors of other (obviously inferior) languages, take note: always
check if there's any Python code with a similar name _before_ you choose a
title for your project, to avoid embarrassment!

~~~
devinj
I don't think that it was Python had anything to do with it. It's sort of like
me calling my new programming language made for sysadmin work and
bioinformatics "Pearl". It's just a confusing name considering that there's
another programming language that's used for the exact same things, and
spelled and pronounced identically.

Now imagine if I said on my website, "By the way, feel free to call it Perl,
just be aware that there's also another project that calls itself that". Now
I'm giving permission for people to be even more confusing!

~~~
benatkin
This isn't a great comparison. Perl is huge compared to these scraping tools.
With how the name "Perl" came to be, it's not likely that two people
developing a similar kind of thing would independently come up with a similar
name. It's quite common to add a _y_ to a concept to come up with a name (or
the last consonant and y, like scrappy, or ly).

