

Ask YC: How to web scrape 100s of sites/forms? - bdouglas

hi...<p>a potential research app requires the scraping of a few hundred websites/forms and diving into the child links to obtain the linked/parent structure ie. company-&#62;dept-&#62;title-&#62;name.<p>in this case, this would involve going 4 levels deep, and getting the required information.<p>so, does anyone know of a method/app/company that can be used to accomplish this. orm am i going to have to figure out how to get a number of cheap guys to write a bunch of python scripts!!<p>thanks
======
nreece
_...cheap guys to write a bunch of python scripts_

You know what will be 'cheap'. Writing it yourself.

------
qhoxie
Libraries like mechanize and hpricot are shrinking the curve for scraping
tasks. That's not to say it is easy, but it should not take a bunch of people
working on it. One good developer with proper experience would be ample in my
opinion.

------
olefoo
Or get one expensive guy to write you a script that

writes the scripts to scrape the sites by scraping the sites to read the
structure to write the scripts to scrape the sites.

------
Anon84
Check out the search.wikia.org project. They make their crawler (and crawl
data) available. Maybe you can get away with using theirs. That would _really_
be cheap!

------
gaius
I am unable to think of an application for this technology other than
spamming. Care to provide more details before we shoot ourselves in the face
by helping you?

