Crawly – Never write another web scraper

chejazi · on March 23, 2016

0. Arrived at site

1. Entered HN URL

2. Entered email reluctantly

3. Checked email, nothing received

4. Started writing comment on HN, frustrated

5. Checked email again, still nothing

6. Posted comment, disappointed

brianwawok · on March 23, 2016

I think you just got harvested!

ki85squared · on March 23, 2016

Link to more of an overview / documentation without having to fork over an email address? Silly.

dwynings · on March 23, 2016

DrScump · on March 23, 2016

I got a reply within 3 minutes.

The email said, "We set our crawler loose on <site>, and WOW did we find some interesting results."

The resulting CSV file? 0 bytes. I guess that's "interesting" in its own way.

dwynings · on March 23, 2016

Sorry about that! If you let me know the result id, I can take a look at what happened.

pink_dinner · on March 23, 2016

This won't work when I need to scrape 100,000 pages in an hour.

cat-dev-null · on March 23, 2016

AWS FTW

edoceo · on March 23, 2016

CloudScrape works

cat-dev-null · on March 23, 2016

www.ncbi.nlm.nih.gov };)