Hacker News new | past | comments | ask | show | jobs | submit login
Crawly – Never write another web scraper (diffbot.com)
9 points by rezist808 on March 23, 2016 | hide | past | favorite | 10 comments



0. Arrived at site

1. Entered HN URL

2. Entered email reluctantly

3. Checked email, nothing received

4. Started writing comment on HN, frustrated

5. Checked email again, still nothing

6. Posted comment, disappointed


I think you just got harvested!


Link to more of an overview / documentation without having to fork over an email address? Silly.



I got a reply within 3 minutes.

The email said, "We set our crawler loose on <site>, and WOW did we find some interesting results."

The resulting CSV file? 0 bytes. I guess that's "interesting" in its own way.


Sorry about that! If you let me know the result id, I can take a look at what happened.


This won't work when I need to scrape 100,000 pages in an hour.


AWS FTW


CloudScrape works


www.ncbi.nlm.nih.gov };)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: