Hacker News

dang · on Aug 6, 2014

This post was killed by user flags.

tombrossman · on Aug 6, 2014

I've just flagged this due to the mechanical '10/10 great job, would crawl again' comments popping up, though I'm not sure this is best practice as the comments are suspect, not necessarily the linked story.

Is there a better approach to use?

jakequist · on Aug 6, 2014

Actually the comments are a good mix of positive & negative.

forca · on Aug 6, 2014

Why pay for this when there are OSS/Libre tools that do this for the proper cost of $0?

rb2k_ · on Aug 6, 2014

The example e.g. uses Anemone (https://github.com/chriskite/anemone):

https://github.com/zillabyte/domain_crawl/blob/a22e1fe338e28...

jakequist · on Aug 6, 2014

Web scraping is just one example of what Zillabyte can do. The open source libraries out there can be a good solution, but our cloud platform lets you scrape the web at scale. That's the main value we can provide in terms of web scraping.

We also support analyzing internal data such as logs. We also provide a pre-scraped web corpus for our users to mine.

alexthedesigner · on Aug 6, 2014

Here is another effective one-liner: wget --recursive --html-extension --page-requisites --convert-links www.randomwebsite.com

ianhathaway · on Aug 6, 2014

Zillabyte allows even fairly non-technical types like me to extract information from the largest repository in history -- the web. I'm a non-traditional user of the tool, but have found it's application for social science researchers to be one of adding an extra dimension to my information set. Thank you Zillabyte.

akosner · on Aug 6, 2014

Zillabyte radically simplifies the process of setting up a web crawler. More and more data-minded developers and marketers are looking to create custom data sets from the open web and these kinds of components combined with ZBs infrastructure will make it much faster and easier to roll your own big data. One to watch.