Hacker News new | past | comments | ask | show | jobs | submit login
[flagged]
jakequist on Aug 6, 2014 | hide | past | favorite



This post was killed by user flags.


I've just flagged this due to the mechanical '10/10 great job, would crawl again' comments popping up, though I'm not sure this is best practice as the comments are suspect, not necessarily the linked story.

Is there a better approach to use?


Actually the comments are a good mix of positive & negative.


Why pay for this when there are OSS/Libre tools that do this for the proper cost of $0?



Web scraping is just one example of what Zillabyte can do. The open source libraries out there can be a good solution, but our cloud platform lets you scrape the web at scale. That's the main value we can provide in terms of web scraping.

We also support analyzing internal data such as logs. We also provide a pre-scraped web corpus for our users to mine.


Here is another effective one-liner: wget --recursive --html-extension --page-requisites --convert-links www.randomwebsite.com


Zillabyte allows even fairly non-technical types like me to extract information from the largest repository in history -- the web. I'm a non-traditional user of the tool, but have found it's application for social science researchers to be one of adding an extra dimension to my information set. Thank you Zillabyte.


Zillabyte radically simplifies the process of setting up a web crawler. More and more data-minded developers and marketers are looking to create custom data sets from the open web and these kinds of components combined with ZBs infrastructure will make it much faster and easier to roll your own big data. One to watch.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: