Hacker Newsnew | comments | show | ask | jobs | submit | ssalevan's comments login

Common Crawl has a really neat mission, as there isn't a whole lot of free and open data out in the world right now and they're trying to change that. With this donation it looks like their commons will be augmented with some great stuff and that can only mean awesome things.

-----


That's an interesting idea, and I dig using a fully open stack; we'll consider adding it into our next howto!

-----


Hey Curt, Most of my own runs, using the 2 small VM default, resulted in 3 normalized hours of usage, which equated to around 25 cents per run.

-----


that's for the crawl sample, not the entire 4TB index, right?

how much data was that?

-----


That was just for the crawl sample, yes, and was approximately 100M of data, though you can specify as much as you'd prefer.

The cool thing about running this job inside Elastic MapReduce right now is the ability to get at S3 data for free, and for cost of access outside of it, both pretty reasonable sums. Right now, you can analyze the entire dataset for around $150, and if you build a good enough algorithm you'll be able to get a lot of good information back.

We're working to index this information so you can process it even more inexpensively, so stay tuned for more updates!

-----


How is the $150 broken down?

-----


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: