Hacker News new | past | comments | ask | show | jobs | submit login

This does look really interesting for research and discovering conten. But I'm not sure how good a replacement it would be for more generally scraping content.

Firstly, if you are scraping you would generally only be targeting a specific list of sites, and you'd want to make sure you were getting the freshest content - which means going straight to the source.

Secondly, while plenty was shown around metadata, there wasn't much shown about extracting actual content. I had expected it to be some kind of clever, AI-hype product that extracted semantic data, but it appears to be much more rudimentary than that, effectively letting you query the DOM with SQL.

I don't mean to hate on it - this really does look interesting - I'm just not convinced there is any real value over existing (or custom) scraping tools.




It would be great if they allowed people to write custom views for a certain group of pages, and allowed them to be run and indexed by default. Then you could create, for example, an Amazon item page view that scrapes price and description, and reviews, and quantity, and seller and all that shit and it would be scraped and indexed for you. They could make it optional and make it default only when the view becomes popular based on their own stats. How awesome and useful would that be?


And if this was centralized, everyone would benefit, since, say, amazon would only get indexed by this service, rather than thousands of individual companies with their own bots doing similar things.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: