
Ask HN: Created Scraping SaaS, best way to avoid misuse? - agibsonccc
In light of the last few days, and all the scraping topics popping up. I figured I&#x27;d ask how to approach this.<p>I created a service that&#x27;s in very early alpha   that&#x27;s able to use lots of different techniques (pattern matching, NLP,javascript) to get data out of webpages.<p>For data gathering, I planned on having an interface to different kinds of APIs, along side the scraping, that users might need to do mashups.<p>It&#x27;s already paying the bills for me, so I&#x27;d like to start scaling this out now. There can be a lot of potential for misuse here.<p>Besides the obvious things like robots.txt, naming my user agent, and users  being able to opt out of things like emails gathered by the system and throttling. What other things should I should be aware of?
======
thecommentator
I'd be interested in using this type of service; I've tried a few but they
were mostly too complicated to use, failed to work as advertised, or cost more
money then hiring someone off vWorker to write a specific one-time solution.

~~~
agibsonccc
That's what the NLP is for. It's a three step process: Specify what you want,
where you want it from, come back later and download your spreadsheet.

I planned on adding more advanced features later for users to specify css
selectors and stuff.

The main ability of this is a preview and adjust feature I created in there
that highlights the data it "thinks" it would grab for each website.

I took a lot of notes as I was creating this.

------
volokoumphetico
what's the url?

