Anything that is released at a certain time on a fixed calendar, you can bet that multiple parties are trying to scrape it as fast as possible.
If you can scrape this data( the easy part), put it in a structured format( somewhat hard) and deliver it in under a few seconds(this is where you get paid) then you can almost name your price.
It's an interesting niche that hasn't been computerized yet.
If you can't get the speed then the first 2 steps can still be useful to the large number of funds that are springing up using "deep learning" techniques to build a portfolio over timelines of weeks to months.
To answer the question of:
> Wouldn't this require a huge network of various proxy IPs to constantly fetch new data from the site without being flagged and blacklisted?
This is why I gave the caveat of only looking at data that comes out at certain times. That way you only have to hit the server once, when the data comes out, or atleast a few hundred times in the seconds leading up to the data's release:)
What types of data formatting are you talking about here? Would it require a unique template for each individual site?
Wouldn't this require a huge network of various proxy IPs to constantly fetch new data from the site without being flagged and blacklisted?
Or are you referencing from the time you scrape data to deliver it in under 3 seconds?
"Speed Traders Get an Edge" - Feb 6, 2014 - http://online.wsj.com/news/articles/SB1000142405270230445090...
"Firm Stops Giving High-Speed Traders Direct Access to Releases" - Feb 20, 2014 - http://online.wsj.com/news/articles/SB1000142405270230377550...
edit: replaced mysql with myself
That's quite an assertion. I'm certain it has been.