Because scraping data sucks, occasionally has compliance concerns, and is a different core competency from trading. They would rather offload all of the bullshit involved in maintaining a robust scraping operation than pay their research team to do it.
Time spent on maintaining a scraping operation is time taken away from optimizing your ETL process and producing actionable research for your trading team. You know how people pay to have their pipes unclogged even when they know how it's done? Same idea.
If all it is some data scraped off a few web sites that they could get an intern to do in a week or three, then it's unlikely to be valuable enough for them to pay you a substantial sum of money.
The most valuable data is data that is difficult to gather. Think things like proprietary (i.e. unpublished) industry data. The canonical "sexy" alternative data set sold to hedge funds is counts of cars in retail parking lots from satellite photos.
> If all it is some data scraped off a few web sites that they could get an intern to do in a week or three, then it's unlikely to be valuable enough for them to pay you a substantial sum of money.
If the data is compelling and clearly correlates to earnings KPIs, I can tell you from experience that "some data scraped off a few websites" can be salable to the tune of $50,000/quarter. Hedge funds will frequently choose to pay that instead of setting up their own scraping operation, because scraping sucks.
Not all hedge funds, of course. Some do actively try to reverse engineer your dataset. But you probably don't want to work with those anyway.
> The canonical "sexy" alternative data set sold to hedge funds is counts of cars in retail parking lots from satellite photos.
I've personally developed forecasts from (ostensibly) public, scraped sources which beat drone and satellite footage of manufacturing facilities. That one Bloomberg article is not representative; satellite footage sounds sexy but it's not what most alternative data looks like.