Hey Hacker News,
We're working on an investment aggregator that tracks the value of our customers' portfolios over time. We've faced a bunch of challenges in getting public stock market data (live/historical prices, splits, etc). We're currently cobbling together a dataset from a few different sources (Polygon, IEX, etc), but it's been a massive pain*
I'm wondering if this is the case for other fintech devs. Does everyone face the pain of assembling their own financial datasets? Or do we have unique needs/a bad solution?
So HN, where do you all source your financial data?
* Our main challenges:
Data quality:
- Ticker symbol changes (and CUSIP/ISIN changes/challenges)
- Missing or wrong values for some days
- Missing or incorrect splits
Speed:
- Tens of API calls that would be necessary to render one screen
- Historical data syncing that would take days of API calls
Burden:
- Enterprise sales contracts instead of self-serve
- Building and maintaining your own ETL pipeline to ingest data
Here's a good start to the list of vendors (HINT: click on "Data Dictionary" for things that look interesting):
https://wrds-www.wharton.upenn.edu/pages/about/data-vendors/
For linking across different data sets and tracking companies over their entire history (check out the video starting around 6 minutes in):
https://wrds-www.wharton.upenn.edu/pages/about/data-vendors/...
EDIT - FYI, WRDS is the financial data platform that almost all large universities use (worldwide). They handle all the management of it, and your school just has to write a few checks. Many of these data sets are licensed to all students, not just business school students. So if you are a CS student and want this kind of data for building an ML model or something like that, you should be able to get it by requesting an account on the WRDS page linked above. They might push back on you a little, in which case you'll have to go over to the business school at your Uni to get things ironed out. They have a non-techie-friendly interface, but also offer a Postgres interface so you can connect directly from Python or R or whatever with your account credentials.