Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Where do you get your financial data?
25 points by sdcoffey on May 5, 2022 | hide | past | favorite | 17 comments
Hey Hacker News,

We're working on an investment aggregator that tracks the value of our customers' portfolios over time. We've faced a bunch of challenges in getting public stock market data (live/historical prices, splits, etc). We're currently cobbling together a dataset from a few different sources (Polygon, IEX, etc), but it's been a massive pain*

I'm wondering if this is the case for other fintech devs. Does everyone face the pain of assembling their own financial datasets? Or do we have unique needs/a bad solution?

So HN, where do you all source your financial data?

* Our main challenges:

Data quality: - Ticker symbol changes (and CUSIP/ISIN changes/challenges) - Missing or wrong values for some days - Missing or incorrect splits Speed: - Tens of API calls that would be necessary to render one screen - Historical data syncing that would take days of API calls Burden: - Enterprise sales contracts instead of self-serve - Building and maintaining your own ETL pipeline to ingest data




You have to agree to serious licenses, and then you pay serious money for it.

Here's a good start to the list of vendors (HINT: click on "Data Dictionary" for things that look interesting):

https://wrds-www.wharton.upenn.edu/pages/about/data-vendors/

For linking across different data sets and tracking companies over their entire history (check out the video starting around 6 minutes in):

https://wrds-www.wharton.upenn.edu/pages/about/data-vendors/...

EDIT - FYI, WRDS is the financial data platform that almost all large universities use (worldwide). They handle all the management of it, and your school just has to write a few checks. Many of these data sets are licensed to all students, not just business school students. So if you are a CS student and want this kind of data for building an ML model or something like that, you should be able to get it by requesting an account on the WRDS page linked above. They might push back on you a little, in which case you'll have to go over to the business school at your Uni to get things ironed out. They have a non-techie-friendly interface, but also offer a Postgres interface so you can connect directly from Python or R or whatever with your account credentials.


Interesting! I'm no longer a student, so I couldn't get access, but it's cool to know that this exists.

Curious if you've used any of these WRDS data sets? Also really interested to know if you've used the postgres interface to this data, and how you liked working with it as opposed to a regular API-like interface. Thanks!


I have used some of those data sets.

This book demonstrates some usage with R and the WRDS Postgres connector (but obviously you can use anything):

https://tidy-finance.org/accessing-managing-financial-data.h...

PS - some of the data sets referenced in that book are freely available.


Everyone faces this pain. There is no single good source for all the data any nontrivial app needs. The main vendors are predatory and will change the goalposts to charge you as much as you can bear and more as you grow. In my experience this is a huge obstacle and probably a major reason there is limited innovation in anything like derived market data analytics. It’s not just you, it’s everyone.


FWIW, I went deep with Interactive Brokers for my last Fintech. It was the only place we could source real-time currency options data and we managed to reduce our total latency from market to database to about 10ms by using a NYC datacenter, which was enough for us. They had historic ticks too, but an inbuilt ratelimiter made it a multi-month project to pull serious volumes of data.

ib_insync was the python client library, and I OS'd the market gateway I built: https://github.com/dvasdekis/ib-gateway-docker-gcp


Multi-month, wow. Cool to know this exists though, thanks!


Interesting. I didn't think it is a challenge until I came across this discussion.

Recently I saw this project on HN: https://openbb.co/ Venturebeat press release states that "The platform gleans its investment data via publicly available sources, among others that require an API key — these include Alpha Vantage, Financial Modeling Prep, Finnhub, Reddit, Twitter, Coinbase, the SEC, and many more." I haven't checked their git for data sources.

My question then is how do you build a data provider? Where do these data providers take their data that they sell, like Bloomberg?


What asset classes are you looking for?

I’ve used Bloomberg Backoffice files in the past, and later went to work at Bloomberg to try and make that data more easily usable.

MarketQA had a product that can give you historical data as well, but tied more into the Reuters world.

Corporate actions are a complete pain, Bloomberg’s back office file data for the adjustment factors isn’t consistent with the data you can pull from a Bloomberg terminal.

The wider your coverage the harder it is to do this correctly.

If you then want historical intraday prices as well, this gets much more expensive and much more complicated to set up. My last job had an entire team trying to get all this right, and still got it wrong a lot.


We're mainly looking at publicly-traded US securities, including stocks, ETFs, and crucially, mutual funds. Many sources (looking at you, Polygon) don't publish price data for mutual funds. Haven't heard of MarketQA, will definitely check it out


I’ve never really looked at at Mutual Fund data, since personally I use ETFs, and professionally I never had to work with Mutual Funds.

I have no idea if MarketQA has that data. I imagine Bloomberg does, but I couldn’t be sure since I never looked.

Stockevents is an iOS app I saw in HN that from a quick glance at seems to have MF data. I wonder where they source it from.


Have you tried the obvious sources? Bloomberg, S&P, Nasdaq. Fintech isn't a cheap game. If you get your data from small data vendor startups, you risk getting poor fundamentals data. You don't want to waste time on debugging the calculations when you could be iterating. Some brokers like IB are also have a data vendoring side business.

If you are really tight on cash, try Intrinio. No idea about their quality but they have been around for a while.


It is indeed, which is why we stopped after window-shopping Bloomberg and Nasdaq. Seems like it's hard to have cost-effective, high-quality data with and API that's amenable to bulk backfilling.

We have actually looked at Intrinio! Specifically for options data. Again the problem was that the API is not setup for bulk, historical backfills.


No data vendor will give you bulk historical backfills cheaply because then they will be out of business.

I will give you some more names, go pouch a quant or one of their ex-data engineers and maybe you can learn more:

- Bloomberg

- Thomson Reuters

- FactSet

- Refinitive Eikon

There is a reason why so many fintechs are going crypto first. The underlying technology may not be sound but the open business model and accessibility makes innovation a lot easier than dealing with old school financial gatekeepers.


I’ve tried many APIs and there’s always gaps in the data. I was working on a stock market API to scratch an itch (hotstoks.com) but now it’s in stealth mode just because all the data issues I was having. I’m using IEX and Yahoo finance.


Check out Pyth, if it has the data you’re looking for, it should be veri high quality and timely.

https://pyth.network/


Bloomberg Terminal. Free account. For life. God help me if I ever lose my login.


Bloomberg Terminal




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: