No exactly an "open problem" (as in science), but an underlying problem is that there is no "google for finance". All financial data is scattered over different places, some of it not openly accessible, in crude formats, etc.

Recently mentioned on HN: https://www.tiingo.com/welcome

And of course https://www.quandl.com/

Thanks, I didn't know those yet.

IMHO financial data is scattered for two primary reasons:

1) "Finance" is extremely fluid because of the speed with with transactions are conducted...(e.g., the stock market)...during the course of a day's trading market reports are given minute by minute, but the most useful "summary" of "performance" is often available only at the end of the day when the market closes...

2) Entities derive a capitalistic advantage by protecting their own data, until, and unless, they're given sufficient incentive to share it...

To me this implies that a system sufficiently complicated enough to provide "meaningful" data captures might have to be nearly as complicated as the field of finance itself...the best we seem to be able to do just now is provide snapshots--AKA, the "leading indicators", etc...

3) Data is protected/hoarded/scattered because for regulatory reasons. Some countries (LU and CH come to mind) don't allow "their" financial data to be stored abroad, for instance.

4) There are many (MANY!) financial standards and formats. But at the end of the day, a payment is a payment. There are only so many different kinds of financial instruments and transactions.

There's a lot of noise, but the signal is still there. And solutions exist to make sense of it all.

Good points...

The "signal" is there, I agree...I used the word "snapshot" and I'm guessing that we mean something similar...

Global finance is a moving target...gleaned information is useful one minute, sometimes meaningless an hour later...

Agree 100%. Good data is very expensive and even then many times needs additional cleaning. Public data sources are for the most part unusable.

what do you mean when you say that public data sources are mostly unusable? what public data sources are there and what renders them unusable?

There are a couple of "googles for financial transactions" out there: http://www.intix.eu

