
Nasdaq Acquires Quandl to Advance the Use of Alternative Data - rainboiboi
https://business.nasdaq.com/mediacenter/pressreleases/1855930/nasdaq-acquires-quandl-to-advance-the-use-of-alternative-data
======
peterbraden
It's crazy how poor the financial data provider offerings out there are. Most
financial data is riddled with inconsistencies, wildly overpriced, and in
esoteric formats. Simply ingesting financial data in a reliable manner
requires significant engineering.

For something so important to the economy, it's amazing that there isn't a
better solution, or that an open standard hasn't been mandated.

~~~
erichurkman
For my current job, we wanted to get a mapping of stock tickers and exchanges
to CUSIPs. Every provider we looked at — and this is fundamental trade data —
was full of errors and missing values. Couple that with the extortion that is
CUSIP (you can't use CUSIP values without a license from them, and licenses
start at $xx,xxx+). It's criminally inept. And when you do fix it up, you
don't want to publish it, because you spent all your time and resources fixing
it… and it becomes a trade secret.

~~~
mlthoughts2018
This is why finance is lucrative, similar to esoteric codes in various types
of law. Nothing to do with math models or superior prediction, just paying for
someone else to fight through identifier hell, exchange protocol hell, etc.,
and be able to do some mickey mouse math at the end of it.

Honestly, this stuff is so bad that the headache of it might fully justify
huge finance compensation, and I’ve had colleagues who turned down huge
bonuses and raises to leave finance companies solely to avoid this type of
stuff and seek a career where the headaches bother them less and they are paid
less.

~~~
hendzen
Data cleaning/transformation ends up being a huge percentage of the work in
pretty much any real-world ML context I'm familiar with. Not unique to finance
at all.

~~~
anongraddebt
I come from the non-technical side of things. Do you know of any resources
that would cover this issue, but for someone on the business side?

Not an engineer, so while I understand this in a general/abstract sense, my
understanding is limited to, "Cleaning/transformation is messy and a time sink
due to non-standardization of data."

~~~
pmart123
One good example I uncovered a while back was that Bloomberg timestamped its
crude oil futures data by finding the last trade to occur in a given second
and rounding down. This means that the user of the data had no idea if the
price used on the 10:30:30 AM print occurred at 10:30:00.999 or 10:30:00.001.
Obviously, this could create problems if thought you found a lead/lag
relationship between say oil and oil stocks.

Similarly, say a vendor aggregated website visits/pageviews but didn't account
for the fact that 1/3 of the traffic was coming click-bots in developing
countries. If they presented you with the raw data you could figure it out and
filter those countries out, but if it is aggregated, you might not discover
the issue.

Then, there could be even simpler ones like determining the opening price for
a stock. If say the first print of stock XYZ trades 10 shares at a price of
$20, but a millisecond later, 100k shares trade at $20.11, which print should
you use in your simulation algorithms as the opening print?

------
monkeydust
Spoke to these guys a while back. Asked for examples of real alternative data
they had...one intertesting was flight data for private jets labeled against
which company owned them. Theory being if ceo of company x keeps visiting a
place near company y there may be an acquisition or merger in play.

~~~
jdironman
Out of curiosity, at what point could things be considered insider trading /
insider data?

~~~
claytonjy
In America, this kind of research in explicitly encouraged, and very much NOT
insider trading according to the SEC. Insider trading has to involved _theft_,
not just insider knowledge.

If you overhear someone talking about an impending acquisition in a coffee
shop, and you trade on that information, you're quite safe in the US. European
countries can and do consider that insider trading, though.

~~~
budu3
I thought that inside info was non-public material info and so air-traffic
data is not non-public per-se and so it's fair game. No?

~~~
throwawaymath
No. You can trade on material, nonpublic data as much as you'd like. Insider
trading is not illegal unless you're breaking a confidentiality agreement or
fiduciary duty.

If you manage to discover confidential data in a way that does not compromise
such an agreement or duty, you're fine. Obviously you should engage with an
attorney instead of taking legal advice from a random HN comment, but there's
really no issue with this. Information asymmetry is a fundamental part of the
market and not illegal on its own.

Source: I used to work in financial forecasting using significant amounts of
alternative data.

------
anonu
We're in the midst of a data gold rush. People who have data are struggling to
monetize it. If you're a data buyer, you're probably swamped with the quantity
and breadth of data providers out there. AI/ML techniques to make sense of
this data are still only scratching the surface. I think this is where there
is a lot of low-hanging fruit: creating services or tools that allow non-
CS/non-Quant people to extract insights from TBs of data...

On the exchange side: these guys are always on the prowl for hot new
properties to scoop up. The traditional business model of simply earning fees
on exchange trading is slowly eroding away (for the last 10 years). So they
need to branch out into services and other data plays...

~~~
inputcoffee
Alternative take: there isn't that much low hanging fruit there.

Hear me out.

"To the person who only has a hammer, everything looks like a nail."

The data in front of your is the data you want to analyze, but it doesn't
follow that that is the data you ought to analyze. I predict that most of the
data you look at will result in nothing. The null hypothesis will not be
rejected in the vast majority of cases.

I think we -- machine learning learners -- have a fantasy that the signal is
lurking and if we just employ that one very clever technique it will emerge.
Sure random forests failed, and neural nets failed and the SVR failed but if I
reduce the step size, plug the output of the SVR into the net and change the
kernel...

Let me put an example: suppose you want to analyze the movement of the stock
market using the movement of the stars. Adding more information on the stars,
and more techniques may feel like you're making progress but it isn't.

Conversely, even a simple piece of simple information that requires minimal
analysis (this companies sales are way up and no one else but you know it)
would be very useful in making that prediction.

The first data set is rich, but simply doesn't have the required signal. The
second is simple, but has the required signal. The data that is widely
available is unlikely to have unextracted signal left in it.

~~~
rademacher
Isn't there utility in accepting the null hypothesis? It's almost as valuable
to know that there is no signal in the data as there is in the opposite, i.e.,
knowing where not to look for information.

I think your example is really justifying a "machine learner" that has some
domain expertise and doesn't blindly apply algorithms to some array of
numbers.

~~~
whatshisface
I think his argument is that some null hypotheses can be rejected out of hand,
but that people are wasting time and effort obtaining evidence that, if they
had better priors, would be multiplied by 0.0000000000001 to end up with an
insignificant posterior. That's what the astrology example indicates.

------
Maven911
I've been researching this topic for some time now, alternative data, and not
surprised since Nasdaq is a large provider of software (e.g. market-making sw
amongst dozens of other sw):

QUANDL SPECIFIC: -Quandl has a pretty decent blog that I would check out, you
never know what new large corporate policy enacted might get rid of it:
[https://blog.quandl.com/](https://blog.quandl.com/)

GENERAL NOTES:

-More and more asset managers are using it and there is some worry that everyone is making the same conclusions off the same data set, and thus no money to be made. Though most practitioners say this is a none-issue, there is more and more alt. data sets out there to chose from, cleaning the data is tricky and testing the veracity of the data provided and knowing how to combine it with others sets is a key competitive advantage that not every asset manager is good at.

-The ROI is something that is top of mind but not always easily attributable throughout the year, e.g. one large insight very late in the financial year can bring +100x returns on what was paid for a data provider's software.

-Hugely successful funds like Renaissance's Medallion has likely been doing this for a long long time, coupled with top PhDs looking for a lot of statistical correlation with traditional data as well.

-More and more data sets that are being created and thrown into a self-learning financial model (aka AI) have a lot of people excited, and certainly there are a lot of small funds being created, though seems to be mostly by young people or not-so-great hedge fund managers. Getting large investors to lay down significant capital has a huge trust component to it, aka want to bet only on succesful grey-haired largely-male dominated folks -A lot of alternative data can be found directly from the Bloomberg terminal e.g. MAPS <Go> function. However my understanding is that it's not that deep, quality is an issue, and everyone has access to it (no real competitive advantage).

------
epapsiou
Any idea as to its valuation?

~~~
minimaxir
Given that its last raise was $15M (Canadian) in 2016
([https://www.crunchbase.com/organization/quandl#section-
fundi...](https://www.crunchbase.com/organization/quandl#section-funding-
rounds)), and I haven't heard anything about Quandl since then, I'm guessing
it's not a 10x exit.

~~~
pmart123
That's probably fair. Quandl started by offering the "everyday" investor API
access. I know the typical VC approach is to first get users and then scale,
but often in investing/financial data products, it seems better to price high
and then move down market. If you study the companies with the most success in
the past (Bloomberg, CapIQ, MSCI, Eze, Advent, Factset, Morningstar, etc.),
none of them started by trying to cater to the DIY investor.

------
kaybe
What is 'alternative data'? The text only says

> 'The company offers a global database of alternative, financial and public
> data, including information on capital markets, energy, shipping,
> healthcare, education, demography, economics and society.'

which doesn't really answer the question.

~~~
Desustorm
Alternative data is non-financial data which can be tied to various
securities.

Financial data, for example, would be EUR USD spot prices. Non-financial data
(i.e. "alternative data") could be healthcare reports which you could
theoretically couple to e.g. pharma stocks.

~~~
bostik
There are quite a few, and I can think of these off the top of my head:

\- Real-time weather data from major ports and across the main shipping lines

\- Telemetry from crop and soil report systems

\- Up-to-date satellite imagery of basically anything large under construction
(solar farms, factories, ...)

Provide information like that in a machine-readable, consistent format and you
have a business.

Btw... Using satellite images to track car manufacturers' inventory levels is
an old idea, used for more than a decade.

------
garysahota93
I had no idea Nasdaq did acquisitions as well. Maybe that's just the engineer
in me..

