
Fitting to Noise or Nothing at All: Machine Learning in Markets - bilifuduo
http://zacharydavid.com/2017/08/fitting-to-noise-or-nothing-at-all-machine-learning-in-markets/
======
dkural
Most academic CS literature is complete BS. The vast majority of papers fit
into a simple formula of "We apply method X to problem Y, and outperform other
approaches using approaches similar to method X".

Meanwhile, no one uses method X or any of its cousins, because in the real
world the problem is solved very differently with a combination of both
principled algorithms and heuristics derived from real-world datasets.

The paper also fails to give any theoretical reason or mathematical insight as
to why their version of X is better.

Thus, it doesn't actually solve a real world problem OR advance scientific
understanding.

~~~
self_assembly
I think this is a reasonable point, but I would just add that a lot of people
in CS academia are well aware of this. The problem is that we all serve
multiple masters and one of the things we have to do is publish frequently. I
think you'll find that many CS academics try to strike a balance between
publishing for the sake of publishing and actually working towards a larger
scientific goal. Personally speaking I'm definitely guilty of publishing work
that looks good on my CV, but does not advance my deeper scientific agenda.
That said the science is always on the front of my mind even if it only makes
up 10-20% of my actual publications.

In regards to solutions it would be great if we focused less on the frequency
at which we published and editors were more willing to publish work that had
novel ideas even if it did not have state of the art performance (yet).
Although like any job there will always be parts that are tedious, involve
politics and yes parts that are even counter productive. At some point as an
individual you just have to play the game while still thinking about and
trying to advance the bigger picture.

~~~
collyw
>Personally speaking I'm definitely guilty of publishing work that looks good
on my CV, but does not advance my deeper scientific agenda.

This sounds analogous to resume driven development. And I completely
understand it. I have pragmatically chosen the best tech for the job
(factoring in the learning cure for new tech and what I already know) and
haven't learned much new tech for the last couple of years. Now I get the
feeling my CV is looking a bit dated. Does it make me a worse at developing
software? I would say not. I have mastery in a few areas rather than shallow
knowledge in a lot. Will it make it harder for me to get a job? Quite possibly
if I keep going this way.

------
lettergram
I can't up vote this enough..

I've spend an unreasonable amount of time reading research related to finance
for my web app Piglet:

[https://projectpiglet.com/](https://projectpiglet.com/)

Long story short, all "research" is pretty much B.S.

I used to assume this is because people want to make money, so they keep the
good analysis secret. However, after working in the industry a few years; it's
mostly because they just don't know how to apply the algorithms or if it's
even possible.

I think my favorite example is the seminal paper on using twitter sentiment to
predict stock movement[1]. They don't use a large enough data set, and more
importantly they use granger causality to identify "casualty"[2] between
sentiment and stock value. They then claim they found a specific range which
has a p-value indicating they are correlated... Of course you'll find a
correlation when you look at two normalized signals and try to match them up.

Now, if they had not use the DJIA (Dow Jones Industrial Average) and instead
used 500 individual stocks, and found the sentiment on twitter correlated with
stock value(s) 90% of the time between 5 and 10 days. I'd argue they probably
have something.

However, because their method is literally a BFS on only two signals in an
attempt to find a correlation, they must correct for the p-value. i.e. "look
and you shall find"[3]

This is just one of the hundred issues I've found, but really sheds light on
how bad that industry is.

[1]
[https://scholar.google.com/scholar?hl=en&q=twitter+stock+sen...](https://scholar.google.com/scholar?hl=en&q=twitter+stock+sentiment+analysis&btnG=&as_sdt=1%2C47&as_sdtp=&oq=twitter)

[2] Granger Causality offsets two signals in an attempt to find a correlation
between them at an offset between two times. AKA find "causality" by finding
correlation at an offset in time

[3] [https://stats.stackexchange.com/questions/5750/look-and-
you-...](https://stats.stackexchange.com/questions/5750/look-and-you-shall-
find-a-correlation)

~~~
aagha
> Long story short, all "research" is pretty much B.S.

I'm curious about this as it relates to Piglet. Are you monetizing on the idea
that most people won't realize this and will pay for Piglet with the hopes
that they signals they get there are somehow positively correlated to what's
happening in the real world for a particular stock/idea/company/politics when
in reality you don't think it really does?

~~~
lettergram
For Piglet, I'm actually going back and conducting research to find valuable
signals and releasing everything in as clear a way as possible.

Importantly, I'm providing access to signals first with explanation(s) of what
each signal is / how it's useful today.

I also have an investment model currently accomplishing on average 50%+ YoY
(for the back testing on 8 years, live for 4 years).

From there, I'm building further machine learning models on top of the
signals. However, (as mentioned prior from the sentiment analysis paper) I'm
doing much more robust research. For example, comparing all stocks as opposed
to DJIA.

------
murbard2
I got into quant finance 12 years ago with the mistaken idea that I was going
to successfully use all these cool machine learning techniques (genetic
programming! SVMs! neural networks!) to run great statistical arbitrage books.

Most machine learning techniques focus on problems where the signal is very
strong, but the structure is very complex. For instance, take the problem of
recognizing whether a picture is a picture of a bird. A human will do well on
this task, which shows that there is very little intrinsic noise. However, the
correlation of any given pixel with the class of the image is essentially 0.
The "noise" is in discovering the unknown relationship between pixels and
class, not in the actual output.

Noise dominates everything you will find in statistical arbitrage. R^2 of 1%
_are_ something to write home about. With this amount of noise, it's generally
hard to do much better than a linear regression. Any model complexity has to
come from integrating over latent parameters or manual feature engineering,
the rest will overfit.

I think Geoffrey Hinton said that statistics and machine learning are really
the same thing, but since we have two different names for it, we might as well
call machine learning everything that focuses on dealing with problems with a
complex structure and low noise, and statistics everything that focuses on
dealing with problems with a large amount of noise. I like this distinction,
and I did end up picking up a lot of statistics working in this field.

I'll regularly get emails from friends who tried some machine learning
technique on some dataset and found promising results. As the article points
out, these generally don't hold up. Accounting for every source of bias in a
backtest is an art. The most common mistake is to assume that you can observe
the relative price of two stocks at the close, and trade at that price. Many
pairs trading strategies appear to work if you make this assumption (which
tends to be the case if all you have are daily bars), but they really do not.
Others include: assuming transaction costs will be the same on average (they
won't, your strategy likely detects opportunities at time were the spread is
very large and prices are bad), assuming index memberships don't change (they
do and that creates selection bias), assuming you can short anything (stocks
can be hard to short or have high borrowing costs), etc.

In general, statistical arbitrage isn't machine learning bound(1), and it is
not a data mining endeavor. Understanding the latent market dynamics you are
trying to capitalize on, finding new data feeds that provide valuable
information, carefully building out a model to test your hypothesis, deriving
a sound trading strategy from that model is how it works.

(1: this isn't always true. For instance, analyzing news with NLP, or using
computer vision to estimate crop outputs from satellite imagery can make use
of machine learning techniques to yield useful, tradeable signals. My comment
mostly focuses on machine learning applied to price information. )

~~~
aatchb
A few years ago I graduated with a PhD in statistics with lots of ML
inspiration. Since then I have always dreamed of applying my knowledge and
skill in this domain. However, despite the belief I was 'probably' in a decent
position to do so, I consistently read about how impossible it was. I have a
boring 'normal' persons job, but, posts like this are somewhat reassuring that
I made a reasonable decision to abandon a life of fruitless datamining and
overfitting.

~~~
chillingeffect
I don't think the message here is "don't do it," but "have domain knowledge."
The crux of the paper was scientists applying ML to a bunch of data without
really understanding trading.

~~~
dsacco
You can actually have scientists find signals in data they have no domain
experience in. In a typical hedge fund the quantitative researchers will be a
different group from the quantitative developers and traders. There are fuzzy
lines between those depending on culture, but those three groups are broadly
the front office. You really need domain experience for execution and risk
management, but pure insights can be derived without necessarily needing any
domain experience.

That said, quant researchers typically understand how the market works. They
are just able to quickly excel without a background in it.

------
zacharydavid
Special thanks to Nickolas Younker (at LiquidWeb) for saving my behind and
getting this all set up.

~~~
arjie
Doesn't load for me, mate. Also, if you put your email in your profile, I can
send you an email instead of posting a comment here.

~~~
arthurcolle
Same here. Unfortunate...

~~~
zacharydavid
Sorry you two are still having issues. I had to do a new incognito window to
get to it

------
zacharydavid
Sorry guys. Traffic killed the site. Booting up a new server

~~~
rubatuga
Do you have a cached copy?

~~~
zacharydavid
En route

------
chvid
Does anyone know of any paper that describes a reproducible method of
generating above normal returns in the mature western financial markets? Nope.
Me neither.

~~~
proofofstake
Do you consider cryptocurrency mature?

~~~
chvid
Nope. But even for those types of market I doubt that there exists an academic
paper describing a reproducible trading strategy/method with above normal
returns.

~~~
proofofstake
[https://pdfs.semanticscholar.org/db15/1836543d8a70db1dabef3d...](https://pdfs.semanticscholar.org/db15/1836543d8a70db1dabef3dee43637a7cd29f.pdf)
"Bayesian regression and Bitcoin"

> The strategy is able to nearly double the investment in less than 60 day
> period when run against real data trace.

[http://cs229.stanford.edu/proj2015/029_report.pdf](http://cs229.stanford.edu/proj2015/029_report.pdf)
"Algorithmic Trading of Cryptocurrency Based on Twitter Sentiment Analysis"

[http://journals.plos.org/plosone/article?id=10.1371/journal....](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0177630)
"When Bitcoin encounters information in an online forum: Using text mining to
analyse user opinions and predict value fluctuation"

[https://pdfs.semanticscholar.org/e065/3631b4a476abf5276a264f...](https://pdfs.semanticscholar.org/e065/3631b4a476abf5276a264f6bbff40b132061.pdf)
"Automated Bitcoin Trading via Machine Learning Algorithms"

~~~
chvid
"Based on this price prediction method, we devise a simple strategy for
trading Bitcoin. The strategy is able to nearly double the investment in less
than 60 day period when run against real data trace."

From the paper "Bayesian regression and Bitcoin".

I think are more relevant question is: This is clearly not the case in the
real world - so why does it appear to be like that?

But thanks for the links; I will enjoy reading through them.

~~~
proofofstake
Do you mean by "the real world" that Bitcoin price prediction does not work in
the real world? Or that stock market prediction/strategies do not work in the
real world?

------
dogruck
Would be nice to see a standard academic platform for backtesting. Then, the
paper could say "we submitted our implementation of this strategy to Backtest
(which includes transaction costs and slippage)."

~~~
sgt101
I think the pymc3 people have made something available

