
Stock Price Prediction with Big Data and Machine Learning - reality
http://eugenezhulenev.com/blog/2014/11/14/stock-price-prediction-with-big-data-and-machine-learning/
======
chollida1
I read this last night and thought it was a great writeup.

Meta note: I especially like how the code was intertwined with descriptive
text like an R Knitr file. It makes it easy to follow allow and verify that
what the author says he's doing, is actually what he's doing:)

A few issues about using this in production, none of which are intended to
slight article's author or his work.

1) The biggest issue for me has always been speed. In this instance he's using
one symbol, imagine trying to do this against 1000 symbols with the real depth
of market and not just the precanned market data he's using. its alot more
data to parse and classify. You can start to see why many HFT systems are more
of an IT and programming endeavor than a quantitative one, not that the
quantitative portion isn't important:)

2) Just being able to detect which way the stock will tick doesn't really help
as much as you'd think.

Assume that you can correctly identify the direction of the next tick 100% of
the time. To make money off of this information you need to:

\- be able to do this classification and send the order to market faster than
it takes to receive the next tick(n+1), a difficult task for most stocks that
trade in major US markets.

\- Get to the top of the order book, again a difficult task as the bid/ask
spread is already tight, and sometimes at its penny limit.

\- Get someone to fill your order, again a factor of being at the top of the
order book,

\- Identify if the tick n+2 is going in the same or opposite direction. if you
guess wrong, then you lose. If you guess right you still need to be able to
exit your position.

As always, if you are capable of doing this kind of work and able to work in
Canada. I'd love to chat with you!! Heck even if you are interested in machine
learning and the markets, I'll make time to chat.

~~~
haddr
Exactly, the main problem is that it might be much easier to do such
prediction "a posteriori" rather than use such algorithms in the real-time
bidding...

The question that seems interesting: is it possible to guess the price
movement for some t+delta moment, where the delta could be for instance 0.1
sec? Or it would be completely unpredictable...

~~~
skillachie
I would also be very interested in seeing this applied to predictions at t +
delta as well. Will probable attempt it over the Christmas break

~~~
haddr
Great! let me/us know your results!

------
IgorPartola
From a 10k foot view: if you actually manage to build something like this that
you can use to trade, why would you ever publish it? No matter how small of an
edge you get over the rest of the market, you can turn that into a huge amount
of money, so why reveal it? The corollary to this is that if something like
this is published, it means it doesn't actually work in practice.

I did a little bit of BTC trading, and I thought I had an interesting idea. I
traded on one of the smaller exchanges, but uses Mt. Gox as the oracle to
predict which way the price would move. The basic idea worked pretty well:
there was correlation. The problem ended up being my order placing algorithm
which would actually do the wrong thing on very large/fast swings.

I think that idea may be interesting to apply, in terms of correlated stocks.
If you detect a price drop in stock for a lithium mining company, you might
predict a stock drop for Apple, since Apple uses lithium batteries for their
devices.

~~~
gjm11
> why would you ever publish it?

Suppose you have a miraculous trading algorithm that you believe makes
somewhat above-market returns with somewhat below-market risks, but you have
only a small amount of capital you're happy to gamble with.

Then what you can do by keeping your algorithm to yourself is: have
investments that perform slightly better than most other people's. That's
nice, for sure, but getting rich that way takes a long time unless you start
out quite rich (or gamble a lot and get lucky, but that's also a way to get
poor).

On the other hand, what you get by publishing it might be a lucrative job
offer at a hedge fund or investment bank, who hope you can use the skills
you've just demonstrated to get them an extra 0.1% of return on their $10B
pot. Now you have the opportunity to apply the same techniques to thousands of
times more money than before, along with much more data and a server room full
of hardware for backtesting and other smart people who will look at your
clever ideas and maybe notice if there's a big mistake or omission. Of course
most of the gains from your miraculous algorithm now go to the investors, and
most of the rest probably go to other people with more seniority, but you are
still likely to get rich faster and more reliably that way.

I make no claim about the particular algorithm here or the person who
published it. But the above seems to me like a pretty plausible reason why
someone might prefer to publish, even if they have good reason to think their
algorithm works.

~~~
freddealmeida
I suppose this is the argument made for closed source over open source
software as well. There are benefits in opening an algorithm since by far the
current algorithm is not optimal. Of course, if you believe the market assumes
all available information in pricing, then sharing an algorithm shouldn't
really have an effect except for small rent extraction.

~~~
IgorPartola
I don't know if that's a fair comparison. You can open source just about
anything, because in software the execution matters a whole lot more than the
core of the software in all cases (to a first approximation). For example, if
Netflix open sourced their recommendation algorithm, do you think they'd go
out of business because another movie streaming service would pop up overnight
and take over? No, Netflix has name recognition, etc.

On the other hand, there is very little barrier to entry if you have a winning
stock trading algorithm. All you really need is money, which is easy to
acquire (for this purpose at least).

------
discardorama
The problem is: "70% accuracy" doesn't mean much, since he's framed it like a
classification problem. It's more like a regression problem.

Plus, the data used is only for a couple of days.

And finally: if someone had an ML model to reliably make money on the stock
market, they wouldn't be writing about it; they'd be laughing all the way to
the bank.

~~~
petegrif
Many years ago I was a trader. I recall watching analysts 'explaining' what
had just happened in the markets. The question that naturally arose, was why
they weren't speaking from their yachts.

~~~
hayksaakian
the simple answer is that you don't have the same information in front of you
that you did at the time.

in hindsight, we can easily identify what is and is not relevant/influential,
whereas in real-time anything could prove to be relevant/influential.

------
encoderer
ThinkOrSwim has had a feature like this called ThinkAI for many years.
Personally, I think it's not better than random.

Years of thinking and ruminating and learning (and investing) on the subject
has left me solidly in the "random walk" camp. At any given point, a stock is
equally as likely to go up or down. There's a small upward bias in the market
(greater than inflation), and I reason that it's the premium offered over debt
to take the higher risk of equity.

If you plot number of consecutive up-days and down-days, it's a normal
distribution, skewed to the right, and with fat tails.

That said, my own belief and experience suggests that I can consistently press
a small edge, which is why I gravitate towards options and futures (highly
leveraged, high notional value). I don't think that would be possible without
a "Portfolio Margin" account.

~~~
imaginenore
Then how do you explain the success of companies like Jane Street? Their
business is built on predicting the near-future stock price. They are pretty
open about it, they even have tech talks on Youtube.

------
swframe
Part of the stock's future price is based many features that are not in the
dataset. In addition, a stock's current price is based on predictors which
interact chaotically. Your predictor has to take into account that other
predictors are watching in real-time and are trying to take advantage of it.
It seems that attempts to predict the stock market make it random.

------
alphaBetaGamma
The reason why they get good and enticing but unactionable predictions, is
that they assume that they can trade instantaneously at the quoted price. They
are neglecting the time their order would get to the matching engine, by which
time the opportunity would be gone often enough to make their trading strategy
unprofitable.

In fact the authors of the original paper seem completely clueless about this
point: "For example, since the prediction time of AAPL, 0.0311ms, is less than
0.0612ms, which is the time difference between the upward spread crossing
events from the Row k −1 to Row k + 4 in Table 1, the model could in principle
perform fast enough to influence corresponding trading decisions"

As if their order could reach the matching engine in 30 us...

------
btbuildem
Once again - if it worked, it would not be a scientific paper / freely
available on github. The creators would be using it to make money, keeping mum
all the while.

~~~
firebones
While I 100% agree with the sentiment, there are freely published strategies
for beating 80% of market participants (namely, buy and hold low cost index
funds with rebalancing) that exist in the wild and appear to be resilient to
disclosure. Yet many, many people ignore this edge over their peers.

------
raverbashing
It will be another failed experiment.

Stock price is an stochastic process. It's unpredictable for the most part (of
course if there are news with a big impact on a company the stock price
usually reflects that, still)

It's certainly a nice experiment, but don't expect to get rich with it (the
opposite is most likely)

~~~
throwaway283719
Out of curiosity, what percentage of out-of-sample variance explained would
convince you that a predictive model had some power? 0.1%? 1%? 2%?

------
mikkom
If I understand this article (only skipped it over) he hasn't normalized the
data. So if the training data and validation data have similar trend, the
result will be invalid.

edit: It's _two_ days of data? For one stock? If you are interested in this
stuff, this is not an article worth reading.

