
Predicting Stock Performance with Natural Language Deep Learning - prostoalex
https://www.microsoft.com/developerblog/2017/12/04/predicting-stock-performance-deep-learning/
======
mlthoughts2018
This is not very convincing. It reads like a semester project for an undergrad
or a summer intern. In particular, the prediction task used for evaluation is
not very meaningful, and the intended use case (predicting biotech
underperformers) is not very meaningful without predicting _by how much_ they
would underperform, i.e. just using the output of this model as yet another
regression factor with questionable significance in the end.

But in that case, it’s very unlikely a heavy CNN text model would be better
suited than simpler methods like a classifier using LSA vectors, or even just
extracting the gloVe vectors and doing no convolution. Especially given all
the metaparameters they mention needing to tweak.

For example, an ablation study to figure out that you need Lecun
initialization seems both overkill and requiring way more specialization in
deep learning than a typical firm would be looking to have, all just to
squeeze out what _might_ be some slight efficacy in one industry group.

When I previously worked in quant finance, I used to be very passionate about
trying to apply the latest & greatest methods.

But over time my feeling is most of it is inapplicable to finance, because it
ends up being a lot of work for pretty much no efficacy, like in this post. I
found much more value in simpler regression and tree models, and using simple
bag of words models with text. The only “advanced” stuff that ever seemed to
add significant efficacy was Bayesian hierarchical regression, but only for
helping overcome limitations of classical random effects models, not for
adding any greater complexity.

The post author deserves plenty of credit for skills in deep learning, but the
_analysis_ seems unconvincing if it’s supposed to market for using MSFT cloud
for deep learning in finance.

~~~
RyEgswuCsn
Agreed. Also noted three potential flows:

1\. They applied pandas.qcut over the entire returns (i.e. training and
validation set) when generating the target performance labels, which could
compromise the validation.

2\. The ordering of news is actually important when it comes to forecasting
the movements of the market; their training/validation split was done after
shuffling the entire dataset, which means the model will have access to
information that shouldn't be available to it during training.

3\. It makes more sense to use a training-validation-testing setting rather
than a training-validation split when reporting the model performance in order
to avoid inflated results due to all the hyper-parameter tuning.

~~~
rpedela
What do you mean by training-validation-testing vs training-validation split?
Also why would pandas.qcut compromise the validation?

~~~
mlthoughts2018
If the quantile assignments took into account data from both the training set
and the test set, then it means the training set quantile label itself
contains lookahead information about what the distribution of test set values
is.

This can lead to artificially increased test set accuracy since by optimizing
the model during training, you are partially optimizing based on information
you gained by knowing about the test set in advance.

Training-validation-testing generally means keeping a portion of the training
set held out for evaluating accuracy and convergence after one full iteration
of training updates. So you get some information per each training update
regarding whether the model had overfitting, or performs unrealistically well
on the validation set (which it’s not trained on), or for early stopping
criteria.

Then, after all that, you move on to checking the final accuracy and metrics
on a fully held out test set.

This validation approach is common in deep learning because of the many
diagnostics you need to get information about during training. Without some
information outside the training set itself, it can be hard tonunderstand how
the learning rate is affecting you, how likely overfitting is, whether there
is a vanishing gradient problem.

Waiting until all the way to the end of training to get any feedback on these
is sometimes just too inefficient or too risky that you’ll discover an issue
only after a huge time sink.

------
a008t
Methodology seems to be poor, like with most publications on finance.

1\. How did they split the data for train/validation/test sets? If all of the
sets had the data from the same time period, or the same companies end up in
multiple sets, it is a major flaw. For example, if outperformance was
persistent for the companies over the time period considered, the model may
simply learn to identify specific companies by their filings.

2\. What is the variance of the out-of-sample performance? Given that their
dataset is very small, and the model performed badly at predicting high
returns and reasonably well at predicting good returns, what are the chances
of getting those results by luck alone?

3\. How has the model performed since then, on the most recent filings?

4\. Why use a convnet? Would gradient boosted trees not perform just as
well/better but be more interpretable? Methods like the ones in the eli5
package can help get an idea of why the model makes a particular prediction,
which could help sanity check the model.

~~~
nunya213
1\. Are you really assuming that researchers at MSFT screwed up the training
and validation sets?? 4\. In what way is GBT more interpretable than an ANN??

~~~
a008t
I don't think that was MS researchers - I think they just publicised this bit
of research.

With GBT, you can check why a particular prediction was made - roughly - by
navigating down each tree for a particular sample and summing up the influence
of each feature on the final score. Then you can see if something weird is
having a large effect on your score. Can you do something similar with ANNs?

------
maxander
They're attempting something fantastically difficult to do effectively
(arguably beyond human capability) and arguing that their slightly-better-
than-chance results indicate that their approach is promising. More likely,
the system is picking up on some "low hanging-fruit" indicators in the data
(perhaps that corporate filings with certain buzzwords are likely to fail, or
that reference projects beyond a certain age, or similar) which would already
be easily picked up on by human analysts, and that anything much beyond that
would be impossible with anything remotely resembling the present method.

I want to posit a general hypothesis (perhaps it's already been said); better-
than-chance performance by a classifier on some dataset is _not evidence_ that
a similar (or any) classifier can perform better on the same data.

~~~
pasta
I was thinking about weather prediction.

Now climate is changing you see that all the models are becoming less and less
acurate.

So yeah some models might be performing better but overall they are all
performing worse.

~~~
tephra
Have there been any research into the decline of weather prediction models?

Sounds really interesting.

~~~
starpilot
Weather models are _constantly_ being validated and improved. They are getting
better as computation power and grid resolution have increased. The climate is
changing but the laws of physics are not. Also, climate is different from
weather.

------
jacques_chester
A few years ago I spoke to a data scientist who, after a bunch of NLP work,
had hit on a very simple technique for quickly identifying good news and bad
news.

How to tell that it's good news: there are numbers near the top.

How to tell that it's bad news: the figures are buried as deeply as legally
permissible.

~~~
rebuilder
Ironically, I figured the authors of the article weren't very impressed with
their results since they weren't outlined in the summary...

------
asmithmd1
I have found that if the earnings comments are easy to understand, the stock
is going to go up. If there are lots of qualifiers and they invent new ways to
measure: sales, costs and profit, then the stock is going down.

If they mention the weather during the last quarter - run!

~~~
danieltillett
Weather really is last refuge of scoundrels.

------
et2o
They lose a lot of information artificially classifying a continuous variable
(% change) into 3 bins (high/medium/low) instead of trying to do regression on
percent change. This incorrect practice is surprisingly prevalent in deep
learning.

~~~
jjn2009
With financial data you might be better off separating out the side of the bet
from the size of the bet and thus it makes sense to just make three bins. The
features to use for direction and volatility are different.

~~~
dx034
Choosing bins for stock returns isn't easy. You could use two bins for
positive vs. negative returns but any other thresholds are arbitrary and
results could depend on the knowledge available when the threshold was chosen.

~~~
jjn2009
It's arbitrary if you make it arbitrary. Depending on the data you might
target various time intervals to find the optimal way to trade on that
information just like any hyperparameter and the bins are a function of that
time, on average in a 30 minute period bitcoin moves by 0.2% on high liquidity
exchanges.

~~~
dx034
But as I understand they used the whole sample to determine bins, including
validation sets. That alone is a big no-go. I didn't mean that you can't find
a proper way to choose bins but there is a lot of potential for errors.

And in the end you want to invest based on the model. Arbitrary bins aren't
that helpful if some of them combine desirable and undesirable (e.g. positive
and negative) results.

~~~
jjn2009
>whole sample to determine bins, including validation sets.

well thats not random or arbitrary just wrong.

>And in the end you want to invest based on the model.

I'm not quite sure what you mean here. Either you determine a model and
hyperparameters (which includes the bins) is correct enough of the time via
testing on out of sample data or even synthetic data or you are talking about
a determining what to do given a single observation, which I would assume you
give it to the model and ask it what it thinks and the bins are part of the
determination (as done here in the article with softmax output) of what to do
and given you've done the testing you should have a level of confidence about
the outcome of acting upon the models output. The bins aren't a post
processing step, the post processing you might do to trade recall for accuracy
might be to expect the bin to have a stronger signal (class > 0.5 or something
maybe and otherwise ignore), all of this is "the model".

>Arbitrary bins aren't that helpful if some of them combine desirable and
undesirable results

Correct me if I'm wrong but it sounds like you are making the case that there
are models and bins which would have only good outcomes (and those are the
only useful ones)? Am I misunderstanding something here?

------
fernly
So if something like this came into common use, the obvious follow-on would be
"earnings writers" who specialize in phrasing and vocabulary that tickles the
AI. (It would be easy to practice, just feed your draft into the system, get a
score, tweak some words, repeat.) So, reproducing the "SEO" frenzies of a
decade ago.

~~~
mlthoughts2018
This technique generally (applying NLP to 10K’s and other corporate filings)
has been in practice for many years. One common, highly commoditized text
signal is earnings surprise: measuring sentiment in earnings call transcripts,
and comparing sentiment scores with similar metrics from analyst publications
in reaction to the earnings announcement.

The arms race effect you mention absolutely exists. Corporate officers are
often coached extensively on key phrases to repeat in earnings calls, and
phrases to avoid, and how to steer caller questions away from SEO-like speech
patterns they don’t want.

------
Roritharr
I truly wonder when we'll get some deep learning models running on Enterprise
Exchange Servers trying to find meaning in the deluge of email's that is sent
across all large corporates, building fancy Dashboard with "actionable"
warnings.

I bet someone is going to make a fortune based on the pitch alone.

~~~
stcredzero
Weren't there some bots doing sentiment analysis on twitter? Also, these were
discovered then exploited at one point to cause a rapid drop in the stock
markets.

~~~
funfunfunction
Any links with more info would be appriciated :)

~~~
stcredzero
[https://www.forbes.com/sites/davidleinweber/2013/04/24/so-
mu...](https://www.forbes.com/sites/davidleinweber/2013/04/24/so-much-for-
fund-mining-twitter-sentiment-for-picking-stocks-but-ok-at-the-
sec/#590bdc2580df)

[http://www.slate.com/articles/business/moneybox/2015/04/bot_...](http://www.slate.com/articles/business/moneybox/2015/04/bot_makes_2_4_million_reading_twitter_meet_the_guy_it_cost_a_fortune.html)

[http://fortune.com/2015/12/07/dataminr-hedge-funds-
twitter-d...](http://fortune.com/2015/12/07/dataminr-hedge-funds-twitter-
data/)

------
acou_nPlusOne_t
Training Set includes, old stock data? How is that corelated with anything?
Steam-Engines where spectacular stakes, until the car and eletricity came
along.

I wouldn't train a AI on stakes- i would train it on history books/ old
newspapers to find hidden needs and gaps. People do not use half of the day,
due to darkness? People fighting, due to starvation, or reduced chances to
mate and procreate?

A NN could find these needs, and hand it to a diffrent NN, that is trained to
find solutions/ companys likely to develop solutions.

~~~
daveguy
Except a NN could not find these needs. They are very poor at the
understanding part of language.

------
bitL
It would be interesting alongside estimating general "consensus" on stock
performance development using NLP processing, sentiment analysis etc. to
utilize game theory to find a contrarian strategy utilized by a few players
that has the highest probability of yield, being one step ahead of moving
average of dominant expectations, and mastering equilibrium where transacted
amount doesn't affect the dominant trend to keep surfing the wave as long as
possible.

------
itronitron
related work from 2009 for those interested...

Financial forecasting using character n-gram analysis and readability scores
of annual reports. by Matthew Butler and Vlado Keselj

------
djrogers
This feels like the type of thing that if widespread could be easily abused -
at least in the short term. Something along the likes of SEO, but for SEC
filings - used to boost a stock price in the short term.

~~~
throwaway2016a
Black Hat SEC?

There was a model to predict up votes on Hacker News based on title that made
its rounds a way back. Fortunately we don't have a bunch of submissions with
the title "YC YC YC YC YC YC golang is better than Rust" [1]

[1]
[https://news.ycombinator.com/item?id=14400603](https://news.ycombinator.com/item?id=14400603)

------
yedawg
This is really only my 2c and I've only really dabbled in btc/alt coin
trading. In my experience, there is too much entropy generated by the
collective masses transacting, to reliably predict most ups and downs without
cheating or using some external "special" knowledge of the market in question.
We will need to wait til the days of strong quantum computers before we can
achieve that level of precision with ai.

~~~
OscarCunningham
By that point the other traders will also have quantum computers, so they'll
be just as hard to predict.

------
noso
This person was on this years ago! [https://medium.com/@symbols/mayweather-vs-
pacquiao-the-fight...](https://medium.com/@symbols/mayweather-vs-pacquiao-the-
fight-from-twitter-and-
facebook-77bc547020b9?source=linkShare-4e17da25b976-1527807605)

------
thewhitetulip
JP Morgan would have a nice quote for these things: "The Market Fluctuates".

I don't really think we can predict the stock market because the market
fluctuates every day based on news and things like that.

Stock market is like a legal gamble, in the words of Wrren Buffet, "The
intelligent earn money off the fools".

~~~
Erlich_Bachman
Warren Buffet has literally made billions by exactly "predicting the stock
market" \- the thing that you write is "impossible". So what do you mean?

~~~
thewhitetulip
Well that isn't "prediction". He invests in sound businesses which are poised
to grow. His investment returns used to take like decades to gove a good
return.

He did not spit out " 100 stocks which will go up tomorrow using my fancy
algorithm"

He researched the people behind the company the finances etx

Wildly different from what algorithms do

------
kusmi
I started doing this with 10k reports, the different formatting between
different years was too much of a pain in the ass. They started embedding
Excel documents in what I think is base64 into text files? I don't know. In
the 90s the tables were in plain text.

~~~
rpedela
Yes, XBRL (XML) files. However they only give you the financials. What OP
wanted was management summary, risk factors, etc which still needs to be
parsed. Luckily many companies now will put anchors to those sections in the
HTML.

------
1024core
Here's a meta question about all such "predicting stock market performance
using X"-type publications: if you have a method that does better than
average, why are you publishing it, and not, say, investing all your money and
making a killing?

~~~
philipodonnell
This is a pretty common response so I think its worth pointing out that there
are a lot of reasons why you might have a model you think might be able to
predict the stock market but are not actively using it to make money.

* You may not be eligible to trade that instrument because of where you live or who you are

* You may have found an edge based on historical data in one instrument and expect it could work for another but lack the data necessary to confirm

* You may not have the capital necessary to trade at sufficient scale

* You may not have the programming skills to turn a mathematical model into an build and maintain a high-frequency trading bot with adequate risk controls (waaaaaaaaaay different skillset)

* You may have found a prediction who's error rate is within the transaction costs and thus needs to be traded at higher volume (lower tx costs) to confirm

And before anyone thinks "why not just negotiate to sell it to someone who
does" they get hundreds of random people doing that every day so your chances
of getting noticed are between slim and none.

Now, what is an effective way to benefit from a market prediction model is to
publish it. Those people may hire you to continue to develop it further or to
refine it for other markets you didn't think of. No model lasts forever, so
having demonstrated the ability to find one makes it more likely you will do
so in the future, thus should attract big money salaries from people with the
capital for your next one.

Not to say that these are legit, just pointing out its not hand-wavvy
dismissive is all. I wish authors would start a paper with something like this
just to address this inevitable comment. "We aren't trading this because we
can't trade SLURM futures" would really help readers out if they also can't.

------
arwhatever
The Rockwell Retroencabulator ...

~~~
martin1975
which one, the original or the turbo?

------
AdamM12
I don't think this is very useful outside of short term fluctuations. At EOD
analysis of the 3 financial statements (Balance sheet, cash flow, Profit/Loss)
are what drives price.

~~~
cycrutchfield
It turns out you can make a lot of money off those short term fluctuations.

~~~
AdamM12
Yeah I mean that is true. Almost had an edit to mention that.

------
hellofunk
Anyone have experiences with Azure? Nearly everyone I see personally in
industry is doing their research on AWS. This article is obviously trying to
market Azure, which makes me curious.

~~~
pasta
Azure is almost as big as AWS. I think Azure is more used for Enterprise Saas.

~~~
origami777
According to who? And what is included? Microsoft has a history of blurring
the lines here. I think the only way you get them being equal is by combining
internal use and/or their SAAS offerings. I don't think it's a fair comparison
if you do that.

~~~
pasta
Maybe you like this article: [https://www.redpixie.com/blog/microsoft-azure-
aws-guide](https://www.redpixie.com/blog/microsoft-azure-aws-guide)

I think this is a fair and in depth comparison.

~~~
hellofunk
> I think this is a fair and in depth comparison.

redpixie is a MS partner, so I don’t expect it to be fair.

[https://www.redpixie.com/blog/iaas-paas-
saas](https://www.redpixie.com/blog/iaas-paas-saas)

“As a Microsoft Partner, we focus on Microsoft Azure IaaS solutions”

------
thrrrowwy
Nothing like an article written 6 months ago about a method with 50% accuracy.
The dime in my pocket does just as well and is much simpler :)

------
tshanmu
this was being promoted to me an insane amount on twitter - now on HN as well?

------
egusa
really interesting article. i imagine the financial markets being one industry
a lot of people apply deep learning models to.

------
cup-of-tea
If you learn how to predict the stock market and reveal your methods then you
no longer know how to predict the stock market.

------
dogruck
This is from December 2017.

~~~
hbcondo714
Yeah, the Github repo on this looks quiet; no responses to the issues posted.

------
mark212
This sounds like the plot of a mediocre thriller. Oh wait, it is.[1]

[1][https://www.theguardian.com/books/2011/sep/30/fear-index-
rob...](https://www.theguardian.com/books/2011/sep/30/fear-index-robert-
harris-review)

~~~
mdellavo
also
[https://en.wikipedia.org/wiki/Pi_(film)](https://en.wikipedia.org/wiki/Pi_\(film\))

