
Using Reinforcement Learning in the Algorithmic Trading Problem - godelmachine
https://arxiv.org/abs/2002.11523
======
traK6Dcm
As someone who has written about this previously [0], worked briefly in HFT
before, and read dozens of papers on the subject, I can say with very high
confidence that the results are not to be trusted. This paper, just like
pretty much any academic paper on the subject, ends with a backtest on
historical data, not a real system.

Not only is it (very!) easy to overfit backtests (especially with so little
data they are using here), but backtests are nothing like the real world. In
the real world there are HFT traders front-running you, latency, jitter, fees,
hidden order types, slippage, and a lot of other complexities that don't fit
into a short HN post. Whenever you see a paper ending with a backtest you can
already assume it's BS.

It's similar to training a robot in an extremely simplified 2D simulation
environment without physics or other interactions, and then claiming one has
built a real robot. A mistake many people make is believing that trading is
all about AI. But in reality, the model often matters less than
infrastructure/latency/system/data issues.

In addition to that, people who are actually "good" at trading don't publish
papers, they silently make money. Papers are typically published by academics
or students who have never built anything profitable but would like to put a
paper on their resume. I have yet to see a single good academic paper about
trading.

[0] [https://www.tradientblog.com/2019/11/lessons-learned-
buildin...](https://www.tradientblog.com/2019/11/lessons-learned-building-an-
ml-trading-system-that-turned-5k-into-200k/)

~~~
jawns
> people who are actually "good" at trading don't publish papers, they
> silently make money

I've long understood that this was true. It makes intuitive sense.

But are there any cases where it is not true?

Is it possible to "spread the wealth" when it comes to trading, or any money-
making endeavor?

Or does it always reduce down to "I win only because you lose"?

~~~
amiga_500
> Or does it always reduce down to "I win only because you lose"?

It is a zero sum game. Nobody is producing anything, therefore for one to win
another must lose.

~~~
traderthrow454
I think thats an overly simplistic view of things. The market is big and many
participants trade at different frequencies. Large pension funds need
liquidity to move big blocks of stock for their quarterly and monthly
rebalances, and the big medium term statistical arbitrage traders provide
liquidity for them to do so. HFT players provide liquidity for the stat arb
players. The classes of participants with different frequencies actually help
one another, while there is competition for alpha within strategies with
similar holding periods. Overall the system creates an extremely efficient and
liquid system for valuing and exchanging equity - the very system that
empowers YCombinator and other Venture investors to make VC investments
knowing that their winners will eventually IPO or be bought by public
companies.

~~~
amiga_500
I knew someone would come in with "liquidity".

Many HFT jump out when things get volatile, when liquidity is actually
required.

Ultimately HFT is doing nothing of societal value, the race down to zero is
never-ending and we are wasting huge amounts of resources on a totally
pointless march towards zero. Exchanges should introduce random delays to
allow market participants who really want to hedge / buy / sell, then we can
shift some of the resources to the real world. The costs required to compete
at the lowest latencies are large, and forcing small/medium players out the
game, as the investment cost is large, which is also bad.

The system is hugely inefficient. The costs as latencies get lower are ever
higher, for an extremely similar end result. The law of diminishing returns.

~~~
traderthrow454
My initial comment was discussing speculative trading in general, but since
you mostly brought up some common anti-HFT tropes I might as well address
them.

> Many HFT jump out when things get volatile, when liquidity is actually
> required.

Do you have a citation on that? If you look the preliminary Q1 results of
Virtu Financial [0] (only publicly traded HFT) they seem to be doing more
trading than ever in these volatile markets.

> Ultimately HFT is doing nothing of societal value, the race down to zero is
> never-ending and we are wasting huge amounts of resources on a totally
> pointless march towards zero.

HFT is a mature industry. Latencies have mostly stabilized, and profitability
is way down in the last few years. Many firms are merging/consolidating. So in
the past few years society is actually spending fewer resources - both
financially and from a human capital standpoint on HFT than it did in the
past.

> Exchanges should introduce random delays to allow market participants who
> really want to hedge / buy / sell, then we can shift some of the resources
> to the real world.

IEX is doing something relatively similar to that for a few years now. They
have ~3% of US equities market share. People have the option of trading there
but they mostly choose not to.

> The system is hugely inefficient. The costs as latencies get lower are ever
> higher, for an extremely similar end result. The law of diminishing returns.

Due to consolidation, costs are actually decreasing. Could it be that the
market is... working?

[0] - [https://ir.virtu.com/press-releases/press-release-
details/20...](https://ir.virtu.com/press-releases/press-release-
details/2020/Virtu-Announces-450-Million-of-Additional-Broker-Dealer-
Borrowing-Capacity-and-Preliminary-Quarter-to-Date-Results/default.aspx)

~~~
twic
> If you look the preliminary Q1 results of Virtu Financial [0] (only publicly
> traded HFT) they seem to be doing more trading than ever in these volatile
> markets.

Similar story from Flow Traders:

[https://www.flowtraders.com/sites/flow-
traders/files/quarter...](https://www.flowtraders.com/sites/flow-
traders/files/quarterly_results/2020/Flow%20Traders%20Q120%20Trading%20Update.pdf)

~~~
amiga_500
Everyone is, volumes are hugely up. The point about liquidity is during the
sudden market shifts, not over a quarter!

~~~
auntienomen
Perhaps you don't follow the news? This was a quarter rich in sudden market
shifts.

~~~
amiga_500
Thanks, I do follow the news. If you read the threads above again you will see
that both posters are fully aware of elevated volume, and the distinction was
between HFT melting away during short periods of vol and wider "liquidity"
from HFT.

So what seemed like a quick drive by wasn't actually correct.

------
lordnacho
Former hedge fund and HFT quant trader here. There's a lot of papers to be
found claiming some sort of strategy. I don't want to go to cynicism
immediately. But we'll get there:

\- Trading isn't just about deciding what to buy and sell, the sexy part that
everyone thinks is great. I even had colleagues who thought they were special
because they worked closer to the strategies, which meant that certain less
glamourous parts were neglected.

\- Less glamourous parts like coding the software to read in the market data
and send out orders.

\- Less glamourous parts like schmoozing with brokers to get them to lower
your costs.

\- And maintaining infrastructure, which somehow people think should come as
part of coding.

Now I'm not saying that RL won't help you. It's just that focusing on the
"intelligent" part of the trading system tends to lead to disappointment, as
you discover some unknown restrictions on your model that you hadn't thought
of. Things like when you find out short selling was prohibited during the
period that your model backtest was shorting.

My main red flags when reading papers are:

\- Choosing a dataset from a small market. Basically any market that isn't the
US or Western Europe large caps. You'll discover both price impact and high
fees quite late in the game.

\- Choosing a very small subset of the market. Smaller n, more noise and
overfitting.

\- Short periods. N again.

\- Long intervals between decision making. N again again.

That's not to say there's nothing useful to be read though. You might be
inspired by something you come across.

~~~
conformist
Yes, this.

The small subsets and super high Sharpe ratios look suspicious.

Further red flags in this particular case:

\- Completely unclear what kind of data they're using. Are they assuming they
can buy and sell one individual contract at the bid price each minute? Or did
I miss some crucial information about bid-ask spreads?

\- Abstract mentions a profit, not an information ratio/Sharpe ratio or
anything similar.

\- During training they need to tweak the reward function in order to not end
up with "buy and hold"? How good is their strategy compared to buy and hold?

\- Plots without proper labels.

------
fnbr
An additional problem with this is that they use A3C here for trading. A3C is
known to not be suitable for adversarial environments (e.g. board games, like
Chess).

I wrote a paper that demonstrated that A3C is as exploitable as a uniform
random strategy in board games (specifically, some poker variants):
[https://arxiv.org/abs/2004.09677](https://arxiv.org/abs/2004.09677)

(Exploitable is a technical term that is defined in the paper; basically, it's
"how much can someone who knows everything about your strategy beat you by?")

So I would be very surprised if this survives contact with other traders.

~~~
MasterScrat
> A3C is known to not be suitable for adversarial environments

Interesting! What are the main papers in this area?

Any intuition why this is the case? is it because A2C generally results in
brittle policies?

~~~
fnbr
It’s mostly an issue that A2C isn’t designed for adversarial environments. It
also doesn’t have any notion of hidden information, while other algorithms (eg
CFR) explicitly handle this. There’s a well-known phenomena of cycling, where
agent A will beat agent B which beats agent C which beats agent A; A2C can
exhibit this. Think of rock/paper/scissors- AlwaysRock beats AlwaysScissors
which beats AlwaysPaper. To avoid this, you typically need to do some sort of
averaging.

The alphastar paper and blog post do a good job discussing these issues as
they had similar problems. I’d say that’s a great starting point (and then
following their references).

Blog post:

[https://deepmind.com/blog/article/alphastar-mastering-
real-t...](https://deepmind.com/blog/article/alphastar-mastering-real-time-
strategy-game-starcraft-ii)

------
1024core
Every time such a paper like this comes out, I have to ask myself: if they
knew how to make money like this, why would they tell anyone?

~~~
cultus
It's a weird subject for academic study in the first place. Trying do do
trading strategies that beat the market is one of the few things that really
is a zero-sum game. In the absence of independent scientific interest to
optimizing these strategies, what's the point? You might as well study how to
optimize strategies for ultimate frisbee or something.

~~~
auntienomen
It's probably best understood as either job audition, fund-raising, or
mathematical entertainment.

~~~
cultus
Sure, I find mathematical finance absolutely fascinating. I just don't think
it's worth putting a lot of research energy into. It's usually the boring
stuff that matters.

------
henning
It looks like their code is assuming a fixed commission of 20 rubles, which is
apparently equivalent to US$0.27. Does anyone know if that's realistic?
[https://github.com/evgps/a3c_trading/blob/master/configs.py#...](https://github.com/evgps/a3c_trading/blob/master/configs.py#L24)

Depending on how liquid the market they studied is, code that assumes there is
never any slippage may not be very realistic.

There's no comparison to a simple buy-and-hold strategy, which may be less
interesting from a computer science perspective but is a good way to avoid
spending lots of money on transaction costs.

(I once posted my own algorithmic trading project to HN that had very flawed,
naive assumptions about what trades could be executed.)

~~~
hhmc
IIRC moex is perticularly expensive to trade on, and costs can be non linear
-- but something like 1bps of commms is a reasonable approximation as an upper
bound. In the paper they claim a cost of 2.5RUB per transaction, not the 20 in
the config file.

Edit*: It looks like the comms are indeed around 10RUB per side (which is
approx 1bps).
[https://www.moex.com/en/contract.aspx](https://www.moex.com/en/contract.aspx)

If the model is trading single lots then this is a reasonable cost assumption,
otherwise it isn't. The paper using 2.5RUB as costs is unreasonable.

------
starpilot
This is just fitting on noise. The vast majority of movements are random and
no more predictable than a coin flip. Before training, your job is to extract
that extremely weak signal, _then_ train.

Try generating a time series in Excel with Brownian noise, watch as it is
indistinguishable from price charts.

~~~
stainforth
What if someone did find a pattern in Brownian noise...

------
outlace
Anyone interested in delving into reinforcement learning should check out our
book [https://www.manning.com/books/deep-reinforcement-learning-
in...](https://www.manning.com/books/deep-reinforcement-learning-in-action) \-
we cover actor critic (used in this paper) to multi-agent to
relational/attention models

~~~
jmeister
Thanks, looks very useful.

Since you seem to be industry practitioners: I moved away from RL 10 yrs back
disillusioned with lack of real-world applicability. Has that changed
significantly? The only major name I’ve heard of is Vowpal Wabbit. Maybe there
are more applications being done in stealth. Any insight? Thanks

~~~
johnmoberg
You might be interested in the recently-launched Covariant
([https://covariant.ai/](https://covariant.ai/)), they apparently actually
have systems in production. Pieter Abbeel is one of the founders and they have
some pretty "heavy" investors, like Jeff Dean, Geoffrey Hinton, and Yann
LeCun.

~~~
jmeister
Thanks! OpenAI has made some progress on this problem too:
[https://news.ycombinator.com/item?id=21259765](https://news.ycombinator.com/item?id=21259765)

Looks like these two places are on the cusp of a major breakthrough in
RL/robotics!

------
iliicit
i've been trading crypto in large volumes at high frequencies for quite some
time now. my models were plain as yogurt feed-forward neural nets. i would
engineer some dumb features, sample the at random data, assign the labels
(that translate into trading decisions), and train the model. then push to
prod, sit back, and relax while the balance grows like a mushroom cloud. just
kidding, before that i would grow gray hair while backtesting, debugging
issues, etc.

one of the hard problems was labeling the data. knowing that the price is
going up 10 bps one minute from now, should i buy? maybe. but what if it's
going to crash 100 bps right after this? probably should sell instead.

reinforcement learning promises to eliminate the need to assign labels in the
training data. the agent will try a bunch of different variants at random and
eventually will choose the most optimal one knowing the state of the working,
i.e. the state of the markets. at training time i only need to feed it the
features data. another benefit is that backtesting and model training is sort
of fused into a single process. rl model is optimizing pnl, and not the label
classification score (as in the nn model). with proper train-test-validation
split, the most performant rl model can go straight into production (helping
me to keep some of my hair brown)

while all the bits and pieces seem straightforward i never managed to tune rl
model to work better in the backtest compared to the good old old nn models.
maybe i have never been closer to the gold vein, but for now, i abandoned my
efforts to build a performant rl agent if favor of nn models.

amen.

------
anjc
Many people are criticising their backtest but I don't understand why. Their
test data is sequential to their training data, considers time, and doesn't
overlap. They can't overfit to their test data. In any other area of ML this
would be an acceptable scheme, why is this unacceptable here?

~~~
traK6Dcm
What people are complaining about is not the overfitting, but the unrealistic
assumptions in the backtest. In the real world there is slippage,
latencies/jitter, special market open regimes, hidden orders, market impact,
front-running, variable fees, and all kinds of other complexities. Their
transaction costs are apparently also an unreasonable assumption.
Sophisticated simulators used in professional trading firms can account for
such things to some extent, but most academic papers conveniently ignore these
complexities and just assume they can trade at whatever price the data tells
them. It's completely unrealistic.

To answer your original question about overfitting, they can still overfit to
test data by running a lot of experiments with different hyperparameters,
architectures and parts of the data, and only report what has worked. There
are also more complex ways that test data can leak into training data (see the
book Advances in financial ML for a good overview). You can already see this
is likely the case just from the variance in their results and trades. They
also don't compare to baselines. It's not unlikely that the results are just
random and they fail to report those experiments that didn't work. Of course,
you cannot prove this without having an exact log of all things they ever did
to the data. But again, that's not the main issue here.

~~~
anjc
> They also don't compare to baselines

This is certainly worth criticising

> they can still overfit to test data by running a lot of experiments with
> different hyperparameters, architectures and parts of the data, and only
> report what has worked.

but this is a different accusation from accidentally overfitting or leaking,
i.e. it would mean that they're dishonest and cherrypicked their data in such
a way that it hides overfitting and leakage. This criticism can be levelled at
every ML paper, but in this case they detail their architecture, provide the
code, and provide a Jupyter notebook to let people try it themselves.

> just assume they can trade at whatever price the data tells them. It's
> completely unrealistic.

I think that this is a fair assumption for highly liquid markets and
relatively small trades, and if it's a fair assumption then all of your
criticisms (slippage etc) don't apply to the extent that they'll break the
approach. Also, if the approach works then trade size (fees aside) and being
frontrun also wont apply because presumably large HFT firms can use it.

Overall I think your criticisms are valid, but imo they don't invalidate a
promising approach, they're just the next thing to test.

------
mD5pPxMcS6fVWKE
Problem is, there is competition, so any "feature" your ML has discovered,
will fade away as other MLs discover it too. So it has a tendency to become a
completely featureless stochastic system. Temporary features can exist though.

------
nycdatasci
All profitable automated trading strategies that I'm aware of target a
specific inefficiency in the market. What is the inefficiency here? If you
can't articulate the inefficiency, it's probably best not to employ the
strategy.

~~~
phobar
Interesting, could you describe one inefficiency that was exploited in the
past? I could imagine buying/selling due to spreads between exchanges but is
there another not as obvious example?

~~~
IMAYousaf
I can describe one inefficiency from sports gambling. There is a famous NBA
Gambler named Haralabos Voulgaris. He realized that the points total
prediction for a game, let's say 100 points scored for Team A, was merely
sliced in half to represent the half-time score. However, the pace of the
first half is markedly different from the pace of the second half, thus points
are scored at an uneven clip. He exploited that inefficiency for a while to
great success.

Like sports gambling, a lot of the financial products we trade are obviously
built by humans using rules, and arbitraging the intrinsic rules and
regulations around said products. Think about Forex trading where you convert
currency into currency. One of the key strategies is to find and identify
brief negative cycles, for example, in the hope that converting US Dollars to
Euros to Yen back to US Dollars leaves you with more dollars than you started
out with.

------
formalsystem
So many papers by "smart" people predicting the past

~~~
anjc
How do you propose that anyone evaluates their algorithm on the future?

~~~
formalsystem
start a fund with your ML approach - evaluate how much money you've made after
1m,6m,1y,2y,10y etc..

------
person_of_color
I’d like to know what Rentec does. Or whether Rentec even does HFT?

------
hasa
Human greediness is endless. Using AI for stock trading is just unethical and
waste of resources. We are sick.

~~~
mardifoufs
AI has been used in stock trading for decades now. Also, why does it even
matter? Why is it unethical to use AI for trading to begin with?

