
Ask HN: How credible is Reinforcement learning in finance(stocks)? - critiq
Recently came across blog post on reinforcement learning on stock data to trade. However in reinforcement setting agent should be able to play with simulated environment to explore and learn. However there is no way to simulate the price all it does is takes historical fragment of price and replays it.<p>In my mind it is equivalent to historical data labeled and resampled with redundancy after shuffling. Am I missing something here?<p>Blog that I was reading was: https:&#x2F;&#x2F;towardsdatascience.com&#x2F;deep-reinforcement-learning-for-automated-stock-trading-f1dad0126a02
======
shoo
> However there is no way to simulate the price

Compared to other real world problem domains which are far less computerised
it seems relatively simple to get fresh data for a trading system: let the
system execute actual trades and measure the actual response.

It might be difficult to do this without having a budget to burn in running
experiments, both in terms of money that is going to be lost while the system
makes poor trades, setting up controls to ensure that not too much money can
be lost in any experiment, and subscribing to data feeds that give a richer
view of how the market responds (e.g. order book info).

In contrast, consider trying to get fresh data in materials science etc where
you may need to manufacture small batches of materials and then test them. It
might cost $10k in materials and weeks of work with expensive machinery and
skilled technicians to generate a dozen or so fresh data points.

------
alexmingoia
The problem I see with ML on stock prices is that stock price is not a
function of the various available numerical indicators. Markets are non-
deterministic and chaotic systems. Contrast that with driving, where the
decision to turn left, right, throttle, etc. is largely a function of the
features (surroundings, current speed, etc.).

That said, people are using ML to predict future prices, although no hedge
fund claiming this has published their returns.

With regards to reinforcement learning, why not just try it? Numer.ai is an ML
competition that gives you free obfuscated stock data and rewards you if your
model is successful.

~~~
critiq
Yeah thanks, I have tried, I was comparing and found reinforcement learning is
not really _reinforcement_ in this setting. As it cannot allow exploring like
what if scenario. It is similar to supervised learning setting as we do in
labeling large data and trying to classify etc.

In contrast if we take
[https://gym.openai.com/envs/Pendulum-v0/](https://gym.openai.com/envs/Pendulum-v0/)
Environment allows algorithm to play with it and learn from new conditions and
actions, so is reinforcement. To me finance is like teacher trying to tell
same story no matter what student does/learns, bit wierd.

------
shoo
> In my mind it is equivalent to historical data labeled and resampled with
> redundancy after shuffling

It is even more constrained than that: there can be no shuffling along the
time dimension, as that would destroy the relationship between time and the
price trajectory (e.g. momentum). It is possible they could have sampled
subsets of the full universe of 30 stocks to generate different training
scenarios, but the blog post doesn't talk about anything related to that.

> agent should be able to play with simulated environment to explore

The work makes the assumption that the environment is not impacted by the
agent's actions, see section 3.3:

> Market liquidity: The orders can be rapidly executed at the close price. We
> assume that stock market will not be affected by our reinforcement trading
> agent.

It's interesting to try to roughly estimate how many non-overlapping windows
of market data were used in training the ensemble strategy. What I really want
is to estimate how many independent samples of input data there are to train
on, but there are probably no truly independent samples, since we're talking
about historical trajectories of stock prices over time. For the window size
we can try to figure out how much time series data the ensemble model needs to
consume to output a single prediction.

The ensemble strategy is described as:

> Step 1. We use a growing window of 𝑛 months to retrain our three agents
> concurrently. In this paper, we retrain our three agents at every three
> months.

> Step 2. We validate all three agents by using a 3-month validation rolling
> window followed by training to pick the best performing agent which has the
> highest Sharpe ratio. We also adjust risk-aversion by using turbulence index
> in our validation stage.

> Step 3. After validation, we only use the best model with the highest Sharpe
> ratio to predict and trade for the next quarter.

Steps 1, 2 and 3 depend on a number of numerical and structural parameters,
such as the size of the windows in step 1 and 2, the choice of which metric to
pick best agent, the "turbulence index" adjustment, the use of argmax instead
of some other approach in step 3. It is possible that the researchers wrote
down and pre-commited to exactly these parameters before looking at any data,
and never changed them, but it is more likely that these parameters were
chosen out of a huge space of alternatives based on what was observed while
running experiments, before finally measuring performance on the hold-out data
set.

When thinking about automated ensemble strategies, the above 3 steps all need
to be executed before the ensemble model can output a single prediction. I
don't quite understand the explanation of step 2 but it suggests that the
ensemble model needs to consume at least 3 months + 3 months = 6 months of
trailing data before it can output a single prediction. In reality it would be
worse, many of the input features defined for each stock that are fed as
inputs appear to be technical indicators that are some fancy form of a moving
average, so to define them they also need to consume some trailing window of
price data -- these details aren't described in the blog. If we assume that
each of these features needs at most 1 month of trailing price data then that
means we need 1 month + 3 months + 3 months = 7 months of trailing price data
before the ensemble model can output a single prediction.

> Data from 01/01/2009 to 12/31/2014 is used for training, and the data from
> 10/01/2015 to 12/31/2015 is used for validation and tuning of parameters.
> [...] we test our agent’s performance on trading data, which is the unseen
> out-of-sample data from 01/01/2016 to 05/08/2020

So data from 01/01/2009 to 12/31/2014 is used for training, and data from
10/01/2015 to 12/31/2015 was also used to tune parameters, i.e. also used for
training. This gives 15 years of data for training, which is enough to
generate about 15 years / 6 months = 30 non-overlapping windows of data that
could be used to generate outputs from the ensemble model. Then there's enough
out-of-sample data for about 4.5 years / 6 months = 27 non-overlapping windows
of data to evaluate the model performance.

It seems like this work involves fitting and tuning a very high parameter
model using a dataset that only offers a single trajectory (containing all the
stocks) of 30 non-overlapping periods of input data to use when training, and
27 non-overlapping periods of data to evaluate.

~~~
critiq
Thanks for your reply

