
Stock Price Prediction with LSTMs - hoaphumanoid
https://miguelgfierro.com/blog/2018/stock-price-prediction-with-lstms/
======
highd
This model is showing predictions one day into the future. The "test set" plot
is all predictions made with data from 1 day ago. The input sequences have
size 1, so no recurrence is happening (see to_1dimension in
[https://github.com/miguelgfierro/sciblog_support/blob/master...](https://github.com/miguelgfierro/sciblog_support/blob/master/Time_Series_Forecasting_of_Stock_Price/utils.py)).
If you predict that tomorrow will have the same price as today you'll get
better plots under the same operating conditions.

In[2]: TIME_AHEAD = 1

Train set has ~1e-6 MSE, Test set has ~0.8 (0.94^2).

EDIT: I should say this person is probably learning, and a lot of this is
honest mistake.

------
antirez
Actual way this experiment should work: feed the recurrent NN with stock
markets data from the latest 10 years _but_ the latest year. Then display the
prediction for the latest year and the actual data. I want to see if it will
match like that...

~~~
turingcompeteme
Predicting for only the last year is good, but there are two even better ways:

\- An expanding window where you train on the first year of data and predict
the second. Then train on the first two years and predict the third, etc.

\- A rolling window where you train on years 1 and 2 and predict 3. Then train
on 2 and 3 and predict 4, etc.

You need to show that your predictions work for any time period, not just the
past year.

------
bcheung
Cool to see that you created a notebook and are publishing this like a paper.
I was thinking of doing some projects and this looks like a good format to
follow.

However, this looks to be grossly overfitting. You can't just randomly drop
out samples in a time series and use those for the test dataset. You need to
cut out larger contiguous sequential time ranges and reserve those for your
test set. Probably a single contiguous time window.

Anyone can predict with a high degree of accuracy what stock price is given
the last 5 days and the next 5 days.

Predicting a few randomly dropped out pixels in that graph is really easy due
to the nature of time series. You can just interpolate it.

~~~
highd
It's actually using just one day's data (one point) to predict the next day's
data, let alone predicting 5 days.

------
m3kw9
The model could be trained in such a way that it will only work with
historical data and pass only back tests accurately. This is because you are
training against the “answer” which is already known.

~~~
m3kw9
Training with trends and patterns is like a technical trader looking for
patterns. When things go right they attribute to the pattern, but when it’s
wrong, they attribute it to unforeseen events. The unforeseen events is
something if can be predicted, you will be rich lol

------
p1necone
According to the graph this seems to basically perfectly predict future stock
price. I don't believe it's possible to be that accurate at all, what am I
missing here?

~~~
bcheung
It's because the test dataset involves randomly choosing samples spread
throughout the time series.

Imagine dropping 10% of the pixels randomly spaced out in a historical graph
and then try to fill them in. It's trivial because you can just average the
previous and next sample and have an extremely high accuracy.

~~~
claytonjy
I don't think that's true, because of the `shuffle = False` argument to
`train_test_split`; it's just chopping the tail off for the test set.

------
godelmachine
Please forgive my ignorance, since I'm a neophyte in this field.

But since you have taking true value & training data, both, from the past,
won't your "prediction value" be prone to lots of biases?

~~~
vajrabum
I'm the farthest thing from an expert but here are some factors to consider.
You're going to train on the past data. Typically at least to begin with you
pick some time period and train on part of the data and keep back some to test
on. And yes, there's definitely a risk of overfitting. So some of the real
data will be in sample and some out of sample for testing. If you want to get
more sophisticated there is one of a number of resampling techniques in which
you systematically choose different subsets of the data to train on and then
look at the variation of the predictions. That can give you some idea of the
error you'll get. Not sure how that works on data with high dimensionality but
if you're doing an autoregression/autocorrelation (i.e. predicting tomorrows
price based on it's past behavior) you'll only have the single time series.
The first thing you'll likely notice if you try and apply that sort of model
to stocks is that stocks are at least as highly correlated and likely much
more so to the market than they are to their own past behavior.

