
Algorithmic Trading Using Logistic Regression - cshad
https://handsoffinvesting.com/an-algorithmic-trading-strategy-using-logistic-regression/
======
lalaland1125
This article is horribly incorrect. All of the numbers and analysis are wrong
and nobody should be using it as a guide to anything.

The key problem with their analysis is that they don't do a proper train-test
split. All of their results are probably just overfitting on the training. You
know that they are doing an incorrect train/test split because they are using
train_test_split from the sklearn library, which is only designed to work on
independent data. However, stock prices are highly correlated with each other.
In order to get a true "independent" test set you have to do a time split
where all the test set values are from the future and all of the training set
values are from the past.

~~~
fractionalhare
_> The key problem with their analysis is that they don't do a proper train-
test split._

Yes, that's bad. But the enterprise is doomed even earlier than that: the
author is using a naive regression technique on timeseries data. Stock price
data is typically log-normally distributed and non-stationary. If the author
is not going to use a robust timeseries technique like ARIMA, they need to
transform and constrain the data to account for non-stationarity. Otherwise
it's trivial to find superfluous correlations which appear to show a strong
linear relationship between some fundamental financial metric and some other
technical indicator, because both may have a temporal component (e.g. stocks
tend to go up over time, and a lot of technical indicators have some implicit
component which also increase or decrease with time).

This is also why it's important to have a strong fundamental thesis to explain
what's going on.

~~~
ulucs
I don't think throwing ARIMA without the proper knowledge would be enough. AR
is evident, but I and MA will be unjustified assumptions that take away from
the meat of the subject.

Also you might need to get into Panel Data methods when you are evaluating
more than one stock

------
Res563
He also has an article up called

How I Lost All of My Money in the Stock Market

He is going to lose it all again if the rest of his algos look like this one.

[https://handsoffinvesting.com/how-i-lost-all-of-my-money-
in-...](https://handsoffinvesting.com/how-i-lost-all-of-my-money-in-the-stock-
market/)

------
smabie
Okay, so I got the page to load (in a previous comment, I said it would not
load). This guy is doing nothing correctly, and his code is an absolute mess.
He's not using pandas except for displaying data and reading a CSV and his
code is unnecessarily ugly and verbose. You really shouldn't be using very
many loops at all when doing quant research. And he capitalizes all variable
names, but whatever I guess.

He's using raw price data (not sure if Yahoo split and dividend adjusts, so
that might be another problem), not log returns. Also as lalaland1125 pointed
out, he's literally splitting his data into training and testing randomly, not
keeping contiguous time-series. This makes no sense, and no quant researcher
would _ever_ do this.

Because of these problems, his results are completely and totally invalid and
readers should completely ignore this article.

Looking at his linkedin, he has no experience in finance, as I expected. You
can't just take over your textbook ML models and apply them to finance. It
doesn't work like that.

I've seen a lot of these ML stock market witchcraft kind of medium posts
lately and I hope they're making money off of medium, because they definitely
aren't developing successful strategies.

Edit: I wasn't going to mention this, but I run a no bullshit quantitative
finance blog: [https://cryptm.org/](https://cryptm.org/)

I keep it mathematically rigorous and honest. Some of it is original research,
but most of it is backtesting/analysis of well known academic research.

Edit2: Take a look at this code:

    
    
        while x < (len(data)):
            if x < (len(data)-5):
                if ((close_prices[x+1] + close_prices[x+2] + close_prices[x+3] + close_prices[x+4] + close_prices[x+5])/5) > close_prices[x]:
                    Five_Day_Obs.append(1)
                else:
                    Five_Day_Obs.append(0)
            else:
                Five_Day_Obs.append(0)
            x+=1

