Hacker News new | past | comments | ask | show | jobs | submit login
Machine learning predictions for QM eMini Crude Oil (dutchess.ai)
40 points by madchops1 on Sept 26, 2017 | hide | past | favorite | 61 comments



If you glance over the whitepaper, it's obvious this is a completely ridiculous model. Trying to predict intraday price movements with previous day volume numbers and open interest as features? Please. Also, selling the "model" predictions for $365 a year is the real scam -- if he had any confidence in the model, he'd get a private backer and be trading size out of his own account rather than making $50 bets.


It's an indicator to assist in trading decisions not a crystal ball. And your analysis of my analysis is incorrect. I do have private backers using this. But I don't see why I can't share as well and have multiple streams of revenue. I post all the results so you can see for yourself. I'm not rich and have only been day trading for a year or two. This is for the little guy not the big guy. So I trade and show examples as someone who may not have lots of liquidity available to show how you too could learn to do this.


You should change the following in your whitepaper though:

"..It contains information that is confidential and privileged. If you have received this document in error, please notify the sender and delete this file."


Ya I should remove that. Ty


I mean it is possible that the model could work for someone with access to more capital and better fee structure than OP making $50 bets.

Still, this seems like a model that will likely work about 95% then fail on outliers ie "picking pennies in front of a bulldozer" model.


The nice thing about QM is that 1 tick is a 12.50 dollar move. So its easy to cover fees. Today those were $50 profits. 2 shares moved 2 ticks. I use stops and limits of course. So my risk was about $100. Its not the same as options.


A strategy that goes long or short QM doesn't have an asymetrical return profile. I.e., it isn't "picking up pennies". An example of that is selling options.


Thanks HockeyPlayer spot on.


I think you misunderstand, what you're doing is even worse than picking up pennies, your strategy just boils down to punting futures with no hedge or any form risk management to your open position. You're just gambling on coinflips. If you backtest your "model" you'll find that external macro event induced crude oil moves will completely wipe you out because you have no hedge against them.


Your are correct about no hedge agains outlying macro events. But I'm not suggesting using this without any other variables in your trading decision. It may help make a trading decision.


Also you should always place a stop and limit order. So you don't get completely wiped out. Its very common.


I do appreciate the criticism though I honestly thought I'd get more. Results, results, results


Have a look at Carver's book 'Systematic Trading' and maybe 'Expected Returns' by Ilmanen. You also need to get a proper backtest going. A black box and a diary won't get you very far.


I am doing my training/evaluation with a data split of 70/30. Doesn't that qualify as a proper backtest?


I don't really know what you mean by evaluation. But you need to be able to (faithfully) generate all the positions your system would take through time, and also to generate all the returns you would have made through time.

Aside from pure P&L, you should be looking at how much risk your system is taking, and under what conditions it's doing badly. All backtests are overfit: their use is mostly in identifying problems with your strategy, rather than predicting how much money you'll make.

One question you'd get asked if you were proposing this in a real trading environment is this: what is it about the QM emini contract that makes this work? Does it work for other energy contracts? For other commodities? For bonds, or equities? If not, why not?


Basically I have a dataset and I train my model with 70% and then evaluate its guesses against the remaining 30%. Hence a baseline is created and I can see if my model performs better.

It took some doing to get this model to perform well. I did this by adding features that help recognize patterns in the time series data.

The features I created are not specific to QM as they are technical (eg. numbers, not news), and time-series related. So the models should work with any historical dataset with the same fields.

My goal is to add another future at some point.


I don't understand your baseline.

I feel like you're talking past me a little. The first thing you need to do is generate all the positions your system would have taken over as many years as possible, and figure out at what times you make and lose money. Otherwise you don't have a backtest.


I apologize. I can do that. I'm going to generate that backtest you described.

Right now I have residual data from the AWS machine learning data that tells me weather there is any structure to the times it does guess wrong. And a value below baseline is a better than 50/50 guess according to what I have learned about how AWS does its ML. Knowing that I use this personally as a supporting indicator to my trade decisions. Since its so new and I really don't want people to think I'm scamming or something. I'm just releasing my results free for now, not trying to be a douche ;)

AWS defines the baseline as follows

Baseline RMSE Amazon ML provides a baseline metric for regression models. It is the RMSE for a hypothetical regression model that would always predict the mean of the target as the answer. For example, if you were predicting the age of a house buyer and the mean age for the observations in your training data was 35, the baseline model would always predict the answer as 35. You would compare your ML model against this baseline to validate if your ML model is better than a ML model that predicts this constant answer.


I will. Thank you for the advice.


A statistical model based on bad/dependent features is essentially just random guessing, which in this case the model just makes favorable guesses.


I experimented with the features until I got good results in my evaluations. If I am getting favorable guesses, how does that point to bad features. Favorable guesses is what I was going for in order to assist in trading decisions.

I use a 70/30 split of training/eval


Just look for feedback. I use many historical price values other than volume. Plus some specific things I have added for pattern recognition in time-series data. That's the special part.


@madchops1 This comment reveals just how seriously we should take your work. This article should be flagged.


touche. Sorry for my negative reaction.


Note to other readers: the comment was changed.


I reacted that way because dude said he read the white-paper and somehow took this as a historical analysis of volume. Which it is not.

I use historical open, high, change, last, settle, prev. day open interest, plus several other fields I use to help recognize patterns and properly weight time-series data.


It was changed. I said "your just jelly". Sorry. I changed it.


The OP says "I don't see why I can't trade it AND share it as well to get multiple streams of revenue?"

Maybe because the capacity of day hold futures (especially e-micro crude) is so small that there is almost no way that the revenues earned from your 365 a year subscribers is going to be greater than the decreased capacity of your strategy from having all those people trading it.

As someone who works at a quant fund, this kind of shit pisses me off to no end. It makes a legit industry look like Herbalife


I don't really have a strategy to copy. I just provide indicators that may affect how you decide to make a trade or not.

Also the point of this project is to get machine models that evaluate below baseline by improving the analysis of time-series data. I spent a lot of time improving the quality until they became better than baseline. Like I said I do not have a crystal ball or strategy for sale. Just insights from pattern recognition of historical time-series data that has helped me trade so I'm putting it out there.


Entirely agree. This site is like looking at a caricature and passing it off as fine portrait art.


yup


OP: how can you expect to be taken seriously if you don't publish backtests, and overfitting analysis?!

It is _extremely_ likely that your model's guesses are just as bad as coin flips.


I will do this thank you for your feedback. I didn't release my results until my models were better than baseline in evaluation. Hence better than flipping a coin. I published the results of my evaluations so you can see that. Its not a crystal ball. Its an indicator to help in trading decisions.


I published the evaluation results in the whitepaper. I am doing my training/evaluation with a data split of 70/30. Doesn't that qualify as a proper backtest?


That's how Simons did it at Renaissance. He sold his model for $365 a year.


I did not know that. Thanks for the info.


This is all a bit naive, OP is not up to speed with the state of the art. This looks like any number of student projects that are created every semester.

The best thing about this is that it looks like OP prototyped and released v1 for sale in under a month. That's respectable.


It may be a bit naive. going to keep working on this and I have found my indicators to greatly increase my own trading results and making the models evaluation numbers improve is very exciting and interesting to me and maybe others.

Thanks for the one compliment :)


Also my models beat baseline in evaluations so I'm stoked on that. It took some doing to produce results with time-series data.


It's been about a decade since I worked with machine learning, and even then I hardly scratched the surface. (I mostly wrote the data access and glue code for someone else's machine learning system.)

Given what I see, it's really hard (for me) to understand what's going on and what's novel. I'm not very active in the financial or machine learning area. All I can understand is that someone wrote an investment program that makes money.

Even with my limited knowledge of machine learning, I know that it's very easy to confuse luck and success. How do you know that you're not just lucky? How do you know that your computer program is really investing, and not just "good timing"?


I am not trading purely on these numbers. They are indicators they help me make my trading decisions and I think they may help others too.

My models evaluations are performing better than baseline when trained with 70% of the data and evaluated against the remaining 30% so I take that as value. As someone else put it a potentially "favorable guess". At this point I'm using the predictions regularly. And I guess I'll know more the longer I keep track of daily results.


Updated White Paper. Generated more evaluations and back tests. https://s3.amazonaws.com/karlcdn/Dutchess.ai+White+Paper.pdf


I know a lot of you hate this but the model has correctly predicted the correct direction of movement at end of day 9 out of 11 days since my model was under baseline and I began publishing my results. I know I've said this a few times but its not a crystal ball, its a value that can assist in trading decisions.


Crude oil has been going up all month. Is your system biased long (i.e. towards buying)?

Edit: looking through your trade history, it does appear to have a long bias. It's predicted a rise in price on all but two days.


11 days is a very short time. Assume that you're lucky.


Thats also what the evaluation of an ML model is for.


Thats true. I'm going to keep going.


I did another back test with a randomly selected 70%/30% training to evaluation ratio for evaluating time-series models. Adding results to whitepaper. The results are still under baseline.


Couple questions:

1. Why QM?

2. Was this tried on any other future and if yes what was the outcome?


I have been trading QM for a year or two now. It has low margin and good volatility for day trading. The tick value is 12.50. So movement of one tick will cover your trade costs. So it makes it possible for a beginner to start with ~$3-5000 and still be able to realistically make money.


Trading costs are more than just commission. Slippage for QM is higher than CL because of thinner volume and can easily be $10-25+ per order ($20-50 round trip). You might say that you don’t worry about slippage because you use limit orders, but limit orders have their own problems, like not getting filled which can easily screw up your returns vs predicted returns. Be careful, there are far more ways to lose money trading than you seem to realize still.


Also doing random not sequential training and evaluation. The proper way to eval time-series models.


I am doing my training/evaluation with a data split of 70/30. Doesn't that qualify as a proper backtest?


What is this I don't even


Working on a tutorial...


I may not be reading this right, but does the white paper contain 5 days of result data?


ya it contains the evaluation results of the model and 5 days of result data.


The ever-relevant xkcd - https://xkcd.com/1570/


Not just relevant; that comic and this website are nearly indistinguishable.


Lol.


I love xkcd.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: