I've been playing around with using lstm recurrent nets to find patterns in forex trades, with no real expectation of anything other than learning about recurrent nets (and td convolutional nets). I was able to access 15 years of historical tick data. I would imagine lack of historical pricing data would be an issue for any machine learning approach to crypto trading. Even with 15 years of daily prices I only have ~5500 samples per major currency pair. I've toyed with learning off hourly prices rather than daily, and I've also thought about creating more samples by shifting prices up or down, since perhaps the general patterns would be the same.
You could make any decently clever algorithm work in backtests as long as you don't account for spread, order failures and data delays. Years ago I used to test algorithms in a "harsh environment" where all your trades are essentially 10% worse.
That's true, of course. If the technique worked here then it would be because there really are location independent features that can learned. I view it as similar to translating and transforming images in the MNIST digit set to account for the various ways the feature you're searching for can be spatially oriented. Of course I have no idea if this holds true and will work for pricing data.
Generating data this way is extremely difficult. The true population characteristics of prices are unobservable, we can only with with samples of the population. This means that any attempt at generating data is highly likely to add errors that are difficult to quantify. This is a fundamental difference to disciplines where ML is most successful: in finance you can’t meaningfully generate new data, you can only work with what has been historically recorded, and that is often not relevant to how the market will behave in the future. You can always generate more cat images for a ConvNet to learn, you can feed it photos from multiple angles or even 3D imagery. None of this is available for market data unfortunately.
Right, the thing about applications like the digit set and similar OCR problems etc., is that we can independently generate a model of "acceptable" translations/rotations and validate it reasonably easily because we understand the domain well (not that you can't cause trouble this way). This certainly isn't true across data sets.
Ouch. Out of all the possible subjects to learn NNs on, you have picked by far the most difficult possible. Seriously. If you think of an analogue to rocketry, with the easiest being launching fireworks from a bottle and the other being a mission to Mars, you have picked a Moon landing.
I don’t even know where to begin. Financial data has an extremely low signal to noise ratio and it is fraught with pitfalls. It is highly non-normal, heteroscedastic, non-stationary and frequently changes behavioural regimes. It is irregular, the information content is itself irregular and the prices sold by vendors often have difficult to detect issues that will taint your results until you actually start trading and realise that a fundamental assumption was wrong. You may train a model on one period, and find that the market behaviour has changed and your model is rubbish. Cross validation and backtesting on black box algorithms with heavy parameter tuning is a field of study on it’s own with so many issues that endless papers have been written on each specific nuance.
Successfully building ML models for trading is an extremely difficult discipline that requires a deep understanding of the markets, the idiosyncrasies of market data, statistics and programming. Most quant shops who run successful ML Algos (they are quite rare) have dedicated data teams whose entire remit is to source and clean data. The saying of rubbish in, rubbish out is very true. Even data providers like Reuter’s or Bloomberg frequently have crap data. We pay nearly 500k a year to Reuters, and find errors in their tick data every week. Data like spot forex is a special beast because the market is decentralized. There is no exchange which could provide an authoritative price feed. Trades have been rolled back in the past and if your data feed does not reflect this, you are effectively analysing junk data.
I don’t even want to get started about the fact that trying to train an RNN on 5500 observations is folly. Did you treat the data in any way? The common way to regularise market data for ML is to resample it to information bars. This is not going to work on a daily basis, so you should start off with actual tick data.
Nearly every starry eyed junior quant goes in with the notion that you can just run some fancy ML models on some market data and you’ll get a star trading algo. That a small handful of statistical tests will tell you whether your results are meaningful, whether your data has autocorrelation or mean reverting properties. In reality, ML models are very difficult to train on financial data. Most statistical forecasting tools fail to find relationships and blindly training models on past data very rarely results in more than spurious performance in a back test.
I don’t want to discourage you by any means, but I’d start off with something easier than what you are proposing. Finance firms have entire teams dedicated to what you are trying to do and even they often fail to find anything.
Seconding this. I'm in touch with a bunch of smart coder/traders trying this and nobody (to my knowledge) is making backtest match forward test. To me, ML isn't optimised for this kind of problem. It might be possible to kludge it, but you won't know what it's doing.
My bot that tracks and trades momentum isn't as sexy, but it works.
Thanks for the great feedback. I have no expectations for this other than the learning, and it's already been successful on that front. Just seemed like a fun thing to poke at when most other hobbyists seem to be doing image analysis and language modeling. I've crawled a couple of forums and I get that there are a lot of people out there who think they can readily use these techniques to make money. I doubt very much that this will be the outcome in my case :).
Where I am now I am just trying to figure out how to treat the data, whether to normalize or stationarize and how to encode inputs, etc. The reason that I am working with daily prices is that the fantasy output of this would be a model that can inform a one day grid trading strategy. It may very well be that daily prices won't work for this.
Whether there's anything like an equilibrium in cryptoasset markets where there are no underlying fundamentals is debatable. While there's no book price, PoW coin prices might be rationally describable in terms of (average_estimated cost of energy + cost per GH/s + 'speculative value')
A proxy for energy costs, chip costs, and speculative information
Are there standard symbols for this?
Can cryptoasset market returns be predicted with quantum harmonic oscillators as well?
What NN topology can learn a quantum harmonic model?
https://news.ycombinator.com/item?id=19214650
"The Carbon Footprint of Bitcoin" (2019) defines a number of symbols that could be standard in [crypto]economics texts. Figure 2 shows the "profitable efficiency" (which says nothing of investor confidence and speculative information and how we maybe overvalue teh security (in 2007-2009)). Figure 5 lists upper and lower estimates for the BTC network's electricity use.
https://www.cell.com/joule/fulltext/S2542-4351(19)30255-7
Here's a cautionary dialogue about correlative and causal models that may also be relevant to a cryptoasset price NN learning experiment:
https://news.ycombinator.com/item?id=20163734
Cool stuff, and I didn't mean to discourage you at all. Some of the most interesting challenges in datascience arise in finance.
Forex perhaps is just a pathologically tricky beast to trade well, even though it is the easiest to access. I think perhaps cryptos would be an easier start in terms of there being more inefficiencies and autocorrelation in the market.
In terms of data treatment, I recommend starting with Marco de Prado's Advances in Financial ML. I don't agree with some of his methods, but it is a practical book that highlights a lot of the issues you'll face. You can then draw your own conclusion how to treat them.
All of this is true, but I'll point out that for crypto, your competition is "I heard it was hot on Reddit/Telegram", and if it's not that it's often "I buy $X worth of Bitcoin on the first of every month". I suspect that a random algorithm (like, literally buys & sells random amounts of cryptos at random times) would do better than the average crypto trader, simply because the "I heard it was hot" strategy inherently leads towards buy-high-sell-low behavior.