I don’t even know where to begin. Financial data has an extremely low signal to noise ratio and it is fraught with pitfalls. It is highly non-normal, heteroscedastic, non-stationary and frequently changes behavioural regimes. It is irregular, the information content is itself irregular and the prices sold by vendors often have difficult to detect issues that will taint your results until you actually start trading and realise that a fundamental assumption was wrong. You may train a model on one period, and find that the market behaviour has changed and your model is rubbish. Cross validation and backtesting on black box algorithms with heavy parameter tuning is a field of study on it’s own with so many issues that endless papers have been written on each specific nuance.
Successfully building ML models for trading is an extremely difficult discipline that requires a deep understanding of the markets, the idiosyncrasies of market data, statistics and programming. Most quant shops who run successful ML Algos (they are quite rare) have dedicated data teams whose entire remit is to source and clean data. The saying of rubbish in, rubbish out is very true. Even data providers like Reuter’s or Bloomberg frequently have crap data. We pay nearly 500k a year to Reuters, and find errors in their tick data every week. Data like spot forex is a special beast because the market is decentralized. There is no exchange which could provide an authoritative price feed. Trades have been rolled back in the past and if your data feed does not reflect this, you are effectively analysing junk data.
I don’t even want to get started about the fact that trying to train an RNN on 5500 observations is folly. Did you treat the data in any way? The common way to regularise market data for ML is to resample it to information bars. This is not going to work on a daily basis, so you should start off with actual tick data.
Nearly every starry eyed junior quant goes in with the notion that you can just run some fancy ML models on some market data and you’ll get a star trading algo. That a small handful of statistical tests will tell you whether your results are meaningful, whether your data has autocorrelation or mean reverting properties. In reality, ML models are very difficult to train on financial data. Most statistical forecasting tools fail to find relationships and blindly training models on past data very rarely results in more than spurious performance in a back test.
I don’t want to discourage you by any means, but I’d start off with something easier than what you are proposing. Finance firms have entire teams dedicated to what you are trying to do and even they often fail to find anything.
My bot that tracks and trades momentum isn't as sexy, but it works.
Where I am now I am just trying to figure out how to treat the data, whether to normalize or stationarize and how to encode inputs, etc. The reason that I am working with daily prices is that the fantasy output of this would be a model that can inform a one day grid trading strategy. It may very well be that daily prices won't work for this.
A proxy for energy costs, chip costs, and speculative information
Are there standard symbols for this?
Can cryptoasset market returns be predicted with quantum harmonic oscillators as well?
What NN topology can learn a quantum harmonic model?
Here's a cautionary dialogue about correlative and causal models that may also be relevant to a cryptoasset price NN learning experiment:
Forex perhaps is just a pathologically tricky beast to trade well, even though it is the easiest to access. I think perhaps cryptos would be an easier start in terms of there being more inefficiencies and autocorrelation in the market.
In terms of data treatment, I recommend starting with Marco de Prado's Advances in Financial ML. I don't agree with some of his methods, but it is a practical book that highlights a lot of the issues you'll face. You can then draw your own conclusion how to treat them.