Hacker News new | past | comments | ask | show | jobs | submit login

Ouch. Out of all the possible subjects to learn NNs on, you have picked by far the most difficult possible. Seriously. If you think of an analogue to rocketry, with the easiest being launching fireworks from a bottle and the other being a mission to Mars, you have picked a Moon landing.

I don’t even know where to begin. Financial data has an extremely low signal to noise ratio and it is fraught with pitfalls. It is highly non-normal, heteroscedastic, non-stationary and frequently changes behavioural regimes. It is irregular, the information content is itself irregular and the prices sold by vendors often have difficult to detect issues that will taint your results until you actually start trading and realise that a fundamental assumption was wrong. You may train a model on one period, and find that the market behaviour has changed and your model is rubbish. Cross validation and backtesting on black box algorithms with heavy parameter tuning is a field of study on it’s own with so many issues that endless papers have been written on each specific nuance.

Successfully building ML models for trading is an extremely difficult discipline that requires a deep understanding of the markets, the idiosyncrasies of market data, statistics and programming. Most quant shops who run successful ML Algos (they are quite rare) have dedicated data teams whose entire remit is to source and clean data. The saying of rubbish in, rubbish out is very true. Even data providers like Reuter’s or Bloomberg frequently have crap data. We pay nearly 500k a year to Reuters, and find errors in their tick data every week. Data like spot forex is a special beast because the market is decentralized. There is no exchange which could provide an authoritative price feed. Trades have been rolled back in the past and if your data feed does not reflect this, you are effectively analysing junk data.

I don’t even want to get started about the fact that trying to train an RNN on 5500 observations is folly. Did you treat the data in any way? The common way to regularise market data for ML is to resample it to information bars. This is not going to work on a daily basis, so you should start off with actual tick data.

Nearly every starry eyed junior quant goes in with the notion that you can just run some fancy ML models on some market data and you’ll get a star trading algo. That a small handful of statistical tests will tell you whether your results are meaningful, whether your data has autocorrelation or mean reverting properties. In reality, ML models are very difficult to train on financial data. Most statistical forecasting tools fail to find relationships and blindly training models on past data very rarely results in more than spurious performance in a back test.

I don’t want to discourage you by any means, but I’d start off with something easier than what you are proposing. Finance firms have entire teams dedicated to what you are trying to do and even they often fail to find anything.




Seconding this. I'm in touch with a bunch of smart coder/traders trying this and nobody (to my knowledge) is making backtest match forward test. To me, ML isn't optimised for this kind of problem. It might be possible to kludge it, but you won't know what it's doing.

My bot that tracks and trades momentum isn't as sexy, but it works.


Thanks for the great feedback. I have no expectations for this other than the learning, and it's already been successful on that front. Just seemed like a fun thing to poke at when most other hobbyists seem to be doing image analysis and language modeling. I've crawled a couple of forums and I get that there are a lot of people out there who think they can readily use these techniques to make money. I doubt very much that this will be the outcome in my case :).

Where I am now I am just trying to figure out how to treat the data, whether to normalize or stationarize and how to encode inputs, etc. The reason that I am working with daily prices is that the fantasy output of this would be a model that can inform a one day grid trading strategy. It may very well be that daily prices won't work for this.


Whether there's anything like an equilibrium in cryptoasset markets where there are no underlying fundamentals is debatable. While there's no book price, PoW coin prices might be rationally describable in terms of (average_estimated cost of energy + cost per GH/s + 'speculative value')

A proxy for energy costs, chip costs, and speculative information

Are there standard symbols for this?

Can cryptoasset market returns be predicted with quantum harmonic oscillators as well? What NN topology can learn a quantum harmonic model? https://news.ycombinator.com/item?id=19214650


"The Carbon Footprint of Bitcoin" (2019) defines a number of symbols that could be standard in [crypto]economics texts. Figure 2 shows the "profitable efficiency" (which says nothing of investor confidence and speculative information and how we maybe overvalue teh security (in 2007-2009)). Figure 5 lists upper and lower estimates for the BTC network's electricity use. https://www.cell.com/joule/fulltext/S2542-4351(19)30255-7

Here's a cautionary dialogue about correlative and causal models that may also be relevant to a cryptoasset price NN learning experiment: https://news.ycombinator.com/item?id=20163734


Cool stuff, and I didn't mean to discourage you at all. Some of the most interesting challenges in datascience arise in finance.

Forex perhaps is just a pathologically tricky beast to trade well, even though it is the easiest to access. I think perhaps cryptos would be an easier start in terms of there being more inefficiencies and autocorrelation in the market.

In terms of data treatment, I recommend starting with Marco de Prado's Advances in Financial ML. I don't agree with some of his methods, but it is a practical book that highlights a lot of the issues you'll face. You can then draw your own conclusion how to treat them.


Thanks for the recommendation, it's a great book and I've already gotten some value, as well as some perspective, out of it.


All of this is true, but I'll point out that for crypto, your competition is "I heard it was hot on Reddit/Telegram", and if it's not that it's often "I buy $X worth of Bitcoin on the first of every month". I suspect that a random algorithm (like, literally buys & sells random amounts of cryptos at random times) would do better than the average crypto trader, simply because the "I heard it was hot" strategy inherently leads towards buy-high-sell-low behavior.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: