Hacker News new | past | comments | ask | show | jobs | submit login
Multi-Horizon Forecasting for Limit Order Books (arxiv.org)
32 points by ArtWomb 28 days ago | hide | past | favorite | 9 comments

The problem with using neural networks in market microstructure is the latency at inference time. Market makers and HFTs need to compute decisions on the order of microseconds. That’s not feasible with large, deep networks.

With specialized hardware you can get close. But you’re still talking about a mid-single digit number of microseconds on inference alone. The competitor using linear models can get down to hundreds of nanoseconds. If you’re in FPGA world, that kind of latency advantage is worth way more than a 30% accuracy improvement from using a complex ML model.

This describes one extreme of the spectra. That is go fast but be dumb. As far as I know this works well for many people. There are other grous of people going a bit slower but making more informed decision. I think of it as a scatter plot of time on one axis and smartness on the other one. As long as you are siting at Pareto front, you can make money.

Such a front exists, but problem is there's a big discontinuity at one round trip. If you add 2-4 extra us, you need a sizeable edge advantage to justify that.

Furthermore... the HFT market participants are not using CPU-intensive calculations to win consistently. They are using simple calculations(eg 6-period SMA) and extremely low latency to win. They are competing with other HFT participants to get their order on the inside bid/ask before everyone else.

At it's core, macro-level algorithmic trading is answering a question with only 2 possible answers, at any point in time...the question is, will the next tick be either "up" or "down".

what is the special hardware/setup that achieves mid-single digit number of microseconds latency for deep learning inference you referred to?

My understanding is that O(5 uS) is achievable on optimized FPGAs with reasonably large networks. Because of the parallelization, large networks don’t add that much more latency as long as you have enough gates. But I have little experience on FPGA stacks, so can’t say for sure.

Even in software, I’ve been able to hit O(15 uS) using optimized FANN libraries. But the nets are far smaller than deep, and pretty ruthlessly pruned and compressed. Another trick that helps is pre-differentiating across all the variables you don’t expect to change on a latency critical event. E.g. if you’re running a liquidity take strategy, you can pre-differentiate assuming the opposite touch size and deep book stays constant, because you’re only gonna act following on an aggressor trade at the touch.

> My understanding is that O(5 uS) is achievable on optimized FPGAs with reasonably large networks.

Putting aside whether it's technically possible, do you know if any groups are actually having good success with this approach (NNs on microstructure) in live trading?

What about use over longer time horizons? The paper seems to be geared for longer predictions.

The longest time horizon discussed in the paper is 100 ticks, or trades. Likely less than a few minutes.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact