Choosing Java instead of C++ for low-latency systems

dcolkitt · on Feb 22, 2021

> any excess latency that Java introduces into your software is likely to be much smaller than existing latency sinks, such as network communication delays, in (at least)

In a co-located datacenter with high performance network equipment this isn't true at all. The time from my CPU to a kernel bypass NIC to a cut-through switch to the exchange's FPGA gateway is in the low single-digit microseconds.[1][2]

[1]https://www.eurexchange.com/resource/blob/48918/ba7e2c5900f1... [2]https://www.arista.com/en/products/7150-series-network-switc... [3]https://technologyevangelist.co/2015/11/04/1-44-microseconds...

bitcharmer · on Feb 22, 2021

> The time from my CPU to a kernel bypass NIC to a cut-through switch to the exchange's FPGA gateway is in the low single-digit microseconds

Single digit microsecond latency is also what you should expect from a well written market data gateway or a fix engine. Java definitely competes with C++ in this space.

dcolkitt · on Feb 22, 2021

I'm not saying it's impossible for JVM to compete at the level. But you definitely don't get O(5 uSec) performance out of the box. The author is trying to argue that language considerations are irrelevant, because network latencies are orders of magnitude higher. That may be true for web networking, but it's definitely not true for HFT networks.

If you hit a GC pause or have to do heap allocation or don't vectorize a calculation because you're not running native code, it all costs way more than the network stack. Again, I'm not saying that there aren't techniques to deal with that in Java. But what I am saying is that you can't just ignore those issues, like the author seems to suggest.

DrBazza · on Feb 22, 2021

> But you definitely don't get O(5 uSec) performance out of the box.

Tick-to-trade? The first version of our trading system in Java, had this level of performance in the first version.

> If you hit a GC pause or have to do heap allocation

We ended up running the code through profiler on our CI system which would fail the build if it detected any potential paths of GC. Object pools and flyweights everywhere.

bitcharmer · on Feb 22, 2021

> We ended up running the code through profiler on our CI system which would fail the build if it detected any potential paths of GC. Object pools and flyweights everywhere

Pretty standard practice in HFT Java, but the GP correctly points out this is not idiomatic Java.

DrBazza · on Feb 22, 2021

Agreed. From the article:

> In other words, it’s possible to write Java, from the machine level on up, for low latency. You just need to write it like C++, with memory management in mind at each stage of development.

That's the tl;dr for the whole article.

bitcharmer · on Feb 22, 2021

Agreed, low latency Java is very far from standard Java programming.

> or have to do heap allocation

FYI, heap allocation in Java is much cheaper (~11 cycles) than in C++ as it's just a bump-the-pointer in TLAB.

https://stackoverflow.com/a/38298326/1278647

dcolkitt · on Feb 22, 2021

Well, my point is that in C++ objects can be allocated on the stack.

bitcharmer · on Feb 22, 2021

Java can allocate on the stack too or perform partial allocation of exploded objects thanks to escape analysis.

dcolkitt · on Feb 22, 2021

Thanks, didn't know that. Will have to look into it.

cashewchoo · on Feb 22, 2021

FWIW, there's actually a large amount of industry experience in ultra-low-latency Java kicking around the ATS/exchange/HFT space. Island's ECN was Java, and that incubated a lot of this kind of talent.

It's absolutely not your everyday java (I've heard the answer to "how do you handle gc's" is just "don't allocate memory") but it's a Thing.

bitcharmer · on Feb 22, 2021

Yup, Java is common among low-latency-ish market makers like Goldman Sachs, UBS, HSBC, Morgan Stanley and others.

Edit: since I'm being down-voted for this comment - I worked in all the above banks on low-latency Java projects. That you don't like the language/runtime doesn't change the facts.

gpderetta · on Feb 22, 2021

The question is, one you are no longer writing anything close to idiomatic java, does the (sketchy) argument that Java is easier to write than C++ still holds?

bitcharmer · on Feb 22, 2021

It still is as there's plenty of non-low-latency related stuff that you still need from the ecosystem such as portability, dependency management, common libraries and utilities, debuggers, profilers, etc.

phildawes · on Feb 23, 2021

but if any of those common libraries allocate short lived objects, doesn't that ruin things?

gpderetta · on Feb 22, 2021

Thanks, this makes sense.

kolbe · on Feb 22, 2021

Furthermore, if all your competitors have to incur the same fixed latency, then this is irrelevant. Just because one system is 200mics and another is 201mics doesn’t mean the 201mic one is good enough. HFTs are competing on what they can control, because if that trade comes with 199mics of network latency, it means the first firm will make a good trade every time, and the second will be shut out.

usefulcat · on Feb 22, 2021

This is true, but also misses a bigger point: in the particular case of low latency trading, once everyone has similarly fast network infrastructure, the only place left to improve is the time spent executing the algorithm on the CPU. Certainly not all strategies are so latency-sensitive, but for those that are, even a small speed difference there can have a significant economic impact.

reedf1 · on Feb 22, 2021

> Well, C++ might be “low latency” when it comes to executing code, but it’s definitely not low latency when it comes to rolling out new features or even finding devs who can write it.

Every time one of these articles roll around it's always the same point. At the end of the day some systems actually do need low-latency in real time and you will be steamrolled by competitors if it is not.

titzer · on Feb 22, 2021

And every time someone makes this point I ask them what the latency of their destructor chains are and no one seems to have measured.

If you don't live and die by a profiler, they you're flying blind.

I've seen a lot of claims that "our system absolutely requires low latency" and then when you dig, it turns out that there is a whole lot of praying under there.

That doesn't invalidate your point, to be fair. But we keep talking about all these mythical hard-real time applications and every time I explore, that actual space gets smaller and smaller.

edit: just to be clear, I brought up destructor chains because they are a potentially large, potentially unpredictable cost in the code that may profoundly depend on the heap layout that is not immediately visible looking at the code. They are something that you need to carefully think about and measure to understand their true cost.

galangalalgol · on Feb 22, 2021

Is c++ that unusual these days? Its the world I live in so I wouldn't know. I wouldn't expect it to be in web stuff backend or client side, but I figured anything embedded big enough to have an OS instead of just a BSP would be using c++. All those triple A game titles use it I'd assume. And most operating systems and drivers. And probably at least a quarter the native applications for desktop or mobile? The college I went to doesn't teach languages, but they have favored languages for assignments. When I was there it was c or java, but starting a decade ago when Bjarne joined the department c++ was the favored, so all thoae grads are at least familiar with it.

hawk_ · on Feb 22, 2021

the problem is that low latency is not absolutely defined, and it's a bit of a buzzword so people who don't know about the specific niches just don't get why java won't cut and you need c++ rust or similar to meet targets there.

raverbashing · on Feb 22, 2021

The most realistic definition of a real-time system that I have heard is: It is a RT system when your results are ready when they need to.

That's it. That's what an RT system is. I don't care if your latency is 1us, 1ms, 1s or 1h. If you can't miss the deadline then your system is a real-time one.

Low-latency then depend on how your strategy works and your particular needs. It's useless to have sub ms timing of stock market data if your results take seconds to calculate.

Of course, your strategy that takes 1s might be more lucrative than one that takes 10ms to calculate. Then it's up to you and your algos.

gumby · on Feb 22, 2021

That’s correct: a real-time system can be slower, if that’s what it takes to be deterministic. And usually that is one of the things it takes.

I have never understood why people might think it ought to be faster.

jacobr1 · on Feb 22, 2021

It is another classic confusion between a definition in technical jargon (maybe even a formal definition) and the vernacular. Real-time for most people means "instantaneous," and definitions of latency are not considered. So it is a shorthand for "low-latency" without a rooted understanding latency at all.

iexplainbtc · on Feb 22, 2021

I don't understand all this confusion about what RT means, this is the definition of a RT system. If you can't complete the computation within the given deadline whatever partial result is discarded and the computation starts from scratch.

Decoding video and interactive rendering is a great example of an RT system. If you can't construct a frame within (usually) 16 ms you skip it and start with the next one.

titzer · on Feb 22, 2021

There is a difference between hard real time and soft real time. Hard real time systems mean that missing a single deadline may be a crash, whereas a soft real time system, missing a deadline just causes a degradation. Video is definitely soft real time.

iexplainbtc · on Feb 26, 2021

Hard and soft real time depend entirely on the context. An example of hard real time is flight control. The whole sensor - actuator - decision process must happen within the given threshold.

Spivak · on Feb 22, 2021

But decoding video in practice isn't usually realtime because you still have to do it when you don't have kernel support and are at the whim of your kernel's scheduler which is basically always.

It's a good example of a problem where a realtime system would be useful but it's not a good example of a real-life realtime application (usually).

CoolGuySteve · on Feb 22, 2021

You can write low latency Java but by the time you do, the C++ will have taken a similar amount of time.

The time to market argument is really stupid imo.

whimsicalism · on Feb 22, 2021

I really was hopeful that the article would move beyond this point. It even teased with a "In fact, there is good reason to question the idea that C++ is genuinely “faster” or has a “lower latency” than Java at all."

But then it just restates the development time point! (at least acknowledging that is absurd this time)

> First, there’s the (slightly absurd) point that if you have two developers, one writing in C++ and one in Java, and you ask them to write an platform for high-speed trading from scratch, the Java developer is going to be trading long before the C++ developer.

gpderetta · on Feb 22, 2021

For my, possibly limited, personal experience writing trading systems, fighting UB in C++ is not even in the top 10 of the issues you have to deal with. The vast majority of the problems are business level issues.

C++ also allows expressing complex concepts in types without overhead (can you have price abstraction in java with no overhead? Can you write an efficient fix point numeric type? Can you post lambdas between threads without memory allocations?).

tines · on Feb 22, 2021

> Well, C++ might be “low latency” when it comes to executing code, but it’s definitely not low latency when it comes to rolling out new features or even finding devs who can write it.

Yikes, this article is terrible.

vultour · on Feb 22, 2021

Turns out javascript was the lowest latency language all along!

igh4st · on Feb 22, 2021

"In the grim darkness of the far future... there is only javascript"

tonyedgecombe · on Feb 22, 2021

https://www.destroyallsoftware.com/talks/the-birth-and-death...

lenkite · on Feb 24, 2021

In the joyful light of the distant future... there is only WASM!

IshKebab · on Feb 22, 2021

Not an exaggeration. I expected some technical reason why if you carefully write your code to avoid garbage collection then GraalVM can beat C++ in some cases (or something like that).

But no. Literally just "actually, C++ is better, but Java is easier so you should use that instead".

DrBazza · on Feb 22, 2021

> You just need to write it like C++, with memory management in mind at each stage of development.

Having worked on a 'low latency' trading system written in Java, this is true. Ultimately you end up using a flyweight pattern over shared memory, and you have to be aware of alignment and cache sizes. Then there are other tricks like zero GC, or pushing the collection back to once a day.

You also spend a fair amount of time analyzing performance to the nanosecond looking for, and eliminating, jitter.

I've also written the same in C++, and it's a similar exercise, along with dashes of [[unlikely]], template specialization, and far too much time inside godbolt.org looking at assembler to see if gcc/clang has "optimized" something in the latest release.

gpderetta · on Feb 22, 2021

In your experience, which one was more pleasant to write? Did java allow for iterating quicker?

DrBazza · on Feb 22, 2021

> which one was more pleasant to write? Did java allow for iterating quicker?

Java. And I'm a career C++ programmer.

The whole system was built to be event driven (messaging) from the ground up too. It had a lot of benefits like performance testing code without instrumentation, the whole junit/mockito ecosystem, code completion in Intellij that actually worked. You only had to worry about code errors, not linker errors. We had over 100,000 unit tests.

We leveraged the whole JVM and ecosystem too. Outside of the hot path code, we could just go and use json libraries, webservers, and so on. In C++ we'd be struggling (not so much these days), to throw together a JSON/REST/ws webserver in the same code base.

Though Java always feels like to trying to unwrap a present whilst wearing mittens. It's excessively verbose. We didn't pick Kotlin as at the time as that was known to emit various extra byte code for seemingly trivial calls, and was only on v1.0 at the time.

gpderetta · on Feb 22, 2021

Thank you (and the sibling as well) for the comment. As a non-Java programmer, I'm probably underestimating the value of the java ecosystem.

bitcharmer · on Feb 22, 2021

Not the OP but we seem to have very similar experience developing low latency HFT systems in both languages.

Java is much more comfortable to work with when you need to rapidly deliver some latency-sensitive functionality, doesn't blow up with segfaults if you do something stupid and has awesome profilers, debuggers, IDEs, dependency management, etc.

Where it looses compared to C/C++ is when you need to have absolute control over the emitted native code (this is getting very close to solved with Graal) or when you need to prevent deoptimization storms when hitting an uncommon branch in the middle of the trading day (there are ways to prevent or limit it but it's ugly).

lordnacho · on Feb 22, 2021

Well I did trading systems in both of these, and I think it's much more natural to do it in C++.

The thing about banks doing systems in Java doesn't really convince me, the times I was facing a Java engine it didn't really matter how fast the thing was, due to the particular thing we were doing. (Not everything is a speed race).

The thing about C++ is you get ultimate control, but you have to use it. For instance if you just write a dummy application where you populate a hashmap and pull out some values, it's easy to do this slower than a managed solution, because C++ allows you to leave out the allocator, meaning you end up using a default one that isn't so great. On the surface it will look like any old hashmap, because if you've looked at a few languages that kind of thing will tend to look like `map<mytype>` in most languages, but there's actually a way to say `map<mytype, myallocator>` in C++.

I find it's actually easier to perf debug c++ than managed languages. With managed, you end up having to think about how the GC works, and the GC is actually not a very simple piece of software. You can sidestep the GC, but then you're not in a very idiomatic Java. There's a bunch of extra rules you have to adhere to.

mansoor_ · on Feb 22, 2021

Almost every point of significance in this article was subjective. You can go low-level and faster in Java as easily as you can go high-level, safer and slower in modern Cpp.

RcouF1uZ4gsC · on Feb 22, 2021

> In other words, it’s possible to write Java, from the machine level on up, for low latency. You just need to write it like C++, with memory management in mind at each stage of development.

If you do that, you lose on a lot of stuff in C++ that supports that. For example, destructors/RAII make managing memory resources a lot better. Tooling like Valgrind, ASan, etc make it easier to find issues in your memory management. With Java, you don’t have those. The vast majority of Java is written with GC in mind, and when you go off the beaten path and do memory management on your own, you won’t have the language features and tools to support you, making it harder to do than in C++.

drno123 · on Feb 22, 2021

> People who enjoy coding in C++ (all three of them)

I have coded both in Java and C++. I enjoy coding in C++, and I detest with a passion coding in Java.

tonyedgecombe · on Feb 22, 2021

I'm the opposite. When I code in C++ I have to spend far too much time thinking about the language rather than the problem at hand.

IshKebab · on Feb 22, 2021

In C++ you spend time wondering "who owns this allocation?". In Java you spend time wondering "what exactly is ConfigurationGeneratorFactoryInterface?".

Clearly the solution is Rust. :-P

moldavi · on Feb 22, 2021

Knowing who owns an object is a solved problem in C++, most codebases I've seen use unique_ptr. Rust isn't the only language with ownership.

IshKebab · on Feb 23, 2021

> a solved problem

Quite an exaggeration. I've never seen a C++ codebase that doesn't resort to bare references or pointers at some point. None of them exclusively use unique_ptr and shared_ptr.

moldavi · on March 1, 2021

Raw pointers and raw references are non-owning, shared_ptr and unique_ptr are owning. Ownership is solved in modern C++.

tonyedgecombe · on Feb 22, 2021

ConfigurationGeneratorFactoryInterface

Writing stuff like that isn't mandatory.

IshKebab · on Feb 23, 2021

No, but most of the Java ecosystem is written like that. Are you going to write your own logging system, HTTP server, GUI framework, etc? Of course not. So you're stuck with `ConfigurationGeneratorFactoryInterface`.

bjarneh · on Feb 22, 2021

I've read some many of these Java is actually better/faster/lighter than you think articles; I now always expect Java to be the right choice whenever I see a headline like that..

waf · on Feb 22, 2021

Because of the domain name, I expected this to be a quality blog post from the stackoverflow engineering team--they always have good content.

But this website appears to be a place for free-lance writers to post articles of somewhat lesser quality. I wonder what happened? Appears the Engineering Blog is now here: https://stackoverflow.blog/engineering/

ampdepolymerase · on Feb 22, 2021

This is called "content marketing".

kolbe · on Feb 22, 2021

What the author is describing is not a “low latency system” with regards to finance. That term is used to describe a system that will successfully make a trade that is regarded as a good trade at the time of a signal. For example, if an order comes in to sell XYZ, and Optiver/Citadel/Virtu all want to buy it, a low latency system is one where you can buy that order at least some of the time.

In one sense, the author is right that C++ based systems do not suffice to trade that order, but he is very wrong that Java ones do.

His justification that banks use Java is not compelling. Morgan Stanley is the only bank that is remotely competitive technologically. And his justification that dev time is faster on Java is just ridiculous. It’s even faster on Python, but we aren’t talking about dev time. We’re talking about building systems that can reliably execute ingress-egress in less than a microsecond. Even using kernel bypass cards and assembly doesn’t get you there. So, no, Java isn’t a good choice for low latency systems.

ho_schi · on Feb 22, 2021

Reads like "I cannot code C++ but Java. Therefore Java is better!". Okay, probably I'm one of the three developers enjoying coding with C++ and therefore biased?

The JVM and especially the GC are the definition of undefined behavior. You don't know what happens, when it happens and how it happens. When GC is working fine your likely doing okay, if not - good luck.

pizza234 · on Feb 22, 2021

Are there numbers on this topic? Without it, an article it's just fluff. And the previous article on the C4 was the same.

The articles from Azul are supposed to be a "goldmine". I've randomly opened two of them:

- https://www.azul.com/low-latency-effect-application-performa...

- https://www.azul.com/garbage-collection-application-performa...

and laughably, they recycle the same (stock) diagram, which, by the way, has no unit on the Y axis.

I'd love to see a GC that is as high-performing as manual memory management. But all these articles are meaningless without numbers (and a rigorous methodology).

trotFunky · on Feb 22, 2021

Maybe someone has more "real" experience than me on this, but the article says :

> Since IDE support for Java is much more advanced than for C++, most environments (Eclipse, IntelliJ, IDEA) will be able to refactor Java. This means that most IDEs will allow you to optimize code to run with low latency, a capability that is still limited when working with C++.

He cites IntelliJ IDEA as a powerful Java IDE, but JetBrains does have a C++ IDE which is quite advanced (CLion) and whose refactor function always impressed me. I'm sure Visual Studio has some pretty impressive refactoring capabilities by itself or with JetBrains plugins. Is it really that bad and I've just not worked on large enough projects ?

harporoeder · on Feb 22, 2021

I thought this related thread was interesting with discussion about Java in the HFT space https://news.ycombinator.com/item?id=12051442

zwieback · on Feb 22, 2021

This is a terrible article, no actual data, lots of qualifying adjectives.

However, even though at heart I still feel like a C++ programmer since I started my career before Java existed, I actually end up using C or C++ less and less. I started transitioning to C# (which I equate somewhat with Java) and Python for small projects but I'm finding that I'm spending most of my time there.

Things I still do in C or C++:

- embedded systems (small memory footprint, no MMU)

- OS-like work

- numerical algortihms, especially anything with pixel data or large matrices (the good stuff stitched together with Python code)

Unexpectedly, I don't miss manual memory management at all but still sometimes want something closer to RAII.

lenkite · on Feb 24, 2021

These HFT folks should really write a book on this. Writing in Java the traditional OO way doesn't work for extreme-performance. And there are way too many tricks that the everyday programmer simply doesn't know.

fulafel · on Feb 22, 2021

C++ and Java are like the "both kinds of music" in Blues Brothers.

72deluxe · on Feb 22, 2021

I thought they were worlds apart, particularly the way you think when you write them.

C# and Java are more like the pair.

fulafel · on Feb 22, 2021

Sure, just like the bar keeper in BB would also say, country and western are worlds apart.

I think C++ is like western (cowboys and open ranges) and Java is like country (the setting is closer to civilization but still far from progressive, was originally aimed for western music audience).

72deluxe · on Feb 24, 2021

Where do we throw the bottles?

tuckerpo · on Feb 22, 2021

This blog post is full of passive aggressive language, and my perception is that the author just can't grok C++

cosmotic · on Feb 22, 2021

There are a lot of cases where java is faster or lower latency than C++ and this article doesn't list any.

pier25 · on Feb 22, 2021

A bit off topic but...

Why is sub 1ms latency so important in trading systems? For automated trading?

jodrellblank · on Feb 22, 2021

Because their business model is your little sister listening at the door, then running to tell everyone your news before anyone else can, so she can have the social capital gain of being first in the know.

You can't be the first to react to price changes unless you can hear about them sooner, process them faster, and issue responses sooner than everyone else. If your business model is "buying just before everyone else buys and drives the price up" and "selling just before everyone else sells and drives the price down", you need to turn yourself into a paperclip maximiser using ever more resources to try to out-compete the other companies doing the same.

DrBazza · on Feb 22, 2021

Order priority. Some exchanges pro-rata trade matching (i.e. you'll get a percentage of your order filled), others just use a FIFO queue. So if the price changes to something that want to buy/sell at, and you get in first, you get matched, your order is filled, you win.

Speed of light is 1ft per nanosecond. So 1000 ft of network cable between you and the exchange server, is 1 microsecond of latency, just on the wire.

Hence straight-line microwave towers, or co-located servers "in the same cabinet".

Then you have the latency from the wire to machine memory, so there's a whole industry of Solarflare-esque network cards that work entirely in userland to further reduce latency.

Bootvis · on Feb 22, 2021

High Frequency Traders are in a race with each other to react to news. The competition does sub 1ms so if you want to be competitive you need to be sub 1ms as well.

ho_schi · on Feb 22, 2021

Yes. This sort for "trading" is a game and the second finisher is the first loser.

Thankfully there are more honorable examples for low-latency systems, like medicine, games, planes, rockets, communication systems :)

emanuelsaringan · on Feb 22, 2021

More like sub-1 usec for high speed trading.

hu3 · on Feb 22, 2021

Yes. In front-running or market making your algo is either the fastest to act or might as well not even try because it would incur a net-loss.

justinholmes · on Feb 22, 2021

A lot has changed since lmax first version. For this nowadays it would have been written in Rust.

phabora · on Feb 22, 2021

No numbers.

SlipperySlope · on Feb 22, 2021

Java can be written with no garbage collection. 1. Pool the business objects and create them all ahead of time. Restore them to the pool after a thread uses them.

2. Create a String pool and use == for equality checks.

3. Avoid Java constructs that create temporary iterators.

for (i = 1, size=myList.size(); i<size; i++)

creates no iterator whereas

for (Object obj : myList)

does. You employ a profiling tool to find the garbage-creating sections of Java code and rewrite them.

I know of a wall street trade matching engine that operates with no pauses for garbage collection that uses these techniques.

KptMarchewa · on Feb 22, 2021

>Well, C++ might be “low latency” when it comes to executing code, but it’s definitely not low latency when it comes to rolling out new features or even finding devs who can write it.

It's certainly fast when adding more useless template trickery to the language instead of fixing real problems with it. I wish people never found out that templates are turing-complete.

72deluxe · on Feb 22, 2021

What template trickery have they added in C++20 that you dislike? I thought the stuff added in C++11 was very useful for querying the type of the thing you are dealing with in a template, eg. is_same.

What "real" problems do you think it has, and how do you propose fixing them?

Karliss · on Feb 22, 2021

I don't have as negative sentiment as grandparent comment but I was disappointed by some parts.

C++20 added niebloids. Not exactly template trickery, but still another concept introduced to fix problems, caused by other complicated feature, introduced to deal with other features problems.

gpderetta · on Feb 22, 2021

I guess niebloids are a reference to Eric Niebler. What does the term refers to exactly? The customization points?

edit: yes, first google hit. Apparently it is now a term of art :)

72deluxe · on Feb 24, 2021

Good point!

niebloids sound like a medical disorder.

raverbashing · on Feb 22, 2021

IIRC some safety guidelines for C++ forbid/restrict the use of templates (usual exception is for STD libs)

Edit: checked the JSF C++ standard and templates are allowed but not really "encouraged"

jimbob45 · on Feb 22, 2021

Why would I ever use Java for high-performance anything when C# is a better feeling language with native memory management support?

Edit: the author’s entire point is his claim (never backed up) that Java is better at optimizing away less-used branches in code. OpenMP lets the programmer do exactly that with C++, if the C++ compiler isn’t already superior.

dchapp · on Feb 22, 2021

What does OpenMP (the shared memory parallel runtime) have to do with optimizing away less-used branches? Am I missing something here or did you mean something else instead of OpenMP?