I agree, the disruptor is more about low latency. And the cost is very high: a 1...

_3u10 · 2024-07-13T18:22:43.000000Z

High throughput networking does the same thing, it polls the network adapter rather than waiting for interrupts.

The cost is not high, it's much less expensive to have a CPU operating more efficiently than not processing anything because its syncing caches / context switching to handle an interrupt.

These libraries are for busy systems, not systems waiting 30 minutes for the next request to come in.

Basically, in an under utilized system most of the time you poll there is nothing wasting CPU for the poll, in an high throughput system when you poll there is almost ALWAYS data ready to be read, so interrupts are less efficient when utilization is high.

nine_k · 2024-07-13T19:45:34.000000Z

Running half the cores of an industrial Xeon or Zen under 100% load implies very serious cooling. I suspect that running them all at 100% load for hours is just infeasible without e.g. water cooling.

wmf · 2024-07-13T21:10:38.000000Z

Nah, it will just clock down. Server CPUs are designed to support all cores at 100% utilization indefinitely.

Of course you can get different numbers if you invent a nonstandard definition of utilization.

nine_k · 2024-07-13T21:48:38.000000Z

Of course server CPUs can run all cores at 100% indefinitely, as long as the cooling can handle it.

With 300W to 400W TDP (Xeon Sapphire 9200) and two CPUs per typical 2U case, cooling is a real challenge, hence my mention of water cooling.

wmf · 2024-07-13T22:09:08.000000Z

I disagree. Air cooling 1 KW per U is a commodity now. It's nothing special. (Whether your data center can handle it is another topic.)

lordnacho · 2024-07-13T16:21:36.000000Z

Suppose I have trading system built on Tokio. How would I go about using this instead? What parts need replacing?

Actually looking at the code a bit, it seems like you could replace the select statements with the various handlers, and hook up some threads to them. It would indeed cook your CPU but that's ok for certain use cases.

nicholassm83 · 2024-07-13T17:21:57.000000Z

I would love to give you a good answer but I've been working on low latency trading systems for a decade so I have never used async/actors/fibers/etc. I would think it implies a rewrite as async is fundamentally baked into your code if you use Tokio.

lordnacho · 2024-07-13T18:01:30.000000Z

Depends on what "fundamental" means. If we're talking about how stuff is scheduled, then yes of course you're right. Either we suspend stuff and take a hit on when to continue, or we hot-loop and latency is minimized at the cost of cooking a CPU.

But there's a bunch of stuff that isn't that part of the trading system, though. All the code that deals with the format of the incoming exchange might still be useful somehow. All the internal messages as well might just have the same format. The logic of putting events on some sort of queue for some other worker (task/thread) to do seems pretty similar to me. You are just handling the messages immediately rather than waking up a thread for it, and that seems to be the tradeoff.

_3u10 · 2024-07-13T18:32:22.000000Z

These libs are more about hot paths / cache coherency and allowing single CPU processing (no cache coherency issues / lock contention) than anything else. That is where the performance comes from, referred to as "mechanical sympathy" in the original LMAX paper.

Originally computers were expensive, and lots of users wanted to share a system, so a lot of OS thought went into this, LMAX flips the script on this, computers are cheap, and you want the computer doing one thing as fast as possible, which isn't a good fit for modern OS's that have been designed around the exact opposite idea. This is also why bare metal is many times faster than VMs in practice, because you aren't sharing someone else's computer with a bunch of other programs polluting the cache.

lordnacho · 2024-07-13T18:37:53.000000Z

Yeah, I agree. But the ideas of mechanical sympathy carry over into more than one kind of design. You can still be thinking about caches and branch prediction while writing things in async. It's just the awareness of it that allows you to make the tradeoffs you care about.

Quekid5 · 2024-07-14T00:00:22.000000Z

Eh... not really. The main problem is that it becomes incredibly hard to reason about the exact sequencing of things (which matters a lot for mechanical sympathy) in async world.