Low Latency Performance Tuning for Red Hat Enterprise Linux 7

chollida1 · on Feb 11, 2015

I'd appreciate any tips, papers, websites, etc to read about performance tuning linux for low latency that anyone can provide.

I've already done the usual move the packet handling to user land, and use infiniband hardware to connect machines.

I'm looking for the last few percent increase in my latency, even hardware(switch/router recommendations would be welcome).

My particular case is algo trading if it really matters.

And how do people actually profiled nano seconds?

The more I learn about low latency development the more I realize I don't really know:(

I'm also open to privately chatting with anyone experience in this area, email in my profile. The alternative is pulling someone out of Getco or Virtu, and that's always a tough task:)

EDIT to respond to the comment that people are unlikely to help, I've received 3 private emails in the span of 10 minutes, so someone out there is willing to do it.

In general I think people are better than you are giving them credit for.

I'd rather it be via this forum so others can learn but I'll take what I can get, and your point is well taken:)

Slartibreakfast · on Feb 11, 2015

You're probably already aware of it, but Martin Thompson of LMAX has done a lot of great work on HFT systems and his blog "Mechanical Sympathy" is excellent:

http://mechanical-sympathy.blogspot.com/

Although a lot of people think HFT is just a race to the bottom (and they are probably right) it's still great fun to try and wring every last nanosecond of performance out of your code.

SEJeff · on Feb 11, 2015

You're unlikely to find someone who would openly help you since you mentioned you work in the electronic trading industry. I speak from experience as a former employee of Virtu (and previously for 4 years when it was EWT / Madison Tyler). Learning is sometimes best through lots of research and experience.

stuntprogrammer · on Feb 11, 2015

Yes, exactly.

(I've built things for a few chicago/newyork hft shops).

tw001 · on Feb 11, 2015

https://access.redhat.com/sites/default/files/attachments/20...

smm_latency · on Feb 11, 2015

From my experience, it's very difficult to achieve low latency on modern Intel processors (after Sandy Bridge) because of SMI interrupts:

https://en.wikipedia.org/wiki/System_Management_Mode

Some of the SMI interrupts can be disabled by smictrl, but there usually remains an interrupt every 10-20 seconds with 100+ microsecond latency. See plot here:

http://wiki.linuxcnc.org/cgi-bin/wiki.pl?FixingSMIIssues

SMI interrupts are used for fan / thermal region control and cannot be fully disabled.

jeremyeder · on Feb 11, 2015

"cannot be fully disabled"

...depends on the gear. SMI used to be (4-5 years ago) a much larger problem than they are now.

I agree with you in that context, and it's why so few systems are certified for Red Hat's Realtime kernel. They are simply not all created equal.

But I'd encourage you to review the results of any of the 25+ benchmarks we did with STAC over the last few years.

We didn't see much (if any) SMI interference on the gear we had, which was off the shelf regular servers, with WSM, SNB, IVB and HSW. All the hardware, software and config is disclosed within those benchmark write-ups.

There is some tooling called hwlat that can detect and report SMIs. It's in the rt-tests package.

Happy tuning!

iand675 · on Feb 11, 2015

They neglect to mention that you have to have a Red Hat subscription to view the paper.

jeremyeder · on Feb 11, 2015

Really sorry about that, out of my hands :-(

vosper · on Feb 11, 2015

We're in the process of moving from Ubuntu to CentOS 7, with which I am largely unfamiliar. We're not in the finance sector, but latency does matter in our systems (probably not so much as it matters to the HFT people, though).

Can anyone tell me whether this guidance for RHEL 7 is likely to apply to CentOS 7?

jeremyeder · on Feb 11, 2015

Yes it will.

Although some of the tuning there-in can be used for improving performance on any workload (handling NUMA, for example), it's probably not necessary for the majority of environments because it involves intimate knowledge of hardware, software and application stack.

It also talks about disabling a bunch of power management which is really only necessary when you're chasing microseconds.

BTW if you're interested in NUMA and memory management on RHEL7, my team mate Bill Gray wrote an awesome whitepaper:

http://rhelblog.redhat.com/2015/01/12/mysteries-of-numa-memo... (sorry, again login required)

OP: please consider RHEL :-)

FireBeyond · on Feb 11, 2015

Flagged for paywall (in this case an active RHN subscription).