Hacker News new | past | comments | ask | show | jobs | submit login
Linux implementation of Homa, a protocol to replace TCP for low-latency RPC (micahlerner.com)
118 points by mlerner on Sept 7, 2021 | hide | past | favorite | 34 comments



Couple of years ago I had this idea that most REST calls I had in my app could easily fit a single packet if encoded in binary.

I rolled my own framework for fun that did REST-style communication but sans TCP, HTTP and JSON.

This improved latency and efficiency by couple orders of magnitude. I would like something that did this that I could use directly without having to write my own framework.



Did the app before that made use of connection pooling and reuse? If yes then the latency gap is typically not high. You can still send a http request in a single ip packet as long as your framework doesn’t do anything wrong. Obviously there will a a latency hit for new connections. But in return you get handling of packet loss


This sounds really interesting, could you provide a bit more detail on the application? Mostly interested in:

- was it over a LAN or WAN?

- how were the latency improvements measured?

- was there a UI on the front end or was this strictly a backend to backend scenario


It was DC environment.

I measure latency on a client, on service and on using some very expensive, cut-through switches. This shotgun method lets me lazily detect where the latency comes from with a single look. I understand not everybody has access to cut-through switching, but for most uses port mirroring is available on a lot of hardware

Strictly backend.


This is about the fourth time something like this has been built.

QNX native networking over IP (protocol 106) is probably the closest to Homa. It's an RPC system, intended primarily for local networks, although it will extend across the Internet. Message based. No head of line blocking between different RPC calls. It's still used in Blackberry/QNX.

Is there a spec for Homa? Did they get a new protocol number from IANA? [1] This needs an RFC before it gets out into the wild.

[1] https://www.iana.org/assignments/protocol-numbers/protocol-n...


One of either QUIC or RDMA already deliver this, depending on your definition of low-latency.


I don't think the design goals of QUIC and this protocol overlap too much. QUIC achieves lower latency for connection establishment than TCP+TLS by merging the TCP and TLS handshakes and avoids additional handshakes by being able to multiplex multiple streams over a connection. But apart from that it behaves a lot like "multiple TCP streams". It isn't designed purely for short-payload RPCs - streams can carry gigabytes of data. It's also not tailored to the datacenter use-case, and actually might shine more outside of it. And obviously QUIC is encrypted by default, while this protocol doesn't seem to be.

That said QUIC is rather flexible since implementations can be done in userspace, and a lot of what this protocol is about can probably be done by running a custom QUIC library. The GRANT messages can be emulated with window size increments on streams, and the retransmissions just look like different ways of handling ACKs - so this might be emulateble by having some custom ACK behavior and an infinite congestion window on the sender (bandwidth is limited by the receiver using window size increments). The usage of hardware-supported priorities might be a bit more tricky to map into QUIC - but it might be doable too if stream data frames can be sent in different UDP datagrams using different priorities.

Whether that really works better than the default setup? I'm not convinced. The nice thing about standard TCP and QUIC is that it works with all kinds of traffic. Congestion controllers will allow coexistence with other traffic on the links ("oh, there was a 5Gbit/s log upload or software update download which used 50% of NIC capacity?"), and the congestion window adjusting itself over time will also mean that "small RPCs can be sent without waiting for any confirmation". It even means "medium RPCs can be sent at once, once we learned about the link capacity (congestion window)".


Depends on your definition of RDMA, too -- if you mean the Infiniband kind, it includes message passing, which is much faster for short messages.


Not directly related to Homa, but I'd like to point that the paper's author and presenter is none other than TCL programming language original designer John Ousterhout [1].

[1]https://web.stanford.edu/~ouster/cgi-bin/home.php


Ousterhout also came up with RAMCloud, which alas, never caught on. But you can find it in Github and about 4 guys got their PhD on it. He was an early (mid-late 2000s) of cheap PC h/w at scale yielding distributed K/V store with impressive timings.


I feel like Homa's competiton isn't TCP, but instead is SCTP.

I know that SCTP isn't very popular, but it is an old standard and solves the ahead-of-line blocking issue. Linux does support SCTP, and maybe more expensive routers support it as well (honestly, I haven't experimented much with SCTP on networks...).

SCTP is built on top of IP. So the main issue with SCTP is firewall / NAT / etc. etc. which are built on top of TCP / UDP instead. Homa, being an alternative protocol, would be very similar to SCTP (in that routers wouldn't work on it), except its decades younger.


Didn't IPv6 basically eliminate the NAT problem in DCs? Plus SCTP has actual router support in respectable routers. What SCTP lacks is userland support, something I dearly want too...


This heavily reminds me of Plan 9's IL[1] which was used to increase the performance of the 9p protocol, a file based RPC protocol, over local networks.

[1] http://doc.cat-v.org/plan_9/4th_edition/papers/il/


It would be really nice if articles about protocols, even if talking about implementation details, would include or link to a representative traffic sample / .pcap of said protocol.


how does this compare to QUIC transport protocol? They both let you avoid head of line blocking.


Does any distro actually ship this protocol or it's just an optional kernel module?


And if it's a kernel module, why is it done that way instead of putting it in DPDK which is faster anyway?


Switching over to DPDK is waaaaaaaaaaay harder than switching to a new type of socket.


Buying a Solarflare card, using LD_PRELOAD, and leaving code as-is is relatively cheap though. But completely abandons the issue of whether Homa should be implemented in userspace of course.


The original version, at Stanford, was in DPDK.[1] This is a re-implementation at Google, right?

[1] https://github.com/PlatformLab/Homa


Moving to user-space is mentioned in the paper.


Can this be tunneled inside UDP?


Basically everything can be tunneled across UDP.

UDP is just a port number + rarely used fragmentation layer tacked on top of IP.

So any protocol designed on top of IP can almost certainly be used on top of UDP instead.


Nowhere in this site is encryption mentioned, and that probably makes it a non-starter - Cryptographic data integrity is a must for modern RPC, if not also confidentiality.


This is a transport layer optimized for RPC, not a spec for RPC. TCP doesn’t specify encryption either. We already have means to apply encryption at higher or lower layers.


> TCP doesn’t specify encryption either.

TCP is very old. You would not do that today, and indeed they didn't, QUIC's cryptography is built in.

Two reasons to want cryptography built-in. One in some sense political, "Pervasive Monitoring is an Attack" is BCP #188. We must defeat such attacks on the Network by encrypting everything possible. The other is pragmatic, middleboxes induce Protocol Ossification, making innovation difficult or impossible, by encrypting everything we prevent a key source of ossification, if the middleboxes can't understand the protocol they also can't rust shut extension points.


Ossification isn't really a problem for Homa's intended use case IMO; it's meant to be used on intra-DC networks with latencies already as low as 1us-2us, and is optimized exclusively for this design space (lots of short messages, high load, etc). Everything on the path is likely going to be controlled and on your network already and won't be going near the WAN, probably only TOR/same isle at worst, if I had to guess (I don't have any measurements on intra-rack tail latencies in DCs or anything, I'm just spitballing here.) Third party gear is a huge problem in the last mile but not in tightly controlled networks that something like Homa targets; you not only have to adopt a new socket type but also the RPC design to go with it, and reap the benefits. I suspect most people aren't considering this unless they already have strong network control with their own hardware/SDN.

Similarly, you could make an argument that because this interconnect tolerance is so low, pervasive monitoring has a different risk profile. Two servers communicating via TOR thru a 40G NIC using Homa don't even have an opportunity to pass the packets elsewhere in the hierarchy to be snooped by someone else, unless it's just streaming packets directly into the three-letter-agency van in the parking lot I guess. For all practical purposes in a system like this the requirements are so tight you can only afford direct links between systems (or something close to that) if you want to maintain the desired operational behavior, so third parties appearing between two points might not be possible, much less likely.

That said I think including encryption would be good. But the reasons would be very pedestrian IMO: for one, it ticks off the annoying compliance checkboxes (both political and social) and means there's a single design for anyone who implements Homa to follow, so people don't have to painstakingly re-create it. And a corollary of that checkbox tick is that it nips threads like this one in the bud regardless of all the above. :P


> Two servers communicating via TOR thru a 40G NIC using Homa don't even have an opportunity to pass the packets elsewhere in the hierarchy to be snooped by someone else, unless it's just streaming packets directly into the three-letter-agency van in the parking lot I guess.

Something like FASHIONCLEFT. Your smart managed switch's firmware squirrels away summaries (e.g. it sees 400GB of data about Project Smith and it notes [400GB, Project Smith]) and then later squirts such summaries over legitimate links to distant nodes, but passing through another device with compromised firmware, and the other compromised device removes the extra data and gives it to the NSA.

The NSA values this because plausible deniability is essential to their work, if you realise you were compromised you are likely to blame the destination not them.


Technically true, but the GP is right in that someone is going to layer encryption on top of this anyway. And the performance of that pairing is what's going to matter eventually.


One of my distributed computing exercises back in the day was to add encryption to the stubs generated by Sun RPC tooling, the upper layers remained unmodified.


Encryption is typically layered on top of TCP and less commonly layered on top of UDP for good reason - connection state and transport ordering is a huge boon for cipher speed. CBC Ciphers are by far the standard, because they're much easier to compute.

Comparitively, QUIC layers encryption directly into the protocol. I can't say how it deals with the compute cost because I'm not familiar with the protocol at that level, but it's a clear design decision - since handshakes are expensive, including encryption in session establishment has significant benefits. Perhaps encryption configuration is expected to be handled in a side channel, but it's odd that in a discussion so centered on what the next layer up is doing it's completely forgotten this critical aspect.


CBC is pretty much obsolete. Modern TLS uses CTR based ciphers (AES-GCM or ChaCha-Poly1304). CTR can be more efficient than CBC, because encryption can be parallelized, while CBC has no advantages over CBC for this use-case.


Encryption also would likely change independently of the protocol issues.

Layers of communication are a thing for a reason, it allows each piece of the communication to upgrade independently. Instead of writing encryption into the spec, its more useful to write encryption on top of the spec as a 2nd layer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: