Homa, a transport protocol to replace TCP for low-latency RPC in data centers

ggm · on Aug 17, 2021

What's missing is a 2021 perspective on why this is or is not useful at scale. Noting, that google does transform edge-IP into some mythic internal protocol which in part explains why google won't do IPv6 direct to some things like GCP: they couldn't, for a given generation of hardware.

Basically: yes, within a DC which is heading to some kind of traefik or HAproxy or other redirection/sharder, this could make sense. So.. how does this 2018 approach stack up in 2021

ithkuil · on Aug 17, 2021

Fwiw IPv6 for Google cloud instances is finally GA (https://cloud.google.com/compute/docs/ip-addresses/configure...)

ggm · on Aug 17, 2021

Forgive me if I am wrong but I believe this is only binding v6 to the outside edge of VMs. Not GKE. And the document notes you cannot connect to Google API services over v6.

I don't want to overdo the curmudgeon thing: I'm really glad they've started to deploy dual stack, it's long b overdue. And remember that Google has been a strong proponent of v6 in Android and in ietf standards across the board.

ithkuil · on Aug 17, 2021

Oh yeah, the IPv6 saga at Google cloud is a long running joke. However this is indeed a decent step forward, in particular this is relevant to the comment I replayed to: in order to reach a VM traffic has to traverse the network fabric, which clearly now supports IPv6.

As for GKE that's another pair of shoes, probably more related to configuring calico and friends and less about limitations in the low level network fabric.

As for Google API, I read somewhere that they disabled it because billing exemption wasn't ready.

hardwaresofton · on Aug 17, 2021

> traefik or HAproxy

A meta note but it struck me that seeing "traefik" where NGINX usually is is pretty fascinating. I rate it highly because of it's support for k8s (I've written before about how wonderful I think it is) but am somewhat unaware of how widely it's known. Guess it's pretty well known at this point if people casually mention it (then again, the audience is HN after all)!

ggm · on Aug 17, 2021

It certainly wasn't up-sold to me when I was building things out. Maybe hiding the nginx light under the bushel because other tools were explicitly written up

Horusiath · on Aug 17, 2021

It targets the properties that Aeron protocol aims for too (https://github.com/real-logic/aeron).

Ericson2314 · on Aug 17, 2021

I have a feeling the vast majority of traffic should be pub sub / love query not request response, and yet is currently the latter, and this might mean this is shooting for the wrong goalposts.

cprecioso · on Aug 17, 2021

What is love query?

jcelerier · on Aug 17, 2021

I'd guess it's a typo for "live query"

Ericson2314 · on Aug 17, 2021

Yes it is, sorry!

cowvin · on Aug 17, 2021

"do you love me?"

hkt · on Aug 17, 2021

Baby, don't hurt me

MisterTea · on Aug 17, 2021

In a similar vein, Bell Labs developed the IL protocol in the 90's to facilitate better 9p performance on plan 9. It did not work well over the internet due to latency but was beneficial on local networks. It was most useful for disk servers serving root fs to CPU servers and terminals/workstations.

http://doc.cat-v.org/plan_9/4th_edition/papers/il/

(edit: forgot to mention IL is still usable on plan 9)

mosseater · on Aug 17, 2021

Can you summarize why this is better than just using UDP?

wmf · on Aug 17, 2021

UDP doesn't have reliability, flow control, congestion control, etc. Blasting RPCs over UDP can cause poor performance due to congestion.

bcrl · on Aug 17, 2021

The data center bridging extensions ensure that packets won't get lost due to congestion or flow control. They were created in part because fibre channel over ethernet couldn't handle any packet loss in the fabric.

wmf · on Aug 17, 2021

Running lossless with no congestion control leads to high tail latency though.

bcrl · on Aug 18, 2021

Quite true. However, in the case of FCoE, given the performance issues with the Fibre Channel SANs I worked on a few years ago, tail latency in the fabric was the least of the concerns we ran into. Customers would wonder why their persistent messaging rates tanked when the system started pushing messages to disk. Meanwhile their SAN can't even sustain 100MB/s of sequential writes.

bullen · on Aug 17, 2021

On a switch you could probably trust UDP no?

magicalhippo · on Aug 17, 2021

Well, if by "probably trust" you mean you are happy if that prediction fails every now and then, then sure.

If a link is saturated, say via TCP, then UDP data can still get dropped AFAIK.

Searching I found this blog post[1] which takes a small stab at it. Would be interesting to try other setups with more data.

Regardless, using UDP one should always be prepared to handle dropped and out-of-order packets.

https://openmymind.net/How-Unreliable-Is-UDP/

bullen · on Aug 17, 2021

That link actually proves my point, thx!

Nothing is perfect, so I'll keep using UDP on a switch and TCP on the internet.

Hikikomori · on Aug 17, 2021

Has nothing to do with using a switch or not.

If send a 1Gbit udp stream over a switch to another machine there will be no drops if both are connected with 1Gbit, the same is true if you use tcp, and over a router assuming its capable of forwarding that amount of traffic. If you have a third machine sending udp or tcp traffic towards the one receiving the 1Gbit udp stream you'll have drops on both streams. Doesn't matter what protocol you use, if you have congestion you'll have drops. You typically use udp if your application is real time, or if you want to create your own reliability mechanism and/or avoid issues with devices in the middle that does things with tcp.

magicalhippo · on Aug 17, 2021

That was my point. If you design your system so that the links stay well below saturation, then you would not expect to see drops.

However they can still occur, maybe due to some unexpected congestion or other issues. So long as your application can deal with that you should be good.

nitrogen · on Aug 17, 2021

other issues

Like intermittent EMI causing bit flips and checksum failures, which happened to me once in an IoT application where the Ethernet cable to an outbuilding was buried next to the power line, and the network would die whenever the furnace kicked on.

bullen · on Aug 17, 2021

So where can one buy consumer 10Gb/s switches?

magicalhippo · on Aug 17, 2021

MikroTik makes some, like the CRS305-1G-4S+IN[1] which is a 5-port variant at a very reasonable price.

Nice review by ServeTheHome here[2].

[1]: https://mikrotik.com/product/crs305_1g_4s_in

[2]: https://www.servethehome.com/mikrotik-crs305-1g-4sin-review-...

bullen · on Aug 17, 2021

Thx, say I have 4x 1Gb/s old-school eth-port machines that want to saturate their capacity at the same time on one of these, should I just buy 4x 10GBase-T SFP+ transceivers for it?

Edit: Apparently LGS105/LGS108 has 10Gb/s switching capacity allready so I'm good!

vlovich123 · on Aug 17, 2021

Doesn’t SRTP mean that a continuous stream of small packets will DOS any larger packets? I’m sure I’m missing something about how it works.

tyingq · on Aug 17, 2021

I would guess "SRTP" is just the shorthand for something that's less simple than that sounds. It likely has some "starvation protection" in the same way that a typical priority queue does. Like having in-queue age time bump up the priority.

FrancoisBosun · on Aug 17, 2021

From my understanding, this is total message size, but packet. If a receiver has two incoming messages, a batch upload and a query for a list, the batch upload may have 10MiB to transmit while the other message may only have 1KiB. This, to improve latency, the smaller message should be prioritized.

wmf · on Aug 17, 2021

You can avoid starvation by increasing the priority of a job (an RPC in this case) when it hasn't made progress for a while. Or you can do weighted scheduling where the shortest job has a higher probability of being scheduled (e.g. WDRR) instead of absolute priority.

ngcc_hk · on Aug 17, 2021

If one use internal ip 128. 10. Etc, why can’t one use another protocol for internal transfer. That means the internal traffic is cut off not just by ip but also by other internet protocol.

The ospf vs Bgp and within organisation why not …

gardaani · on Aug 17, 2021

Quic is similar and standardized by IETF. https://en.wikipedia.org/wiki/QUIC

KaiserPro · on Aug 17, 2021

Quic is not designed for internal low latency point to point.

Quic is basically the weird semantics of http2 transmuted to UDP with some retransmission logic. But everything is hilariously complex.

zozbot234 · on Aug 17, 2021

Quic is not that similar. Although SCTP could be given many of the same properties with a slightly customized implementation.

touisteur · on Aug 17, 2021

Please can you say a bit more? I keep reading about quic as an sctp replacement? Apart from the 'whole thing encrypted' thing I'm wondering what differences there are.

baybal2 · on Aug 17, 2021

Deterministic ethernet varieties been around for a very long time.

If you have fabric determinism like Infiniband, and capacity reservation on the receiving side, you can just dispose of connection paradigm, flow control, and thus get great deal of performance, while simplifying everything at the same time.

I do not see much use of it though unless you are building something like an airplane.

The uncounted PhD hours spent on getting networks to work well do amount to something.

Dealing with RDMA aware networking is far beyond the ability of typical web developers.

Deterministic Ethernet switches cost a fortune, and are lagging behind the broader Ethernet standard by many years.

Making a working capacity reservation setup takes years to perfect as well.

99.9999...% of web software will most likely *lose* performance if blindly ported to RDMA enabled database, message queue server, or caching.

If you don't know how the upoll, or iouring like mechanisms work, you can not get any benefit out of RDMA whatsoever

I once worked for a subcontractor for Alibaba's first RDMA enabled datacentre.

ergl · on Aug 17, 2021

> Dealing with RDMA aware networking is far beyond the ability of typical web developers.

Homa is designed for traffic inside a DC, same as RDMA or Infiniband. I don't think anyone is proposing to use it for normal web traffic.

eternalban · on Aug 17, 2021

Which partly (besides the Iranians on the Homa team) explains the name of the protocol. Homa is the Persian ~phoenix, a mythical bird that never touches the ground and is always hovering above:

https://en.wikipedia.org/wiki/Homa_(mythology)

touisteur · on Aug 17, 2021

Even inside DCs I thought everything was Web services upon Web services.