This seems like a review of John Ousterhout's work w/ Homa. I highly recommend r...

ignoramous · on Feb 21, 2023

> With that in mind I think nobody will replace TCP

Within the data center? AWS uses SRD with custom built network cards [0]. I'd be surprised if Microsoft, Facebook, and Google aren't doing something similar.

[0] https://ieeexplore.ieee.org/document/9167399 / https://archive.is/qZGdC

actionfromafar · on Feb 21, 2023

Considering how slow Azure caching is, maybe they use Token Ring?

Edit: seriously though, does anyone know what they are up to? We get like 30 megabits from cache to CI.

wpietri · on Feb 21, 2023

Looking at the Ousterhout paper, I share your skepticism. I'm perfectly willing to believe that a parallel universe with datacenters running Homa would be a better world.

But if I were to make a list of actual reasons too much money is getting burned on AWS bills, I suspect TCP inefficiencies wouldn't make the top 10. And worse, some of the things that do make the list, like bad managers, rushed schedules, under-trained developers, changing technical fashions, and "enterprise" development culture, all are huge barriers to replacing something so deep in the stack.

opportune · on Feb 21, 2023

The problem is not just TCP vs not-TCP. It’s using TCP/HTTP to transmit data in JSON, vs picking a stack that is more optimized for service-to-service communication in a datacenter.

I am willing to wager that most organizations’ microservices spend the majority of their CPU usage doing serialization and deserialization of JSON.

jiggawatts · on Feb 21, 2023

Inefficient RPC compounds the common error of “SELECT * FROM BigTable”. I regularly see multi-hundred-megabyte result sets bloat out to a gigabyte on the wire

Bizarrely, this take just a couple of seconds to transfer over 10 GbE so many devs simply don’t notice or chalk it up to “needing more capacity”.

Yes, yes, it’s the stingy sysops hoarding the precious compute that’s to blame…

hinkley · on Feb 21, 2023

I know people who've tried to fix this from time to time but it always seems to go wrong.

We could track expected response size, but then every feature launch triggers a bunch of alerts which either causes the expenditure of social capital, or results in alert fatigue which causes us to miss real problems, or both.

This is a place where telemetry does particularly well. I don't need to be vigilant to regressions every CD cycle, every day, or even every week. Most times if I catch a problem in a couple of weeks, and I can trace it back to the source, that's sufficient to keep the wheels on.

ilyt · on Feb 21, 2023

> I am willing to wager that most organizations’ microservices spend the majority of their CPU usage doing serialization and deserialization of JSON.

I'd say in majority of cases the service was made "too small".

If you waste majority of CPU to serialize/deserialize/send to network you should probably just "do the job" right there and then (aside from loadbalancers and such for obvious reasons)

opportune · on Feb 21, 2023

In my experience, microservices/backends are almost always I/O bound, and the CPU usage is almost pure overhead (meaning it may exist, but you could reduce it with effort - it’s not intrinsic to the problem), unless the service is doing some particularly domain-specific compute-intensive task.

For your typical website backend between frontend and db, you are doing some async conversion of JSON to db call back to JSON. For an HTTP microservice you are also typically converting some JSON request body to a JSON response body with some kind of I/O call in between.

So that’s a roundabout way of saying I think that the case where majority of CPU is spent on SerDe is more common than you think. And it’s not necessarily a problem if the effort to improve is not worth the savings.

coding123 · on Feb 21, 2023

But if it's not JSON it's still going to be some SerDe. Is there another format that's enough of a factor faster that it means we should rewrite all of our stacks? In other words, the developer's time vs throwing more hardware at it.

wmf · on Feb 22, 2023

Yes, any binary format is going to be dramatically faster than JSON. IIRC JSON was the bottleneck in k8s at one point so it was replaced with protobufs.

mpenick · on Feb 21, 2023

I think you mean system call bound which is really another way of saying CPU bound. The hardware underneath is incredibly capable, but the interfaces to it are inefficient. I’m skeptical that JSON payloads over HTTP based applications are saturating 10gig+ network interfaces.

teknopaul · on Feb 21, 2023

N.B. And are quite happy doing so because it makes app develop a breeze.

Being able to use tools like tcpdump to debug applications is important to fast problem resolving.

Unless everything you do is "Web scale" and development costs are insignificant, simple paradims like stream oriented text protocols will have their place.

petethepig · on Feb 22, 2023

This would be my guess as well.

I work on Pyroscope, which is a continuous profiling platform and so I see a lot of profiles from various organizations.

If you want to save the world some CPU cycles I would look into optimizing deserialization. And it’s not just JSON, binary formats like protobuf are not much better.

It comes down to the overhead associated with allocation and tracking (GC) of many many small objects which is unfortunately very common in modern systems.

foobiekr · on Feb 22, 2023

I really love looking at systems where life of a message is one of constant transformation - from JSON to internal to JSON to internal to JSON to internal to JSON to internal. U'm not even talking about the reply path.

Even if you're naive and don't care about performance - which is a common sentiment for modern developers who have spent the last decade working for companies where the cost of AWS didn't matter - chains of transformations like this are a good place to switch to a format, any format, less atrociously expensive than JSON.

The example above, btw, comes from a very large unicorn that burns _2 complete cores per outstanding request_on a continuous basis_. To someone who lived in the dotcom, that's so outrageous it's comical, and of course they have years of negative cashflow because of their insane costs.

opportune · on Feb 22, 2023

I work in cloud and that doesn’t even phase me. A lot of people want to write applications in Node or don’t understand/want to deal with concurrency, leave cores stranded, etc. You pay a premium for it but it’s their decision.

Anyway, for people who pick relatively more sane application languages, yeah deserialization is pretty much all their CPU does. It’s just such a shame because it really is a godawful format, just like HTTP/1, its basically only benefit is that it’s easily human-readable.

imtringued · on Feb 22, 2023

You must not be very experienced with other formats. The primary reason why fixed schema formats haven't been widely adopted is that schema evolution is more important.

opportune · on Feb 23, 2023

I’m quite experienced with protobufs and they handle it well enough. Having no schema doesn’t address the root problem of API producers/consumers needing to manage schema changes together.

sacnoradhq · on Feb 22, 2023

That's why Google developed and uses protobufs.

KaiserPro · on Feb 21, 2023

> We've already seen people build a number of similar systems on UDP including HTTP replacements that have delivered value for clients doing lots of parallel requests on the WAN.

HTTP isn't low latency. Spending all that effort porting everything to a discreet message based network, only to have HTTP semantics running over the top is a massive own goal.

as for WAN, thats a whole different kettle of fish. You need to deal with huge latencies, whole integer percentage of packet loss, and lots of other wierd shit.

In that instance you need to create a protocol to tailor to your needs. Are you after bandwidth efficiency, or raw speed? or do you need to minimise latency? All three things require completely different layouts and tradeoffs.

I used to work for a company that shipped TBs from NZ & Aus to LA via london. We used to use aspera, but thats expensive. So we made a tool that used hundreds of TCP connections to max out our network link.

For specialist applications, I can see a need for something like homa. But for 95% of datacenter traffic, its just not worth the effort.

That paper is also fundamentally flawed, as the OP has rightly pointed out. Just because Ousterhout is clever, doesn't make him right.

Karrot_Kream · on Feb 22, 2023

What about HTTP/2 or HTTP/3 do you see poses large latency problems? Both of those do away with HTTP HOL blocking (not TCP HOL blocking) and allow parallel streams. Homa proposes introducing a transport level protocol with request/response semantics specifically to accommodate protocols like HTTP which operate using requests and responses.

KaiserPro · on Feb 22, 2023

HTTP/2 is the king of HOL blocking. Its a single TCP connection in which everything is multiplexed at the app level. It was almost like it was designed to exploit the worst parts of TCP. It was and is a stupid design, they were clearly told about the drawbacks, both in mobile and for transfer of larger website. They chose to ignore it.

The key thing is that HTTP is not designed to be low latency. If latency is important to you, then you need to make your protocol bespoke for that application.

HTTP/3 has a much better TLS connection process, which means that the cost of creating a connection is much lower. QUIC is much more configurable in terms of per steam or connection flowcontrol.

Yes, homa has request/reply semantics, but not in the way that HTTP needs. Homa is optimised for low latency small messages, in a small hop, transparent network. HTTP is a file access protocol with a general data channel hammered in. Sure it'll work in a DC type network, but its also got to deal with lossy, high latency networks.

tyingq · on Feb 21, 2023

Ousterhout was also one of the co-authors of the Raft consensus paper.

dgacmu · on Feb 22, 2023

Large amounts of Google's internal RPC traffic goes over a custom internal framework called Pony Express, which is optimized to work on the internal cluster fabric.

yencabulator · on Feb 22, 2023

I dug up some resources for us non-Googlers:

https://blog.acolyer.org/2019/11/11/snap-networking/

https://research.google/pubs/pub48630/

https://sosp19.rcs.uwaterloo.ca/slides/marty.pdf

https://news.ycombinator.com/item?id=21374204

rektide · on Feb 22, 2023

Well that's accidentally hella unfortunate.

I've been assuming eventually gRPC over HTTP3[1] would get some traction from big camps. But if Google is using their own internal transport, it seems highly unlikely that these main authors of gRPC are ever going to care.

This blog post talks about how many of the features of HOMA are already present in existing specs like QUIC/HTTP3. I'd expect many of the wins in Ousterhout's HOMA benchmarks could probably be replicated elsewhere, with better transports.

[1] https://github.com/grpc/grpc/issues/19126

imtringued · on Feb 22, 2023

I don't have enough time to inform myself about every obscure network protocol that won't be widely adopted.

I don't know if Homa has multihoming but QUIC and SCTP have and if Homa doesn't then I think it is a huge step backwards.

vlovich123 · on Feb 22, 2023

They would still care about it to do client to first Google server handling the request

rektide · on Feb 22, 2023

Alas the web standards world has been a trashfire useless pile of nothing at giving webdevs any use of http2 or http3. Push support for fetch was one of the first issues opened[1], and summarily ignored by nearly every browser & spec author & never made at all accessible, right up until modern times where the browsers said, oh hey, we never really made this feature available to webdevs, but since you all havent used it anyways, we're removing Push from the browser. SMDH. Monstrous shit.

In short, the browser failed to make basic modern http usable by anyone so there's no hope of getting off the janky ugly grpc-web trashfire & making grpc from the client just work, like it always should have been able to do.

Pathetic pathetic showing by the browsers here. Beyond neglect, how they mishandled http2 and http2 and gave webdevs access to almost none of what the most exciting new thing on the www was. Huge huge miss. Incredible bullshit.

Some day we'll be running http3 over webtransport & fixing their wrongs. Thia is such a bullshit long workaround, such a pathetic way to handle the browser getting nowhere with new http standards, end running around their utte unmoving complete inability to advance at all. But we'll start to actually use http intensively again, soon, in spite of the browser & standards community being such ridiculous & farcical impedances against using http2 and http3 at all. What a tragic shit show of uselessness it's been, trying ro actually enjoy the new http standards; resistance on all fronts to real usage.

[1] https://github.com/whatwg/fetch/issues/51

vlovich123 · on Feb 22, 2023

You’re conflating very different things I think. HTTP/push requires browser and server support. Multiple experiments by multiple parties have shown it’s difficult to make work well regardless of a user facing API. It’s not just Google either - I recall multiple failed attempts to get it to work well. It was an interesting idea that never panned out. I think the summary by my peers on the cache team at Cloudflare outlines why server push failed [1] and I know those people first hand to be very thoughtful and principled people without a hidden agenda other than making the web very fast. To be clear I have no agenda or strong opinion either way and I could be misinformed, I just didn’t see any string evidence for benefit and lots of evidence that it is a net harm (+ maintenance effort).

grpc-web will require very little work to adopt http/3 and the benefit is pretty obvious. The hard part will be if there’s http/1 servers in the way that have an impedance mismatch. Still, I don’t see why grpc which only requires a JS library and matching deployed server which is all open source and doesn’t really require buyin from multiple stakeholders will struggle here.

Your emotional reaction to this seems out of place as there’s no grand conspiracy here. It seemed like a plausible idea. It just never panned out enough to be worth it. Certainly the server push as an API wouldn’t buy you that much in terms of performance because you can emulate it via long poll / websockets unless I’m missing something?

[1] https://blog.cloudflare.com/early-hints/

rektide · on Feb 22, 2023

The people who had any power/control over push (standards & broeser folks) seemed to focus exclusively on using it for initial page load. As you do. Which is just one sliver of what Push could have been used for. The community had begged for some ability to actually increase developer capability, to start using Push as a reactive system, where we could start to replace unstructured websocket schenanigans with a holistic resourceful way to asserting resources.

There was a couple very brief moments during fetch's addition of progress where push got a brief bit of attention from big enough names that it seemed like maybe after years there's be some real chance of using http2 push interestingly on the web, but that moment flickered out & died. In general there has been a callous treatment for Push, with blinders on, thinking only of tbe narrowest desires & uses. It's been a completely squandered technological capability that was never opened for use in any interesting form, and it's a shame this sad small vision of http2 push & it's so called failure obstructs us from even considering how many more interesting & powerful uses it could have had.

My understanding is grpc requires push and trailers support, and that the browser still has no designs on offering either capability to developers for use. It's been some years since I've looked, but http3 in the browser once again seems to give developers absolutely no new capabilities, even though http added new stuff like Push & Trailers nearly a decade ago.

My emotional response is because there is such a small & narrow vision, choking how we might be using http & growing the web. Techniques like long-poll & websocket exist, but there's such a clearer better fitting match for sending http resources as they are generated, Push. The lack of browser exposure of new (decade old) http capabilities is pushing us towards a stupid point where we end up running http3-over-webtransport, and it's absurd & enmisersting to see such a lack of follow-through in the deepest most core heart of the web being given a chance to get used, to do the amazing things it could be doing... as opposed to radically non-web non-resourceful hacks like websockets. This harkens back to the HyBi mailing list, and the sad inability for the web & our exhange of resources to be more bidirectional & asynchronous, and that's not a decade od stagnation, it's now two decades of stagnation, stagnation that we almost got a chance to improve past, were it not for the sad silly limited pretense that Early Hints gives us even a thousandth of what Push gave us in terms of capabilities.

hardwaresofton · on Feb 22, 2023

If you want to watch Outsterhout in talk form:

Netdev 0x16 - Keynote: It's time to replace TCP in the datacenter: https://www.youtube.com/watch?v=o2HBHckrdQc

Netdev is an amazing conference btw, it is insane that we get access to the trailblazing being shown off at conferences for free on places like YT.

killingtime74 · on Feb 21, 2023

He also wrote one of my favorite software books https://web.stanford.edu/~ouster/cgi-bin/book.php.

Presentation https://youtu.be/bmSAYlu0NcY

zarkov99 · on Feb 22, 2023

TCP is rarely used in HFT data centers except as required by exchanges.