The problem is not just TCP vs not-TCP. It’s using TCP/HTTP to transmit data in ...

jiggawatts · on Feb 21, 2023

Inefficient RPC compounds the common error of “SELECT * FROM BigTable”. I regularly see multi-hundred-megabyte result sets bloat out to a gigabyte on the wire

Bizarrely, this take just a couple of seconds to transfer over 10 GbE so many devs simply don’t notice or chalk it up to “needing more capacity”.

Yes, yes, it’s the stingy sysops hoarding the precious compute that’s to blame…

hinkley · on Feb 21, 2023

I know people who've tried to fix this from time to time but it always seems to go wrong.

We could track expected response size, but then every feature launch triggers a bunch of alerts which either causes the expenditure of social capital, or results in alert fatigue which causes us to miss real problems, or both.

This is a place where telemetry does particularly well. I don't need to be vigilant to regressions every CD cycle, every day, or even every week. Most times if I catch a problem in a couple of weeks, and I can trace it back to the source, that's sufficient to keep the wheels on.

ilyt · on Feb 21, 2023

> I am willing to wager that most organizations’ microservices spend the majority of their CPU usage doing serialization and deserialization of JSON.

I'd say in majority of cases the service was made "too small".

If you waste majority of CPU to serialize/deserialize/send to network you should probably just "do the job" right there and then (aside from loadbalancers and such for obvious reasons)

opportune · on Feb 21, 2023

In my experience, microservices/backends are almost always I/O bound, and the CPU usage is almost pure overhead (meaning it may exist, but you could reduce it with effort - it’s not intrinsic to the problem), unless the service is doing some particularly domain-specific compute-intensive task.

For your typical website backend between frontend and db, you are doing some async conversion of JSON to db call back to JSON. For an HTTP microservice you are also typically converting some JSON request body to a JSON response body with some kind of I/O call in between.

So that’s a roundabout way of saying I think that the case where majority of CPU is spent on SerDe is more common than you think. And it’s not necessarily a problem if the effort to improve is not worth the savings.

coding123 · on Feb 21, 2023

But if it's not JSON it's still going to be some SerDe. Is there another format that's enough of a factor faster that it means we should rewrite all of our stacks? In other words, the developer's time vs throwing more hardware at it.

wmf · on Feb 22, 2023

Yes, any binary format is going to be dramatically faster than JSON. IIRC JSON was the bottleneck in k8s at one point so it was replaced with protobufs.

mpenick · on Feb 21, 2023

I think you mean system call bound which is really another way of saying CPU bound. The hardware underneath is incredibly capable, but the interfaces to it are inefficient. I’m skeptical that JSON payloads over HTTP based applications are saturating 10gig+ network interfaces.

teknopaul · on Feb 21, 2023

N.B. And are quite happy doing so because it makes app develop a breeze.

Being able to use tools like tcpdump to debug applications is important to fast problem resolving.

Unless everything you do is "Web scale" and development costs are insignificant, simple paradims like stream oriented text protocols will have their place.

petethepig · on Feb 22, 2023

This would be my guess as well.

I work on Pyroscope, which is a continuous profiling platform and so I see a lot of profiles from various organizations.

If you want to save the world some CPU cycles I would look into optimizing deserialization. And it’s not just JSON, binary formats like protobuf are not much better.

It comes down to the overhead associated with allocation and tracking (GC) of many many small objects which is unfortunately very common in modern systems.

foobiekr · on Feb 22, 2023

I really love looking at systems where life of a message is one of constant transformation - from JSON to internal to JSON to internal to JSON to internal to JSON to internal. U'm not even talking about the reply path.

Even if you're naive and don't care about performance - which is a common sentiment for modern developers who have spent the last decade working for companies where the cost of AWS didn't matter - chains of transformations like this are a good place to switch to a format, any format, less atrociously expensive than JSON.

The example above, btw, comes from a very large unicorn that burns _2 complete cores per outstanding request_on a continuous basis_. To someone who lived in the dotcom, that's so outrageous it's comical, and of course they have years of negative cashflow because of their insane costs.

opportune · on Feb 22, 2023

I work in cloud and that doesn’t even phase me. A lot of people want to write applications in Node or don’t understand/want to deal with concurrency, leave cores stranded, etc. You pay a premium for it but it’s their decision.

Anyway, for people who pick relatively more sane application languages, yeah deserialization is pretty much all their CPU does. It’s just such a shame because it really is a godawful format, just like HTTP/1, its basically only benefit is that it’s easily human-readable.

imtringued · on Feb 22, 2023

You must not be very experienced with other formats. The primary reason why fixed schema formats haven't been widely adopted is that schema evolution is more important.

opportune · on Feb 23, 2023

I’m quite experienced with protobufs and they handle it well enough. Having no schema doesn’t address the root problem of API producers/consumers needing to manage schema changes together.

sacnoradhq · on Feb 22, 2023

That's why Google developed and uses protobufs.