Hacker News new | past | comments | ask | show | jobs | submit login
Request Coalescing in Async Rust (fasterthanli.me)
142 points by ingve on March 6, 2022 | hide | past | favorite | 38 comments



> Which is all well and good, until we start hitting some limits. Like, maybe we have SO MANY CONNECTIONS that we run out of memory, because each of these threads has its own stack, and that's not free. It's not a lot, but a lot of "not a lot" quickly becomes a lot, as we all learn sooner or later.

I understand this was more of a side point as an introduction, but it's a very quick dismissal of blocking I/O that I've been seeing a lot. Modern threads + blocking I/O is much more performant than most people realize, often yielding better throughput than async due to reduced syscalls and other factors. Async is important when you are in an extremely latency constrained environment and/or need to prioritize tasks, but in other places, not so much. How much does increased stack memory usage really matter to a real server?


Both my current and previous day jobs involve building an edge network, which involves doing TLS termination: that means dealing with a lot of crap that comes your way - much more traffic than is valid/legitimate traffic, because on the public internet, anything goes. So I may be biased here.

The situation for a purely backend service (behind an existing reverse proxy) might very well be different — but as others have mentioned, the Rust http ecosystem has solidified around async, so even if you could get away with 1 connection = 1 thread, you might not want to, just because of where the ecosystem is.

One thing I didn't mention is that epoll isn't even state of the art: there's a lot of work going on around io-uring and thread-per-core runtimes nowadays, which I'm following rather closely because again, at my day job it does matter!


> One thing I didn't mention is that epoll isn't even state of the art: there's a lot of work going on around io-uring and thread-per-core runtimes nowadays, which I'm following rather closely because again, at my day job it does matter!

io-uring is definitely a game changer, but it isn't async specific!


Rust 'async' frameworks allow you to issue blocking calls, either by auto-spawning a separate thread or on via a trivial executor that blocks on the current thread.


I have done lots big corp projects, and came to accept the fact that using async I/O or reactive streams or whatever technology it is that is currently en vogue is only very loosely related to actual requirements, because lets be honest those are pointless for an internal backend with at most 10 concurrent connections. This is usually due to two factors:

- people lack understanding of fundamentals, thus buying into the hype, and are bad at weighing the trade-offs because they often can't see the downsides, especially regarding complexity

- engineers are curious and want to play with $latest_tech, and then construct a post-hoc justification for it to which they are sometimes unaware themselves

Or put differently, its either incompetence or mismatched incentives (principal-agent problem). There is some justification for playing with new tech as in "it helps motivation and retains talent", but sometimes the resulting complexity is far out of proportion.

That being said, tech ecosystems have different cultures, and I can understand why Rust is the way it is. It attracts idealists that tolerate complexity in the pursuit of the optimal solution. They have already produced more useful software than eg Haskell, and this is ultimately the yardstick of success. Lets see how Zig will do in that regard.


I'm not sure that tolerance for accidental complexity is all that idiomatic in Rust. Rust async is a lot less complex than what Go does under the hood. Rust as a whole is even less complex than C++, with essentially no loss in useful featureset.


The lang team is very careful in designing new features, and I agree that they don't introduce accidental complexity lightly. Rust is an impressive achievement and a step forward over C++.


That's a good overview similar to my experience.

> those are pointless for an internal backend with at most 10 concurrent connections

I'm also arguing that the benefit of async over threads at 10 _thousand_ concurrent connections is still unclear. Yes memory usage will be less, but memory is cheap. Context switching overhead may be less, but a lot of it is still there because of readiness-based I/O. We don't see new, high-scale systems built on blocking I/O perhaps not because they can't be, but because async has become the norm, at the cost of _a lot_ of complexity.


My experience as well with embedded rust, in two areas: Async, and typestates. Typestates are popular in OSS embedded rust. The idea is that the compiler catches errors, and your program is safer. In practice, it leads to verbose type signatures, and inflexibility. I think the reasons for this are both the idealism you mentioned, and being used to build infrastructure, without applying to practical problems. (ie, the examples you'll see for these libs are "hello world" style, for a given peripheral.


I like the RTIC programming model, but jesus christ the typestates for accessing peripherals etc are annoying. I agree they went overboard here.


The typestates are in issue with the libs that implement them; not with RTIC directly.


I attempted to mimic Go's single flight package myself [0]. It doesn't have support for the asynchronous world but I suppose that could be made possible. I also have to mention bradfitz for providing the initial implementation (I think?) to learn from. It's really quite elegant!

Does anyone else think the Rust HTTP ecosystem is becoming increasingly fragmented? I can't keep track of which library is best suited for common operations. Is it a web framework built on hyper? Should I pay attention to tower middleware? Where does tracing fit in?

[0]: https://github.com/gsquire/singleflight


I actually think it's gone through a period of fragmentation, and is now heading back towards a more unified ecosystem:

- There was a split between tokio and async-std asynchronous executors, but the ecosystem now seems to be coalescing back around tokio.

- There was weird split where Hyper was the de facto http library, but the best web framework was actix-web which wasn't based on hyper. But now there is Axum, an official Tokio project that is good enough to generally recommended for all web projects, and looks to be the project with momentum going forwards.

- Tracing is really the only game in town when it comes to asynchronous logging, and again is part of the Tokio project.

Tower is a bit of a weird one. It's a very general middleware layer which I think does actually provide a very good abstraction. But that abstraction is quite complex, and learning it is in many cases trickier than doing an implementation from scratch. I suspect it might come to play a bigger part in the Rust ecosystem at some point, but for now I'd ignore it.


Coming from the Ruby ecosystem, a lot of this played out similarly to how the Rack[1] middleware conventions developed in the early Rails v1 and v2 days. Prior to Rack there was a lot of fragmentation in HTTP server libraries, post-Rack everything more or less played nicely as long as libraries implemented Rack interfaces.

I don't write Rust professionally, but it was a bummer seeing that this seems to be a place that was figured out (painfully) in ecosystems used heavily for web development--Javascript and Elixir have their own Rack equivalents[2][3]. I hope that Tower plays a similar role to unify the library ecosystem in Rust.

1. https://github.com/rack/rack

2. http://expressjs.com/en/guide/writing-middleware.html

3. https://github.com/elixir-plug/plug


JavaScripts “Rack” equivalent is actually a library called “connect”, which express used to use under the hood as it’s middleware layer. The fact it doesn’t anymore is why I’m a little pessimistic about Tower, but it’s good to hear this has worked in other ecosystems.


> There was weird split where Hyper was the de facto http library, but the best web framework was actix-web which wasn't based on hyper.

actix-web, like hyper, is built on tokio, so I'm not sure why you see this as a weird split. The underlying HTTP server is not something you generally interact with. actix-web even _uses_ hyper (the h2 crate) for HTTP2.


For me the weird part was when I needed to use an http client within my http server and I was pushed towards actix-http rather than reqwest.


I'm building a somewhat ambitious web client utility in Rust, and am struggling with this now. I'm using Hyper... and its deps you need it to work with (tokio, futures, hyper-tls).

And its begging the question "Is this what I want, or should ditch these, and build a new one using threads". I am not sold on the async paradigm. Some of the Rust OSS community uses it on embedded too, but to me, code using interrupt handlers (or RTIC) is more intuitive.

Immediately on viewing the Hyper guide and hello world, the program structure is messier than a normal Rust program. Colors propagate through the program. I am giving this a shot, but am not optimistic this will be cleaner than threading.


The point of async isn't that it is cleaner than threading. It can be, but by no means is that guaranteed.

The point of async is that at scale threads start becoming expensive. If you don't have high performance requirements under heavy load, async is largely unnecessary.


Async rust currently isn't well-designed for pushing throughput to its limits, though. You really need a lot of batching and very few uses of atomic ops on your CPU to get to extreme throughputs, and Tokio doesn't generally give you great batching. I have also found that good concurrent data structures for Rust are a lot harder to find than other languages.


Stock Tokio is tuned towards the general set of applications, attempting to make things work well out of the box, while being ergonomic to use. But, this isn't set in stone. There are knobs and patterns that can be used to really squeeze out performance, as seen with Actix Web, which is based on Tokio [1].

[1] https://www.techempower.com/benchmarks/


I'm not sure why you can't make a framework tuned towards a "general set" of applications that offers batching and run-to-completion semantics. As far as I can tell, the biggest problem with this is the semantics: async/await and futures models of async programming make it hard to figure out how to produce batches.


> As the popular saying goes, there are only two hard problems in computer science: caching, off-by-one errors, and getting a Rust job that isn't cryptocurrency-related.

For whatever it's worth, we are hiring for a non cryptocurrency Rust job, haha.

https://www.svix.com/careers/


Excellent tour through this topic, here's my best attempt at a tl;dr summary:

1. write a 'Hello World' HTTP server to show how Tokio's async-Rust primitives work for epoll-based I/O;

2. show how the tracing crate ecosystem works to inspect and debug async-runtime tasks;

3. play with converting traces to OpenTelemetry, tokio-console, etc. for different views of the collected data;

4. play with hyper and axum to build a simple HTTP service that makes an upstream HTTP request to the YouTube API;

5. cache results by ID so it can shared across all request-handlers;

6. add 'request coalescing' to prevent multiple concurrent requests from making upstream requests in parallel;

7. fix a bug in the request-coalescing implementation that caused coalesced requests to hang when the in-flight request raised an error.

One word of caution about using any request-coalescing pattern like this on a highly-loaded server: in the case where every request fails to populate the cache (e.g., timeout / not found), you end up in a situation where all requests end up _serialized_ and can quickly lock up a server under constant load. You need to either add a circuit-breaker, or otherwise ensure that _something_ always gets cached in any otherwise-uncacheable edge case (e.g., hit-for-pass in Varnish [1].)

[1] https://info.varnish-software.com/blog/hit-for-pass-varnish-...


Hey amos, if you read this, could you consider adding a table of contents to your articles?

I enjoy your long reads but picking a post back up after pausing can be tricky.


This gets requested often, it's still on my TODO list. I've touched the markdown pipeline code recently (to fix alt text support) so it doesn't feel as scary anymore, but I've got a lot of balls in the air so no ETA!


This is a well written article, if long. The overviews of tracing/open telemetry and tokio are excellent.


> This is a well written article, if long.

That's Amos' brand, alright! ;)

(<3 Amos)


As soon as I saw "fasterthanli.me", I thought, "Well, there goes my afternoon."

Amos writes some amazingly readable, incredibly long posts. Such a treasure.


Cool bear is back, with knobs on! Enjoyed the article, found it useful.


Does anyone know why Axum is so much faster than the basic "spawn a new task to accept the connection and write the response" async webserver? I'm curious what they're doing under the hood. They're both using the Tokio runtime, but Axum is based on Hyper. So I guess it boils down to how Hyper uses Tokio to efficiently accept/serve TCP connections?

EDIT: Actually I'm guessing a lot of it comes down to encoding the HTTP responses, now that I think about it.


A lot of the difference likely comes from the fact that hyper implements HTTP keep-alive, meaning that multiple requests can be handled on the same connection, vs. having to create and terminate a conbection for every individual request.


That's a very good question — I feel like the article is set up in a way that you could easily take the TCP-based, hand-written server and try to match Axum/hyper's performance — implement keep-alive, be smarter about buffering, etc, and see what makes an actual difference.

Of course if you reach a point where the only thing left to do is implement http/2, that... is no longer an evening project


Cool! I suspect that you can do some reshuffling to avoid using any locks when serving cache hits. Also, if you want to serve large files or live video, it may be cool to be able to have the coalesced requests send body bytes to the client before the upstream request is finished. I implemented this feature at one (non-varnish-based) CDN, and other (varnish-based) CDNs seem to support it for some subset of configurations.


Why would you want to write a custom web server instead of using the existing ones? Performance reasons? Or is there really functionality that is missing from the existing ones?


I've written a custom web server and the reason was performance. Web servers have the potential to do quite a lot and each of those activities adds latency, so it made sense to write my own web server to implement exactly what I needed and nothing more.

A minimalist, custom web server can approach 10m req/s as seen on the Techempower Benchmarks.


Sorry to sidetrack, but since the article led with such an awesome opportunity ,

> As the popular saying goes, there are only two hard problems in computer science: caching, off-by-one errors, and getting a Rust job that isn't cryptocurrency-related.

I'm hiring Rust engineers for the future of virtual production. Kids at home are going to leapfrog Disney and make their own Star Wars.

We have some toys that might look silly now, but so much in the pipeline once all of this begins to mature.

- https://FakeYou.com is viral on social media. (Please excuse the service failures as we just reached 1M unique DAUs. Creating an account and logging in boosts queue priority such that your requests will render in under a few seconds.)

- https://create.Storyteller.io does Twitch TTS and deepfaked cheer emotes, has a robust rule based system, and helps creators monetize and engage their audience.

- We also have motion capture, volumetric capture, voice conversion, virtual sets, etc. But I'm so tapped out with the growth that these haven't been spun off yet apart from internal use and close work with a few creators.

Silly toys that don't look like much now, but incredible promise, incredible traction, and signs of so many monetization channels. (Custom voice cloning, paid services for Twitch creators, broadcast studios, hosted rendering, etc. etc. etc.)

I'll hire contract, part time, remote. Post funding (working on this now), huge equity, salary, etc. are in line. But I'm more than happy to draw down my last startup exit to fund this.

Unlike working on plumbing and glue code, this is actually a ton of fun and touches on so many of the most exciting parts of CS. ML, computer vision, audio processing, graphics, ...

Sorry if hustle posts like this break the rules, dang! (The article led so brilliantly I had to say something)

Contact info in my profile, or jump in our Discord.


> > As the popular saying goes, there are only two hard problems in computer science: caching, off-by-one errors, and getting a Rust job that isn't cryptocurrency-related.

Great line. I've been thinking of using Rust (in addition to its merits) as a hiring carrot, to help attract some of the best multi-skilled software engineers.

I've seen this work very well with Common Lisp and Scheme in the past.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: