I'm not sure I agree with most of this post. Firstly, a userland socket mux/demu...

I'm not sure I agree with most of this post. Firstly, a userland socket mux/demux implementation isn't such an insurmountable challenge. It's essentially the core of a good http/2 implementation. If you've got a good HTTP/2 implementation, you've necessarily got a good mux/demux solution.

> Now if HAProxy makes multiple connections to the backend each one gets served by its own thread+socket and that's going to load much faster because at the very least it's going to get more attention from the OS.

I'm not sure what you're basing this on. What is the technical definition of "more attention from the OS." If anything, limiting things down to a single process over one connection will improve latency. It'll can help minimize memory copies if you get volume because you'll get more than one frame per read (and you certainly aren't serving a L>4 protocol out of your NIC). Most importantly, it'll remove connection establishment and teardown costs. These aren't free.

Now, if you really are loading backends so that responsiveness is a problem, you'll need to appeal to whatever load balancing solution you have on hand. But most folks agree that this is faster.

> Furthermore, if you use the keepAlive header, and set long timeouts, then you don’t even have to pay the connection penalty.

But you still have head-of-line blocking, so you're still spamming N connections per client to get that concurrency factor. Connections aren't free, they have state and take up system resources. You'll serve more clients with one backend if you take up less resources per connection.

> So essentially you’ve shifted thread management to HAProxy by virtue of its parallel connections which keeps the webserver code pretty simple.

I don't think this is true at all. If you want to serve connections in parallel or access resources relevant to your service in parallel, you're going to need to do that. You can of course choose to NOT do this and spam a billion processes rather than use concurrency.

> And in a C++ program simplicity is key to correctness

I don't think this is unique to C++, but I'm also not sure it's really relevant here. From an application backend's perspective, it's very much the same model. They need a model that supports re-entrant request serving.