
Actix Web: Optimization Amongst Optimizations - brandur
https://brandur.org/nanoglyphs/008-actix
======
drenvuk
The reason the rust framework is winning on the fortunes is because of the
ecosystem more so than the framework itself.

For instance, any database benchmark is going to favor frameworks that have a
library that implements pipelining and asynchronicity. The fact that you're
not allowed to modify the library unless you want your framework to be labeled
a "raw" benchmark increases the benchmarks worth but doesn't come without
caveats. The implementer of h20 which is the c server framework/library just
below actix in that chart has mentioned something related to this:

[https://github.com/TechEmpower/FrameworkBenchmarks/tree/mast...](https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/frameworks/C/h2o#database-
tests)

So rust is "production grade" because its official library handles it for the
framework separately. I LOVE these benchmarks because it points out the
differences in each ecosystem when you're trying to find the fastest thing. I
wouldn't have found a bunch of interesting projects without it.

If you want to really know about the raw speed, just look at the plaintext
bench. The top frameworks are all being limited by the 40(!) gbps connection
these guys are running. After picking any one of them the code _you_ write
would be the limiting factor.

------
bhauer
Great write-up, brandur!

Actix has definitely deployed a broad suite of optimizations. When we posted
Round 17 of our Framework Benchmarks, we wrote [1] about the stratified
Fortunes results and attributed it in large part to our agreement to support
the pipelining feature of PostgreSQL's protocol.

We had a conversation with the community about pipelined Postgres [2] when it
was initially brought to our attention. Initially, we were opposed because it
had a smell of cheating. However, after the dialog on the mailing list and
several internal conversations, we remembered that one of the reasons we
created the project was to encourage friendly competition, yield higher-
performance platforms and frameworks, and ultimately benefit the application
developers who use those frameworks. In that light, this form of optimization
is exactly what we wanted to see (whether or not its genesis actually traces
to this project). We _want_ to see such creative ways to reduce the overhead
of frameworks, leaving more CPU capacity to the discretion of the application
developer.

As long as they continue to live up to the spirit of our tests, of course. We
believed this did live up to the spirit of our tests because these
optimizations did not require the application developer to do anything to reap
the benefit--they just run queries as usual.

And then there's that SIMD HTML escaping, which I admit kinda blew my mind.

[1] [https://www.techempower.com/blog/2018/10/30/framework-
benchm...](https://www.techempower.com/blog/2018/10/30/framework-benchmarks-
round-17/)

[2] [https://groups.google.com/d/topic/framework-
benchmarks/Kbd2N...](https://groups.google.com/d/topic/framework-
benchmarks/Kbd2No7xrv8/discussion)

------
pornel
I see it also as a success story for Cargo. Actix can be optimized from top to
bottom, without being a monolithic framework. It doesn't have to reinvent and
optimize every component itself.

The SSE-accelerated escaping, postgres driver, and state-of-the-art hash
tables are all packages that can be used without Actix. And Actix can take
advantage of them without undue runtime overhead (thanks to generics+inlining)
and without burdening users with build/install complexity (it depends on over
140 packages, and Cargo seamlessly takes care of that).

------
JoshMcguigan
> On startup, 128 request and response objects are pre-allocated in per-thread
> pools. As a request comes in, one of them is checked out, populated, and
> handed off to user code. When that code finishes with it and the object is
> going out of scope, Drop kicks in and checks it back into the pool for
> reuse. New objects are allocated only when pools are exhausted, thereby
> saving many needless memory allocations and deallocations as a web server
> handles normal traffic load.

As someone not very familiar with systems programming, I would have expected
this is an optimization that would already happen automatically within the
system memory allocator. Is that not the case?

~~~
fafhrd91
You still need to initialize inner structures and buffers. Cache allows to
avoid that.

~~~
JoshMcguigan
Thanks for the response.

So it sounds like it is not so much the allocation/de-allocation savings that
matters, but rather that you get to skip initializing the memory because it is
already known to be a valid instance of a given struct type?

~~~
jlokier
Yes, that.

See also SLAB allocation, used in many OS kernels:
[https://en.wikipedia.org/wiki/Slab_allocation](https://en.wikipedia.org/wiki/Slab_allocation)

------
anp
I enjoyed this quite a bit. I had read Nikolai's comment back when it was
linked in a few discussions and felt familiar with the techniques, but it's
really nice to have a detailed catalog with explanations and links :D.

nit re:

    
    
        Although designed to help speed up the parsing of 
        XML documents, it turns out they’re also useful for 
        optimizing web-related features.
    

HTML is syntactically very close to XML in a bunch of ways (cf XHTML's attempt
to unify them IIRC?), so it seems almost exactly like the intended use of the
instructions based on this description.

------
iknowstuff
Nice write-up. Glad to see 2.0 performing just as well.

------
darksaints
Is anyone else just completely flabbergasted at the performance levels we're
seeing at the top end of these benchmarks? 880k requests per second, with each
request handling a database query? 7 million requests per second without
database access? All on a single 16 core server? I don't even know what to
think anymore.

------
The_rationalist
Beware, actix web has been caught cheating at this benchmarck.

Cf: [https://64.github.io/actix/](https://64.github.io/actix/)

~~~
dpc_pw
Can you point in more detail where was the cheating?

What you link to talks over the great `unsafe` drama, but that's not cheating.
`unsafe` is in Rust for a good reason. Every C and C++ codebase is effectively
one big `unsafe` block from the perspective of Rust developer, and we don't
think C/C++ are cheating. Rust devs frown upon overusing `usafe` for a good
reason: it leads to obscure bugs and security issues. But if you use it
correctly, in place that is critical for your performance - go for it.

Having said that - a lot of frameworks in that benchmark are over-optimizing
in ways that no real practical app would do and arguably are cheating.If
there's anything in particular about actix here, please let me know - I'm
honestly interested in it.

~~~
mwcampbell
The post linked in the GP discusses cheating in this section:
[https://64.github.io/actix/#blazingly-fast-or-
not](https://64.github.io/actix/#blazingly-fast-or-not)

Note that this is for the "raw" version of the plain-text and JSON benchmarks,
not the fortune benchmark covered in the OP. But I wonder why that raw version
is even there. It certainly isn't "realistic", as it claims to be in
benchmark_config.json

~~~
gpm
Also from that section

> In fairness, it seems other benchmarks are doing these things

Personally, I think it's pretty reasonable to sink to the same level as the
competition.

