
Rwasa – A high-performance web server in x86_64 assembly - jonathonf
https://2ton.com.au/rwasa/
======
bsaul
Question to the author : Seing the benchmark really made me wonder, how can
you be twice as a fast as nginx ?

I've always thought writing assembly manually was for some very specific edge
cases, or to talk to some very specific hardware, but that it was just a waste
of time for anything else, especially compared to C (and especially with all
the progress in compilers).

Is it the death by a thousand cuts scenario, or are there some big chunks of
performances gained thanks to some specific tricks (and if so, could you give
some example ?). I'm thinking maybe cryptographic functions ?

~~~
2ton_jeff
Hey thanks! Contrary to popular belief, x86_64 assembler isn't really that bad
to deal with. An eagle-eye perspective is simply that regardless of how good
optimising compilers get, they can never really know my intent. My zlib
implementation for example is consistently 25% faster than the reference
version, despite me simply "hand compiling" it straight from the C source.
There are lots of contributing factors that all add up to the end result as
seen in the benchmarks.

The short answer is: there is no single reason, it is the culmination of all
of the underlying bits of the library that made it what it is.

I added code profiling to rwasa so that you can run load tests against it and
watch call graphs and individual function timings, which makes for interesting
inspection of the library itself for specific "tricks" that were employed.
(the library's page contains rwasa-specific profiling examples:
[https://2ton.com.au/HeavyThing/](https://2ton.com.au/HeavyThing/) )

~~~
JoshTriplett
> My zlib implementation for example is consistently 25% faster than the
> reference version, despite me simply "hand compiling" it straight from the C
> source.

Have you compared it to Intel's optimized implementation of zlib, at
[https://github.com/jtkukunas/zlib/](https://github.com/jtkukunas/zlib/) ?

If you can improve on that implementation, please consider submitting a patch.

~~~
anotherangrydev
Congrats on your work 2ton_jeff!

Regarding zlib, is that the fastest implementation that's currently available?

I rememeber stumbling into a guy that claimed his implementation was way
faster (2x or more) than the original zlib, and it was a drop-in replacement
and scaled fairly well. Unfortunately, I can't find it right now (it should be
bookmarked on another computer).

~~~
JoshTriplett
Various people have attempted to speed up zlib; it's not that high a bar, if
you're OK with not producing output binary-identical to the original zlib.
Deflate compression algorithms keep references to possible LZ backreferences
in a hash table, and depending on your choice of hash algorithm, and how much
you prune your table versus spending memory and time storing and searching it,
you'll end up emitting different backreferences. The implementation at
[https://github.com/jtkukunas/zlib/](https://github.com/jtkukunas/zlib/)
replaces the hash algorithm with something much faster.

------
jonathonf
Submitting this so I can make a feature request. :)

The -logpath option is fine, but it would be nice if it could create
subdirectories too (e.g. $LOGPATH/YYYY/mm/access.log.YYYYmmdd ). Otherwise,
over time the log dir is going to get unwieldy.

I'm currently running several sites in alpine+rwasa Docker containers; I'm
liking having a set of entirely isolated web servers based on a 10MB container
image, each apparently consuming ~6KB RAM while idle.

~~~
2ton_jeff
Author here: Feature request noted, although the one-logfile-per-day for me
doesn't seem too unwieldy. I have not seen webserver logs stored in the way
you describe, is that a common practice (each month's worth in its own
separate directory?)

~~~
jonathonf
Depending on what you read [1] cronolog is a decent way of doing things (e.g.
unlike logrotate it doesn't require a reload of web server). Without exporting
to something like ELK, it also gives you a decent audit/accountancy trail.
Limiting logs is not something I can really do in production (think six-years-
plus of mandated data retention).

[1]
[https://startpage.com/do/search?q=cronolog+vs+logrotate](https://startpage.com/do/search?q=cronolog+vs+logrotate)

------
faizshah
Great project, I have been watching another similar project pretty closely:
[https://github.com/h2o/h2o](https://github.com/h2o/h2o)

I've always wondered, how many man hours does writing a small web server like
that take?

~~~
2ton_jeff
h2o is cool too, I contributed to that project early this year for their
OpenSSL/DHE parameter settings.

Re: How much time, hard to say from my perspective since a fair amount of
rwasa's functionality resides in the library itself. Start to finish for all
of the showcase pieces and the library (from 0 lines of code to release) took
13 months of my life :-)

~~~
srean
I almost downvoted you by accident. Thankfully when I reloaded the arrows were
still there. I reply think this "arrows to small" problem needs to be fixed.
Although the refrain seems to be that it is not a problem: the community would
vote such accidental downvotes back up

~~~
Narishma
They should just put one arrow to the left and the other to the right of the
name.

------
rubyn00bie
I've been wondering when something like this would emerge... I'm excited to
try it out on some side projects; seems simple and fast-- hard combo to beat
:)

I'd also like to say, as someone who is quite ignorant of writing x86
assembly, the function hook example is incredibly readable and clear. I'm
looking forward to grokking the rest of the code base in attempt to learn
more.

Thanks for the hard work!

------
faragon
That's a huge effort, kudos. Certainly, it is harder to get maximum
performance from C, as sometimes you have to reorder things in non-trivial
loops in order to "help" the C compiler to get it "right" in the performance
sense (many times involving function inlining and other things that usually
increase code size/bloat). Although the problem of assembly-only is to be tied
to just one platform, because currently not an issue for this case, with most
web servers running on x86_64 CPUs, if allows to increase performance while
reducing memory usage, it makes a lot of sense for super low cost architecture
(tiny memory and CPU usage per active connection).

It would be interesting to see how x86_64 implementation of specific elements
compare to equivalent C code, in terms of instruction per clock efficiency,
cache miss ratio, etc. (e.g. using the "perf" tool in Linux, or any other tool
that use the CPU hardware counters).

------
tux
Few questions to the author.

1\. Is there a config file ? Like "nginx.conf" for example.

2\. Is "reverse proxy" possible ? Like nginx reverse proxy.

Some examples would help.

~~~
jonathonf
You should really just read the configuration examples on the rwasa page.

1\. There is no configuration file. 2\. Yes, you can set an upstream
(-backpath).

------
jjoe
Is that Rwasa as in Russians? Because in one non-Russian dialect I'm familiar
with, it translates exactly to that. If so, I find the reference interesting
considering Nginx author's country of origin. I'm curious how you came up with
the name Rwasa.

Good luck with the project!

~~~
2ton_jeff
Late Sunday night here, but before I sign off for the night, thought you'd be
amused to know that the name rwasa really, truly was an acronym from "Rapid
Web Application Server in Assembler". I have historically sucked at coming up
with decent names, and at the time pre-release this seemed like as good a name
as any :-) I assure you any relation to Russians, African politicians, etc is
wholly coincidence.

------
tyho
From their contact page:

    
    
        $ ssh 2ton.com.au

------
jrk_
If only there would be a way to write a high level description of a web server
which shomehow gets translated into assembly.. :)

Just kidding, nice effort!

------
Sir_Cmpwn
A very cool project, but they seem to think people will run it in production.
Probably a bad idea to run an assembly TLS implementation in production.

EDIT: Here's the header of the file that contains their TLS implementation.
I'll let you be the judge:
[https://gist.github.com/SirCmpwn/ec8aaec128aa3e47ddda](https://gist.github.com/SirCmpwn/ec8aaec128aa3e47ddda)

~~~
2ton_jeff
Do you have specific criticisms of my work and/or attention to detail re: the
TLS specification, and/or my library's application of modular arithmetic et
al?

~~~
Sir_Cmpwn
I'd be happy to give some more detailed thoughts. I write a lot of assembly
myself [1], so hopefully that lends me some credibility.

I haven't read much of your TLS implementation, but I'm not a security
researcher and I don't think I'm qualified to give an opinion on your
particular implementation. However, there are some points to be made here.
First of all, almost no one ships their own crypto for a good reason. To trust
that something is secure, you need to have lots of people working on it and
lots of projects invested in it (something that OpenSSL and such have). "Many
eyes makes bugs shallow" is the common phrase, but it holds truer when large
companies with highly skilled engineers are putting their secrets and their
customer's secrets on the line. Your implementation has the unfortunate
problem of being written in assembly. While I don't think there's anything
inherently wrong with assembly, many people don't share the same opinion.
People will be reluctant to contribute to it (how do people contribute,
anyway?). Also, it is easier to make mistakes in assembly, and I would be
surprised if there weren't several mistakes in your TLS implementation and in
the rest of the server. This doesn't reflect poorly on you as a developer, but
is instead a consequence of choosing assembly.

Assembly also inherits a lot of the common security problems C has (like
buffer overflows), but makes them harder to identify. I would feel very
uncomfortable exposing anything written in assembly to the public net, and
doubly so if it used an unproven TLS stack written in the same. Other projects
avoid the problem of untested crypto by using tested crypto from an external
module like OpenSSL.

[1] [https://github.com/KnightOS/kernel](https://github.com/KnightOS/kernel)

~~~
2ton_jeff
Agreed regarding general trust in any crypto stack. I've been doing commercial
software development for 28 years now, and my company's products all reflect
this. Whether I expect high-value security sites to use my software in
production or not, well I certainly do not. Hardened stacks are few and far
between, and OpenSSL can by no measure be deemed hardened (though certainly
getting better of late thanks to all of the bug releases). Do I expect that my
entire stack is 100% bug-free? No, but one of the niceties IMO of doing
assembly language programming is that it is far less error tolerant in the
ways you describe. Reading all of the nasties re: security-related code, and
then applying the commonly-accepted mitigation strategies was applied
throughout.

Re: how do people contribute, it is on my list of things to do for github's
linguist x86_64 support (which is why I didn't put it all on github to begin
with).

At the end of the day, trust is a function of time and perceived scrutiny of
the stacks at hand. We are getting there slowly but surely :-) Cheers!

~~~
e12e
I'm sorry, you've held off on publishing on github due to missing source code
highlighting? Or am I completely misunderstanding what you're saying (I think
I am...)?

~~~
2ton_jeff
Admittedly I haven't checked recently, but before I released 2 Ton Digital I
did a few test github projects and they all looked horrific so yeah I left it
out on purpose. It's been on my "someday when I am bored" list since then (to
fix up linguist so it all looks half-decent), and also why all the "library as
HTML" on 2ton.com.au is self-highlighted.

