
Show HN: ZeroHTTPd: A web server to teach Linux performance, with benchmarks - shuss
https://unixism.net/2019/04/linux-applications-performance-introduction/
======
marios
Had a quick glance and while the content seems worthwhile I am a little
saddened by the code.

There are numerous calls to strcpy() which is a buffer overflow waiting to
happen. I know this is not production code but it's just as important. This
code is to be used as a learning tool so I think it's important to use correct
idioms.

Apologies for nitpicking, but code gets copied, often in a hurry and you get
bugs all over the place.

One more comment: in the intro you mention that while the code is mostly
tested on Linux, it's likely it works on other Unixes such as Free/OpenBSD.
You might want to remind the reader that epoll() is Linux-only (bonus points
if you mention the BSDs have something similar called kqueue() :)).

~~~
shuss
Points well taken, marios. Will try and fix strcpy(). Should certainly mention
kqueue(), too. Will fix. Thanks.

~~~
cyberpunk
OpenBSD doesn't have sendfile as far as I know so maybe s/Open// there too

~~~
shuss
OpenBSD and FreeBSD do have sendfile(), but args are different compared to the
interface presented by Linux. It should be trivial to write a wrapper, though:

[https://man.openbsd.org/FreeBSD-11.1/sendfile.2](https://man.openbsd.org/FreeBSD-11.1/sendfile.2)
[https://linux.die.net/man/2/sendfile](https://linux.die.net/man/2/sendfile)

~~~
cyberpunk
That manpage is an import from freebsd, not part of base -- As far as I can
see on my system it's not there:

OpenBSD:

    
    
        $ man 2 sendfile
        man: No entry for sendfile in section 2 of the manual.
    
        $ grep -rni sendfile /usr/include | wc -l
           0
    

FreeBSD:

    
    
        # grep -rni sendfile /usr/include | wc -l
              30
    

edit: typo

~~~
dallbee
On Openbsd you have sosplice, which can be used much like sendfile.

~~~
cyberpunk
Oh neat, I had no idea that existed. Will have a play, thanks for the tip!

------
pjc50
Interesting. Reminds me of when I worked on the Zeus Web Server:
[https://en.wikipedia.org/wiki/Zeus_Web_Server](https://en.wikipedia.org/wiki/Zeus_Web_Server)

It was a static content focused poll-based (might have had epoll too) web
server written in non-STL C++ which dramatically outperformed multi-threaded
Apache. I remember it had a "stat cache" to reduce the number of syscalls
made, and a nice set of string classes for passing around substrings of e.g.
HTTP headers.

~~~
joosters
I don't think the epoll/kqueue version ever got an official release :(

The syscall reduction was pretty extreme, even down to keeping a shared-memory
copy of time() to share across processes. IIRC that was only a benefit for HP-
UX/IRIX or something like that...

------
daurnimator
Need a uring based one!
[http://kernel.dk/io_uring.pdf](http://kernel.dk/io_uring.pdf)

~~~
mistrial9
that looks interesting; it would make the project clearer to others to add
some title, date, institution or other mark to find project backing, at the
top of the doc, and the same with details in the endnotes.

~~~
the8472
he's a fulltime kernel developer and maintainer of the block layer subsystem.

------
_pmf_
Another project in the performance oriented Linux HTTP servers is
[https://lwan.ws/](https://lwan.ws/) (using pervasive zero copy), but as far
as I know it does not cover a lot of features yet.

~~~
shuss
Never seen this before. Very interesting. Thanks for sharing!

------
okket
Great work! But it should say somewhere which version of HTTP it implements. I
guess it is 1.1 and will likely stay so?

~~~
masklinn
> I guess it is 1.1

Negative ghost rider, it's hard-coded to 1.0:
[https://github.com/shuveb/zerohttpd/blob/master/01_iterative...](https://github.com/shuveb/zerohttpd/blob/master/01_iterative/main.c#L468)

(in reality it's barely even that e.g. it only handles GET and POST methods,
discards every header, …, so it's an HTTP server in the sense that it kinda
sorta will respond to HTTP requests).

~~~
tele_ski
So there is no keep alive in these tests including lb to application? Makes
sense why the qps is so low on a single core for all versions tested if nginx
is having to reopening a new socket each time to zerohttpd. Not sure how
useful this is as keeping your connection alive to your lb is important for
throughput.

~~~
derefr
Not in all use-cases. If your backend is serving long-lived HTTP streams (big
downloads; chunked SSE streams; websocket sessions), it may make more sense to
close and re-open those sockets between sessions, since they live long enough
to establish TCP window characteristics that may not apply to the session
succeeding them (e.g. an interactive-RPC websocket session, reusing a TCP
connection previously used to stream a GB of data using huge packets, will
start off quite a bit slower for its use-case than a “fresh” TCP session
would.)

~~~
tele_ski
Thats fair, always need to optimize for your use case. I run thousands of
small transactions per second so no keep-alive would absolutely kill us.

------
bvinc
Correct me if I'm wrong, but don't modern web servers often use a handful of
threads, and use epoll for each thread? If so, could this be a Part VIII?

~~~
shuss
Yes, IIRC, even Nginx uses the general structure you're describing.

~~~
xorcist
So does Apache with mpm_event, which has been around for close to 15 years
now.

The fact that you _can_ run Apache in a process pool model doesn't mean you
_should_. That mode was mostly kept to support CGI scripts and old style
mod_php.

------
the8472
Needs peak (not averages) latency measurements

~~~
kim0
Maybe p95 makes more sense than both

~~~
the8472
Not really, with pages loading hundreds of assets (especially when relying on
http2 instead of asset packing/image sprites) your single-request 99.5th
percentile will become the floor for your mean page load time.

------
navinsylvester
Very insightful article. Thanks for putting it together. Looking forward to an
update with io_uring.

~~~
shuss
Thanks, navinsylvester :)

------
antisemiotic
Heh, you beat me to publishing my own simple HTTP server. Though the one I'm
working on is even more dumbed down, only supporting GET to serve prepared
HTTP messages from a hash table (this requires a creative interpretation of
RFC 7230 to avoid including the _Date_ header which would have to be updated
each time).

I very much approve of this tutorial, I remember when long time ago I tried to
understand how a web server works, and learned that I need to install Apache,
then put stuff in CGI directory, or maybe just use a framework and not have a
separate server at all... It was quite confusing for me back then. But the
core functionality of a HTTP server is just listen on a port, accept
connections, read text, write text.

For anyone interested, I also recommend to read "Unix Network Programming"
books; the first part is about actual network programming, the second about
inter-process communication on a single computer. For example, the old art of
using Unix-domain sockets for TCP or UDP on a single computer (harder to hack
by shady javascript in your browser!) seems unjustly forgotten.

~~~
bgilroy26
For a more introductory audience, Ruslan Spivak's Let's Build a Web Server
series of blog posts is a great hands-on learning resource

[https://ruslanspivak.com/lsbaws-part1/](https://ruslanspivak.com/lsbaws-
part1/)

~~~
shuss
Thanks for sharing this. This is such a great resource.

------
cadamangue
Great article, explains things better than most O'Reilly books.

~~~
shuss
LOL! Thanks :)

------
megous
It would be nice to also compare memory use, which will vary between
implementations.

~~~
shuss
That should be pretty trivial to implement with getrusage()

[http://man7.org/linux/man-
pages/man2/getrusage.2.html](http://man7.org/linux/man-
pages/man2/getrusage.2.html)

------
dominicl
Awesome comparison, super curious how this would scale out on the current
generation of big CPU machines. E.g. Epyc with 64 cores, would threads still
perform that well?

~~~
shuss
Would be lovely to get my hands on such metal :) I wonder how Linux thread
scheduling scales on multi-CPU machines. To keep things simple, I specifically
chose to go with a single-core machine to benchmark all architectures.

~~~
knotty66
This might be of interest ?[https://www.amd.com/en/campaigns/packet-epyc-
challenge](https://www.amd.com/en/campaigns/packet-epyc-challenge)

~~~
shuss
Thanks for sharing!

------
cbsmith
Needs async IO variant as well...

~~~
shuss
Unfortunately, the current POSIX AIO implementation is done in user space by
glibc. That's the reason why I covered poll and epoll. The next logical
variant to add would be io_uring.

~~~
snaky
There's kernel AIO as well.

[https://blog.cloudflare.com/io_submit-the-epoll-
alternative-...](https://blog.cloudflare.com/io_submit-the-epoll-alternative-
youve-never-heard-about/)

~~~
shuss
Oh, wow. Never knew about this. This is most certainly worth checking out.

~~~
cbsmith
Not to mention the new stuff being hacked on:
[https://lwn.net/Articles/776703/](https://lwn.net/Articles/776703/)

------
snvzz
Notably lacks part VIII, kqueue-based server.

This is because Linux doesn't support kqueue. Got to make do with their
pointless NiH syndrome fueled epoll().

~~~
shuss
I've never really done any serious work on anything other than Linux. But I
always wanted to try out kqueue(). Even if I don't add a separate section on
kqueue(), I think it warrants a clear mention.

------
layoutIfNeeded
Very nice writeup!

Although I’ve noticed that it’s written in C. I would suggest educational
materials to be written in Rust, unless the topic is very low-level
optimization.

That way you can be sure that even if some novice blindly copies your example
code, it won’t cause any security issues, thus saving you from the liability
:)

~~~
shuss
(Glances at the "Programming Rust" book which has been sitting on his desk for
months and thinks about writing an indemnification clause in the LICENSE file)

