There are numerous calls to strcpy() which is a buffer overflow waiting to happen. I know this is not production code but it's just as important. This code is to be used as a learning tool so I think it's important to use correct idioms.
Apologies for nitpicking, but code gets copied, often in a hurry and you get bugs all over the place.
One more comment: in the intro you mention that while the code is mostly tested on Linux, it's likely it works on other Unixes such as Free/OpenBSD. You might want to remind the reader that epoll() is Linux-only (bonus points if you mention the BSDs have something similar called kqueue() :)).
However, using strcpy() with arbitrary strings, is a bad idea, even using the safer variants (strncpy, strlcpy, ...) is not ideal. In the case of ZeroHTTPd, the length of the path should be checked during request parsing and a 414 error be returned in case it is too long. Once you know the maximum size of your path, you should be able to size your buffers to do all your manipulation safely, and it includes using strcpy/strcat. I wouldn't recommend it but it can be safe.
Done right, string manipulation in C can be very efficient, it is also very tricky, because you need to be aware of your buffer sizes at all times. Ideally, you should avoid copies.
All the string manipulation trickery may be a little too much for such a simple project and it isn't the point, so I suggest maybe just add a few comments along the lines as "this is unsafe". strncpy() is not the right way to do it, it is just a band-aid so that instead of buffer overflows, you get truncated data, which is usually a less severe bug.
Since we are nitpicking, the return values of system calls like recv() are not properly checked. EINTR and EAGAIN have to be taken into account. It it rarely a problem except when the server is overloaded. And considering that the point of ZeroHTTPd is to measure performance, being robust to such conditions matter.
BTW, flooding your server is very educative. I wrote a small, performance-oriented HTTP server myself (with epoll()) and that's at 100% CPU with all queues full that you realize that you really need to read the man pages. The Linux network stack is pretty stable under load but you need to do your part.
Finally, thank you for your work, I had a feeling that epoll() was "the right way" but I didn't test it. You did ;)
$ man 2 sendfile
man: No entry for sendfile in section 2 of the manual.
$ grep -rni sendfile /usr/include | wc -l
# grep -rni sendfile /usr/include | wc -l
It was a static content focused poll-based (might have had epoll too) web server written in non-STL C++ which dramatically outperformed multi-threaded Apache. I remember it had a "stat cache" to reduce the number of syscalls made, and a nice set of string classes for passing around substrings of e.g. HTTP headers.
The syscall reduction was pretty extreme, even down to keeping a shared-memory copy of time() to share across processes. IIRC that was only a benefit for HP-UX/IRIX or something like that...
Negative ghost rider, it's hard-coded to 1.0: https://github.com/shuveb/zerohttpd/blob/master/01_iterative...
(in reality it's barely even that e.g. it only handles GET and POST methods, discards every header, …, so it's an HTTP server in the sense that it kinda sorta will respond to HTTP requests).
Its purpose is not to show how to implement an HTTP server, but to show how different architectures of Linux network servers are written and perform.
The fact that you can run Apache in a process pool model doesn't mean you should. That mode was mostly kept to support CGI scripts and old style mod_php.
I very much approve of this tutorial, I remember when long time ago I tried to understand how a web server works, and learned that I need to install Apache, then put stuff in CGI directory, or maybe just use a framework and not have a separate server at all... It was quite confusing for me back then. But the core functionality of a HTTP server is just listen on a port, accept connections, read text, write text.
This is because Linux doesn't support kqueue. Got to make do with their pointless NiH syndrome fueled epoll().
Although I’ve noticed that it’s written in C. I would suggest educational materials to be written in Rust, unless the topic is very low-level optimization.
That way you can be sure that even if some novice blindly copies your example code, it won’t cause any security issues, thus saving you from the liability :)