First of all, the development is still in early stages, please forgive me for the lack of documentation.
That said, below are some high-level design issues that are not addressed by the README.
- it is designed to be pluggable (though not as flexible as Apache), by providing virtual-host-based configuration of generators, output filters and loggers
- lib/file.c and lib/chunked.c would be a good place to understand their interface if you are interested
- I am (at the moment) not interested in providing support for input filters
That said, this library is very clean and easy to use. I just wish it had high-level functions to create sockets, but that's a minor detail.
Good job by Kazuho, as usual.
nginx does a lot of optimization at many level that ab can't figure out.
In short, it only profiles the speed of the HTTP parser and certainly not the network stack.
There are a lot of things that a HTTP server does in order to keep the connections stack "steady" : Disabling the Nagle algorithm at the right moment, gracefully handling failure, managing slow clients the right way, ...
FWIW, Tsung is an awesome benchmarking tool written in erlang : http://tsung.erlang-projects.org/
But the nginx http parser is one of the things that is fast, so a faster one is interesting.
I think having a library implementation is very useful as well.
Regarding the performance of the HTTP parser, I have heard that picohttpparser (the HTTP/1 parser used by H2O) is much faster than the HTTP parser used by nginx.
If that is true, it is likely due to the difference between the approaches the parsers take. Unlike most parsers, picohttpparser does not have a callback-based API. Instead it uses a loop for parsing the HTTP request and header lines.
note: This comment might be biased since I am the author of H2O and picohttpparser.
picohttpparser is the core of HTTP::Parser::XS which is used by many HTTP application servers for Perl (see http://plackperl.org). There are many deployments using it.
The database is still gonna be a bottleneck. The application is still gonna be a bottleneck. The disks are still gonna be a bottleneck. The network is still gonna be a bottleneck (Apache can saturate a Gbit NIC, so can nginx, so can H2O).
I'm not saying this is useless. It looks like a cool project, built by someone really clever (unco is hilariously clever), with lots of good uses (an easily embedded HTTP server library is nothing to sneeze at). I'm saying I think it's weird and unfortunate that so many people focus on performance of the web server, as though it will make a difference for end users. In the vast majority of web server deployments any of the major web servers will do the job and will perform well enough to not be the bottleneck in the system.
Given that a single web server (Apache or otherwise) can serve millions of pages per hour, it's pretty rarefied air to be talking about 50 or 100 or more web servers. There just aren't a lot of people working on sites with that kind of traffic.
I'm really not saying better performing, more efficient, web servers aren't a good thing. It's great that web servers (including Apache) continue to get faster and more efficient. I just don't think it should be the primary thing we're talking about when comparing new servers to tested and proven existing web servers. There are so many other factors, and performance doesn't matter at all if the web server doesn't do what you need it to do, or is insecure, or is unreliable.
The idea that Apache was ever good enough was wrong then and is wrong now. We always needed a more efficient design.
I'm sure someone could write an even more somewhat more efficient version of nginx but I'm not sure they could do it with as many features as nginx has. Which means you probably would just end up switching to nginx at some point. I think Cloudflare uses nginx. They probably could save on machines if they used a more minimal server.
Further, "forking processes" is actually not incredibly expensive on Linux, and in fact, pthreads on Linux are implemented by the same code (clone() with varying levels of sharing). Forked applications on Linux are pretty much just as fast as threaded applications. It is a myth based on extremely outdated knowledge (fork on Solaris, for instance, and some other UNIX variants, had a history of being slow; but, Linux has always had a very fast fork).
It is true that event-based thread (or process) pool concurrency implementations can provide superior performance to thread- or process-per-connection implementations, for a variety of reasons, but Apache has that covered. I'm gonna guess you've never even used or seen an Apache installation that forked a process for every request (because it's been so long since that was a thing Apache did), so I'm not sure how you could believe it works that way.
Where did you get all of these assertions from? Are there sites out there propagating these crazy claims about Apache? And, if so, why? What does one gain by trash-talking a project that was instrumental in helping build the open web and still powers more websites than any other web server in the world? And, does it well, I might add. There are some good reasons a reasonable admin might choose nginx over Apache. But, they aren't because Apache is a terrible piece of software written by incompetent people.
In short, your comment has negative value, by providing misleading and outright incorrect information.
Edit: And, this is why I hate it when performance is the measuring stick people use to discuss web servers. It begins to seem like it is a useful metric for comparing web servers, when it really is not for 99% (or more) of deployments. Apache is fast enough. nginx is fast enough. Pick your web server based on other characteristics, because otherwise you're almost certainly making decisions based on the wrong things.
I think you are referring to vfork vs. fork. While not terribly expensive, forking certainly is more expensive than the alternatives. (There is a reason why none of the Apache MPM's do that unless you give them a very messed up config.)
However, you are right that the real issue isn't the cost of the fork (which, duh, if it were Apache would totally have you covered!!). It's more about the address space used up by each process/thread. That becomes a limiting factor for high levels of concurrency, though below levels where it may be a limiting factor the model tends to execute very efficiently (arguably more efficiently).
Apache's "event" MPM isn't really quite the same as engines light nginx & lighttpd... it works quite well, but it even describes itself as a "hybrid multi-process/mutli-threaded server".
As for sites "propagating" this information, there is certainly the ol' traditional: http://www.kegel.com/c10k.html
You can also fight some pretty well established companies in the web hosting business that really ought to know of what they speak, like say DreamHost: http://wiki.dreamhost.com/Web_Server_Performance_Comparison
Several other things you say are equally wrong. Claiming that Apache has been away from a process model for decades is dishonest. It's not even decades old and the Apache project itself contradicts you.
Preforking is still recommended for "sites requiring stability" and is in (most?) common usage http://httpd.apache.org/docs/current/mpm.html
> The server can be better customized for the needs of the particular site. For example, sites that need a great deal of scalability can choose to use a threaded MPM like worker or event, while sites requiring stability or compatibility with older software can use a prefork.
Your religious devotion to Apache and claim that "Apache is fast enough" ignores the reality of what nginx can do with so much less CPU and memory. It's not a technical argument but a religious one. Nginx is good enough. Apache is not. It never was, people have lived with it for too long because people like yourself buried their heads in the sand. You're still doing it.
No, that's not strictly true. If you have min & max servers fixed at the same value and the max requests per child set absurdly high, you won't fork much if at all under heavy load. Pre-forking can result in a lot of forking and requests being blocked while you fork if a) you have a surge in traffic and b) your min servers isn't set high enough to cover the surge or alternatively c) your max requests per child is low enough that you are constantly having processes exit.
> I never claimed forking is expensive. I said it's expensive compared to an event loop.
Agreed, although that's kind of a meaningless statement (an event loop is something you go through on a per-request basis), and misses the real problem with the multi-process model: virtual address space for each process.
> It's not a technical argument but a religious one. Nginx is good enough. Apache is not.
Umm... that sounds like a religious argument in its own right. Apache is certainly good enough for plenty of people, and more importantly with all the dynamic content on sites, the web server tends to be a pretty unimportant factor in the performance of many sites. Apache brings other things to the table which are often valued for projects, and there is no reason that needs to be considered a "religious" decision.
Though, while we're on the subject, it's a little too easy to configure nginx in insecure ways, and several configuration examples on the web exhibit pretty bad practices. But, on the whole, I agree that nginx is pretty easy to setup and maintain, and it is a great piece of software.
It's interesting that folks have interpreted my comments to mean people shouldn't use nginx and should always use Apache. I've never suggested that (and, I find it funny, because I was the biggest proponent of adding nginx support to the control panel software I work on). All I've said is that I recommend people not choose their webserver based on performance, because they're all (Apache and nginx in particular, and the one OP posted about) fast enough for the vast majority of websites and environments.
(And god help you if you try to do it with Passenger.)
Certainly performance is nice. I'm not saying it isn't. But, the webserver is going to spend most of its time waiting on your application and your database. I'd like to see folks more focused on standards-compliant behavior, secure behavior, etc. That's not as easy as slapping up an ab benchmark, but it's more useful in helping me decide if a new server is appropriate for my needs, and it's helpful in moving the state of the art forward on fronts that are far more important than squeezing another bajillion requests out of hardware that can already serve a bajillion requests.
Another interesting angle is memory usage. Apache does require more memory than nginx or, presumably, H2O. It's not a huge difference, if Apache is configured as minimally as nginx is, by default, but it's notable on a very high concurrency system. In the "Internet of things", small embeddable web servers will be important. If H2O uses less memory than nginx, that'd be interesting (I think more interesting than performance). But, memory usage isn't really mentioned anywhere that I see...if a memory graph were beside the ab benchmarks, I probably wouldn't have even complained about the benchmarks being so prominent. It would have added a useful and maybe even predictive piece of data to the page.
conf->fd = socket(AF_INET, SOCK_STREAM, 0)
Or is that the point? If you don't need all the configuration and features of an off-the-shelf web server, you can more easily custom-build an H20 HTTP server for your specific needs that is blazing fast?
I wrote it for an iOS game, the idea being that configuration of the game would be done in real time by connecting to your iThing via a web browser and tweaking it using a bunch of HTML widgets or whatever. But its intended users ended up preferring to do their tweaking using some on-screen Cocoa Touch stuff that I'd also put in, so the HTML idea never ended up being taken further.
My library has a few good points, I think: no dependencies, no threading, easy integration, simple API, builds as C89/C99/C++, doesn't hate Windows. (I've often wished other libraries did all of those.) But I don't currently have any plans to work on it further, and you can probably find other options that have been more thoroughly battle-tested, were more carefully planned, and are actively maintained.
Has quite alot of neat features for C web development imho, then again I wrote it so of course I like it.
The Poco Net libraries include a more traditional thread-per-request server which is very easy to use, with a clean OO approach.
I think ACE includes an HTTP server as well, but I have no experience with it.
Big list of alternative libraries at the bottom also.
It also has C and C++ support, with very clean API, django-like templates, initial Python and PHP bindings, Websocket, Webdav, has several threading modes (poll, pool, one thread...) And many other goodies.
Disclaimer: Im the author.