However, it seems worth mentioning that webservers haven't been a bottleneck for a long time now. Your bottleneck is always disk I/O, the network, or the slow application server that you're proxying to.
For reference: Wikipedia serves roughly 8k pageviews/sec on average for a total of ~20 billion pageviews/month.
Assuming each pageview consists of ~10 webserver hits we're looking at ~80k requests/sec.
This is within the realm of a single instance of either nginx or h2o on a beefy machine [on a very beefy network].
So, unless you plan to serve Wikipedia or Facebook from a single server, you're probably fine picking your webserver software on the basis of features rather than benchmarks.
Wikipedia uses 15,000 cpus on 750 hosts.
These are not all webservers.
Also, even though they could run their load on a fraction of the hardware it would probably not make a lot of sense to optimize for that, as the potential cost savings are relatively small (servers are cheap).
And, again, 750 hosts is not too bad for an operation the size of Wikipedia.
A running server doesn't cost much in the grand scheme of things. You can infer that from the fact that ISPs will rent you one for under $30/mo and still make a profit on you.
Consequently the potential savings may not be trivial, but they're hardly big enough to justify tying up large parts of the team in an effort to squeeze out the last x%.
Is there already support for configuration files? Because for me the performance isn't the most important issue, in fact the main reason I'm using nginx over Apache is that I don't want to deal with .htaccess any more.
I think if you would consider adding support for the nginx config file format to H2O, thus making it a drop-in replacement for it (if all the used features are actually supported), you could give the project a huge boost.
The configuration file format is YAML, and the directives can be seen by running `h2o --help` (the output of version 0.9.0 is: https://gist.github.com/kazuho/f15b79211ea76f1bf6e5).
Unfortunately they are not compatible with that of Nginx. I do not think it is possible to take such approach considering the differences in the internals of both servers.
Wait, you don't use Apache because you don't like .htaccess files?
But of course I don't mind all the other awesome features you get by using nginx.
There are also if directives in there, but they don't really work they way you think. You really need a deep understanding of its parsing rules in order to do anything remotely complicated with it. It's certainly possible to do better.
(Please don't mention Apache here and its steaming pile of faux-xml. Existence of worse does not make better.)
Could someone elaborate on what this is referring to?
 http://php.net/manual/en/ini.core.php#ini.cgi.fix-pathinfo - the docs on cgi.fix_pathinfo don't mention that it affects which PHP file gets executed at all.
This has nothing to do with nginx's config format at all. It's simply a vulnerability in FastCGI for PHP and the way PHP file paths are passed.
Not a fan of nginx's config but I'm a huge fan of nginx.
Here's one. Look at the example request loop on <https://github.com/h2o/picohttpparser/>. It reads from a socket, appending to an initially-empty buffer. Then it tries to parse the buffer contents as an HTTP request. If the request is incomplete, the loop repeats. (h2o's lib/http1.c:handle_incoming_request appears to do the same thing.)
In particular, phr_parse_request doesn't retain any state between attempts. Each time, it goes through the whole buffer. In the degenerate case in which a client sends a large (n-byte) request one byte at a byte, it uses O(n^2) CPU for parsing. That extreme should be rare when clients are not malicious, but the benchmark is probably testing the other extreme where all requests are in a single read. Typical conditions are probably somewhere between.
Hmm. Not sure if that's universally true. I've heard that in some cases HTTP requests can be quite large due to having several cookies, long strings of Accept: garbage, etc. My impression was that this could make them not only exceed the MSS (thus be in multiple packets) but also the congestion window (thus be in multiple round trips). It doesn't matter then how fast the client code is.
On looking a little now, my information may be old. Apparently Linux increased the default initial congestion window back in 2011: https://www.igvita.com/2011/10/20/faster-web-vs-tcp-slow-sta... . Is the same true for all the widely-deployed versions of iOS, Windows, and OS X?
> Instead, accept filters were used and to this day are advised in such situations by nginx people.
Hadn't seen that before: http://www.freebsd.org/cgi/man.cgi?query=accf_http&sektion=9...
Interesting feature, and certainly would the initial congestion window problem, but there doesn't seem to be an equivalent on Linux.
Sure. I might have slightly overgeneralized some things to illustrate the point.
> equivalent on Linux
TCP_DEFER_ACCEPT is something similar on Linux
Not similar enough. In the case I was describing, there are some bytes to read but not a full HTTP request. TCP_DEFER_ACCEPT prevents accept() from returning until there are some bytes to read. That doesn't help.
Sorry this is a bit off-topic (and doesn't apply to H2O as it's been in the works for a while looking at the commits), but I wonder, today, with a language like Rust (1.0 is at the door ), as performant as its safe C equivalent but modern and safe by design (and with an escape hatch to C/C++ if needed), what would be the advantages of starting a long term project of this type in C today?
Edit: why the downvotes?
And even if it was, it would take 3-5 years until it gets any decent adoption (if that happens, which remains to be seen). It doesn't even have Go level adoption yet, and Go's adoption is not something to write home about either.
C, people know very well, has tons of tooling, plays well in all platforms and has all the libraries in the world available for it.
I wouldn't say so... Go may not be as widespread as other older languages, but the speed at which it's taking over new developments (and sometimes re-writes) can't be glossed over that easily.
Regarding GP's comment, I believe the number 1 argument why that project is started in C is performance. Even nginx, which is also written in C, can't match H2O's speed; I doubt Rust with all the micro-management you can have could beat this level of dedication towards performance.
It has a decent following for a new-ish (5 years) language, but the HN echo-chamber makes it seem ever larger than it actually is.
In the general industry it's nearly a statistical error, especially in the enterprise.
Because under seemingly every C project discussed here someone asks this question or claims that it is "stupid to do something like this in C" and always gets the same answers. Some users might have felt like you were trolling.
You could just have easily have asked why it wasn't written in Lisp. It's just not relevant.
1.0 Rust gives an option for people wanting to use it in production, and as far as comparing it with C goes it has a lot more functionality, but there is still a long way to go before the "stable" Rust has all the awesomeness that Rust nightlies have right now.
. It's missing content-encoding handling on the receiving side
. No http continue support
. No regex routing support
. No header rewrites
to name a few.
It's certainly not full-featured, but I don't think any of the omissions you mentioned should invalidate a performance comparison. I'd expect them to have little cost when not used, and I assume he's not using them for nginx in these tests.
Just because features are not used doesn't mean they don't have a cost. Features drives you to make design choices that can affect the overall performance otherwise the feature cannot be implemented. Adding more code to your program might trigger a different code layout which can affect memory cache, and branch prediction.
It's not fair to blindly compare webserver performances without knowing the bottlenecks of each and how the faster webserver overcame those bottlenecks.
And if you needed those features, then you wouldn't care if h2o is faster or not.
h2o doesn't even use sendfile(2) which means file data is read and copied to the userland just to be copied back to the kernel to send it to the socket buffer. Turn this on in nginx and you'll see a significant performance improvement.
nginx can be tuned a lot to improve performance and the author didn't bother doing that.
This doesn't solve the other side of the problem that spritesheets are meant to solve, namely that an individual image will not be loaded yet when the first UI element using it is displayed (e.g. in a CSS rollover, or new section of a SPA appears). I can't see a way that new protocols are going to solve this, unless I'm missing something in how HTTP2 is going to be handled by the browser?
I assume that once you're forced to preload everything you might need for the page, it's no longer more efficient to break up into multiple tiny requests.
HTTP/2 server push. Your server can proactively deliver things like rollover state graphics knowing that the client will need them.
Also, I realized shortly after commenting that I missed the obvious benefit of avoiding downloading global spritesheets and other compiled assets for individual pages that only use a subset of the content.
I've always seen the "image is already loaded" as a "nice" side effect but spritesheets can have issues in mobile contexts as the whole image must be decoded jsut to access the sprite, it's unclear how effectively browsers cache the individual sprites in memory, compared to individual images too.
Still, an exciting time when we can combine files based on what is the most logical grouping, rather than what is the most efficient. I look forward to the day when HTTP2 rules the world.
The OCaml community, and probably others, have noted that printf is an embedded DSL and treat it as something to be compiled rather than interpreted.
http://caml.inria.fr/pub/docs/manual-ocaml/libref/Printf.htm... (I have a memory of this being type safe and doing stuff at compile time but I don't see it now)
Rust borrows heavily from OCaml, and uses compile time macros for printf and regex, i.e. format! and regex! (the trailing ! means it's a macro that can be further compiled by the compiler).
Also, http://www.ciselant.de/projects/gcc_printf/gcc_printf.html#p... shows gcc did something in this area 9 years ago.
http://www.cygwin.com/ml/libc-hacker/2001-08/msg00003.html indicates gcc had printf optimizations in 2001.
I expect there are older examples.
qrintf seems to be 'just' a more aggressive version of this.
I guess being more aggressive makes sense in applications that do a lot of simple string formatting.
Is it getting commercial support/funds ?
For myself, developing H2O is part of my job at DeNA (one of the largest smartphone game providers in Japan).
I am not so much into web-servers (yet), but I found this in the feature list:
HTTP/1 only (no HTTPS)
It also puzzled me, that https is not supported, but in the benchmarks I found a part: "HTTPS/2 (reverse-proxy)". As I said, I am not so much in Web-servers and https/2, but that was a little confusing.
HTTP and HTTPS (both version 1 and 2) are supported for downstream connections (i.e. connection bet. H2O and web browsers).
Only plain-text HTTP/1 is supported for upstream connections (connection bet. H2O and web application servers).
There's plenty of kitchen sinks out there.
To clarify for my mind:
Can I than use H2O to connect to a web-browser via HTTPS and H2O is routing the same request upstream via HTTP to a web application server? (that of course would suffice for me).
Any of the servers/clients that support spdy currently will eventually make the minor changes, and call it http 2.0.
However, in the real world, the number of requests per second a http daemon can perform is the last thing to worry about. If the web is slow, it's not because Apache used to be bloated with thread. It's because of bad architecture: centralization of services, latency in page builds time, size of static components, data store bottlenecks, etc...
Nevertheless, a very cool project. One I'll follow closely.
I'm not seeing that there is yet a portable version however.
1) So, you're saying that the fact that you can hire college freshman to complete your project outweighs the issues introduced by using the wrong tool for the job?
1) Not me, the market. I think it is silly, too.