
Show HN: Initial release of H2O, and why HTTPD performance will matter in 2015 - kazuho
http://blog.kazuhooku.com/2014/12/ann-initial-release-of-h2o-and-why.html
======
moe
Very nice work, competition is always good.

However, it seems worth mentioning that webservers haven't been a bottleneck
for a long time now. Your bottleneck is always disk I/O, the network, or the
slow application server that you're proxying to.

For reference: Wikipedia[1] serves roughly 8k pageviews/sec on average for a
total of ~20 billion pageviews/month.

Assuming each pageview consists of ~10 webserver hits we're looking at ~80k
requests/sec.

This is within the realm of a _single_ instance of either nginx or h2o on a
beefy machine [on a _very_ beefy network].

So, unless you plan to serve Wikipedia or Facebook from a single server,
you're probably fine picking your webserver software on the basis of features
rather than benchmarks.

[1]
[http://reportcard.wmflabs.org/graphs/pageviews](http://reportcard.wmflabs.org/graphs/pageviews)

~~~
sparkzilla
You may be a bit off with your calculation. Wikipedia uses 15,000 cpus on 750
hosts.
[http://ganglia.wikimedia.org/latest/](http://ganglia.wikimedia.org/latest/)

~~~
moe
_You may be a bit off with your calculation_

Where?

 _Wikipedia uses 15,000 cpus on 750 hosts._

These are not all webservers.

Also, even though they _could_ run their load on a fraction of the hardware it
would probably not make a lot of sense to optimize for that, as the potential
cost savings are relatively small (servers are cheap).

~~~
jaytaylor
The costs incurred for a system compromised of 750 hosts or 15,000 CPUs are
most certainly nontrivial.

~~~
moe
First off, 750 hosts don't have 15k CPUs. The ganglia metric likely refers to
_cores_ and doesn't account for hyperthreading.

And, again, 750 hosts is not too bad for an operation the size of Wikipedia.

A running server doesn't cost much in the grand scheme of things. You can
infer that from the fact that ISPs will rent you one for under $30/mo and
still make a profit on you.

Consequently the potential savings may not be trivial, but they're hardly big
enough to justify tying up large parts of the team in an effort to squeeze out
the last x%.

------
rkrzr
Congrats on shipping! This project looks very interesting already and will
hopefully pick up more contributors.

Is there already support for configuration files? Because for me the
performance isn't the most important issue, in fact the main reason I'm using
nginx over Apache is that I don't want to deal with .htaccess any more.

I think if you would consider adding support for the nginx config file format
to H2O, thus making it a drop-in replacement for it (if all the used features
are actually supported), you could give the project a huge boost.

~~~
xorcist
nginx' configuration format leaves a lot to be desired, as evidenced by the
(former, hopefully) widespread use of exploitable php calls.

There are also if directives in there, but they don't really work they way you
think. You really need a deep understanding of its parsing rules in order to
do anything remotely complicated with it. It's certainly possible to do
better.

(Please don't mention Apache here and its steaming pile of faux-xml. Existence
of worse does not make better.)

~~~
lonnyk
> as evidenced by the (former, hopefully) widespread use of exploitable php
> calls

Could someone elaborate on what this is referring to?

~~~
meowface
Read through [https://nealpoole.com/blog/2011/04/setting-up-php-fastcgi-
an...](https://nealpoole.com/blog/2011/04/setting-up-php-fastcgi-and-nginx-
dont-trust-the-tutorials-check-your-configuration/)

~~~
makomk
That's not really nginx's fault, its behaviour is quite sensible. The main
problem is that PHP does some poorly-documented magic behind the scenes[1]
that modifies the information nginx gives it in a way that causes security
issues. The solution is not to do that; if you really need the path-splitting
functionality that cgi.fix_pathinfo provides, it's better and safer to set
fastcgi_split_path_info in the nginx configuration instead.

[1] [http://php.net/manual/en/ini.core.php#ini.cgi.fix-
pathinfo](http://php.net/manual/en/ini.core.php#ini.cgi.fix-pathinfo) \- the
docs on cgi.fix_pathinfo don't mention that it affects which PHP file gets
executed at all.

------
scottlamb
I'm skeptical of the performance numbers. First, like others here I don't
believe nginx's performance will be a bottleneck for HTTP/2\. Beyond that, I
suspect there are cases in which this code is much worse than nginx.

Here's one. Look at the example request loop on
<[https://github.com/h2o/picohttpparser/>](https://github.com/h2o/picohttpparser/>).
It reads from a socket, appending to an initially-empty buffer. Then it tries
to parse the buffer contents as an HTTP request. If the request is incomplete,
the loop repeats. (h2o's lib/http1.c:handle_incoming_request appears to do the
same thing.)

In particular, phr_parse_request doesn't retain any state between attempts.
Each time, it goes through the whole buffer. In the degenerate case in which a
client sends a large (n-byte) request one byte at a byte, it uses O(n^2) CPU
for parsing. That extreme should be rare when clients are not malicious, but
the benchmark is probably testing the other extreme where all requests are in
a single read. Typical conditions are probably somewhere between.

~~~
zzzcpan
You are incorrect, modern clients are fast and requests typically reside in
buffers by the time event driven webservers decide to read them. Nginx had
parsing with state retention because of the simple idea to handle large
amounts of slow clients, which was a problem back when nginx was born. As it
turned out later it didn't help with malicious clients at all, because costs
to retain clients' connections and get and process each new portion of data
were still very high. Instead, accept filters were used and to this day are
advised in such situations by nginx people.

~~~
scottlamb
> You are incorrect, modern clients are fast and requests typically reside in
> buffers by the time event driven webservers decide to read them.

Hmm. Not sure if that's universally true. I've heard that in some cases HTTP
requests can be quite large due to having several cookies, long strings of
Accept: garbage, etc. My impression was that this could make them not only
exceed the MSS (thus be in multiple packets) but also the congestion window
(thus be in multiple round trips). It doesn't matter then how fast the client
code is.

On looking a little now, my information may be old. Apparently Linux increased
the default initial congestion window back in 2011:
[https://www.igvita.com/2011/10/20/faster-web-vs-tcp-slow-
sta...](https://www.igvita.com/2011/10/20/faster-web-vs-tcp-slow-start/) . Is
the same true for all the widely-deployed versions of iOS, Windows, and OS X?

> Instead, accept filters were used and to this day are advised in such
> situations by nginx people.

Hadn't seen that before:
[http://www.freebsd.org/cgi/man.cgi?query=accf_http&sektion=9...](http://www.freebsd.org/cgi/man.cgi?query=accf_http&sektion=9&apropos=0&manpath=FreeBSD+10.1-RELEASE)

Interesting feature, and certainly would the initial congestion window
problem, but there doesn't seem to be an equivalent on Linux.

~~~
zzzcpan
> in some cases HTTP requests can be quite large

Sure. I might have slightly overgeneralized some things to illustrate the
point.

> equivalent on Linux

TCP_DEFER_ACCEPT is something similar on Linux

~~~
scottlamb
> TCP_DEFER_ACCEPT is something similar on Linux

Not similar enough. In the case I was describing, there are some bytes to read
but not a full HTTP request. TCP_DEFER_ACCEPT prevents accept() from returning
until there are some bytes to read. That doesn't help.

------
stephth
Interesting article. And congratulations for the release!

Sorry this is a bit off-topic (and doesn't apply to H2O as it's been in the
works for a while looking at the commits), but I wonder, today, with a
language like Rust (1.0 is at the door [1]), as performant as its safe C
equivalent but modern and safe by design (and with an escape hatch to C/C++ if
needed), what would be the advantages of starting a long term project of this
type in C today?

[1] [http://blog.rust-lang.org/2014/12/12/1.0-Timeline.html](http://blog.rust-
lang.org/2014/12/12/1.0-Timeline.html)

Edit: why the downvotes?

~~~
coldtea
Rust is not even 1.0.

And even if it was, it would take 3-5 years until it gets any decent adoption
(if that happens, which remains to be seen). It doesn't even have Go level
adoption yet, and Go's adoption is not something to write home about either.

C, people know very well, has tons of tooling, plays well in all platforms and
has all the libraries in the world available for it.

~~~
rakoo
> Go's adoption is not something to write home about either.

I wouldn't say so... Go may not be as widespread as other older languages, but
the speed at which it's taking over new developments (and sometimes re-writes)
can't be glossed over that easily.

Regarding GP's comment, I believe the number 1 argument why that project is
started in C is performance. Even nginx, which is also written in C, can't
match H2O's speed; I doubt Rust with all the micro-management you can have
could beat this level of dedication towards performance.

~~~
coldtea
> _I wouldn 't say so... Go may not be as widespread as other older languages,
> but the speed at which it's taking over new developments (and sometimes re-
> writes) can't be glossed over that easily._

It has a decent following for a new-ish (5 years) language, but the HN echo-
chamber makes it seem ever larger than it actually is.

In the general industry it's nearly a statistical error, especially in the
enterprise.

------
halayli
This doesn't look like a complete HTTP server, comparing it with nginx is not
fair.

. It's missing content-encoding handling on the receiving side

. No http continue support

. No regex routing support

. No header rewrites

to name a few.

~~~
scottlamb
> This doesn't look like a complete HTTP server, comparing it with nginx is
> not fair.

It's certainly not full-featured, but I don't think any of the omissions you
mentioned should invalidate a performance comparison. I'd expect them to have
little cost when not used, and I assume he's not using them for nginx in these
tests.

~~~
halayli
Omitting one feature might not but things add up. Compare HTTP parsing in
nginx with this one and you'll notice how much more rigor nginx is about it.
Parsing transfer-encoding by itself is costly for example, and nginx does it
whether you ask for it or not because it has to populate the structure after
all.

Just because features are not used doesn't mean they don't have a cost.
Features drives you to make design choices that can affect the overall
performance otherwise the feature cannot be implemented. Adding more code to
your program might trigger a different code layout which can affect memory
cache, and branch prediction.

It's not fair to blindly compare webserver performances without knowing the
bottlenecks of each and how the faster webserver overcame those bottlenecks.

~~~
rgbrenner
It's perfectly fair because those using h2o won't be using any of those
features (because they can't). So the fact that nginx provides these unwanted
features is irrelevant.

And if you needed those features, then you wouldn't care if h2o is faster or
not.

~~~
halayli
It's not fair because you are not comparing apples to apples. I can write a
poll loop that accepts and sends back http responses and call it a webserver.

h2o doesn't even use sendfile(2) which means file data is read and copied to
the userland just to be copied back to the kernel to send it to the socket
buffer. Turn this on in nginx and you'll see a significant performance
improvement.

nginx can be tuned a lot to improve performance and the author didn't bother
doing that.

------
robbles
> Instead, switching back to sending small asset files for every required
> element consisting the webpage being request becomes an ideal approach

This doesn't solve the other side of the problem that spritesheets are meant
to solve, namely that an individual image will not be loaded yet when the
first UI element using it is displayed (e.g. in a CSS rollover, or new section
of a SPA appears). I can't see a way that new protocols are going to solve
this, unless I'm missing something in how HTTP2 is going to be handled by the
browser?

I assume that once you're forced to preload everything you might need for the
page, it's no longer more efficient to break up into multiple tiny requests.

~~~
zub
> I can't see a way that new protocols are going to solve this, unless I'm
> missing something in how HTTP2 is going to be handled by the browser?

HTTP/2 server push. Your server can proactively deliver things like rollover
state graphics knowing that the client will need them.

~~~
robbles
good point. I imagine determining when to push these assets will become a
complex choice though.

Also, I realized shortly after commenting that I missed the obvious benefit of
avoiding downloading global spritesheets and other compiled assets for
individual pages that only use a subset of the content.

------
Shish2k
Looking at the tangentially linked qrintf project that H2O uses (
[https://github.com/h2o/qrintf](https://github.com/h2o/qrintf) ), replacing
generic sprintf calls with specialised versions for a 10x speed boost - that
seems like a brilliant idea, I wonder why it took so long for somebody to
think of it?

~~~
nly
One of the problems with compiling print/scanf is that a lot of the overhead
comes from locale handling, which is a runtime variable. Parsing is fairly
negligible for short format strings.

~~~
justincormack
Few applications want locale support. Mostly you want reproducible output.

------
zzzcpan
Socket API is a bottleneck now, right? So, next step: roll your own http-
friendly tcp stack on top of netmap/dpdk and get 10x performance increase over
nginx.

~~~
wmf
IX is something like that:
[https://www.usenix.org/conference/osdi14/technical-
sessions/...](https://www.usenix.org/conference/osdi14/technical-
sessions/presentation/belay)

------
jarnix
Obviously it's great software. Does Kazuho work alone on this ? If it's meant
to replace nginx, it needs a lot of other
options/functions/extensions/modules/...

Is it getting commercial support/funds ?

~~~
kazuho
Contributors are highly welcome, obviously!

For myself, developing H2O is part of my job at DeNA (one of the largest
smartphone game providers in Japan).

------
PythonicAlpha
Looks very promising!

I am not so much into web-servers (yet), but I found this in the feature list:

    
    
      reverse proxy
        HTTP/1 only (no HTTPS)
    

Are there any plans to add also HTTPS-support for reverse proxy? Since I have
to include a secondary (Tornado) web-server unto my stack for dynamic pages.

It also puzzled me, that https is not supported, but in the benchmarks I found
a part: "HTTPS/2 (reverse-proxy)". As I said, I am not so much in Web-servers
and https/2, but that was a little confusing.

~~~
kazuho
Sorry for the confusion.

HTTP and HTTPS (both version 1 and 2) are supported for downstream connections
(i.e. connection bet. H2O and web browsers). Only plain-text HTTP/1 is
supported for upstream connections (connection bet. H2O and web application
servers).

~~~
PythonicAlpha
Thank you very much for the answer!

To clarify for my mind:

Can I than use H2O to connect to a web-browser via HTTPS and H2O is routing
the same request upstream via HTTP to a web application server? (that of
course would suffice for me).

~~~
kazuho
Yes, assuming that you wanted to say: "web-browser connect to H2O via HTTPS".

~~~
PythonicAlpha
Of course! Excuse my bad English!

------
jvehent
That's a cool project. Performance is a fascinating topic.

However, in the real world, the number of requests per second a http daemon
can perform is the last thing to worry about. If the web is slow, it's not
because Apache used to be bloated with thread. It's because of bad
architecture: centralization of services, latency in page builds time, size of
static components, data store bottlenecks, etc...

Nevertheless, a very cool project. One I'll follow closely.

------
ams6110
Another one to keep an eye on might be the new httpd in OpenBSD.
[http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-
current/man8/...](http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-
current/man8/httpd.8)

I'm not seeing that there is yet a portable version however.

~~~
justincormack
It is not intended to be anything other than a minimal secure http 1 server,
not performant.

------
bkeroack
If you're relying on HTTP for all your microservices, you're doing it wrong.

~~~
nnx
Why? What's wrong with this approach?

~~~
bkeroack
Latency, for one. "Microservice" means reaching out over the network to
perform an action, not necessarily: make an HTTP connection, transfer JSON,
receive JSON back and deserialize. Microservices can just as readily be
performed via protocol buffers or Thrift over ZeroMQ, raw TCP/UDP, etc.

~~~
Iftheshoefits
You are correct, but missing a larger point: the ecosystem backing JSON-over-
HTTP is large and expansive, and as a result development for apps built on
that technology has effectively been commoditized.

~~~
simoncion
0) In the late 1990's to early 2000's, exactly the same thing would have been
said about XML. And yet, here we are...

1) So, you're saying that the fact that you can hire college freshman to
complete your project outweighs the issues introduced by using the wrong tool
for the job?

~~~
Iftheshoefits
0) technologies evolve over time. JSON-over-HTTP will be supplanted by
something, eventually, which almost certainly will also be something that acts
to commoditize labor

1) Not me, the market. I think it is silly, too.

------
dschiptsov
So, it has better string, pool allocators, zero-copy buffers and syscall
support than nginx/core/*.ch? That would be a mirracle.

------
Aldo_MX
I love this kind of projects which people receive with much skepticism, but
after some years will bring interesting improvements to us.

------
thresh
Hello there, can you share the performance test details? The configurations of
both servers, client software, hwserver setups.

Thanks!

------
haosdent
I couldn't understand why it could faster than Nginx? Maybe the way of
benchmark nginx in this case is wrong?

------
huhtenberg
That's a very good code. Succinct and readable. You clearly now your C well :)

------
xfalcox
Any plans to get script support, like nginx access_by_lua?

------
okpatil
It seems that everything mentioned in the library could be done with golang
easily. I am interested to see how H2O benchmarks with pure golang binaries.

------
caycep
whoa, and here i was thinking nginx was the be all end all of sweet sweet
blistering speed...

