
H2O – an optimized HTTP server and library implementation - dochtman
https://github.com/kazuho/h2o
======
kazuho
As the author of H2O, I would like to leave a comment here.

First of all, the development is still in early stages, please forgive me for
the lack of documentation.

That said, below are some high-level design issues that are not addressed by
the README.

\- it is designed to be pluggable (though not as flexible as Apache), by
providing virtual-host-based configuration of generators, output filters and
loggers

    
    
      - lib/file.c and lib/chunked.c would be a good place to understand their interface if you are interested
    
      - I am (at the moment) not interested in providing support for input filters
    

\- performance of HTTP parser is likely going to become importance since the
ratio of small files being served is expected to increase in HTTP/2; H2O is
designed with that in mind

~~~
tinco
Why is H2O faster than nginx? I've read the source of nginx and it does
attempt to be very clean and tightly optimized, which makes me suspicious of
H2O. Is there some big architectural advantage or is it less robust in some
way?

~~~
kazuho
Please read my other comment here
[https://news.ycombinator.com/item?id=8342684](https://news.ycombinator.com/item?id=8342684)

------
jedisct1
I don't trust any benchmark using ab. Use Tsung or Siege.

That said, this library is very clean and easy to use. I just wish it had
high-level functions to create sockets, but that's a minor detail.

Good job by Kazuho, as usual.

~~~
escaped_hn
It uses libuv which you can use to create sockets.
[http://nikhilm.github.io/uvbook/networking.html#tcp](http://nikhilm.github.io/uvbook/networking.html#tcp)

------
paraboul
Looks impressive though benchmarking with "ab" should be taken with a grain of
salt.

nginx does a lot of optimization at many level that ab can't figure out.

In short, it only profiles the speed of the HTTP parser and certainly not the
network stack.

There are a lot of things that a HTTP server does in order to keep the
connections stack "steady" : Disabling the Nagle algorithm at the right
moment, gracefully handling failure, managing slow clients the right way, ...

FWIW, Tsung is an awesome benchmarking tool written in erlang :
[http://tsung.erlang-projects.org/](http://tsung.erlang-projects.org/)

~~~
justincormack
ab is also very slow, it can be the limiting factor in the benchmarking.

But the nginx http parser is one of the things that is fast, so a faster one
is interesting.

I think having a library implementation is very useful as well.

~~~
kazuho
ab has not been the bottleneck in the benchmarks since both servers were
pinned to a single core. It is true that ab is too slow for benchmarking a
HTTP server running on multiple CPU cores, but it is at least faster than a
server only using one core.

Regarding the performance of the HTTP parser, I have heard that picohttpparser
(the HTTP/1 parser used by H2O) is much faster than the HTTP parser used by
nginx.
[https://github.com/kazuho/picohttpparser](https://github.com/kazuho/picohttpparser)

If that is true, it is likely due to the difference between the approaches the
parsers take. Unlike most parsers, picohttpparser does not have a callback-
based API. Instead it uses a loop for parsing the HTTP request and header
lines.

note: This comment might be biased since I am the author of H2O and
picohttpparser.

~~~
FooBarWidget
I believe you when you say that picohttpparser is faster. But is it also
_correct_ and _secure_? Is it fully RFC7230 compliant? Are there no parsing
bugs that would allow a security exploit?

~~~
kazuho
It is hard to argue the secureness of a software, but I would say that if
there are problems in the library it would be taken seriously.

picohttpparser is the core of HTTP::Parser::XS which is used by many HTTP
application servers for Perl (see
[http://plackperl.org](http://plackperl.org)). There are many deployments
using it.

------
SwellJoe
I think it's interesting that performance is mentioned as a primary feature of
this. I haven't deployed a web service in nearly 15 years that had the HTTP
server as the limiting factor for the performance of a website. Serving small
files from RAM really fast just isn't that interesting. For this kind of
workload Apache is fast enough for the vast majority of deployments (such a
large majority that I've never worked on a deployment that it wasn't, and I've
worked on some very large deployments), nginx is fast enough, and H2O is fast
enough. Going from "fast enough" to "2x fast enough" isn't going to matter to
end users who are waiting on other parts of the system.

The database is still gonna be a bottleneck. The application is still gonna be
a bottleneck. The disks are still gonna be a bottleneck. The network is still
gonna be a bottleneck (Apache can saturate a Gbit NIC, so can nginx, so can
H2O).

I'm not saying this is useless. It looks like a cool project, built by someone
really clever (unco is hilariously clever), with lots of good uses (an easily
embedded HTTP server library is nothing to sneeze at). I'm saying I think it's
weird and unfortunate that so many people focus on performance of the web
server, as though it will make a difference for end users. In the vast
majority of web server deployments any of the major web servers will do the
job and will perform well enough to not be the bottleneck in the system.

~~~
lpgauth
Really depends on the scale you're at, but sometimes a couple CPU percent can
equal saving a bunch of servers. Obviously the rest of the stack (application,
db) has to also be fined tuned or else it's just pre-optimization.

~~~
SwellJoe
Yes, if you have 50 web servers, saving a couple of CPU % can save a server.
That said, I've never worked on a deployment that had 50 web servers, even
though I've worked on a couple of major large city newspaper sites, a support
site for the #2 or #3 most popular personal computer manufacturer (I don't
remember what their rank was at the time), a free hosting provider that had
~12 million websites, and a handful of other interestingly large sites.

Given that a single web server (Apache or otherwise) can serve millions of
pages per hour, it's pretty rarefied air to be talking about 50 or 100 or more
web servers. There just aren't a lot of people working on sites with that kind
of traffic.

I'm really not saying better performing, more efficient, web servers aren't a
good thing. It's great that web servers (including Apache) continue to get
faster and more efficient. I just don't think it should be the primary thing
we're talking about when comparing new servers to tested and proven existing
web servers. There are so many other factors, and performance doesn't matter
at all if the web server doesn't do what you need it to do, or is insecure, or
is unreliable.

~~~
staunch
Apache was always a badly designed HTTP server. Forking processes for every
request was a stupid idea for any public web server (see also: CGI). It's the
only reason web servers got Slashdotted back then and why they rarely do now.
Forking processes is incredibly expensive compared to an event loop. Event-
based HTTP servers inevitably won, it just took a while. Using select() wasn't
ideal but then epoll() made things optimal.

The idea that Apache was ever good enough was wrong then and is wrong now. We
always needed a more efficient design.

I'm sure someone could write an even more somewhat more efficient version of
nginx but I'm not sure they could do it with as many features as nginx has.
Which means you probably would just end up switching to nginx at some point. I
think Cloudflare uses nginx. They probably could save on machines if they used
a more minimal server.

~~~
SwellJoe
So, you don't know enough to know that Apache is not tied to any particular
concurrency model, nor has it used the concurrency implementation you've
described (process-per-connection) in decades, yet you feel you know enough to
make recommendations for web servers? Even if using Apache in the now quite
old prefork MPM, it is a process pool, rather than a process-per-connection
that you've described, and is not subject to the performance problems you're
alleging (memory usage can be high when using that MPM for high concurrency
workloads, but its performance in most deployments is not all that bad).

Further, "forking processes" is actually _not_ incredibly expensive on Linux,
and in fact, pthreads on Linux are implemented by the same code (clone() with
varying levels of sharing). Forked applications on Linux are pretty much just
as fast as threaded applications. It is a myth based on extremely outdated
knowledge (fork on Solaris, for instance, and some other UNIX variants, had a
history of being slow; but, Linux has always had a very fast fork).

It is true that event-based thread (or process) pool concurrency
implementations can provide superior performance to thread- or process-per-
connection implementations, for a variety of reasons, but Apache has that
covered. I'm gonna guess you've never even used or seen an Apache installation
that forked a process for every request (because it's been so long since that
was a thing Apache did), so I'm not sure how you could believe it works that
way.

Where did you get all of these assertions from? Are there sites out there
propagating these crazy claims about Apache? And, if so, why? What does one
gain by trash-talking a project that was instrumental in helping build the
open web and still powers more websites than any other web server in the
world? And, does it well, I might add. There are some good reasons a
reasonable admin might choose nginx over Apache. But, they aren't because
Apache is a terrible piece of software written by incompetent people.

In short, your comment has negative value, by providing misleading and
outright incorrect information.

Edit: And, this is why I hate it when performance is the measuring stick
people use to discuss web servers. It begins to seem like it is a useful
metric for comparing web servers, when it really is not for 99% (or more) of
deployments. Apache is fast enough. nginx is fast enough. Pick your web server
based on other characteristics, because otherwise you're almost certainly
making decisions based on the wrong things.

~~~
staunch
You're arguing against several strawmen of your own invention. Preforking is
still forking, for one. Under heavy load you're forking to keep up with new
connections while existing processes are tied up very slowly serving responses
to bandwidth constrained clients. Also, I never claimed forking is expensive.
I said it's expensive compared to an event loop. Correcting things I didn't
say might feel good but you're just lying to yourself. Regurgitating what
you've read about how Linux processes and clone() works is stereotypical
sysadmin bloviating.

Several other things you say are equally wrong. Claiming that Apache has been
away from a process model for decades is dishonest. It's not even decades old
and the Apache project itself contradicts you.

Preforking is still recommended for "sites requiring stability" and is in
(most?) common usage
[http://httpd.apache.org/docs/current/mpm.html](http://httpd.apache.org/docs/current/mpm.html)

> _The server can be better customized for the needs of the particular site.
> For example, sites that need a great deal of scalability can choose to use a
> threaded MPM like worker or event, while sites requiring stability or
> compatibility with older software can use a prefork._

Your religious devotion to Apache and claim that "Apache is fast enough"
ignores the reality of what nginx can do with so much less CPU and memory.
It's not a technical argument but a religious one. Nginx is good enough.
Apache is not. It never was, people have lived with it for too long because
people like yourself buried their heads in the sand. You're still doing it.

~~~
cbsmith
> Preforking is still forking, for one. Under heavy load you're forking to
> keep up with new connections while existing processes are tied up very
> slowly serving responses to bandwidth constrained clients.

No, that's not strictly true. If you have min & max servers fixed at the same
value and the max requests per child set absurdly high, you won't fork much if
at all under heavy load. Pre-forking _can_ result in a lot of forking and
requests being blocked while you fork if a) you have a surge in traffic and b)
your min servers isn't set high enough to cover the surge or alternatively c)
your max requests per child is low enough that you are constantly having
processes exit.

> I never claimed forking is expensive. I said it's expensive compared to an
> event loop.

Agreed, although that's kind of a meaningless statement (an event loop is
something you go through on a per-request basis), and misses the real problem
with the multi-process model: virtual address space for each process.

> It's not a technical argument but a religious one. Nginx is good enough.
> Apache is not.

Umm... that sounds like a religious argument in its own right. Apache is
certainly good enough for plenty of people, and more importantly with all the
dynamic content on sites, the web server tends to be a pretty unimportant
factor in the performance of many sites. Apache brings other things to the
table which are often valued for projects, and there is no reason that needs
to be considered a "religious" decision.

------
p1mrx
Looks like it doesn't support IPv6.

src/main.c: conf->fd = socket(AF_INET, SOCK_STREAM, 0)

~~~
justincormack
Although that is the example server, maybe the library does?

------
billyhoffman
As happy as I am so see open source libraries that can support SPDY/HTTP2,
these "2x faster than nginx" stats are a joke. This dubious claim is based on
how fast H20 and nginx can serve a 6 byte, and a 4KB response

------
zokier
I hope to see it in TechEmpowers benchmarks:
[http://www.techempower.com/benchmarks/](http://www.techempower.com/benchmarks/)

------
sbarre
Does this server/library allow the same kind of hooks and configuration that a
traditional web server allows?

Or is that the point? If you don't need all the configuration and features of
an off-the-shelf web server, you can more easily custom-build an H20 HTTP
server for your specific needs that is blazing fast?

------
marktangotango
Very nice, I've often wondered at the lack of embeddable http servers in the
c/c++ world. Are there any other libraries that do the same thing? What is the
status of this project? Is it being used in production anywhere?

~~~
fredliu
Another example is Mongoose. It's small, easy to config and embeddable(I've
used Mongoose on android with NDK with no problem). Unlike H2O, Mongoose is
using an old school one-thread-per-request approach, so I would expect the
throughput won't be as high as H2O, which seems to be using an event based
approach similar to ngnix. Be aware of the license of Mongoose though, it used
to be MIT (or apapche?), but now it's GPLv2.

~~~
nodivbyzero
[https://code.google.com/p/mongoose/](https://code.google.com/p/mongoose/)

