Hacker News new | past | comments | ask | show | jobs | submit login
New version of Apache HTTP Server released (apache.org)
246 points by wyclif on Feb 21, 2012 | hide | past | favorite | 35 comments




From the announcement: "Performance on par, or better, than pure event-driven Web servers."

Is there any more information on how they arrived at this conclusion?


The second "Core Enhancement" is: "The Event MPM is no longer experimental but is now fully supported.". I would imagine it is because that module's inherent design offers the same underlying system overhead benefits that people are claiming with their event-driven servers.

The idea is that Apache features "multi-processing modules", which control how incoming connections are eventually translated into the state of the running program. These are not just simple tweaks to a fixed pipeline: it is possible to drastically alter the mechanism. As an example: mpm_winnt can be drastically different than the Unix implementations.

The original implementation was mpm_prefork, which maintains a pool of fork()'d processes to handle incoming connections. However, the more reasonable installations have used mpm_worker for years now, which keeps a pool of threads in each process (developed back when thread pools were the rage).

However, a while back someone built mpm_event, which uses the same underlying system calls that people use to build their "purely event-driven servers" to handle the sockets. This has been marked "experimental" for a while, initially due to bad interactions with other modules that weren't quite prepared to be evented (such as mod_ssl), but honestly it has been stable for years.

(I've been using it in production on a site with tens of millions of users, with SSL, and if anything mpm_event made things run better by working around a really irritating issue in Python: specifically, that until very very very recently the Python libraries simply failed to handle EINTR in a reasonable way; mpm_event seems to setup the signal and thread environment in a way that made me stop getting entire outgoing HTTP API requests to fail with EINTR ;P.)


Can you go into more detail about the right way to handle EINTR? When I wrote a server I wound up using the "if at first you get EINTR'd, try, try again".


I'm not entirely certain at what level you are asking, but if I understand your question: yes, "try, try again". There are some legitimate reasons why you might want to "handle" an EINTR (such as for canceling threads), but honestly they are super-advanced usage (and normally something abstracted by your threading library): 99.9999% of the time you called a blocking function, knew it might block, and intended it to block until it did something.

The problem with Python was that it was not retrying: it would just throw an EINTR i/o exception from its lowest-level read() bridge, and then most of the code that called that would not catch the exception. The result would be that you'd make an HTTP request and the top-level GET call would just throw "EINTR": it is not reasonable (or even semantically valid) to have to retry the entire HTTP request because a call to read() happened to fail while streaming the response.

This was sufficiently ludicrous (and sufficiently simple to fix) that I had a patch I'd apply to Python's core libraries every time I installed or upgraded my system. ... and, before anyone tries to call me on "patch or GTFO": someone filed a bug, with a working patch (although people tried to quibble with it; I believe incorrectly, and it certainly didn't change the result), in January of 2007, and it wasn't resolved until mid-2010. I didn't come up with the fix: I just pulled it out of the bug tracker.

http://bugs.python.org/issue1628205


Thanks for the history lesson and the thorough response. I'm going to sleep a little better tonight.


With mod_lua, embedded lua interpreter.

I just couldn't resist, I had to try it at once. It works! (no pun intented, for those who get the reference).

edit: fixed typo


It would be interesting to see benchmarks compared to nginx


Those would be pretty much meaningless...

Apache can be tweaked in so many different ways depending on what your traffic patterns look like, and how you're processing your requests. MaxClients / KeepAlive / MaxRequestsPerChild / etc...

e.g. you would have a completely different config for serving wordpress vs static images for a photo album.


> Apache can be tweaked in so many different ways

I can't stress this enough. Besides all the configuration settings, you can really dig into the internals.

Our production deployment of Apache is custom-compiled with something like 90% of the modules disabled. Only the basic stuff we use.

Each deployment is unique, making benchmarks pretty useless.

If you need to choose which httpd to use, decide based on how easily it'll fit in your stack. I promise you, if you hit the performance ceiling on any of the httpd suites out there, you'll have bigger fish to fry.


Each deployment is unique, making benchmarks pretty useless.

Not that unique, as in snowflake-unique. There are several classes of deployments, that's all. A lot of things are similar.

That said, benchmarks still say a lot, when the performance gets over some limit. A 10-50% speed improvement could be alleviated with a different config. Even a 100%. I doubt a 10x one could. Or 1/5 the memory usage. Or getting to 20,000 rps on the same otherwise setup, while the other server stuggles after 4000 reqs.


You've got think about all of the specifics though...

Here's apache tuned to serve static files of 50kb in length, with 20 requests per client, 500 uniques per hour...

vs.

Here's apache tuned to serve a wordpress blog that gets 500 uniques per hour and each client makes 50 requests (average time on site is 5 minutes)...

and then...

apache serving wordpress on a VPS vs apache serving wordpress on a single core vs apache serving wordpress on a quad-core..

and oh-wait... if we're talking php then let's consider prefork/FastCGI/yada yada yada

The "problem" (or as I prefer to look at it: feature) with Apache, is that it's been around for so long, that it can accommodate all of the above specific issues.


No matter how many people stress this very important point there will still be people crying over performance during sleepless nights spent thinking about the one in a million chance that their server will be burdened by some ungodly amount of traffic. I've even been guilty of it on servers that get 14 hits a day (and that's on a good day)!

I disagree that each deployment is unique though. In theory each deployment should be unique but most people just go with the defaults their hosting provider suggests, others don't have control over those setting (shared hosts) and still others just use the htaccess file that comes with html5BP and leave it at that. There's a ton of people That have a majority of the modules enabled but neither use them or know what they do too. I've been guilty of it before too. Maybe around these parts it's safe to say each deployment is unique but out in the wild you tend to see a lot of the same.


Plus it's pretty easy to saturate a 1 Gbit NIC with even a pretty close to OOTB Apache setup. You might burn 5-10% more CPU than if you were using nginx, but it's not that big a deal in 99% of the use cases.


Having a couple different common cases and configurations benchmarked wouldn't hurt...


Benchmarks are like porn. The real thing is always better. Figure out what your stack is, try using all the options available, and go with the one you like best.


I wouldn't base a decision on benchmarks alone, but can imagine myself using them to narrow down my own set of tools to benchmark against my workload.


Between this and PHP5.4, LAMP performance is finally progressing.


Although not untrue, this is also typical for the somewhat myopic HN-crowd way of looking at it.

For 99% of all LAMP implementations, the performance of the stack is not an issue.


Indeed, the performance bottleneck of a webapp is often the database. But isn't that what the 'M' in 'LAMP' stands for? MySQL? Now, if the 'M' would stand for memcached, things would be better. Maybe it's time for the 'LAMMP' stack ;-)


The performance bottleneck for virtually every webapp is the programmer. In other words, doing stupid shit. You can write a performant webapp using Perl and flat files if you wanted to.

The most important thing is to know how to use your stack. It doesn't matter what the stack actually is.


If you're webscale the M stands for MongoDB duh. ;)


The last time I tried mpm_event, PHP shat all over the place because of the threading model. Has this improved?


Yes. On PHP's end. Use PHP-FPM.


So no SPDY implementation then? Too bad.


There's http://code.google.com/p/mod-spdy/ -- not sure if it works with 2.4.


Micro Benchmark of Apache 2.4 vs Nginx 1.0: http://blog.causal.ch/2012/02/micro-benchmark-apache-24-vs-n...

Event MPM seems unreliable :-(


I thought I saw a squadron of pigs soar by this morning. Too little, too late, I think, though, as nginx & friends have devoured their market share over the years, and it's going to take a good year or two for distros to start supporting 2.4 - probably 7 or 8 years for RHEL!


Apache continues to hover around 65% of the market while its next nearest competitor, Microsoft, hovers around 15%. nginx is below 10% and climbing very slowly.

http://news.netcraft.com/archives/2012/02/07/february-2012-w...


It may be climbing slowly, but it is the only one climbing in all the categories.


Why wouldn't distros support it? I'm not sure what you mean by that. I don't know, your comment sounds more like you have an irrational preference for Nginx because it's what the cool kids use rather than having any real issues with Apache. I do hope that distros start including 2.4 right away in their package managers but even if they don't that doesn't mean we can't install it with just a few extra keystrokes.


Who cares for distros? Lots of people employ custom stacks.

You run the distro provided Nginx? Or the distro Ruby/Python/etc?

Really?


Agreed. I compile and package just about every piece of server software by hand. There almost never a case where my ./configure options match the default options.


No, we package up all of our own debs, as waiting for distro maintainers is like waiting for Godot. My point was that I honestly believe Apache's days are numbered, as their release cycle is practically moribund.


brilliant. Shouldn't you be catching up on your codeacademy classes?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: