Hacker News new | past | comments | ask | show | jobs | submit login
Mongrel2 1.0 released. (sheddingbikes.com)
145 points by ryan-allen on Sept 2, 2010 | hide | past | web | favorite | 76 comments

I've put up a GH repository of mongrel2 with its dependencies included. Compiling and installing should be as simple as typing:

  make all install PREFIX=$HOME/opt/mongrel2

First, thanks!

From your commit log:

> Disable target install-py for now, it requires sudo.

Agreed: re: disabling app that needs root unnecessarily. I don't want anything fucking with my OS python.

Have you considered having it simply call pip instead, so it installs to the current virtualenv?

I hadn't but I probably should have. Reading through the pip documentation as we speak - if you have some quick pointers, don't hesitate to post them.

We should revist this. It's mostly for convenience since lots of people just put the m2sh stuff wherever everything else goes. Problem is that screws up virtualenv folks.

Any ideas on it other than disabling?

everything else goes into virtualenvs or user bin/pythonpath; python stuff is unfortunately too fragile to put everything in one place.

It's not so much about Python being fragile, it's more that the OS - Linux in particular - uses Python itself, and modules can modify the behavior and break it. (same things if you used Puppet to manage your configs and broke your OS Ruby).

Keeping everything in one dir also makes deployment way simpler.

How will Zed merge your changes back to his fossil repo, and how will you merge his changes back to git?

I'm not sure I understand the motivation for putting the code on github unless changes can flow both ways.

It'd be nice if someone wrote a fossil<->git bridge, since there are (multiple) foo<->git bridges for most other values of foo: https://git.wiki.kernel.org/index.php/Interfaces,_frontends,...

Actually there's already a few git mirrors I'll pull from if people tell me to. I then have a few branches from the fossil repo that have the git stuff in them. Works pretty easily actually.

But, I really should just follow this so I can get the changes as an email.

I am simply impressed at how much sh*t he gets done. I honestly want to be more like him

you should probably spend less time here ;)

I actually think it's fine to spend time on HN, you just can't also spend tons of time on tons of other sites as well.

Based on anecdotal evidence only, it seems the rate at which articles move up and down the HN frontpage pretty much ensures you'll see everything that hits the front page per day if you check HN twice - once in the morning and once in the late afternoon or evening.

I don't think that's too much HN browsing to prevent you from being uber productive, as long as you're not also browsing tons of other sites on the net too.

or having discussions in the comments

I've read the introduction and I still don't quite know what the fuss is all about.

Perhaps some examples of its benefits right in the introduction?

First of all, there's a fuss because it's written by zedshaw. Secondly, zedshaw also authored mongrel, a well-known web server that many people have taken the ragel-based parser from to build their own stuff. Third, everyone's been talking about this whole 'sqlite as configuration' thing, which could be awesome or horrible, the jury's still out. Finally, it's always interesting to see new projects in 'solved' areas. How long has Apache been around? How well is it tested? It obviously works, but could it be better? Zed's trying out new ideas, has a crazy amount of test coverage, and is starting out fresh.

I haven't tried it yet. But it's certainly interesting.

what your'e saying is that there isn't yet a compelling reason for most people to try it.

Sounds like there are a ton of compelling reasons to try it.

As someone who runs multiple sites and who doesn't have a major problem with apache, what here should compel me to try it?

Ragel parser? Um, why should I care if I'm not going to dig into the code myself?

because zedshaw is so fucking awesome? Not a compelling reason to try a web server to me, is it to you?

sqlite configuration? Maybe, but that's not super compelling to me at the moment.

despite my sarcasm, my question is sincere. I'm curious how mongrel2 could be an improvement over what is currently used, which for me is mostly apache

Any other reasons?

I'm a firm believer that if it ain't broke don't fix it. If you've got multiple sites working with Apache and you know how to keep that going smooth, then stick with that. Especially if you're doing small to medium sites in say PHP. PHP and Apache are kind of the king and queen of the realm.

What Mongrel2 has over Apache is it's ability to run async jssockets, HTTP long poll style work, and regular HTTP at the same time. If you hit an application where you want to do some async socket type stuff, that's when you should look at Mongrel2.

Of course, that's not all it does, but if you already love your Apache and it's crazy config file format and weirdo 1995 style syntax, then Mongrel2's only addition is the extra protocols.

thanks, that makes perfect sense. No, I don't love apache's crazy config file format, but I don't update the config often and it's mostly copy and paste when I need to add something new. It's rails with passenger which is pretty easy.

I'll definitely check out the async stuff, that is compelling!

The name of the game is flexibility.

Configure your web server in any language you want (so long as it has sqlite bindings).

Serve your web pages from any back-end you want (so long as it has zeromq).

Has it turns out, zeromq and sqlite are low barriers to entry for any language.

You're right though, in that you can do all of this with apache, if you're willing to write the right config generator and install (or create) mod_*. Mongrel2 just makes it easy.

edit: Also, what zed said :)

Hmm, I'll think about that one too. Truth is, I just built it, so kind of a cobbler's shoes problem in that I've had little time to actually use it other than to run mongrel2.org with it. So far the people who are playing with it talk about how easy it is to work with and deploy one of the 10 programming language platforms using ZeroMQ.

I saw a demo of Mongrel2, and it's really slick. Lots of new concepts to get behind, which is both scary and exciting.

There's no way I can use it for anything remotely close to production but I have some hobby ideas baking...

Thanks. Let me know what you do and shoot me bug reports on anything you find.

According to the Mongrel2 Book (http://mongrel2.org/doc/tip/docs/manual/book.wiki#x1-560005....), there is no pipelining of HTTP requests. Without testing anything for myself, I'm really surprised at this; are the perf gains from pipelining really as negligible as the book says they are?

I have read many blog posts from Zed and seen some talks about using statistics evidence (R project) to measure performance (http://zedshaw.com/essays/programmer_stats.html). If he says they are negligible he probably measured them, and they probably are.

There's no pipelining, there's keep-alives though. It turns out pipelining is really bad for servers that talk to backend applications because it allows clients to a large number of requests to be handled and then close the connection. The server could have then sent all the requests to backends or needs to buffer them. In that case, even though the client has closed the connection, the server(s) still need to process all the requests.

In fact, there's been discussions on the httpbis list to downplay or remove HTTP pipelining since it's ambiguous and causes performance headaches.

That whole 'Note 8' sounds pretty stupid.

"In Mongrel2 we use a parser that rejects invalid requests from first basic principles using technology that's 30 years old and backed by solid mathematics."

The mind boggles... (Unless he's just being satirical here).

Personally, my advice to anyone thinking of using mongrel2 would be to write your own webserver. You'll learn far more than you ever dreamt. If a webserver isn't that important to your success/failure, use a battle-worn webserver - apache etc

Well, it is true. He's validating it with a grammar, not some adhoc parser implementation.

Can you explain why 'validating it with grammar' is better than 'some adhoc parser'?

IMHO a webserver is one of the things you want to be relaxed, laid back, and basically not care if the client gets things wrong. Just serve up what they look like they wanted.

Many of the security attacks on Apache (and other browesers) were based on invalid requests.

The original Mongrel was known for its powerful request handler that didn't let many of the same security attacks through. In fact, in the Ruby world, many other non-mongrel web servers used the mongrel handler for that very reason.

As long as you don't get into the algorithms it's pretty simple.

A hand written http parser is kind of like writing a "white-list" of what the server rejects. Since there's no algorithm backing it the only thing you can do is list out all the things you can think of or have run into that is "wrong".

Using a parser (well lexer really) like Ragel I can make something that's relaxed, but it's more of a white-list of what it accepts. The algorithm explictily says this particular set of characters in this grammar is all that I'll answer to.

If you then write the grammar so that it handles 99% of the requests you run into in the wild, you get the same relaxed quality as a hand written one, but it explicitly drops the 1% that are invalid or usually hacks.

This is also the same parser that's power a large number of web servers in multiple languages, so it's proven to work.

My mind is still boggling over how you make a simple HTTP request parser so complex sounding.

Take a look at the Mongrel/Mongrel2 Ragel grammar and compare it to a hand-written request parser. You might be surprised which is complex.

Yeah I just did. Mongrel2 looks overly complex. But then it is C...

Yeah, that is what Microsoft did (does). The result is most of the requests "look like" a desire to serve up viruses or spam.

A grammar is theoretically provable (yes, that is a double entendre). An ad-hoc implementation is not provable and exhaustively testing its validity is unrealistic for anything but trivial grammars.

And in laymans terms?...

Sorry but I'm even more confused now. What are we proving?

"The result is most of the requests "look like" a desire to serve up viruses or spam."

I have no idea what you mean by that.

HTTP is a trivial grammar. The parser is the simple bit. What you do with the headers and how you respond to them is the more interesting bit.

Why would rejecting invalid requests be desirable? Why not just serve up what we think they want? (Of course there's levels of 'invalid'. Reject the crazies, but allow some).

With a parser that implements a grammar, you can prove that (a) it accepts every string that is valid as defined by the grammar and (b) it rejects every string that is invalid. The specifying of a grammar is relatively straight-forward (hopefully). Proving that an ad-hoc parser does (a) and (b) is nearly impossible.

Ad-hoc parsers can be shown to accept all "OK" strings that somebody used to test the parser and can be shown to reject all "not OK" strings that somebody used to test the parser.[1] "The problem with idiots (and black-hats) is that they are so ingenious." The only way to prove that an ad-hoc parser is truly correct is to run all possible strings through it, complete with a-priory knowledge of which strings are OK and which are to be rejected. This is an O(infinite) problem (i.e. the halting problem http://en.wikipedia.org/wiki/Halting_problem).

Guessing intent is a wormhole: how close does the request need to be? What if you guess wrong?

The combination of ad-hoc parsers with guessing intent is a potent way to introduce security flaws in your program. In the case of a web server, the "attack surface" is the whole internet, i.e. there is a huge number of idiots and black-hats that could potentially attack your program.

[1] War story: in a previous life, the company decided they needed to have a custom code standards checker program (a result of a chain of four or five decisions, all of them really stupid, but that is a different war story). They contracted out the creation of the program, complete with a requirement that the contractee company write the test cases (fox in the hen house). The program was a POS (how did you know that was coming???).

When I looked at the test cases: they had one "positive" (i.e. catches a "bad" construct) test case and NO "negative" (i.e. does not have false positive) test cases. As a result, when run on real code, the "standards checking" program was actively sabotaging good code!

Here's why I dislike it. You shouldn't rely on the HTTP request parser to save you if you have security issues elsewhere.

The HTTP parser is simple enough to not have any concerns in itself if written properly.

You should fix the security issues.

> You shouldn't rely on the HTTP request parser to save you if you have security issues elsewhere

This doesn't make sense. Why should a particular piece of the application not be coded with security in mind?

> You should fix the security issues.

One part of this is sanitizing user input. Why would you not do this as early as possible?

Because it's inefficient unnecessary overhead for static requests.

The place to block application specific hacky looking requests isn't in the general HTTP request parser. It's in the 'application specific' stuff.

The headers and such for even the most static requests still get used all over — dispatch, caches, logging, etc. The overhead is minuscule, especially compared to a hand-rolled parser that's literate enough to be maintainable.

And the purpose isn't to "block application specific hacky looking requests", it only does that as a side-effect — this isn't some inane IDS bullshit sold to PHBs. It's not looking for exploit signatures, it just sanitizes all input as a consequence of correctness.

It's quite simple really. Do you like your compiler? Or would you rather write code and hope for the best? Compilers work because they have a formal grammar of what is and is not acceptable in the programming language. This same principle is being applied to handling web requests. We have a standard - HTTP - and any requests that don't conform to the protocol are immediately rejected by Mongrel2. Since many attacks against web servers involve sending improper web requests, this sort of approach simply rejects those requests and doesn't even begin to process them. This certainly doesn't prevent Mongrel2 from implementing proper security at other appropriate places in the code. It simply stops a whole lot of potential exploits before they start.

Yeah that's not really comparable.

HTTP requests come from millions of different browsers. Some with bugs, some with idiot creators, etc etc.

My point was that an HTTP request parser is trivial to write correctly. What you do with the headers and request later on are where sometimes you need to be careful.

TBH Though I think I'm just in a different world to all of this mongrel stuff.

Mongrel2 uses a Ragel-based parser (Zed already used that technique with the first Mongrel).

very few proxies support pipelining. You shouldn't be surprised at all. Try implementing pipelining in proxies and you'll figure that it isn't worth the effort.

I'm missing the boat on how it's more language agnostic than other web servers (I'm most familiar with Apache)?

EDIT: I see now that it's "language agnostic" based on using ZeroMQ to shuttle request/responses between Mongrel2 and a language with a ZeroMQ library. Sounds cool, but also makes me wonder exactly how that's different from someone writing, say, mod_zeromq for Apache, and attaching handlers to ZeroMQ in the same way Mongrel2 does? Am I missing something?

Apache's architecture is such that it's very bound to "files" and a strict request/response message pattern. That makes things like long polling, async sockets, and even streaming say ICY (check our mp3stream demo) really difficult.

In Mongrel2 it's kind of like every thing's long poll or an async socket. That let's you do a ton of very cool things you can't do easily in Apache or other web servers. Sure you could hack them in, but it's a nightmare.

I think I get it, thanks. Mongrel2 is better at being language agnostic, by virtue of intentional design. Looking forward to trying it out!

Mongrel2 specializes. This means way less cruft. It should also be much more performant in this role than Apache [citation needed], because Apache has sooooo much other stuff.

Congrats on the release, Zed!

Thanks man. Can't wait to get to the next level with it.

I believe the potential of Mongrel2 lies in real-time web applications.

Want to write a backend in PHP for a real-time chat? That is very difficult with Apache and mod_php. With Mongrel2 and ZeroMQ, it is almost trivial.

Mongrel2's asynchronous pub/sub networking paradigm opens up possibilities for real-time communication to browsers. When websockets get real, Mongrel2 may very well be the way we all start using them.

Where Mongrel2 has some ground to cover is handling traditional server-generated pages. Want to run an existing PHP application with a framework like CodeIgniter? You can't do it from Mongrel2 without proxying to nginx or apache (yet). Hopefully this is something that will come along in Mongrel2 v2.0.

I have an old idea regarding webservers: peer-to-peer file transfer. The server would not store the file on disk, only the last transferred block in memory. Rate control would be done at TCP level: while the block is not transferred by the receiver, the server would not read again from the sender.

I have written several HTTP servers myself, and I know it's doable, but the world does not need another webserver that only has this distinguishing feature.

My question is if all of this can be done with Mongrel2?

If you're not immediately grokking the significance of Mongrel2 (as I did not), be sure to check out the documentation's Introduction:


The features sound cool and significantly evolved past current webservers. And the documentation is extensive and so far, an enjoyable read - now that's impressive. :)

So is this designed to replace apache/nginx, or should i throw it behind an nginx upstream still and let nginx serve the files?

it can do either, but i think it's designed to do the former.

Sweet stuff. Arnor's got a new toy! Good job Zed.

Good work Zed!

I have no doubt that for what it is this is a well designed piece of software, but, I'm gong to stick my neck out and say that purely for reasons of scalability a web server written in Ruby just doesn't seem like a sensible choice.

Mongrel2 isn't written in Ruby. It's written in C: http://mongrel2.org/doc/tip/docs/manual/book.wiki#x1-110002

In which case I am remiss, although at time if writing it says here http://en.wikipedia.org/wiki/Mongrel_(web_server) "Mongrel is an open-source HTTP library and web server written in Ruby by Zed Shaw.".

Mongrel and Mongrel2 are two very different projects that have very little beyond their name, author and the fact that they both are servers in common. They share no code and have very different designs and goals.

I believe they share some code (http parser)

which is written in Ragel and compiled to a C state machine, if I remember correctly

Well Ragel supports Ruby code output as well (it can generate any of C, C++, Objective-C, D, Java and Ruby). But I'd be surprised if Mongrel used that.

Some people do that. It actually was used in jruby, and I think a couple projects generated the raw ruby rather than the C version. Uhm, I want to say Rubinius?

i don't doubt that, why would ragel generate Ruby code if nobody used it? I just doubt mongrel used ruby ragel-generated code.

Mongrel 1.x is not related to Mongrel2.

Mongrel was written by Zed in Ruby. Mongrel2 is a new project.

Mongrel2 is a rewrite to be a scalable webserver that's language neutral. Think Apache (without all the pain).

/this note is me sharing what I've learned from zed's tweets, I've not played with the software yet, please correct if wrong.

Twitter used to use Mongrel, but replaced it with Unicorn (uh, another Ruby based server :))


Unicorn is based on the mongrel source

Well they're using Scala in production now, so if they wanted to use Mongrel2 messaging, they probably can.

They're using it for back end stuff, the front is still Ruby. To my understanding.

Is this a quote from someone else?

Also, "does seem like a sensible choice"?

Apart from that, it's completely written in C and uses language-agnostic ZeroMQ to pass "requests" back to the handler than can be written in any language with a ZeroMQ library (pretty much all of them).

I thought it was mainly C?

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact