Hacker News new | past | comments | ask | show | jobs | submit login
Diesel: How Python Does Comet (dieselweb.org)
44 points by z8000 on Sept 23, 2009 | hide | past | favorite | 29 comments

Let me be the first to get out of the way all of the complaints:

1. Why didn't they just use Twisted? 2. Why didn't they predict tornado and make it better? 3. Yeah, but (tornado|twisted|my half baked implementation) is still faster at serving 'hello world' than diesel.


I for one welcome this new breed of choice in this space. I'm interested in "real-timing" an application I'm currently working on, so having more choices as to which way to implement it is great.

Actually, "Why not Twisted?" is still a good question. If they have a good reason, let's hear it. If you want to choose between this and Twisted, that's information you need.

One huge Twisted advantage is that you can have your async comet reactors running in the same process as other Twister reactors, which can be very useful for many applications you might want Comet for. What's the payoff for using this instead?

BTW: These are straight questions, not sarcastic questions. I am perfectly willing to extend benefit of the doubt and assume there are good answers. I just don't know what they are, and I'd like to know, as I am potentially looking to use some stuff like this myself in the near future.

It's a styles thing.

Twisted is a very capable system, very well written, with all kinds of capabilities that diesel doesn't have and never will. So it's not a capabilities thing.

It's written by a team of guys who are crazy smart, and who know async better than most people on the planet, me included. So it's not a competence thing.

The performance difference between twisted and most other Python async libraries isn't going to be significant. We're all mostly benchmarking epoll or select or whathaveyou, plus a thin layer of frames to get from there to your application code. So it's really not a performance thing either, though hackers love to talk about this anyway.

It's a style thing. Twisted doesn't feel "Pythonic" to me. It doesn't have that succinctness that makes you say "shit, I'm done already?" It's got lots of big-A architecture that's very Correct, but in practice, leaves you wondering why you're being burdened with it when that rainy day when it pays dividends never comes. Purity is allowed to triumph over practicality at every turn.

That's one man's take. I don't claim it's the universal truth, and Twisted has many very happy users. But I suspect I'm not alone in this assessment.

That being said, I will relish the day when someone can do _something_ in the Python async space and not immediately be thrust into the ring as the latest combatant in the twisted-vs-the-new-guy debate. New guys never stop coming, and that's a wonderful thing for the evolution of computing and programming.

Twisted creates confusing, bloated code.

Every time you have to do asynchronous work, you have to mentally follow the chain of deferreds to ensure you understand exactly what code will be called at what time. Coupled with its unpythonic, deep hierarchical API, Twisted almost literally twists your code into a jumbled blob.

Unless you need the existing protocol-specific APIs twisted provides, stay far, far away. Find something more lightweight, like this, or Eventlet, or something else that lets your code remain concise and clear.

This comes from my experience writing a Twisted server for my startup, and subsequently rewriting it when the Twisted version became far too difficult and frustrating to maintain.

I'm sorry, but this post is just packed full of FUD.

I hear this all the time, "argh, Deferreds are so hard to understand," as if there's some kind of magic going on in a Deferred. Deferreds are just a way to encapsulate two callbacks, one for errors and another for successful callbacks.

If you're writing any kind of significantly complicated asynchronous process, what is the difference between following a bunch of Deferreds and following a bunch of callbacks??

Asynchronous coding is hard. There's no framework that is going to make that go away. Diesel looks nice enough, but although it shows promise, there's not enough examples of hardcore usage to make a sufficient comparison. An asynchronous IO framework that has nothing but HTTP and echo examples doesn't even scratch the surface of the kinds of apps that require the advanced asynchrony available in Twisted.

If you're writing any kind of significantly complicated asynchronous process, what is the difference between following a bunch of Deferreds and following a bunch of callbacks??

The difference is that in co-routine based code (such as concurrence or eventlet) there are no callbacks. Co-routines are all about message passing which leads to much cleaner and much less code in most applications.

It's not uncommon to see 40%+ boilerplate callback-handling code in a twisted app. Those are not needed in a coro-environment.

Asynchronous coding is hard.

Actually the established patterns for implementing asynchronous code are hard. You get to choose between threading-hell (deadlocks, races) and callback-hell (code-bloat). I have no idea why these abstractions grew so popular, my guess would be because they are "closest to the metal".

But they are not without alternatives; co-routines, actor-based concurrency and erlang exist - all of which enable fairly straightforward "Do what I mean" development.

Eh. I disagree.

I've been including Eventlet in a lot of my comments lately, but not because I have any affiliation with it (I don't). But it has quite literally made asynchronous coding so much easier that going back to deferreds or callbacks is a huge inconvenience.

My comments previously reflect equally on deferreds and callback-based asynchronous programming. Of course those make async programming hard.

It doesn't have to be that hard.

Interesting, this is the first Python async/comet library I've seen to take advantage of generator coroutines:

    def echo(remote_addr):
        their_message = yield until_eol()
        yield "you said: %s\r\n" % their_message.strip()

There are actually quite a number of (yes, relatively dead) projects that do this or something very similar: eventlet, gevent, cogen, concurrence, weightless, ...

Just for the record, eventlet and concurrence are not dead, I even used the latter in production.

They both have a fairly small audience but when you look at their mailing lists you'll see activity.

Twisted does it with inlineCallback

When I think of "comet" I think of "something that gives my apps the ability 'push' to browsers [even if it's really long polling under the hood]." Did I miss the browser integration story here, or are we calling all asynchronous apis "comet" now?

Yeah, we didn't get a chance to change that headline before the project leaked out there. The next two phases of the project will get higher-level and build out our "real" goal: a bitchin' comet framework.

But you're right, it doesn't have much to do with comet at this point.

I just think it's funny how they compare it to the 1 week old tornado rather than twisted. Way to jump on the bandwagon. I predict a few more asynch libraries going open source in the not so distant future.

It does seem a bit messy to read, at least compared to eventlet/gevent which does some nice monkey-patching of the socket API.

a lot of pythonistas consider monkey-patching to be exceptionally poor taste.

If it's used to unnecessary change language or API behavior - not for justified performance improvment.

When will people start to understand that req/s is not the full picture of how well a Comet webserver scales :/

Hi, author of diesel here.

I can only assume you're probably referring to some of these things:

1. Response latency, esp at 99%-ish mark, under load 2. Memory usage per connection under many idle connections 3. Scalability wrt. data sharing, backing persistent data, replication/redundancy strategies, etc

I'm not sure if you were referring to us, the diesel authors, when you indicated that someone didn't understand something about Comet scaling, but I assure you that diesel does 1 and 2 quite nicely, as do most sensibly-written things based on epoll, kqueue, etc. The benchmark page is with 1k concurrent connections, and it does well with more than that, too.

If you're referring to item 3, I'm afraid diesel doesn't tackle that element of scalability yet. It's more of an I/O library, not really a framework with aspirations of providing high availability. We have some plans and quite a bit of mostly-working code that implements a paxos-based framework for achieving those goals as well, but release of that part of the framework is some months out.

Writing good unit tests for this stuff is a higher priority--unit testing async code is a PITA. We're probably going to need to steal ideas from twisted.trial or something.

Thanks for checking out diesel.

3. Scalability wrt. data sharing, backing persistent data, replication/redundancy strategies, etc


We have some plans and quite a bit of mostly-working code that implements a paxos-based framework

Don't. Seriously, don't go that route, it's a huge waste of time. Instead: Keep your server shared-nothing and make it interface with as many different breeds of Message Queues on the backend as possible (AMQP, Stomp etc.).

That's what any non-trivial comet app needs to do anyways and any kind of intelligence inside the comet-server beyond dispatching between the backend-queue and the browser only gets in the way.

The canonical setup is:

    Browser <-> Diesel <-> Message Queue
Diesel should be able to maintain connections to multiple Brokers on the backend in parallel, for failover and load distribution - and that's it.

Right, but those message queues themselves implement Paxos or have some master election scheme, etc (if they promise replication/failover). You have to get that durability somewhere.

So, we're internalizing that queueing behavior into diesel.

(You can tell us not to do it, but it's pretty much done. :-)

You have to get that durability somewhere.

Yes, and the message broker is pretty much the only place where it makes sense.

So, we're internalizing that queueing behavior into diesel.

An exercise in futility.

The main application dealing with the actual messages will still need to know which Diesel instance to talk to, in order to reach a particular subscriber.

How do you solve that?

I'm not sure what main application you're referring to.

Every diesel node is an instance of an application that is willing to be provide a service that acts as the master/router for a certain class of messages. Paxos ensures there is only one master elected for every message class, and this router is re-elected should the router go down. The routing table of who is master for what class of messages is kept in sync across all nodes.

It's almost exactly like registered processes in an erlang cluster--reserving the exclusive right to handle certain messages. Simpler master/slave relationships can take over from there.

Every diesel node is an instance of an application

You mean one that the user writes (starting with "import diesel")? If so then I'd think such a tight coupling is a bad idea. Why prevent non-python users from using Diesel as a comet-broker? Why even force python-users to tightly couple their app with Diesel when it'd be so much easier to abstract out the interface they need?

provide a service that acts as the master/router for a certain class of messages. Paxos ensures there is only one master elected for every message class, and this router is re-elected should the router go down. The routing table of who is master for what class of messages is kept in sync across all nodes.

If you're going to all these lengths then why on earth couple it to a comet server? All these features belong in a message broker, not in a protocol endpoint. You'd make many people happy by building a STOMP or AMQP broker with these features, even people that are not interested in comet at all.

Wrt your other reply: No, we don't agree. But at least I have a better idea of what you have in mind now, thanks for that. Also this all is ofcourse just my humble opinion. It's your project and you're free to overengineer at your peril ;-)

Okay, the "comet" bit, I can see that--why conflate those?

To clarify--we're going to build a comet framework _on_ diesel, but diesel itself is more of a general async I/O system with cluster messaging features. We intend it to be applicable for building arbitrary networked application using similar patterns to what you'd do in erlang--message passing.

We just _happen_ to be focusing on building a comet framework first and foremost on it. That will probably be called "dieselweb".

So, the "comet server" portion of our framework may not, in fact, utilize any of this message-broker stuff. But other components might. We're actually going to try to build something fairly unique here, but I don't have all the details ready to put out there yet.

Good luck :-)

So, maybe we agree? Because I'm not disputing (and never have) that the message broker is the place to do that. But you seem to be conceptualizing the message broker as a separate _service_ or application, and I'm saying it's just a function, a role. It doesn't matter so much what particular process (or group of processes) it runs in. That's what I meant by "internalize".

Thanks for the response, and good to see some of the other issues understood. It wasn't anything personal, I haven't checked out Diesel fully yet, I just saw the page at http://dieselweb.org/lib/benchmarks/ and thought it could do with exploring all the other scalability issues...

I guess I just have "Here's a graph showing req/s for twisted/tornado/etc etc" overload lately.

It's definitely nice to see more open source options in the Comet arena :)

Oh and http://shoptalkapp.com/ looks very interesting also

Writing unit tests should be your highest priority. I am far from a TDD advocate, and compared to many I am terrible at maintaining my own tests, but I could never justify using an IO package that contains zero tests.

IO errors are some of the most annoying errors to debug, and the whole point of using a preexisting IO package is to not have to worry about errors in it. I would say you need 100% test coverage of your core library before I could consider using Diesel for a major project.

Just my $0.02...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact