

Diesel: How Python Does Comet - z8000
http://dieselweb.org
- async I/O using epoll on Linux
- for Python 2.6
- uses Python's generators 
- is not Twisted nor a twister or a Tornado
======
apgwoz
Let me be the first to get out of the way all of the complaints:

1\. Why didn't they just use Twisted? 2\. Why didn't they predict tornado and
make it better? 3\. Yeah, but (tornado|twisted|my half baked implementation)
is still faster at serving 'hello world' than diesel.

\---

I for one welcome this new breed of choice in this space. I'm interested in
"real-timing" an application I'm currently working on, so having more choices
as to which way to implement it is great.

~~~
jerf
Actually, "Why not Twisted?" is still a good question. If they have a good
reason, let's hear it. If you want to choose between this and Twisted, that's
information you need.

One huge Twisted advantage is that you can have your async comet reactors
running in the same process as other Twister reactors, which can be very
useful for many applications you might want Comet for. What's the payoff for
using this instead?

BTW: These are straight questions, not sarcastic questions. I am perfectly
willing to extend benefit of the doubt and assume there are good answers. I
just don't know what they are, and I'd like to know, as I am potentially
looking to use some stuff like this myself in the near future.

~~~
mcav
Twisted creates confusing, bloated code.

Every time you have to do asynchronous work, you have to mentally follow the
chain of deferreds to ensure you understand exactly what code will be called
at what time. Coupled with its unpythonic, deep hierarchical API, Twisted
almost literally _twists_ your code into a jumbled blob.

Unless you need the existing protocol-specific APIs twisted provides, stay
far, far away. Find something more lightweight, like this, or Eventlet, or
something else that lets your code remain concise and clear.

This comes from my experience writing a Twisted server for my startup, and
subsequently rewriting it when the Twisted version became far too difficult
and frustrating to maintain.

~~~
PhilChristensen
I'm sorry, but this post is just packed full of FUD.

I hear this all the time, "argh, Deferreds are so hard to understand," as if
there's some kind of magic going on in a Deferred. Deferreds are just a way to
encapsulate two callbacks, one for errors and another for successful
callbacks.

If you're writing any kind of significantly complicated asynchronous process,
what is the difference between following a bunch of Deferreds and following a
bunch of callbacks??

Asynchronous coding is _hard_. There's no framework that is going to make that
go away. Diesel looks nice enough, but although it shows promise, there's not
enough examples of hardcore usage to make a sufficient comparison. An
asynchronous IO framework that has nothing but HTTP and echo examples doesn't
even scratch the surface of the kinds of apps that require the advanced
asynchrony available in Twisted.

~~~
moe
_If you're writing any kind of significantly complicated asynchronous process,
what is the difference between following a bunch of Deferreds and following a
bunch of callbacks??_

The difference is that in co-routine based code (such as concurrence or
eventlet) there are no callbacks. Co-routines are all about message passing
which leads to much cleaner and much _less_ code in most applications.

It's not uncommon to see 40%+ boilerplate callback-handling code in a twisted
app. Those are not needed in a coro-environment.

 _Asynchronous coding is hard._

Actually the established patterns for implementing asynchronous code are hard.
You get to choose between threading-hell (deadlocks, races) and callback-hell
(code-bloat). I have no idea why these abstractions grew so popular, my guess
would be because they are "closest to the metal".

But they are not without alternatives; co-routines, actor-based concurrency
and erlang exist - all of which enable fairly straightforward "Do what I mean"
development.

------
simonw
Interesting, this is the first Python async/comet library I've seen to take
advantage of generator coroutines:

    
    
        def echo(remote_addr):
            their_message = yield until_eol()
            yield "you said: %s\r\n" % their_message.strip()

~~~
z8000
There are actually quite a number of (yes, relatively dead) projects that do
this or something very similar: eventlet, gevent, cogen, concurrence,
weightless, ...

~~~
moe
Just for the record, eventlet and concurrence are not dead, I even used the
latter in production.

They both have a fairly small audience but when you look at their mailing
lists you'll see activity.

------
jbellis
When I think of "comet" I think of "something that gives my apps the ability
'push' to browsers [even if it's really long polling under the hood]." Did I
miss the browser integration story here, or are we calling all asynchronous
apis "comet" now?

~~~
jamwt
Yeah, we didn't get a chance to change that headline before the project leaked
out there. The next two phases of the project will get higher-level and build
out our "real" goal: a bitchin' comet framework.

But you're right, it doesn't have much to do with comet at this point.

------
cvg
I just think it's funny how they compare it to the 1 week old tornado rather
than twisted. Way to jump on the bandwagon. I predict a few more asynch
libraries going open source in the not so distant future.

------
z8000
It does seem a bit messy to read, at least compared to eventlet/gevent which
does some nice monkey-patching of the socket API.

~~~
japherwocky
a lot of pythonistas consider monkey-patching to be exceptionally poor taste.

~~~
deno
If it's used to unnecessary change language or API behavior - not for
justified performance improvment.

------
axod
When will people start to understand that req/s is not the full picture of how
well a Comet webserver scales :/

~~~
jamwt
Hi, author of diesel here.

I can only assume you're probably referring to some of these things:

1\. Response latency, esp at 99%-ish mark, under load 2\. Memory usage per
connection under many idle connections 3\. Scalability wrt. data sharing,
backing persistent data, replication/redundancy strategies, etc

I'm not sure if you were referring to us, the diesel authors, when you
indicated that someone didn't understand something about Comet scaling, but I
assure you that diesel does 1 and 2 quite nicely, as do most sensibly-written
things based on epoll, kqueue, etc. The benchmark page is with 1k concurrent
connections, and it does well with more than that, too.

If you're referring to item 3, I'm afraid diesel doesn't tackle that element
of scalability yet. It's more of an I/O library, not really a framework with
aspirations of providing high availability. We have some plans and quite a bit
of mostly-working code that implements a paxos-based framework for achieving
those goals as well, but release of that part of the framework is some months
out.

Writing good unit tests for this stuff is a higher priority--unit testing
async code is a PITA. We're probably going to need to steal ideas from
twisted.trial or something.

Thanks for checking out diesel.

~~~
moe
_3\. Scalability wrt. data sharing, backing persistent data,
replication/redundancy strategies, etc_

...

 _We have some plans and quite a bit of mostly-working code that implements a
paxos-based framework_

Don't. Seriously, don't go that route, it's a huge waste of time. Instead:
Keep your server shared-nothing and make it interface with as many different
breeds of Message Queues on the backend as possible (AMQP, Stomp etc.).

That's what any non-trivial comet app needs to do anyways and any kind of
intelligence inside the comet-server beyond dispatching between the backend-
queue and the browser only gets in the way.

The canonical setup is:

    
    
        Browser <-> Diesel <-> Message Queue
    

Diesel should be able to maintain connections to multiple Brokers on the
backend in parallel, for failover and load distribution - and that's it.

~~~
jamwt
Right, but those message queues themselves implement Paxos or have some master
election scheme, etc (if they promise replication/failover). You have to get
that durability somewhere.

So, we're internalizing that queueing behavior into diesel.

(You can tell us not to do it, but it's pretty much done. :-)

~~~
moe
_You have to get that durability somewhere._

Yes, and the message broker is pretty much the only place where it makes
sense.

 _So, we're internalizing that queueing behavior into diesel._

An exercise in futility.

The main application dealing with the actual messages will still need to know
which Diesel instance to talk to, in order to reach a particular subscriber.

How do you solve that?

~~~
jamwt
I'm not sure what main application you're referring to.

Every diesel node is an instance of an application that is willing to be
provide a service that acts as the master/router for a certain class of
messages. Paxos ensures there is only one master elected for every message
class, and this router is re-elected should the router go down. The routing
table of who is master for what class of messages is kept in sync across all
nodes.

It's almost exactly like registered processes in an erlang cluster--reserving
the exclusive right to handle certain messages. Simpler master/slave
relationships can take over from there.

~~~
moe
_Every diesel node is an instance of an application_

You mean one that the user writes (starting with "import diesel")? If so then
I'd think such a tight coupling is a bad idea. Why prevent non-python users
from using Diesel as a comet-broker? Why even force python-users to tightly
couple their app with Diesel when it'd be so much easier to abstract out the
interface they need?

 _provide a service that acts as the master/router for a certain class of
messages. Paxos ensures there is only one master elected for every message
class, and this router is re-elected should the router go down. The routing
table of who is master for what class of messages is kept in sync across all
nodes._

If you're going to all these lengths then why on earth couple it to a comet
server? All these features belong in a message broker, not in a protocol
endpoint. You'd make many people happy by building a STOMP or AMQP broker with
these features, even people that are not interested in comet at all.

Wrt your other reply: No, we don't agree. But at least I have a better idea of
what you have in mind now, thanks for that. Also this all is ofcourse just my
humble opinion. It's your project and you're free to overengineer at your
peril ;-)

~~~
jamwt
Okay, the "comet" bit, I can see that--why conflate those?

To clarify--we're going to build a comet framework _on_ diesel, but diesel
itself is more of a general async I/O system with cluster messaging features.
We intend it to be applicable for building arbitrary networked application
using similar patterns to what you'd do in erlang--message passing.

We just _happen_ to be focusing on building a comet framework first and
foremost on it. That will probably be called "dieselweb".

So, the "comet server" portion of our framework may not, in fact, utilize any
of this message-broker stuff. But other components might. We're actually going
to try to build something fairly unique here, but I don't have all the details
ready to put out there yet.

~~~
moe
Good luck :-)

