I haven't tried celluloid yet, but I've learned one lesson building things with PHP, Ruby, Python, Node, Java, Scala, etc. That lesson is to use tools with a community of people around it that are using those tools to solve the same problems as you. The more community is around a project the more likely that someone else has stumbled upon whatever bug or problem you have hit already and found a solution if it exists. Also, documentation tends to be better and easier to find on more popular languages and frameworks.
So, if you like busting out CRUD web apps, you probably should look at PHP and Zend/Symfony/etc. framework or Ruby on Rails or Python Django or Java Play. If you like doing single page JS apps you should look at Knockout, Backbone, and Ember. If you want to do more evented/parallel networked apps you should look at node, scala, clojure, go, and erlang because those communities care a lot about threading, evented, actor pattern type programming.
Evented/multicore/multithreaded programming is just not something that say PHP, Ruby, and Python have embraced as much as a community because it's not for the most part a problem that the average PHP, Ruby, and Python dev is trying to solve most of the time.
Node does not own evented I/O.
I didn't mean to say node owns evented I/O, just that their whole community embraces it.
Checkout Akka, it's awesome: http://akka.io/
It's well documented, straightforward to use, robust and works seamlessly on top of many event loops (libev, libevent, Glib, Tk and more).
And I just need to look at AnyEvent::* namespace on CPAN to see what can be run with it - http://search.cpan.org/search?query=anyevent%3A%3A*&mode... (currently lists 630 modules)
I think DRb pretty cool and has long been underutilized. That said, there are a few fundamental design problems with the way it works that I think DCell solves:
1. Distributed systems really need to be built on top of asynchronous protocols, and DRb is a direct mapping of a synchronous method dispatch protocol onto a distributed systems protocol. Similar attempts at this include: CORBA and SOAP. If anyone disagrees with this I can go into more detail but I think you will find ample distributed systems literature condemning synchronous protocols. DCell is fully asynchronous (but also provides synchronous calls over an underlying asynchronous protocol)
2. DRb is multithreaded but does not provide the user with any assistance in building multithreaded programs. This becomes particularly confusing when you have to deal with a proxy object (DRb::DRbObject) in a remote scenario but not in a local scenario
EventMachine is • A frankenstein guts the ruby internals • Not in active development • Makes non-blocking IO block • Requires special code from Ruby libraries • Hard to use in an OOP way • Is really difﬁcult to work with • Poorly documented
I'm not sure I understand how EventMachine "guts the Ruby internals" (I didn't watch the talk). It's true that EventMachine's internals are C++, not Ruby; there was originally a reason (again, I think it had to do with green threads) that it was designed this way. I'm not sure I can think of the Ruby functionality that EventMachine changes, or the manner in which EventMachine mucks with the interpreter or its runtime. I'm obviously ready to be corrected, but I'm missing how this impacts me as a programmer. Maybe he's talking about exception handling?
I also wasn't aware EventMachine "wasn't under active development". Because it's just an IO loop. Is libevent in active development? Do I need to be aware of that to use it? The underlying OS capabilities EventMachine maps haven't changed in over a decade. I think I'm actually happy they aren't constantly changing it.
I also don't understand how EventMachine "makes non-blocking block". All EventMachine I/O is nonblocking; it's essentially a select loop.
I also don't understand what special code EventMachine demands from libraries. Maybe he means database libraries? That is, maybe he's referring to the fact that you can't use standard Ruby database libraries that rely on blocking I/O inside an EventMachine loop? I'm wondering, then, what he expected. We wrote a little library (a small part of it is on Github) to do evented Mysql, but we stopped doing that when we realized that Redis evented naturally, and we just hook Mysql up through Redis.
"Hard to use in OOP way" just seems wrong, given that the ~30 evented programs I can find in my codebase directory all seem to be pretty object-oriented. So, that's not so much a question on my part.
Really difficult to work with? I've taught 7 different people EventMachine, in a few hours each. EventMachine is easier than Ruby's native sockets interface, in several specific ways.
I think maybe the issue here isn't so much EventMachine, but the idea of using EventMachine as a substrate for frameworks like Sinatra and Rails. That idea is whack, I agree. Trying to retrofit a full-featured web framework onto an event loop seems like an exercise in futility.
But on the other hand, I've been writing Golang code for the past 2 months, and Golang is militantly anti-event; it doesn't even offer a select primitive! Just alternating read/write on two different sockets seems to demand threads! And what I find is, my programs tend to decompose into handler functions naturally anyways. I try to force myself to write socket code like I did when I was 13, reading a line, parsing it, and writing its response, but that code is brittle and harder to follow than a sane set of handler functions.
So, long story short: I'm not arguing that evented code is the best answer to every problem, or that web frameworks should all be evented, or that actor frameworks aren't useful. It's probably true that a lot of people rushed to event frameworks who shouldn't have done that. But there are problems --- like, backend processing, or proxies, or routers and transformers, or feed processors --- where event loops are the most natural way to express a performant solution.
EventMachine does not use the Ruby IO primitives (e.g. TCPSocket, UDPSocket, etc) as the basis of its IO abstraction, and instead has reimplemented its own set of primitives for doing IO.
Because of this it can't take advantage of work being done in Ruby core to advance Ruby's socket layer. For this reason IPv6 support langered, among other problems.
This also severely complicates making multiple implementations of the EventMachine API, such as its JRuby backend (which maps onto Java NIO)
I think the real problem is EventMachine's original goal was to be a cross-language I/O backend similar to libevent or libev, but since it wasn't a particularly good one, the only language that wound up using it was Ruby. Compare to Twisted, which is built on libevent, or to Node, which is built on libev/libuv
* select() call used to segfault.
* the non blocking read/write behaved very differently under different OSes and I am not talking Windows. There were differences between Linux and OSX for example.
* Again frequent crashes when reading/writing when nonblock flag is set.
I am sure, situation is lot better now but for building a reactor library, native Ruby IO primitives fell short. There is always lack of advanced selectors (Epoll/KQueue) as well.
Now as I replied to thomas(?) below and since you yourself have wrapped libevent for Ruby 1.9, there were severe limitation in interpreter back then for even libevent wrapper to work. So I guess, given historical reasons, it made sense Eventmachine did not use native Ruby IO primitives or libevent.
Again, this is apocryphal, but I remember someone trying to wrap Ruby around libevent and failing; I remember there being a reason this had to be done bespoke. And having fallen into this particular NIH trap many times before: there's just not a whole lot to a simple socket I/O loop.
What's been your experience with JRuby/EventMachine?
I am sure you could write a libevent-based reactor, but twisted was started a long time ago (10 years ? The twisted book in o'Reilly was published in 2005), before libevent existed I think.
Or compared to AnyEvent which works with [m]any event loop - https://metacpan.org/module/AnyEvent
s/anti-event/anti-callback/. They are not the same thing. Under the hood, Golang is doing the eventing and select() for you, while letting you write simple procedural code.
> I try to force myself to write socket code like I did when I was 13, reading a line, parsing it, and writing its response, but that code is brittle and harder to follow than a sane set of handler functions.
How is simple, procedural code hard to follow? Even the way you write it sounds simple "read, process, write". That is really more hard to follow than a series of onRead, onWrite callbacks?
> I'm not arguing that evented code is the best answer to every problem
No, but like many others you seem to be confusing and conflating call-back driven code with event-loop based code. Everyone agrees on the benefits of event loops over kernel threads. Not everyone agrees that call-backs are the best interface to event loops - more and more people are switching on to green threads (which look just like normal threaded code) as the best and simplest interface to the event-loop - as seen by Golang, gevent, Eventmachine, Coro (Perl) etc etc.
I don't want to turn this into "events work for every program", because like I said above: I don't think event interfaces work for all kinds of programs. I've found them poorly suited to full-featured web frameworks, for instance.
I did not predict this incredibly boring "evented runtime vs. evented API" controversy, but will dispense with it by saying that it is incredibly boring, so you win it in advance. :|
Using non-blocking APIs requires you to turn your application "inside out", putting state that would normally have been on the stack into a state machine. You wind up in a maze of callbacks and low-level details.
This kind of thing might be appropriate for writing high performance code in C, but it's a mystery to me why anyone would want to do it in a higher-level language.
That list is just screaming for someone to write a comment mentioning that you forgot Erlang.
The lock conditions it has are for particular workloads at particular throughput/utilization, so 99% of the people using it won't hit those conditions initially (and a lot of people aren't writing systems that will ever reach those limits). However, we have a particular service that acts as a TCP multiplexer/router/load balancer to provide high availability for some of our mission critical applications. A little over a year ago, I initially wrote it using EventMachine in a couple of weeks, then spent almost a month trying to find what I thought was a bug in my code where a full deadlock would occur if 3 or more connections were established in a small enough time frame. Turned out to be a bug in EventMachine. After fixing that one, I found another where if 5 or more open connections fired the same event within a small enough time frame, all 5 would hit livelock and given enough connections hitting the condition, the whole process would deadlock. Once I hit the second bug in EventMachine itself directly related to concurrency handling, I switched to Netty and rewrote the whole thing in Scala in about a week and it's been rock solid since.
I've been doing some development on the side in Go because its particular flavor of types and concurrency model is fascinating. It's not that Go is anti-event, it's that it's overwhelmingly stream oriented (not low-level streams, but data streams). Once I finally hit that moment of clarity that goroutines/channels was all about connecting streams of data and not connecting raw streams, they became a much more natural solution. However, don't get me started on the difference between a non-blocking read on a channel and a blocking read on a channel.
One thing we did have was a large majority of the connections were coming from the same IP (but different source port) and a peak connection rate of ~200 connections per second per server and would last a few seconds (so at any given moment, roughly 1000 connections being established or with data in flight). I'd be really interested in hearing what your traffic patterns are roughly like (and hand-wavy what you have event machine doing, like proxying requests, building/returning its own responses, etc.) if you're willing to share at all.
> Golang is militantly anti-event; it doesn't even offer a select primitive! Just alternating read/write on two different sockets seems to demand threads!
So..can you be more specific about what you were talking about in that statement?
Obviously, the whole of Golang's concurrency model is lightweight threads scheduled on I/O events. But that's not exposed to the programmer; in fact, it's hermetically sealed away from the programmer from what I can tell.
So now you know what I meant by that statement.
If you want to manually yield to the scheduler, you call runtime.Gosched() and the calling goroutine yields. If you want to lock a goroutine into a specific thread, you call runtime.LockOSThread(). Everything is single-threaded to start until you call runtime.GOMAXPROCS(), but that's intended to go away in the very near future. So... I'm not sure what you mean by sealed away; the Go runtime does the sensible things that it can do automatically, but you still have the ability to change the scheduling behavior if you really want to.
what do you mean by "doesn't offer a select primitive"? I'm not familiar with your definition of select, because Go has a select keyword for concurrency control, and I'm under the impression you're talking about select as its defined in EventMachine, could you elaborate?
Go provides concurrency primitives (scheduler yielding i/o, channels, goroutines, etc) upon which you can build your own "events". I agree (and this seems to be your point) that Go doesn't provide a canned event mechanism that you can just hook into for callbacks on IO.
For example in the Go http server a goroutine accepts in a loop and kicks off goroutines to process and handle new requests.
Take the example of a simple proxy to see where I'm coming from. Sure, I'm happy to have Go manage all the connections and sockets and I'm happy to spawn new goroutines for each connection and all that. But a proxy accepts an inbound connection, makes an outbound connection, and then monitors the outbound and inbound sides for data. The loop to do this with sockets could be a simple two-descriptor read select(2) call. But instead, I have to spawn two more goroutines, one to "monitor" (really, read) from inbound, and one for outbound.
What does Golang's select/case not do that you want?
I agree that would be a nice feature though.
I'm not criticizing Go for being anti-event; I'm just observing that it is. Idiomatic Go --- like, the code in the standard library --- has a strong bias towards straight-line code.
I think I understand where you are coming from now - I think you are saying Go is "anti-event-based-callback-driven" rather than "anti-event-loop-implementation" - which is absolutely true. Go's concurrency model is build on CSP (Hoare's Communicating Sequential Processes) which seems to advocate procedural threads rather than callbacks.
I know this isn't the same as posix select, but it does let you have one goroutine coordinate the hot potato..
If you really do just need to take data from one Reader and send it to a Writer, you can make a new Pipe.
Do like the http library does and separate out socket handling to be in terms of net.Listeners/net.Conns, and create a handler abstraction for yourself. Your socket code will be testable, because its trivial to write fake net.Conns that are backed with byte buffers. Your handler code will be testable because they are in terms of parsed objects.
I tend to write socket code once per server project, and then leave it alone for months, so this isn't a big deal for me. What makes Go nice for me is that I can block in client code, and retain the efficiency of using a select loop.
I don't have access to the source for the proxy I wrote for work, but here's one I whipped up quickly: http://play.golang.org/p/Fz19qSehCg
Go doesn't expose select, and has the runtime do that for you; but this allows them to make all Go libraries share a select loop, which has nice performance characteristics. Although, now that I think about it, it is probably possible to have a userspace implementation of select that works atop the runtime's shared select loop. Hmmm...
What EventMachine does (or at least what it did a year or so ago when I was debugging this) is this: sub processes are equated to popen. When the input side of that pipe closes (that is, when the sub process closes STDOUT) the process will be finally be waited upon—and if it doesn't terminate in a hard coded timeout which by the way blocks the rest of your program, then it will be forcibly gunned down, with SIGKILL if necessary.
Among the problems that arise here, note that unless your daemon script is written to unconditionally drop STDOUT upon forking (uncommon) and you attempt to launch a daemon from within a sub process you are managing using EventMachine, the subprocess itself will terminate quickly, the daemon will go on its merry way, and your driver program will never, ever tell you it has finished running until that daemon is dead and anything it has spawned that might possibly use STDOUT is also dead. And god forbid it close that stream and then dare to continue running, for EventMachine will shoot it dead within IIRC 20 seconds, and lock up your driver program for the duration to boot.
Programmers who understand how the Unix process model works will write a very small signal handler for SIGCHLD that writes a byte on a pipe or some similar method of notifying the main event loop and call wait on the child immediately and then close its end of those pipes. I am reliably informed by those who understand the Windows process model that what EventMachine does is even more wrong there. This is a subsystem that was not written by anyone who knew what popen does, could not be bothered (or was perhaps incompetent to read) what any of a dozen standard implementations of it do, and appears to have debugged the code into some form of submission and then released it upon an unsuspecting public.
This is the only colossally wrong decision they made that I can list off the top of my head, but that's because it was so stupid I stopped looking for trouble after that. EventMachine does not handle anything but a very straightforward select loop very well, and I am sufficiently terrified of what lives under the covers in that system that I would rather write the select by hand (massive pain though it may be) than let this system anywhere near it. The thing that really alarms me is that people build walls of cardboard like NeverBlock (which reaches deeply into the guts of the Ruby software I/O and replaces it with EventMachine driven coroutines) atop this foundation of sand and then wonder when it falls over sideways in an impenetrable and impossible to debug fashion.
Coroutine programming (for that is, essentially, what we are talking about) can be a very elegant way to solve certain problems, but it works best when it is simple, or it least localized (e.g. samefringe). In an event driven server, every little piece must be audited carefully to ensure that it does not block. You get all the same problems any preemptive concurrency model does, with some added nastiness; in exchange you get some slightly better scalability numbers. It is at its best in a fairly simple program such as, say, nginx in its proxy configuration, where it speaks streams and SSL and talks to some application server on the other end of a different stream for anything sophisticated.
And this isn't "arbitrary use cases"; this is an explicitly supported function that is completely contrary to good practice and sane behavior and, to boot, has the ability to arbitrarily kill programs for impenetrable reasons and block for significant periods of time (the central sin of event driver programming). You can't tell me that if you saw something like this in a random crypto library you wouldn't immediately tell everybody to stop using it; why should EM's developers get a pass for their, yes, incompetently written popen? I would actually be considerably happier if it wasn't in the library at all; at least then it wouldn't be wrong.
I was using Adam Langley's net/ssl code in Golang to build an HTTPS proxy, and only after several hours of hair-pulling did I discover that Langley hadn't implemented the compat SSL2 handshake that Firefox uses with proxies. net/ssl in Go was, for no good reason other than an omission, unsuited for use as an HTTPS proxy. Should I say net/ssl was incompetently written? That seems like a bad idea to me.
Your backend processing apparently fits in one machine, since you "hook Mysql up through Redis." I'm personally astonished you get so much done without a distributed environment tolerant of the relevant failures I see when I read that sentence.
We've been happily using Go for a few internal systems are are slowly galvanising most of our backend with it too. Life is happier.
I have yet to test it in production, but it looks pretty promising.
I had a lot of problems with EM blocking networks connections if an event loop was tying up CPU, but I suppose that's to be expected.
I knew a guy who chose Scala for a project so he could use Actors for concurrency.
The system never gave the same answers twice and wouldn't peg all the cores on a 4-way machine.
I spent two days trying to fix it, then I got wise and switched back to Java and got it working in 20 minutes with ExecutorService with (i) no race conditions, and (ii) nearly perfect scaling up to eight cores.
Anyway, just wanted to throw it out there that I've deployed several production apps (gaming/messaging servers) using EventMachine, and combined with em-synchrony, I'm pretty comfortable with its performance and limitations. There seems to be good community support (thanks to Ilya Grigorik's articles and em-synchrony) with plenty of examples/documentations.
Just wondering, are there any people out there with Celluloid app experience that's currently in production? Like any other geek, I love programming "the right way" but I know nothing about Celluloid and it's real-world benefits/drawbacks (performance, API, code maintainability, code documentation, community support, blog articles, etc).