

I like Unicorn (Rack HTTP server) because it's Unix - pjhyett
http://tomayko.com/writings/unicorn-is-unix

======
mrshoe
There's a good reason most Ruby and Python projects don't rely heavily on
system calls: portability. If you care at all about portability, it's just
easier to not hit the system calls directly, otherwise you'll have to detect
host OS and make sure you're using the system calls properly.

If threads are "out", then pre-fork is _way out_. Just look at the history of
the Apache project. I realize this all happened before the RoR era, but Apache
used to use a pre-fork MPM almost exclusively. In more recent years it has
added the threaded MPM and the async MPM. They have also put in the work I
mentioned above to achieve cross-platform compatibility.

I'm just using Apache as an example here; I'm not suggesting that we should
all use Apache. It's just funny to me to see a post that basically says, "All
the stuff we've been doing for the last 5 years is _out_. We should be doing
the same stuff they were doing _15 years_ ago, but in Ruby this time around
instead of C."

Maybe software trends are like music trends? Everything from 5 years ago is
lame, but the stuff from 20 years ago is _super groovy, man_.

~~~
defunkt
Counter example: nginx, which uses fork(), and seems to smoke Apache while
offering features like binary reloading without dropping connections (not sure
if Apache supports this but I seem to remember no - please correct me if I'm
wrong).

Portability is not always necessary - when talking about fork() and friends
you're talking about trade offs.

When I'm only ever deploying to Unix environments, I accept the lack of
portability in exchange for features I value.

~~~
mrshoe
That's not really a counter example. nginx does indeed fork off child
processes, but it uses async I/O in each child process. It doesn't use the
traditional pre-fork process-per-connection model that Apache's pre-fork MPM
uses or that Unicorn is using.

~~~
defunkt
nginx relies heavily on system calls at the expense of portability.

~~~
sovande
Hello!? nginx is a POSIX network application written in C. Of course it uses
syscalls. As the previous poster pointed out you are also wrong in that nginx
uses anything like a prefork model. It _may_ fork of a process or more if it
detect that it is running on a SMP system to take advantage more than one CPU.
But each process has a strictly async io architecture handling
request/responses within a big loop.

~~~
gloob
The _point_ of the post you are responding to is that it uses syscalls. I'm
not entirely certain what about that warrants a "!?"; could you enlighten me?

~~~
sovande
The '!?' was aimed at Mr. defunkt who said that

    
    
       nginx relies heavily on system calls at the expense of portability
    

Its a oxymoron as you cannot expect a C application that mostly do network and
disk i/o not to use syscalls. That said, the author of nginx does an admirable
job of reducing the numbers of syscalls and make the application as efficient
as possible. In this context a syscall i.e. a kernel call is heavy and
something one want to minimize.

The blog post on the other hand was talking about using syscalls from _Ruby_.
I can understand Ruby programmers who wrinkle their nose at this. If you want
your Ruby application to be portable then using, for instance, fork is not the
best idea. In fact, programing anything long lived such as a server in Ruby is
not the best idea. The GC in 1.8.x leaks memory over time and it is a common
case that Ruby servers has to be restarted often as they consume all memory on
the machine they run on. Adding fork on top of this is bad. A fork will copy
the whole Ruby interpreter into the new process and you will end up with a lot
of top-heavy processes - not a good idea unless you sell RAM. Basically, using
Ruby as a systems programming language and for applications that are long
lived is a bad idea.

~~~
tyler
Nonsense. My multitudes of small web services, many of which have been running
months without a restart and while taking up no more memory than they did at
the start say you're wrong. Even if there are some obscure bugs, and there
undoubtedly are, that doesn't make ruby an unsuitable language for systems
programming.

------
josephruscio
This book (<http://www.unpbook.com>) and this book
(<http://www.kohala.com/start/apue.html>) are both very good. Not to mention
keeping abreast of new stuff in the Linux Kernel at <http://www.lwn.net> or
<http://kernelnewbies.org/LinuxChanges>.

As productive as dynamic languages make us, I think we sometimes forget that
this is all typically built on Linux/C, and that's not changing anytime soon.
A _good_ hacker should at least have a basic knowledge of what's under the
hood.

~~~
jsrn
> [...] and this book (<http://www.kohala.com/start/apue.html>)

> are both very good.

this refers to the first edition of APUE - there is also a very good (IMHO)
second edition co-authored by S. Rago (a former collegue of W. Richard
Stevens):

<http://apuebook.com/>

(first edition: 1992, second edition: 2005)

The second edition mainly adds better coverage of POSIX (much of which was
developed after the first edition was published) and current UNIX variants
(Linux, FreeBSD, Solaris, MacOS X) while leaving out obsolete stuff.

~~~
josephruscio
Thanks! Rago is the copy I actually have on my bookshelf, didn't notice the
site I linked was only the first edition. Guess with Steven's unfortunate
passing they made a new site.

------
blasdel
fork() in Ruby would be much better if MRI's garbage collector wasn't so awful
-- because it marks every reachable object in each collection cycle, it's
impossible for MRI processes to take advantage of the kernel's copy-on-write
memory sharing post-fork. You can't even just let the processes gobble up
space and let the kernel's VM sort it out -- anything that gets swapped out
will have to be paged back in to be marked by the GC. _Churn._

~~~
boulderdash
almost all GCs are awful

~~~
stcredzero
_almost all GCs are awful_

 _Wrong_ \-- almost all of the _most visible_ GCs in the most popular
languages are either 1) still awful or 2) were formerly awful for such a long
time, they're still living it down.

It's a vicious cycle.

    
    
        - GCs have a bad rep.  
        - Precocious programmer implementes their own dynamic language.  
        - They settle for Mark/Sweep or ref counts to "get it done" 
           (Hey, GCs are all awful anyhow, yeah?)
        - Many people experience the awfulness.
        - GCs have a bad rep -- REPEAT
    

Chicken & egg? GCs were bad. Experts have since figured out how to make them
good. The programmer culture in general is slowly getting this knowledge by
diffusion.

The VisualWorks GC is so good, as a lark, I once put an infinite loop into the
app I was working on that did nothing but instantiate new objects. I could
barely tell it was there!

~~~
boulderdash
so the 'almost all' in my above is wrong? you basically said what I said, but
with your favorite smalltalk GC.

~~~
stcredzero
Yes, the GCs you've heard of constitutes an encyclopedic listing of them.
</sarcasm>

Hmmm, you just gave me an idea. Interview question to see if prospect _knows
what he doesn't know_. Does she/he even have the order of magnitude right on
that?

------
kmavm
I'm sorry I'm coming to this with the thread dead. I was on a boat for a team-
building offsite most of yesterday.

People also overlook the great benefit of fork(2) for static languages: it's
like GC for your address space. In a long-enough-lived multi-threaded C/C++
program, heap fragmentation will eventually eat you just as badly as a memory
leak would have. Since there's no GC to compact the heap, the only real
solution for memory-intensive servers is scheduled restarts.

A good, old-fashioned fork(2) resets the address space to a known-ok state;
after the client connection is done with, whatever fragmentation the request
has introduced disappears with its container process. The canonical, ancient
structure of UNIX servers (fork after accept) was what enabled those servers
to stay up for months and years at a time, but very few people made that
connection. When processes were the only concurrency primitive available, we
saw only their costs, and assumed threads would be better, since their costs
were lower. In some ways, processes were the devil we knew, and threads an
unfamiliar devil; in addition to all the usual complaints (e.g., about how
hard it is to synchronize), threads mean that the global heap lives forever.

------
blasdel
Jakob Kaplan-Moss thought it’d be an interesting exercise to port Ryan’s code
to Python: <http://jacobian.org/writing/python-is-unix/>

------
dschobel
_Threads are out. You can use processes, or async/events, or both processes
and async/events, but definitely not threads. Threads are out._

Can anyone expound on this? Because the author certainly doesn't.

~~~
wmf
It's not clear whether he's saying that threads are bad in Ruby, which is true
because of the GIL, or that threads are bad in general, which is just
unfounded bashing.

~~~
ryah
> which is just unfounded bashing.

i'm always surprised to hear things like this. it's certainly not unfounded -
it's just so obvious and universally accepted that it doesn't require
explanation - one would think.

~~~
dschobel
I think it may be obvious to people with a c/++ background, but when I was in
school in the early 2000's, they taught you concurrent programming in Java via
threads.

And the concurrency primitives and frameworks in languages like Java and C#
are so easy to use there's a whole generation of us who think that threading
is the easy way to do concurrency.

~~~
jshen
threading is not easy in java. Check out
<http://www.javaconcurrencyinpractice.com/>

------
wsprague
The master has passed away, but here is the real deal (tm) in Unix:

<http://www.kohala.com/start/>

There is nothing more I can say -- I am so in awe...

------
boulderdash
When I read stuff like this, I should go... Yay, they are back on the path.
However, I'm stuck on the... why are they even off the path in the first
place? Why should rediscovery of processes and the 'select' call be news?

... I should just write a big blog post about this ... (which I'll never do...
to busy working :-)

------
colbyolson
I don't know much about Ruby or 'threads vs processes', but his blog sure is
refreshing and simple!

~~~
cschneid
At least last time I checked, he was running a custom written blog engine that
several of us in #sinatra made a bit more generic. It's probably been heavily
modified or even replaced since then. Link: <http://github.com/rtomayko/wink>

------
jherdman
That's fine and dandy, but when I want to use your libraries with JRuby or
MacRuby, I'm SOL. I kind of like the portability of avoiding fork() and exec()
for those reasons.

~~~
jeremymcanally
Well, it's a Rack web server. Just use one that works on those, problem
solved. That's the whole point of Rack.

~~~
jherdman
Point taken. In the general case, though, (i.e. we're talking about more than
Rack awesomeness) I think it's important that people keep this in the back of
their minds. I was burned on this recently with the use of the Daemon Kit gem
in a project.

It's definitely a "do-better" on my part to more closely examine the libs I'm
working with, but as an author in a language wherein numerous OSes and
implementations of the interpreter are used, it's something to keep in the
back of your mind as well.

------
gsiener
Why didn't they use Twisted?

~~~
oldmoe
Good question, may be even build on Tornado, though it seems that Unicorn is
written in Ruby.

