If threads are "out", then pre-fork is way out. Just look at the history of the Apache project. I realize this all happened before the RoR era, but Apache used to use a pre-fork MPM almost exclusively. In more recent years it has added the threaded MPM and the async MPM. They have also put in the work I mentioned above to achieve cross-platform compatibility.
I'm just using Apache as an example here; I'm not suggesting that we should all use Apache. It's just funny to me to see a post that basically says, "All the stuff we've been doing for the last 5 years is out. We should be doing the same stuff they were doing 15 years ago, but in Ruby this time around instead of C."
Maybe software trends are like music trends? Everything from 5 years ago is lame, but the stuff from 20 years ago is super groovy, man.
Portability is not always necessary - when talking about fork() and friends you're talking about trade offs.
When I'm only ever deploying to Unix environments, I accept the lack of portability in exchange for features I value.
The point that I was more upset with was the "threads are dead, pre-fork is the way to go" section of the article.
However, thinking about it more now, even the syscalls vs. portability arguments presented here are another example of forgetting the past (or just never being aware of it). After all the work from the Python devs to encapsulate syscalls in the standard library and provide a portable API to them, Jacob Kaplan-Moss says, "I’m a bit dismayed to see [syscalls] relegated to the dusty corners of our shiny dynamic languages."
nginx relies heavily on system calls at the expense of portability
The blog post on the other hand was talking about using syscalls from Ruby. I can understand Ruby programmers who wrinkle their nose at this. If you want your Ruby application to be portable then using, for instance, fork is not the best idea. In fact, programing anything long lived such as a server in Ruby is not the best idea. The GC in 1.8.x leaks memory over time and it is a common case that Ruby servers has to be restarted often as they consume all memory on the machine they run on. Adding fork on top of this is bad. A fork will copy the whole Ruby interpreter into the new process and you will end up with a lot of top-heavy processes - not a good idea unless you sell RAM. Basically, using Ruby as a systems programming language and for applications that are long lived is a bad idea.
Nonsense; Unix is quite portable.
Making nginx portable means using more syscalls -- the ones specific to the kernel you're calling. Across Unixes that just means using the appropriate epoll/kqueue/etc, for Windows support it means a total refactoring to use NT's Completion Ports.
That said, you're right about portability. The attraction of Ruby's Thread is that it works identically across operating systems.
As productive as dynamic languages make us, I think we sometimes forget that this is all typically built on Linux/C, and that's not changing anytime soon. A _good_ hacker should at least have a basic knowledge of what's under the hood.
> are both very good.
this refers to the first edition of APUE - there is
also a very good (IMHO) second edition co-authored
by S. Rago (a former collegue of W. Richard Stevens):
(first edition: 1992, second edition: 2005)
The second edition mainly adds better coverage
of POSIX (much of which was developed after the
first edition was published) and current UNIX
variants (Linux, FreeBSD, Solaris, MacOS X) while
leaving out obsolete stuff.
So even if it took an afternoon, there'd be no reason to use it, as MRI is the past anyway. (and for those of us still using 1.8.x, there's REE)
Wrong -- almost all of the most visible GCs in the most popular languages are either 1) still awful or 2) were formerly awful for such a long time, they're still living it down.
It's a vicious cycle.
- GCs have a bad rep.
- Precocious programmer implementes their own dynamic language.
- They settle for Mark/Sweep or ref counts to "get it done"
(Hey, GCs are all awful anyhow, yeah?)
- Many people experience the awfulness.
- GCs have a bad rep -- REPEAT
The VisualWorks GC is so good, as a lark, I once put an infinite loop into the app I was working on that did nothing but instantiate new objects. I could barely tell it was there!
Hmmm, you just gave me an idea. Interview question to see if prospect knows what he doesn't know. Does she/he even have the order of magnitude right on that?
People also overlook the great benefit of fork(2) for static languages: it's like GC for your address space. In a long-enough-lived multi-threaded C/C++ program, heap fragmentation will eventually eat you just as badly as a memory leak would have. Since there's no GC to compact the heap, the only real solution for memory-intensive servers is scheduled restarts.
A good, old-fashioned fork(2) resets the address space to a known-ok state; after the client connection is done with, whatever fragmentation the request has introduced disappears with its container process. The canonical, ancient structure of UNIX servers (fork after accept) was what enabled those servers to stay up for months and years at a time, but very few people made that connection. When processes were the only concurrency primitive available, we saw only their costs, and assumed threads would be better, since their costs were lower. In some ways, processes were the devil we knew, and threads an unfamiliar devil; in addition to all the usual complaints (e.g., about how hard it is to synchronize), threads mean that the global heap lives forever.
Can anyone expound on this? Because the author certainly doesn't.
This means that if not careful, when a thread is reading data, that data could be corrupted by a another thread writing data into the same data structure. This in turn means that threads need to lock the data structure when they are accessing it so that other threads can't get to it.
This leads to a whole host of potential problems that can be very difficult to debug because many of them end up depending on subtle timing differences.
Processes avoid this because they can't have shared data. So instead the problem is broken up between processes and message passing is used to share data in between those processes. It's something that generally makes life easier to understand and steers clear of problems that can be very difficult to reproduce and debug.
In nearly all Unixes, both processes share their entire address spaces after the fork(), just immutably -- pages are only copied when they're written to by one of the processes, otherwise they are shared till exit. Copy-on-write is what makes fork() tractable performance-wise.
I would agree with the author. Unless you have a compelling reason to use threads, processes and async tend to make for more correct programs. However, they're often more difficult to get started with. Trade-offs, it's what we do :).
There are still problem domains where you need to obsessively and manually handle memory, there are still people who don't work in such domains but have convinced themselves they do, and there are still people who feel, for whatever reason, that C or C++ are the only tool for Real Programmers(TM). But the trend is and for many years has been away from that and toward managed runtimes, because they're far less complex to work with and far less susceptible to the sorts of easy errors which plague C/C++.
I think that a few years down the line we'll be in a similar situation with threads: there will still be problem domains where you absolutely need them, and people who still believe for whatever reason that manually managing a thread pool and shared resources will make their penis bigger, but most of the world will be moving on to something that's less complex and less error-prone, and probably managed automatically by a language runtime or something similar.
I mainly work on server side code where for the most part, the overhead of a separate process is not issue. The overhead of not sharing memory is not an issue.
i'm always surprised to hear things like this. it's certainly not unfounded - it's just so obvious and universally accepted that it doesn't require explanation - one would think.
And the concurrency primitives and frameworks in languages like Java and C# are so easy to use there's a whole generation of us who think that threading is the easy way to do concurrency.
There is nothing more I can say -- I am so in awe...
... I should just write a big blog post about this ... (which I'll never do... to busy working :-)
It's definitely a "do-better" on my part to more closely examine the libs I'm working with, but as an author in a language wherein numerous OSes and implementations of the interpreter are used, it's something to keep in the back of your mind as well.