*If you fork inside an app server, such as mod_python, you will fork the entire ...

lincolnq · on Oct 20, 2009

Four forks per second is basically nothing. This article is blowing it all out of proportion. You can't sustain forking per web request on a really large site but at this scale it's not going to matter.

The author is being stupid: the size of the process that you're forking doesn't really matter (it might start to matter if you didn't call exec() or exit() right after you forked, but that's not the case: you're just execing another program, which replaces the current process in memory). VERY little is copied; fork is defined to have copy-on-write semantics for the process's address space.

swombat · on Oct 20, 2009

I'd use something like DelayedJob and send_later the call to your image processing stuff, that way the forking happens out of the request path, at least.

patio11 · on Oct 20, 2009

You just described my exact setup. However, my understanding is that Delayed::Job's worker threads have a full Rails environment in them, and if this blog post is correct and I am indeed forking that entire Rails process for every call out to ImageMagick, my vague recollections of what a fork entails suggest to me that the Ghosts of C Programmers Past are going to visit a terrible vengeance upon me.

jeremyw · on Oct 20, 2009

The fork+exec is efficient. The blog post compares things without units. Forks (principally page table copies w/copy-on-write in effect) are measured in microseconds and the exec is your standard binary startup time. While you don't want to put a synchronous fork/exec in the way of 5,000 reqs/sec, it will be a trivial part of your asynchronous imagemagick processing.

At scale, you might care about the imagemagick startup latency, but not the forking.

swombat · on Oct 20, 2009

Only if you run out of memory. But with DJ at least you should be forking only one call at a time, rather than multiple, like you might from the controller itself. So although you'll end up using more memory, it'll only be one extra rails process, not 4.

May not still be ideal... interested to hear other people's ideas.

simonw · on Oct 20, 2009

Use http://github.com/documentcloud/cloud-crowd

boundlessdreamz · on Oct 20, 2009

Use a queue. You should never be doing time consuming method calls inside a controller anyway.

idlewords · on Oct 20, 2009

Yes, yes, yes. Beanstalkd is an easy one to set up, for example, with good Rails integration.

polvi · on Oct 20, 2009

The solution to use an image processing library such as RMagick, http://rmagick.rubyforge.org/

tptacek · on Oct 20, 2009

Calling into RMagick/ImageMagick from inside the request/response cycle is probably even worse than shelling out, because ImageMagick does grievous damage to your runtime.

polvi · on Oct 20, 2009

I guess it all depends how you design it and what you are doing. I would have to agree with others, the out of request cycle image processing solutions are definitely the right way to go overall.

Pistos2 · on Oct 21, 2009

Last I tried RMagick, it leaked significantly. Definitely not something I want to use in a long-lived process. I remember having to fork to use it, to work around the memory leak. If you don't need the fancier operations, there are lighter image manipulation gems out there that do just the basics, but without leaking. e.g. ImageScience http://seattlerb.rubyforge.org/ImageScience.html