Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you fork inside an app server, such as mod_python, you will fork the entire parent process (apache!). This could happen by calling something like os.system("mv foo bar") from a python application.

I nominate this post as the most distressingly important bit of information I've ever received at 2:43 AM in the morning.

Now the question: what can I do in Ruby to avoid the four calls a second or so I'm currently making to system(big_command_to_invoke_imagemagick) ?



Four forks per second is basically nothing. This article is blowing it all out of proportion. You can't sustain forking per web request on a really large site but at this scale it's not going to matter.

The author is being stupid: the size of the process that you're forking doesn't really matter (it might start to matter if you didn't call exec() or exit() right after you forked, but that's not the case: you're just execing another program, which replaces the current process in memory). VERY little is copied; fork is defined to have copy-on-write semantics for the process's address space.


I'd use something like DelayedJob and send_later the call to your image processing stuff, that way the forking happens out of the request path, at least.


You just described my exact setup. However, my understanding is that Delayed::Job's worker threads have a full Rails environment in them, and if this blog post is correct and I am indeed forking that entire Rails process for every call out to ImageMagick, my vague recollections of what a fork entails suggest to me that the Ghosts of C Programmers Past are going to visit a terrible vengeance upon me.


The fork+exec is efficient. The blog post compares things without units. Forks (principally page table copies w/copy-on-write in effect) are measured in microseconds and the exec is your standard binary startup time. While you don't want to put a synchronous fork/exec in the way of 5,000 reqs/sec, it will be a trivial part of your asynchronous imagemagick processing.

At scale, you might care about the imagemagick startup latency, but not the forking.


Only if you run out of memory. But with DJ at least you should be forking only one call at a time, rather than multiple, like you might from the controller itself. So although you'll end up using more memory, it'll only be one extra rails process, not 4.

May not still be ideal... interested to hear other people's ideas.



Use a queue. You should never be doing time consuming method calls inside a controller anyway.


Yes, yes, yes. Beanstalkd is an easy one to set up, for example, with good Rails integration.


The solution to use an image processing library such as RMagick, http://rmagick.rubyforge.org/


Calling into RMagick/ImageMagick from inside the request/response cycle is probably even worse than shelling out, because ImageMagick does grievous damage to your runtime.


I guess it all depends how you design it and what you are doing. I would have to agree with others, the out of request cycle image processing solutions are definitely the right way to go overall.


Last I tried RMagick, it leaked significantly. Definitely not something I want to use in a long-lived process. I remember having to fork to use it, to work around the memory leak. If you don't need the fancier operations, there are lighter image manipulation gems out there that do just the basics, but without leaking. e.g. ImageScience http://seattlerb.rubyforge.org/ImageScience.html




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: