Hacker News new | past | comments | ask | show | jobs | submit login
Nginx image processing server with OpenResty and Lua (leafo.net)
106 points by fcambus on Sept 20, 2013 | hide | past | favorite | 31 comments



I have done something similar at a little scale. At small load, it is great. The larger you get, real problems start to arise.

The bottleneck with this implementation is ImageMagick. ImageMagick leaks a lot of memory and is generally very inefficient with resizing operations. GraphicsMagick is not much better. Under high load, this will crush your CPUs and max out your Nginx threads much faster than it is worth. You will almost definitely need to use something like OpenCV on the GPU for this to scale.

Although caching is referenced briefly in this article, it is crucial for this system to work. A good CDN with fast invalidation and a low eviction rate would be ideal.


I have to disagree with that. We've been using graphicsmagick via cgo on memecrunch.com (alexa 30K, 8K in US) for 9 months and we're serving around 500K images per day on a single server without any issues. In fact, I've just logged in and the app is currently using 0.3% of the memory (out of 64GB) after running for a couple of weeks, since the last update I pushed to production. I don't believe graphicsmagick leaks memory in any significant way.

Of course, you need to cache the thumbnails for a reasonable time once they're generated, otherwise the CPU usage will skyrocket, but that's pretty easy to achieve with nginx's proxy_cache directive.


VIPS can be a great replacement for gd and magick; it's not as user friendly but it's ridiculously fast.

http://www.vips.ecs.soton.ac.uk/index.php?title=Libvips


With OpenCV you can also do face detection and feature recognition for smarter crops. Thumbor does that and is used in a large scale at globo.com: https://github.com/globocom/thumbor.


Figure 1 second CPU execution time; 12 cores (2x 6core CPU). You can handle 12x60 = 720 images per minute, or, more than 40,000 images per hour.


I've probably done this ten times in ten different ways over the years. Most recently with ~150 lines of Go via nginx proxy_pass (or fastcgi_pass). I needed to be able to control how to resize images while maintaining aspect ratio on a fixed size canvas. The first request for an image is disappointingly slow for larger images, but it's not terrible. Writing out the generated images to a cache directory and using try_files, so subsequent requests are static, is definitely key.


Go's image module is painfully slow, supports a limited set of formats and fails on lots of optimized images.

I also have implemented this in Go, but using cgo and graphicksmagicks, which is way faster and decodes almost any image you throw at it (there are some issues with very optimized GIFs, but I fallback to gifsoup for "deoptimizing" them in those cases). In fact, I even added a function for cropping and resizing an image to a given size in the module itself (since I think is a very common need), while keeping the aspect ratio and also giving the option to just center the result or grab the part of the image with higher entropy (e.g. suppose you have an image with a person in the side and then a lot of blue sky, you probably don't want to crop and image and end up with just sky in the thumbnail). This is just one of a few benchmarks I wrote:

BenchmarkResizePngMagick 20 80665091 ns/op 689 B/op 3 allocs/op

BenchmarkResizePngNative 1 9689016519 ns/op 351200 B/op 27 allocs/op

(yup, that's 120x faster)

The bindings are mostly documented, but I haven't gotten around to releasing the code yet, although I do hope do publish it soon. If you're interested, send me an email and I'll let you know when I put the code in Github.


Sent you an email. Thanks. I actually kind of like using the native Go libraries, because they're so simple I feel like I can actually trust them. But, you're right, they're really slow and strict in what they parse. I'll probably need to do something different in the future, so your code would certainly be interesting to check out.


Sent an email as well.


Did it with the perl module in Nginx with imagemagick.

Much more powerful than their built in image handling which uses the ancient and no longer updated GD library.


Oh and regarding "creating a system that deletes unused cached entries"

Just use tmpwatch in a cron job. You can use mtime or atime depending on your linux and capabilities to expire the resized images and delete them every so many hours (or days depending on your space).


Would you be willing to let us take a peek at that code? It could be interesting to compare and contrast to what Lua offers.


Ha, my code is ugly as Perl is not my primary coding language, would be embarrassing.

Could never get sendfile to work under perl through nginx either for some reason and they never answered my question on the nginx forums, so I had to just inefficiently dump the image directly to nginx in a buffered loop. The image is cached just like this lua code so the next read goes directly through nginx so wasn't too worried about the sendfile problem.

Code is just 50 lines though, if I can figure it out, most any coder should be able to. Just compile nginx with http://wiki.nginx.org/HttpPerlModule and then find any perl example code for imagemagick.


Haha! Ok, fair enough :)

Thanks for the details, though. That actually gave me a few ideas.


I highly recommend http://www.imgix.com/


I can recommend thumbor: https://github.com/globocom/thumbor


Since disk space is not much of problem this day, why don't we just pre-process the images? I know this only applies if you know in advance which sizes you want, but do you really need like 20 different sizes of an image?


Because when your design requirements change, you need to reprocess your entire corpus, among other reasons.


Since author is talking about security a bit, I'll add. Be careful, if there is a bug somewhere in imagemagick or it runs out of memory, it could easily take down an entire worker process and abort every connection there.


You have to be especially careful with Imagemagick because it calls through to format specific libraries (libjpeg, libpng, etc) and the implementation of THOSE libraries can have a huge impact on your application.

For instance, say you are generating thumbnails for JPEG images. Most Operating Systems ship with the IJG libjpeg, but a few have switched or are considering switching to libjpeg-turbo, a forked binary compatible library that has several performance enhancements. One thing libjpeg-turbo doesn't do though, is implement the DCT scaling functionality of libjpeg, which is a way of efficiently downscaling jpeg images without fully decoding the image (and of course has an impact on image quality as well). The most important benefit of using DCT scaling for generating thumbnails is that it has much lower memory overhead. Since you don't need to decompress the entire image first, it can be done block by block, which means full-image sized buffers don't need to be allocated (which is what Imagemagick will try to do by default). Generating a small thumbnail of a large (10,000 x 20,000 pixel) image will allocate large amounts of memory, whereas using the DCT scaling option will allocate only small working buffers and complete much faster. If you're running an image processing server, these considerations are vital.

Long story short, if you stop thinking about your code at the level of the Imagemagick API (or whatever graphics library you choose), you can end up with more problems than you might realize.


Agreed.


Another concern is parsing user supplied images, which opens you up to security problems in your image library.


Yes, this too. So, to sum up: it should only run isolated from everything else and be used as a service.


Has anybody done something like this with support for uploading and resizing images via a REST API with OpenResty. Do you know of any good examples or tutorials for that or upload support in general?


for true hipster web scale power, one must use node.js https://github.com/saml/nodejs-resize-image


Actually, Openresty is both faster and more obscure.


...or you could just use Pixtulate (http://www.pixtulate.com)


Looks like free. Really ?


Yes, we'll be providing a free beta coming up soon. If you are interested, I'd love to reach out to you when we launch the beta.


Thanks for posting this. Do those Lua scripts run concurrently in any way?


To enable concurrency you'll have to spawn multiple workers using the worker_processes directive http://wiki.nginx.org/CoreModule#worker_processes




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: