
Nginx: a caching, thumbnailing, reverse proxying image server - coleifer
http://charlesleifer.com/blog/nginx-a-caching-thumbnailing-reverse-proxying-image-server-/
======
mrb
His setup is vulnerable to hash length extension attacks:

    
    
      secure_link_md5 "my awesome secret $uri";
    

This means anyone can extend the URI (eg. adding "/../../../some/other/file"
at the end of the filename) and compute a valid key, and force the server to
access an arbitrary image file on the backend. If the backend is a private
server not publicly accessible, then this may be a security issue.

This is why in the ngx_http_secure_link_module's documentation the secret is
appended (instead of prepended), see
[http://nginx.org/en/docs/http/ngx_http_secure_link_module.ht...](http://nginx.org/en/docs/http/ngx_http_secure_link_module.html):

    
    
      secure_link_md5 "$secure_link_expires$uri$remote_addr secret";
    

The Nginx doc is also at fault because it fails to explain why it is important
to append the secret and not prepend it. I just emailed security-
alert@nginx.org to let them know.

~~~
nly
Length extension attacks work for both append _and_ prepend. I don't know if
you're right, or whether or not either of them are vulnerable, but the correct
solution to this problem is to use the HMAC[0] construction.

[0] [https://en.wikipedia.org/wiki/Hash-
based_message_authenticat...](https://en.wikipedia.org/wiki/Hash-
based_message_authentication_code)

~~~
mrb
Yes some other attacks can be performed if the secret is appended (we don't
call them "length extensions attacks"), however they do not apply in this
guy's case because the attacker does not control the URI that is hashed.

But yes, when in doubt, and to future-proof your code, it is good practice to
use an HMAC anyway. Or use SHA-3 which is not vulnerable to any attack when
prepending/appending a secret.

------
merlincorey
This is a cool little project showcasing some of NginX's lesser known modules
in action.

However, I have one nitpick.

The caching server's main location is a bit of a cargo-cult unoptimization.
They have specified:

    
    
        location ^/(.+)$ {
    

When the equivalent:

    
    
        location / {
    

Is shorter, clearer, and does not incur a regular expression capture on every
request that is never used.

~~~
reipahb
The difference is that the last one also matches requests to "/", while the
regex ensures that there is at least a single character after the "/".

That allows "/" to be used for other purposes, e.g. redirecting to the main
page of the site, or displaying a nice error page. (However, in this case it
simply shows the default Debian Nginx installation page:
[http://m.charlesleifer.com/](http://m.charlesleifer.com/) )

~~~
merlincorey
For that use case, I would recommend two locations like so:

    
    
        location = / {
          # Just whatever happens for "/" requests goes here
        }
        
        location / {
          # Every other requests ends up here
        }

------
markdown
One of the great benefits of Google Appengine and Google Cloud Storage is that
you get this for free. Merely changing the size param at the end of the url
resizes the pic. And since this all happens on the edge cache, it never
touches your app, and you don't get charged for bandwidth.

[http://lh4.ggpht.com/sTP-3BqnirkHm40qfb496w85A1bf7BpeXthFJ92...](http://lh4.ggpht.com/sTP-3BqnirkHm40qfb496w85A1bf7BpeXthFJ924I3ktYZoP5pD3diZqjsqob1ZXNfsw3kHe8lULgQaeN7AB9o04Ow=s200)

[http://lh4.ggpht.com/sTP-3BqnirkHm40qfb496w85A1bf7BpeXthFJ92...](http://lh4.ggpht.com/sTP-3BqnirkHm40qfb496w85A1bf7BpeXthFJ924I3ktYZoP5pD3diZqjsqob1ZXNfsw3kHe8lULgQaeN7AB9o04Ow=s960)

------
ck2
I found the nginx native image library completely inadequate a few years ago,
so I had to use the perl module to connect it to imagemagick libraries
instead. Much higher quality results.

Not sure if they have improved the native library but I doubt it has the
flexibility.

The important part is caching the results because of the expensive cpu time.

ps. googling about the current state of this came up with an interesting nginx
module which talks to imagemagick (or gd) directly
[https://github.com/cubicdaiya/ngx_small_light](https://github.com/cubicdaiya/ngx_small_light)

~~~
quicksilver03
I second this, for a recent e-commerce project we started using nginx's
image_filter but the quality of the resized images was unacceptable.

We ended up using Thumbor, which was able to give us much better looking
images (but at the cost of being extremely difficult to deploy on the CentOS
6.x servers we had at that time).

------
jhgg
We use a similar set-up at work, but we don't do the image resizing in nginx,
rather we have a python backend that it proxies up to, that runs a tiny wsgi
app that uses requests & pil to do the resizing and transforming. It uses
proxy_cache_lock to dedupe requests to the backend.

~~~
coleifer
If all you need is crop or resize, might as well use nginx in my opinion.

~~~
jhgg
There was one annoyance we have to fix that I don't think the nginx module
supports & that's rotating images based on their exif orientation (mobile
devices love to upload landscape images w/ an exif-orientation in portrait).

Python is definitely an OK solution for this as it turns out. We resize a few
million of images a day on-demand on two n1-highcpu-8 GCE instances. Although,
they could easily be a fourth the size, as CPU generally peaks out at 20-25%
during peak hours.

We've actually tried more specialized services for this, like sharp + a
threadpool using node.js and it actually performed terribly. As it turns out,
blocking the gunicorn event loop by processing image resizes created a good
amount of back-pressure to load balance incoming requests to each worker
process.

~~~
coleifer
That's a good point. The nginx module can rotate, but I don't know of any
built-in support for rotation with respect to exif. You could also use lua and
the lua imagemagick bindings and probably see much better performance though.

------
escobar
I'd love to see some performance benchmarks for Nginx's caching vs Varnish
(memory, reqs). Even if Nginx can do the same thing, if I need more RAM to do
it, or it handles less requests - I'd rather keep Varnish.

------
stephenr
A few years ago everyone was jumping on nginx saying it's the 2nd coming of
Jesus and that being a simple web server without dynamic config or modules to
do processing made it better than Apache because it's "fast".

How many of those same people are now going on about how nginx has finally got
_some_ support for loadable modules and built in processing like this?

~~~
true_religion
> How many of those same people are now going on about how nginx has finally
> got some support for loadable modules and built in processing like this?

I would doubt many of them are because early adopters tend to also be the kind
of people who don't mind compiling their own webserver for whatever reason:
time, patience, availability of dedicated hardware to do it on, etc.

~~~
stephenr
The sort of people I saw jumping to nginx were largely _not_ the sort of
people who would compile it themselves.

These are often people who would compare Apache + mod_php to nginx + PHP-fpm,
and not understand why it's not a realistic comparison of the web servers.

------
jarnix
We do something similar : when nginx doesn't find the file for a given format
(width and height are in the url), it's a 404 and we use the 404 handler to
proxy to a php app that does the resize/crop (and we have a list of the
formats we use more often in the app so we store the generated thumb on the
disk), the resize/crop operation is centered around coordinates that we
determine with opencv (it detects faces and/or focus points) when the user
uploads the picture.

We use varnish in front of nginx

------
skrebbel
If you use a CDN such as Cloudflare, you can do a similar trick but forget
about the proxy cache (and associated storage needs) - as long as every image
at every size gets a unique URL and you control http cache headers correctly,
Cloudflare will store copies of every image size ever requested across their
CDN network.

In terms of this blog post, you'll only need the "resizing server". This is a
very simple and stateless way to get image thumbnailing to work. I'd also
personally not bother with all the API key stuff - how 'malicious' can it be
to generate a thumbnail of a public image?

This will even work with _3rd party_ images (the case of the app I used this
for, it was product icons from Apple's App Store). If you somehow want to use
images hosted elsewhere inside your app, but worry about speed or about
hitting their servers too much, proxy them with Nginx and let Cloudflare
handle the rest.

~~~
rgbrenner
That's not quite true. Cloudflare (like virtually every CDN) only caches files
at the edge server that handled the request. And they purge the files after a
short period of time--even if your cache-control headers ask for a longer
time. Most won't tell you how short (including Cloudflare), but some CDNs will
delete files after as little as two hours.. 24 hours is very common. Most CDNs
also give you no visibility about what is cached and where.

In Cloudflare's case, they operate 76 data centers. So every "short period of
time" x up to 76, your server will need to regenerate each image.

So while a CDN is useful, a proxy cache can still be helpful.

(None of this applies to NuevoCloud though.. NuevoCloud has a global cache,
with dedicated caches for each customer. So it's entirely possible to keep an
image at the edge indefinitely)

~~~
true_religion
I'm curious. $99/TB seems like an incredibly high price to charge for a CDN
that only has 10 points of presence globally.

Maybe this is typical with 'website accelerators' as opposed to CDNs that
focus on caching truly static content, but even so features like dynamic
acceleration and global caching are par for the course with any traditional
CDN.

Cloudflare, which I use mind you, is non-traditional and their business model
seems more suited to DDoS prevention and edge SSL than anything else.

So what sets NeuvoCloud apart?

~~~
rgbrenner
Dynamic acceleration at Cloudflare (Railgun) is $200/website. Origin shield is
$200/website at MaxCDN (could be wrong, but I don't think Cloudflare has an
origin shield equivalent). Both are included with NuevoCloud, and you can use
them on multiple websites.

Global Cache does not exist at any CDN that I'm aware of. We spent a lot of
time developing this. It's true other CDNs will cache files at each POP, when
that POP sees a request for it... but that isn't what we mean here. (I
understand the confusion.. I get a lot of questions about this, and we should
probably rename it.)

Global Cache means we have a single cache that is used/managed globally. Let's
say a visitor in France gets a cache miss for a new file on your website.
Every PoP at NuevoCloud now knows of that file and has it cached. So if the
next visitor to your website is in Tokyo, they'll get a cache hit and the file
will be served from the Tokyo PoP, even though it's the first time the Tokyo
PoP has seen a request for that file. A cache hit at a traditional CDN means
that PoP has the file; a cache hit at NuevoCloud means every PoP in the world
has that file.

The cache is managed for each customer individually with guaranteed space at
each PoP. Which means, even if you have a small website that receives no
traffic.. you can still keep your files at the edge. The only other way to get
this is to pay for a dedicated CDN.

Finally traditional CDNs send requests from the edge directly to your server
(client > edge node > your server) which is a long distance connection.
Connections through NuevoCloud are routed through our network: client > edge
node > edge node > server. So both the client and server are talking to an
edge node near them. This speeds up SSL and connection negotiation between the
edge and your server. We also use this setup to dynamically route requests
around high latency, network partitions, etc.

I put our 10 points against other CDNs daily... and it's faster. You would be
surprised at how dumb the average CDN is when handling requests. This is the
reason they scatter edge nodes all over the place.

~~~
skrebbel
Hey, thanks for the detailed comment spam. I might become a customer in the
foreseeable future!

------
frik
I was wondering if an additional cookie based security token is possible with
secure_link module. (show image only if session cookie is present)

Apparently, it is possible, great! One of the google results:
[https://gist.github.com/hilbix/5921589](https://gist.github.com/hilbix/5921589)

~~~
zzzcpan
Secure_link module shouldn't be used for anything security related. I think it
uses plain md5 and without even an hmac.

------
leafo
I wrote a similar article a few years back that employs Lua and imagemagick
bindings:
[http://leafo.net/posts/creating_an_image_server.html](http://leafo.net/posts/creating_an_image_server.html)

~~~
coleifer
Yeah, I'm pretty sure I linked to your post. I was going to use your script
before I realized I could do everything with plain old nginx.

------
m_mueller
Could someone here explain to me how it's possible that a bug like [1]
concerning the main functionality of nginx (static file routing) is open since
many years, and yet nginx keeps its popularity? Is that really the best we've
got?

[1]
[https://trac.nginx.org/nginx/ticket/97](https://trac.nginx.org/nginx/ticket/97)

~~~
DevOpsTiger
[https://trac.nginx.org/nginx/changeset/2797b4347a2af4e8fd46d...](https://trac.nginx.org/nginx/changeset/2797b4347a2af4e8fd46dc5d1def5083813f38f1/nginx)

~~~
m_mueller
So, do I get this right - the bug has been fixed since mid 2015 but the ticket
wasn't updated?

------
mozumder
Does the image_filter module use the GPU?

Image filtering is an expensive operation for the CPU, with large latencies as
well.

~~~
vardump
Actually GPU latencies and transfer costs dominate most _simple_ image
filtering operations. A GPU might be fast _once_ the data is in GPU local RAM,
but if just transferring the data back and forth takes 3x the time running the
filter locally, what's the point?

You're better off performing it on a CPU, of course SIMD optimized.

Besides, web servers probably won't have GPUs anyways.

~~~
mozumder
Intel is building entire Xeon lines with embedded GPUs for serving web
imagery.

And these Intel Xeon's with embedded GPUs don't need to transfer data to the
coprocessor, since they operate off of system memory, so transfer latencies
are a non-issue.

------
hcarvalhoalves
Similar idea but more flexible image manipulation:
[https://github.com/hcarvalhoalves/django-rest-
thumbnails](https://github.com/hcarvalhoalves/django-rest-thumbnails)

------
jemfinch
I wish the author had gone into more detail about why they chose to stop using
Varnish.

~~~
coleifer
I stopped using varnish because, after all, my site is just a blog and Nginx
is just fine for my needs. It has a single backend server, so there's no load-
balancing. Basically, I was already using Nginx to serve static files, and my
caching logic was so simple it was actually easier to move it into Nginx.

~~~
dwightgunning
Perhaps better a better question is why you were using Varnish in the first
place?

I'm genuinely curious because I've never stood up Varnish before so maybe
there's an anti-pattern / gotcha to learn from.

~~~
frankwiles
Well without some sort of caching, his site very well could fall over when
being on the front page of something like HN. Putting Varnish in front a site,
even with something simple like a 1 minute cache on everything, makes you
pretty much immune from having "large amounts of traffic" being a real
problem.

~~~
coleifer
Exactly, thanks frank! :P

~~~
dwightgunning
By removing varnish aren't you now exposed to that risk once again?

I feel like I am missing something.

~~~
mfjordvald
He would have been if it hadn't replaced the Varnish functionality with nginx.

He's basically just moving functionality from Varnish to nginx and thus
simplifying his stack a bit.

------
leesalminen
Very cool! I've been playing around with nginx-lua to handle our shortened
URLs without having to invoke the upstream application.

------
wildmXranat
Clever and thanks for sharing. It was a good read.

