Hacker News new | past | comments | ask | show | jobs | submit login

Heads up a simple yet production ready NGINX location block to proxy to a public s3 bucket looks like:

    # matches /s3/*
    location ~* /s3/(.+)$ {
        set $s3_host 's3-us-west-2.amazonaws.com';
        set $s3_bucket 'somebucketname'

        proxy_http_version 1.1;
        proxy_ssl_verify on;
        proxy_ssl_session_reuse on;
        proxy_set_header Connection '';
        proxy_set_header Host $s3_host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Authorization '';
        proxy_hide_header x-amz-id-2;
        proxy_hide_header x-amz-request-id;
        proxy_buffering on;
        proxy_intercept_errors on;
        resolver_timeout 10s;
        proxy_pass https://$s3_host/$s3_bucket/$1;
Adding NGINX caching on-top of this is pretty trivial.

Also, heads up, in the directive proxy_cache_path, they should consider enabling "use_temp_path". This directive instructs NGINX to write them to the same directories where they will be cached. We recommend that you set this parameter to off to avoid unnecessary copying of data between file systems. use_temp_path was introduced in NGINX version 1.7.10 and NGINX Plus R6.

Also, they should enable "proxy_cache_revalidate". This saves on bandwidth, because the server sends the full item only if it has been modified since the time recorded in the Last-Modified header.

    proxy_cache_revalidate on;


This is vulnerable to path expansion attacks. If someone passes a URL such as your site/s3/..EVIL_BUCKET/EVIL.js all of a sudden your site is serving someone else's content. Bad idea. Use virtual host style buckets instead, i.e S3_bucket.S3host/content.

In our case, we don't need to ever revalidate. We store things forever since our file blobs are immutable.

Immutable blobs are really the right choice with s3, as it's eventually-consistent (when using it as a blob store anyway. If you're hosting a static site or similar it's a bit tricky to immutableize and not necessarily worth the effort).

We even go a step further, and our blobs are 100% content addressable. :) So caching is super easy for us.

Yep, file shas are a great choice. UUIDs are typically fine too.

One sort of weird case is if I have an image key (sha-based) and want to store thumbnail sizes: 'bae6ff187e4c491e5de9cfa3b039ce7da8255798' makes sense as a base key, but really I want bae6ff187e4c491e5de9cfa3b039ce7da8255798/400x400 for thumbnails rather than storing individual thumbnail shas, hah.

or.. use cloudfront. It will probably be much cheaper to use cloudfront than the instance scaling required as your traffic increases.

Is CloudFront expected to have better uptime than s3?

One argument for self hosting the proxy is that I don't care if s3 is working when my server is down anyway.

CloudFront, as with most CDNs, has very good uptime.

Remove the C in CAP and you can go far.

Good config, but if you're not defining the proxy in an upstream {} block, you can't make use of the keepalive parameter, which keeps a number of connections to the backend alive at any time, reducing the RTT for an actual request.

This is bad for stuff like this because nginx doesn't re-resolve DNS records after process startup. So if an IP address behind the hostname changes, things will just hard stop working. Using it explicitly as a variable coerces nginx into actually resolving DNS regularly to pick up changes like a normal client.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact