S3 bucket replication is a bit flawed, and doesn't buy us anything on top of what we implemented. And would cost double for storage. Plus more complex application logic which I wanted to avoid. Tools like haproxy are pretty good at this.
With S3 replication, you still have a primary/replica setup in which only one of them can accept writes, but you can accept reads from both. So we'd gain HA between multiple regions, but we wouldn't solve our original goals: speed. The round trip to S3 was too slow for us.
Very cool. I did read the article but I missed that speed was important, so the custom solution was an excellent choice.
Another idea might be to use Varnish for the caching layer, but I haven't compared Varnish to NginX in many years so the gap has probably been closed now?
Good work. I've stuffed this one in the back pocket for future use.
I have tons of experience with Varnish and a long history there. Varnish is really bad for this since it's memory only. I wanted to use 750GB of disk space, not the 32GB of RAM we had. 750GB of RAM is significantly more expensive.
And in our case, performance between reading some bytes from disk vs memory isn't significant. A disk seek is still many many orders of magnitude faster than a round trip to Amazon.
With that said, Varnish does offer the ability to use mmaped files, but the performance is really appalling out of the box, and just not worth it. Varnish is way better if you want strictly in memory cache.
Another benefit of nginx is the cache won't be dumped if the process restarts, unlike Varnish.
I'll have to look into this. I wasn't aware. Either way, I don't think that'd replace our current setup since the original intent wasn't to increase our availability. But good to know!
With S3 replication, you still have a primary/replica setup in which only one of them can accept writes, but you can accept reads from both. So we'd gain HA between multiple regions, but we wouldn't solve our original goals: speed. The round trip to S3 was too slow for us.