For the past week, I've been working on writing a little static site generator and putting it on S3 and I thought I was pretty damn clever for finally getting around to doing that... except now it turns out I'm clueless yet again. I'm looking at Cloudfront now, but I'm still not sure if has all the features that I was expected out of S3 alone (someone already mentioned Route 53 integration).
S3 is for storing files, Cloudfront is for serving cached versions of them really quickly out of edge locations (i.e. the closest CDN datacentre to the user that requests it).
In your usecase, your best bet is to use them in combination. Set up a Cloudfront distribution to point to your S3 bucket, then setup DNS to point to your Cloudfront distribution.
One big point to note is that you need to either:
* Configure Cloudfront to expire objects after a TTL (time-to-live) that is reasonable to you (e.g. 1 hour, 1 day etc). You can do this from the Cloudfront 'new distribution' wizard.
* Let Cloudfront respect HTTP headers and then make S3 (or whatever your custom origin is) set cache-control headers that make sense for how often you update your site. Not sure if/how you can do this with S3, with a custom origin its your app so you can set whatever http headers you like.
To be clear: if you don't do this, I'm pretty sure cloudfront caches things forever, or at least a very long time.
Personally, I think triggering cache invalidations should only be for emergencies (e.g. someone has uploaded questionable content to serve to other users and it's cached in at and edge). Rather than screwing around with that, save yourself some headaches: pick a sensible TTL and wait a little longer to have things up to date at your edges.
Note that by using Cloudfront in this manner, you get the performance benefit of serving static files. If performance at the expense of convenience was your main reason for going with static site generation, you might want to rethink that decision (there are other perfectly good reasons for wanting to use a static site generator, security being my favourite).
Do feel free to ping me over email if you have any questions on the above.
Great comment. I didn't get this part though. What would be a better alternative?
Performance is one of the main reasons I considered this (uptime is another). Let's just say I've had the same shared hosting for over 5 years and the speed/uptime have been a disappointment for a long time. When I was working on a site and noticed a 500kb background image was taking 2 seconds to load and around the same time I saw that the spotify homepage was streaming a fullscreen video instantly, that was kind of the last straw.
So I thought the idea was that skipping dynamic generation, using distribution (well, I think my assumption was S3 did have edge locations), and just having a better host was a big win.
To clarify: I'm not saying that static assets behind an edge cache is not performant, just saying that a dynamically generated site behind an edge cache is effectively the same performance wise. It's probably not worth the sacrifice in convenience if performance was your main reason for going the static generated route.
I can't speak to the argument from uptime as I haven't used shared hosting in a while. Using an edge cache (without S3) might give you a little help there, as it only needs to hit your shared hosting on cache expiry, but that obviously won't be as safe as statically generating the files and making CF read them out of S3.
I think S3 behind CF is a perfectly good approach. I was just saying that if you've currently got a dynamically generated site and are considering moving to static generation because of performance alone, the trade-off probably isn't worth the effort.
I wouldn't advise you personally to go back on that decision at all, especially because of the issues you've seen with uptime.
> I think my assumption was S3 did have edge locations
I think you get this from the rest of my comments but just to be clear: I've only ever experienced bad download times from S3 and would not feel comfortable recommending that you use it to serve traffic directly from the internet. It's not what S3 is for and so you shouldn't expect good performance from it in that use case.
I don't think that's a fair comparison - it seems unlikely to me that the video requires much data to get started, or a sustained 250kB/s (about 2 megabits/s) connection to play.
In that case, you get the benefit of both worlds.
Taking Vine for example, I picked a random vine from twitter (https://vine.co/v/bE3YI365gxd) and a popular vine from twitter (https://vine.co/v/bEFFxdwjK9x). The popular video and thumbnails are served from a CDN, where the new one with no traffic is served directly from S3.
YouTube did the same thing in it's early days. CDNs are not required for all traffic, and blindly recommending them is a bad prescient. They're really great for content that is frequently accessed, but their value greatly decreases on long-tail content.
While I can't compare CloudFront to other CDN's, I do know it works well for my clients that have been using it (and certainly better than serving directly from S3).
On paper the Rackspace one looks like a great performance/price alternative.
Depending on the nature of the content, Cloudfront is not usable for video, particularly the kind where people seek around a bunch, like instructional videos and tutorials. Also, people with slow connections expect the video to buffer while paused. This doesn't happen if you serve the videos from cloudfront.
If that is the only problem, flipping one bit in every response seems like a really simple solution. Why hasn't Cloudfront fixed it yet, do they know about this?
- - -
While we are aware of the issue with range request HTTP/1.0 206 responses and Chrome, we cannot provide an ETA for a fix. Since this issue is specific to range requests, an immediate workaround is to disable range requests on your origin server if this is possible for your use case.
It is also worth mentioning that multiple web proxy and cache application vendors have using HTTP/1.0 as a de facto standard for many years, so you will probably sporadically get similar reports from your end users using Chrome, but not other browsers such as Firefox or Safari. For example, here is a discussion between a Chrome developer on the mailing list for the popular Squid web cache about a similar report:
I am not saying that always returning HTTP/1.0 will stick around forever, but it is fairly common in real world situations today.
To see this in action, play a video with chrome://media-internals/ open.
A video served from S3 will get saved to the media cache, and the full video can buffer when paused. A video served from cloudfront will only buffer a few seconds of video.
It is probably just easier to see for yourself:
Here is a video served from S3. Pause the video in chrome, and you will see the whole video gets buffered:
Here is the same video served from Cloudfront. Start the video, and pause it, and notice how it doesn't buffer more than a few seconds:
Do you know of any workarounds client-side?
I don't think there is one. You can't even compile a custom Chromium because it doesn't have h264 codecs. I even spend a day reading the Chrome source code, hoping there was some combination of headers that could trick Chrome into using the media cache, but I didn't find anything. I probably should have just spent the day finding a better CDN.
I'm not a big fan of Akamai's control panel -- setting up new distributions is way to complicated (albeit more flexible), and configuration changes were taking almost a day to propogate.
Amazon promised a fix, but then back tracked. Their explanation was something along the lines of "an old version of Squid has the same problem, thus it is okay". I cannot comprehend the logic system they employ to think that is a good reason to live with the bug.
/co-founder at Advection.NET
CDN that support Range Requests correctly. I want to use S3 as the origin. I need an edge location in Australia. Also, there can be no 301/302 HTTP redirects, thus I have eliminated Google as a potential offering. My current spend is around $1000 a month, so it is a tiny account.
malatortsev at advection dot net
Furthermore, your profile suggest you work for a pump and dump penny stock company (basically a scam). If your employer is paying you in something other than cash, you need to walk away asap.
There's lots of video CDN "solutions", and it's almost always cheapest (even after labor support) to DIY with bare metal at very large scale. If it were me, I would eval video CDN shops using tsung test cases wired up as nagios checks. Gotta make sure their stuff stays working.
A payment gateway once mistakenly deployed API changes to production without notice. Trust no one.
Anyone evaluated? http://live.bittorrent.com
If you simply need to deliver files or live streams, without needing to provide complex functionality at the edge (various kinds of protection, geo blocking, or pay-per-minute), and your traffic patterns are predictable - it's often cheaper to build your own solution. Once you start thinking about backbone and colo redundancy, deploy in different countries with contract commits - things get expensive very quickly.
The beauty of using a massive third party delivery service isn't performance, it's elasticity. Just like with the web apps (frequently hosted on DIY systems) that go down as soon as the link goes up on HN - being able to absorb traffic spikes without failing (and without forcing you to commit to a higher tier for a year) can be very valuable.
I sign my own paycheck, so the notion of whether he is overpaying is a bit confusing.
Edit: Also, I don't appreciate you posting that. It's completely off-topic. Keep it classy.
You are involved in a stock fraud. The company you work for is a sham.
If you live in the US, then you have a plausible defense that you have no understanding of the underlying business. In this case, you likely can't afford the lawyer to present this case.
If you don't live in the US, then be careful. Imagine, ten years down the road, you are a successful engineer, and want to take your family to Disney world. Unfortunately, there is an outstanding bench warrant for your arrest, and rather than a nice family vacation that your wife wanted, you end up in a US prison.
Setup enough identical boxes with each of Squid, Nginx, Varnish, trafficserver, etc. and evaluate each with basically the same traffic and however much tweaking.
Simple caching works for images, but doesn't work for large video files, for example (look at latest financials from public CDNs - they are all bleeding cash).
It's really not that simple as testing a box to see which setup works best.
"Rackspace uses 213 of Akamai's edge locations, selected especially for our customers' typical usage patterns, and designed to cover all major areas of the globe." http://www.rackspace.com/cloud/files/
Akamai has approximately 5 gazillion edge locations, so yes, it is a cut down version. It is still a lot more POPs than just about every other CDN, though this doesn't necessarily translate into performance.
"though this doesn't necessarily translate into performance" - Yes, even when the files were already cached, I wasn't always satisfied with the performance.
A detail that bugged me, by the way, was the high number of CNAME requests, although this should at worst affect the first view.
1) POST (and so are PUT, DELETE, OPTIONS and CONNECT)
are not yet supported on CloudFront.
2) HTTPS/SSL for your own domain is not yet supported
$ curl -I http://phaven-prod.posthaven.netdna-cdn.com/uploads%2F2013-05-17%2F20%2F3128%2FErQE0vKlNMIeNvaxbneY75nWy
HTTP/1.1 403 Forbidden
Date: Sat, 18 May 2013 20:31:08 GMT
PostHaven cut off the URI:
curl -I http://phaven-prod.posthaven.netdna-cdn.com/uploads%2F2013-0...
HTTP/1.1 200 OK
Date: Sat, 18 May 2013 20:34:13 GMT
Last-Modified: Sat, 18 May 2013 00:47:10 GMT
You're not making the web faster, you're shilling for your employer by comparing their apples to a competitor's oranges and proclaiming "our competitor's oranges make bad apple sauce!"
Failure to mention CloudFront is disingenuous.
To be honest, your post feels like spam for MaxCDN.
Pro-tip: If you're using Rails, just create a distribution with your app as the origin server and in production.rb, set your asset host to your distributions host. You get the asset cache without having to do the precompile step. Tastes great with Heroku.
I used to do the same until cloudfront rolled out custom origins.
> Since Heroku will precompile by default, what do you do to disable it.
I misspoke in my comment, what I meant was you can skip the synch with S3 step. I've never actually bothered to stop heroku precompiling assets (though I may as well). This question on SO looks promising:
If you do that though, as you mentioned, you will definitely need to flip serve_static_assets on.
I switched to CliudFront and, of course, downloads improved dramatically. I had to go with CF because we needed signed downloads. Would be nice to have alternatives but I'm happy with CF.
I have my static assets on a sub-domain, so I just set cloudflare to cache everything on that subdomain (and left it off on everything else).