
CDN vs S3 - mclarke
http://jdorfman.posthaven.com/medium-bitcoin-660x493-dot-jpg-cdn-vs-s3
======
jere
For everyone saying this is obvious: I'm one of those idiots that didn't quite
get it and I appreciate this post a lot.

For the past week, I've been working on writing a little static site generator
and putting it on S3 and I thought I was pretty damn clever for finally
getting around to doing that... except now it turns out I'm clueless yet
again. I'm looking at Cloudfront now, but I'm still not sure if has all the
features that I was expected out of S3 alone (someone already mentioned Route
53 integration).

~~~
alinajaf
I don't think you're an idiot, it's more likely that you were just unaware of
cloudfront or perhaps CDNs in general.

S3 is for storing files, Cloudfront is for serving cached versions of them
really quickly out of edge locations (i.e. the closest CDN datacentre to the
user that requests it).

In your usecase, your best bet is to use them in combination. Set up a
Cloudfront distribution to point to your S3 bucket, then setup DNS to point to
your Cloudfront distribution.

One big point to note is that you need to either:

* Configure Cloudfront to expire objects after a TTL (time-to-live) that is reasonable to you (e.g. 1 hour, 1 day etc). You can do this from the Cloudfront 'new distribution' wizard.

OR

* Let Cloudfront respect HTTP headers and then make S3 (or whatever your custom origin is) set cache-control headers that make sense for how often you update your site. Not sure if/how you can do this with S3, with a custom origin its your app so you can set whatever http headers you like.

To be clear: if you don't do this, I'm pretty sure cloudfront caches things
forever, or at least a very long time.

Personally, I think triggering cache invalidations should only be for
emergencies (e.g. someone has uploaded questionable content to serve to other
users and it's cached in at and edge). Rather than screwing around with that,
save yourself some headaches: pick a sensible TTL and wait a little longer to
have things up to date at your edges.

Note that by using Cloudfront in this manner, you get the performance benefit
of serving static files. If performance at the expense of convenience was your
main reason for going with static site generation, you might want to rethink
that decision (there are other perfectly good reasons for wanting to use a
static site generator, security being my favourite).

Do feel free to ping me over email if you have any questions on the above.

~~~
jere
>If performance at the expense of convenience was your main reason for going
with static site generation, you might want to rethink that decision (there
are other perfectly good reasons for wanting to use a static site generator,
security being my favourite).

Great comment. I didn't get this part though. What would be a better
alternative?

Performance _is_ one of the main reasons I considered this (uptime is
another). Let's just say I've had the same shared hosting for over 5 years and
the speed/uptime have been a disappointment for a long time. When I was
working on a site and noticed a 500kb background image was taking 2 seconds to
load and around the same time I saw that the spotify homepage was streaming a
fullscreen video instantly, that was kind of the last straw.

So I thought the idea was that skipping dynamic generation, using distribution
(well, I think my assumption was S3 _did_ have edge locations), and just
having a better host was a big win.

~~~
alinajaf
> I didn't get this part though. What would be a better alternative?

To clarify: I'm not saying that static assets behind an edge cache is not
performant, just saying that a dynamically generated site behind an edge cache
is effectively the same performance wise. It's probably not worth the
sacrifice in convenience _if_ performance was your main reason for going the
static generated route.

I can't speak to the argument from uptime as I haven't used shared hosting in
a while. Using an edge cache (without S3) might give you a little help there,
as it only needs to hit your shared hosting on cache expiry, but that
obviously won't be as safe as statically generating the files and making CF
read them out of S3.

I think S3 behind CF is a perfectly good approach. I was just saying that if
you've currently got a dynamically generated site and are considering moving
to static generation because of performance alone, the trade-off probably
isn't worth the effort.

I wouldn't advise you personally to go back on that decision at all,
especially because of the issues you've seen with uptime.

> I think my assumption was S3 did have edge locations

I think you get this from the rest of my comments but just to be clear: I've
only ever experienced bad download times from S3 and would not feel
comfortable recommending that you use it to serve traffic directly from the
internet. It's not what S3 is for and so you shouldn't expect good performance
from it in that use case.

------
Icer5k
True, S3 is not a CDN, but for a lot of use cases, serving directly from S3 is
fine.

Taking Vine for example, I picked a random vine from twitter
(<https://vine.co/v/bE3YI365gxd>) and a popular vine from twitter
(<https://vine.co/v/bEFFxdwjK9x>). The popular video and thumbnails are served
from a CDN, where the new one with no traffic is served directly from S3.

YouTube did the same thing in it's early days. CDNs are not required for all
traffic, and blindly recommending them is a bad prescient. They're really
great for content that is frequently accessed, but their value greatly
decreases on long-tail content.

~~~
obviouslygreen
When S3 first came out, people used it for this without considering its
latency issues, and it was a disaster. S3 _is not a CDN_ and shouldn't be used
as one; this is the reason for CloudFront.

While I can't compare CloudFront to other CDN's, I do know it works well for
my clients that have been using it (and certainly better than serving directly
from S3).

------
jcastro
Since this thread will likely turn into people asking about Cloudfront
performance, does anyone have any real-world experience with CloudFront vs.
Rackspace with Akamai CDN?

On paper the Rackspace one looks like a great performance/price alternative.

~~~
jcampbell1
I am in the process of abandoning Cloudfront, because they have a serious bug
when serving video files. They serve HTTP 206 (Range-Get) as HTTP 1.0 but 206
didn't exist in HTTP 1.0. Chrome and Firefox treat this as "uncacheable", thus
media assets bypass the local media cache.

Depending on the nature of the content, Cloudfront is not usable for video,
particularly the kind where people seek around a bunch, like instructional
videos and tutorials. Also, people with slow connections expect the video to
buffer while paused. This doesn't happen if you serve the videos from
cloudfront.

~~~
jefe78
You're doing something horribly wrong. I work for a live streaming company and
we make extensive use of Varnish. It can probably solve the problem you're
describing.

~~~
jcampbell1
I'm not doing a damn thing wrong other than using Cloudfront. The problem is
on their end, not mine. Thinking Varnish could solve this problem is utterly
confused. Do you know what CDN does? CDNs have servers located around the
world so files are loaded quickly and with low latency.

Furthermore, your profile suggest you work for a pump and dump penny stock
company (basically a scam). If your employer is paying you in something other
than cash, you need to walk away asap.

~~~
ballard
Sounds like troll bait, recall the recent article about PG's modding
algorithms. Life's too short.

There's lots of video CDN "solutions", and it's almost always cheapest (even
after labor support) to DIY with bare metal at very large scale. If it were
me, I would eval video CDN shops using tsung test cases wired up as nagios
checks. Gotta make sure their stuff stays working __.

 __A payment gateway once mistakenly deployed API changes to production
without notice. Trust no one.

Anyone evaluated? <http://live.bittorrent.com>

~~~
photorized
_and it's almost always cheapest (even after labor support) to DIY with bare
metal at very large scale_

If you simply need to deliver files or live streams, without needing to
provide complex functionality at the edge (various kinds of protection, geo
blocking, or pay-per-minute), and your traffic patterns are predictable - it's
often cheaper to build your own solution. Once you start thinking about
backbone and colo redundancy, deploy in different countries with contract
commits - things get expensive very quickly.

The beauty of using a massive third party delivery service isn't performance,
it's elasticity. Just like with the web apps (frequently hosted on DIY
systems) that go down as soon as the link goes up on HN - being able to absorb
traffic spikes without failing (and without forcing you to commit to a higher
tier for a year) can be very valuable.

------
geoffhill
Amazon Web Services does offer a worldwide CDN, CloudFront.
<http://aws.amazon.com/cloudfront/>

~~~
philip1209
Cloudfront doesn't offer some services that I find necessary, such as nested
directory indices (e.g. example.com/folder/ instead of
example.com/folder/index.html) and it doesn't return a 404 header on missing
pages. I just emailed MaxCDN to see if they provide these.

~~~
Icer5k
Cloudfront as a CDN supports directory indexes and 404s, it's just S3 that
doesn't. If you point a CF distribution at your own server with directory
indexes enabled, CF will send those through to the user.

~~~
zwily
S3 supports both of those, via its "bucket as a website" feature.

------
davidandgoliath
I thought this was obvious? Amazon's Cloudfront on the other hand is a CDN and
works great :)

~~~
asb
If only Route53 allowed you to point the apex domain at Cloudfront (as I
understand it, it's currently S3 or ELB).

~~~
tzury
You may not want to do so as:

    
    
      1) POST (and so are PUT, DELETE, OPTIONS and CONNECT) 
         are not yet supported on CloudFront.
      2) HTTPS/SSL for your own domain is not yet supported 
         on CloudFront.

------
fs111
What is next? A blogpost saying "Hammers are terrible screwdrivers. Don't use
a hammer with a screw!"?

------
kmfrk

        $ curl -I http://phaven-prod.posthaven.netdna-cdn.com/uploads%2F2013-05-17%2F20%2F3128%2FErQE0vKlNMIeNvaxbneY75nWy
    
        HTTP/1.1 403 Forbidden
        Date: Sat, 18 May 2013 20:31:08 GMT
        Content-Type: application/xml
        Connection: keep-alive
        x-amz-request-id: 41706FB9149898AF
        x-amz-id-2: d5F1JMIBLaQzNG5A
    

Boo. :)

~~~
jdorfman
@kmfrk

PostHaven cut off the URI:

curl -I [http://phaven-prod.posthaven.netdna-
cdn.com/uploads%2F2013-0...](http://phaven-prod.posthaven.netdna-
cdn.com/uploads%2F2013-05-17%2F20%2F3128%2FErQE0vKlNMIeNvaxbneY75nWyy4%2Fs3ul27%2Fposthaven-
loves-maxcdn.png) HTTP/1.1 200 OK Date: Sat, 18 May 2013 20:34:13 GMT Content-
Type: binary/octet-stream Content-Length: 52958 Connection: keep-alive x-amz-
id-2: NO6o51/19JsQJN9YHc+T/sraZSGNT+f3R+1GWl2QL3aD4SubqazjbMURb4VYaZyS x-amz-
request-id: E640348D2D6EDA7B Last-Modified: Sat, 18 May 2013 00:47:10 GMT
ETag: "f95534e9752b560f4acdda20228f90ba" Server: NetDNA-cache/2.2 X-Cache: HIT
Accept-Ranges: bytes

------
jefe78
As the sysadmin to a company that does use both S3 and Cloudfront, I'm a
little shocked anyone would think to use S3 for distribution. A little testing
will reveal just how slow S3 can be.

------
molecule
obvious: Amazon product isn't that great @ a service that's optimally provided
by another Amazon product.

<http://aws.amazon.com/cloudfront/>

~~~
jdorfman
@molecule obvious to you and I. I wrote this to inform those who think it is a
good idea to use S3 as a CDN, that it isn't. If we can educate a few
developers then we (this awesome community of hackers) are making the web
faster.

~~~
molecule
> If we can educate a few developers then we (this awesome community of
> hackers) are making the web faster.

You're not making the web faster, you're shilling for your employer by
comparing their apples to a competitor's oranges and proclaiming "our
competitor's oranges make bad apple sauce!"

Failure to mention CloudFront is disingenuous.

~~~
jdorfman
CloudFront

------
iambibhas
Who said s3 is a CDN at the first place!?

~~~
obviouslygreen
Certainly not Amazon, or they wouldn't have provided a service that acts as a
CDN based on an S3 bucket.

------
alinajaf
> I think S3 is a great origin server for static assets

Pro-tip: If you're using Rails, just create a distribution with your app as
the origin server and in production.rb, set your asset host to your
distributions host. You get the asset cache without having to do the
precompile step. Tastes great with Heroku.

~~~
amalag
This is smart, I usually use the precompile with upload to S3 via asset_sync
gem. Since Heroku will precompile by default, what do you do to disable it.
Just turning on config.serve_static_assets = true is likely not enough.

~~~
alinajaf
> This is smart, I usually use the precompile with upload to S3 via asset_sync
> gem

I used to do the same until cloudfront rolled out custom origins.

> Since Heroku will precompile by default, what do you do to disable it.

I misspoke in my comment, what I meant was you can skip the synch with S3
step. I've never actually bothered to stop heroku precompiling assets (though
I may as well). This question on SO looks promising:

[http://stackoverflow.com/questions/8953360/preventing-
heroku...](http://stackoverflow.com/questions/8953360/preventing-heroku-from-
using-precompiled-assets-in-development-mode)

If you do that though, as you mentioned, you will definitely need to flip
serve_static_assets on.

------
MichaelApproved
I was using S3 to deliver secure signed downloads to customers. It worked well
enough for a long time but eventually customers started having major
connectivity issues and dead slow downloads.

I switched to CliudFront and, of course, downloads improved dramatically. I
had to go with CF because we needed signed downloads. Would be nice to have
alternatives but I'm happy with CF.

------
getdavidhiggins
<https://twitter.com/zeeg/status/297888975463542784>

------
chj
What do you think CloudFront is for?

------
hybrid11
Just put Cloudflare in front of it, it's free :)

~~~
xur17
This is what I ended up doing. I've been using Cloudfront for a while now, but
Cloudflare is free, and people seem to indicate it is just as fast if not
faster.

I have my static assets on a sub-domain, so I just set cloudflare to cache
everything on that subdomain (and left it off on everything else).

