Hacker News new | comments | show | ask | jobs | submit login
No Server Required - Jekyll & Amazon S3 (allthingsdistributed.com)
137 points by werner on Aug 17, 2011 | hide | past | web | favorite | 35 comments

While S3 might be "extremely durable" from a technology standpoint, its unceremonious dumping of Wikileaks as a customer shows it to be politically fragile.

You might think of Wikileaks as "extreme," but this is an organization that was neither convicted nor even charged with breaking any laws, which Amazon dumped as a customer on very vague TOS grounds following pressure from Sen Joe Lieberman.

This could be an issue for reasons like...

-You make a Web app that Hollywood deems to somehow encourage or abet piracy

-You provide a service used by a customer deemed to be politically controversial

-You facilitate financial transactions deemed to be potential helpful to "terrorists" or the wrong sort of activists (e.g. Wikileaks).

Werner, since you submitted this entry from your personal blog, maybe you could clarify what safeguards Amazon has put in place to prevent a repeat of the Wikileaks situation. Many companies will stand behind a customer barring a court order, but for Amazon this clearly is not the case. How do you decide when to abandon a customer?

I think you nailed the TINY niche of things that amazon could take issue with and pull your site down. So now back to 99.9999999% of content producers on the internet: S3 is "extremely durable"

I've heard of other (large) providers pulling sites that contained "objectionable" content down. Wikileaks just had the media's attention at the time.

At least one customer had issues removing data without court order. So in order to satisfy your 99.9999999% estimation amazon should have at least 1B S3 customers which is way to optimistic.

I think your point is worth making.

The technology is good, but sometimes it's not about the technology.

Are those really issues for a static site? (Heck, would any of your examples be a static site in the first place?) And if a static site is dumped by Amazon S3, is that really so bad? You're a rsync and a DNS edit away from the site being back up - that's the beauty of a static site, it's just files in directories.

Are those really issues for a static site?

For most people, Wikileaks is a static site. They are unable to interact with it at all, ergo it could just be a static site.

Static vs Dynamic is orthogonal. You can still run afoul of the rules with a static site

I think the point is that a static site is trivial to move.

If you go to the bottom of the page at http://allthingsdistributed.com/, you'll see that the blog actually requires "Movable Type Pro".

AWS doesn't provide a way to serve from S3 without the help of a CNAME redirect, which means that you're out of luck if you want to use the Jekyll+S3 setup with a naked domain name (naked, as in no "www" or "blog" subdomain). And it also means that you're going to have to get some other server (Google Apps can do it), to redirect your *.domain.com queries to www.domain.com. And then your users' DNS is running all over the place, incurring, in my opinion, unneeded delay.

Actually that was my fault. I had just switched off the redirect of allthingsdistributed.com to www.allthingsdistributed.com and as such you ended up at the old MT installation. That is now corrected.

You are correct; to map to an S3 bucket you need a CNAME. But DNS doesn't allow the apex to be a CNAME so you will need to redirect that. Route53 solves that for EC2 with the help of ELB. But there is no such solution for S3 (yet).

I am using the www subdomain as much as possible, so the redirect only happens if a visitor actually types in the apex name, in all other cases they will get where they need to be directly. But I agree that it would be better to solve this at a different level.

Launch and then iterate...

Do you even really need S3 if you're just serving up static pages?

A $5/mo web host or a VPS slice would probably be overkill - you're not hitting a database at all.

S3 is less than $0.1 per GB per month. How big is your static pages before hitting the equivalent $5 cost?

If you want to get really exotic, you can check out the Haskell version of this tool. My site (http://dave.fayr.am) does the same thing using Hakyll and S3. You can see the code here: https://github.com/KirinDave/public-website

http://www.gwern.net is built on Hakyll as well; I'm currently hosting the static files on NFSN, but they're noticeably more expensive than Amazon S3 and I've been thinking of doing the same thing. What did you have to do to get S3 working?

Basically nothing. I drag-drop the produced site via S3 Browser.app. I need to fix that part of the workflow; it's a little awkward here.

I'm also using Hakyll for my site (http://www.wunki.org). Used Jekyll before, but I didn't think the markdown libraries were up to par with Pandoc. You can browse the source code of my site here: https://github.com/wunki/www.wunki.org.

I also threw together a script (./publish) that first gzips the static files and then uploads them to S3 with the correct headers (gzip and cache-control). Finally, it invalidates the old files on Cloudfront. Combined I get a very fast site, while keeping the cost low. Again, you can find it all in the github repository.

Except for the servers at Disqus which are running the comment system. And the servers serving up the static HTML pages.

Aha! That's how he did it. I couldn't figure out how commenting worked. He does have a comment count link at the top of the page. Is that fragment from Disqus also?

A simple inspection of the source reveals that it is also an embeded javascript file from Disqus.

Any open source alternatives to Disqus that I could host myself? I don't want to use Disqus - apart from wanting to "own" my comments, it is also incompatible with my browser configuration (blocking 3rd party cookies).

Coming from the CTO of Amazon...

It is always a pleasure to read from Werner (one of the few blogs I subscribe to). It a shame that not as many of his posts get to front page on HN.

Do you mean that in a "he's clearly just promoting S3" or a "woah, the CTO of Amazon has time to code neat tools" kind of way?

It doesn't sound like he coded all that much from the post. He even mentions that Cactus is a little bit too much work since there's not much of an existing community surrounding it. I was just surprised that it was the CTO of Amazon after I read the post.

S3 are kind of his "servers" no ;)

Actually I did do some coding for the conversion, etc. :-) But I like it when doing something new to be able to look at how other people solved similar problems. And it is a bit early for Cactus in that respect, and Liquid feels much simpler than Django templates.

The extension and plugin mechanisms will make it easier for me to start adding my own code without having to modify the core framework. But it is always more fun to add these kind of things if there is a community to give you feedback.

This is a pretty over-used subject for blog submissions here on HN, see previous ones at https://encrypted.google.com/search?q=site%3Anews.ycombinato...

To be fair, the submitter/author runs AWS, which makes this a little less typical to me.

I remember running this type of setup back in 2003 with text files and Blosxom. What's old is new again?

This is more about the S3 part than the jekyll one.

But yes, I guess there is some movement back in time. I'd say the two main reasons are a return to more minimal blogging (i.e. something like tumblr as opposed to blogrolls, widgets, plugins etc.), and the fact that you can do some dynamic stuff in the client now via JavaScript (e.g. comments with Disqus).

I started using a method really similar to this to host a blog a few months ago, shortly after the S3 static website feature was released. However, shortly after a post ended up on the front page of hacker news, requests to anything on the S3 bucket started responding with 503 errors.

Not entirely sure what the issue was, since I use S3 to host static assets for other sites that see similar traffic levels, and haven't gotten any 503 errors. And clearly ATD seems to be handling the HN traffic just fine.

I actually made a Ruby gem to do exactly this on a repeatable basis for my own site.

Here's the code: https://github.com/ohrite/vacation Here's the gem: http://rubygems.org/gems/vacation

Oh, I've been looking for something like this forever. Thanks.

It's an interesting use case. However, if you have almost static content you can also make use of heavy caching. That is a low end dynamic site with a powerful caching/delivery layer. I guess this is also doable with Amazon Cloudfront. It is all about how comfortable it is to update your site.

This is awesome. I was going to put my blog onto GitHub (you know, being a hacker it just makes sense since I already pay for it anyway), but it is intriguing to be able to put it on S3, especially with CloudFront.

I'd be more likely to put it behind Cloudflare than Cloudfront.

Is it just me or is the site down?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact