Hacker News new | comments | show | ask | jobs | submit login
How I Made my Blog 2.3x Faster (alexbrowne.info)
50 points by polymathist 1576 days ago | hide | past | web | 45 comments | favorite

"Ruby/Rails" <-Theres the problem

I know HN loves these frameworks, but seriously, the JVM is waaay faster... Too bad there wasn't a dead simple JVM or Java based blog engine.

Did you catch that afterwards he still wasn't using the JVM?

Any time you're serving up gzipped static pages instead of loading any language runtime, JVM or not, you're going to see huge speedups.

My apologies for "implying" that. I facepalmed when I saw, "slow" and "ruby/rails". So I digressed into a rant about the lack of a simple blog engine for Java, which would likely run much faster than a Rails solution. I'm fully aware his solution is to just pushed static files, and I even bet he's having Apache do linux sendfile, which would pretty much idle the CPU.


What do you find lacking in the countless existing blog engines for the JVM? It seems to me that hosting blogs on the JVM is about as well-supported as hosting them with Ruby.

As you may know, Jruby runs on the JVM. In fact Torquebox is JBoss specifically for JRuby.

Everybody says it's the bees' knees, but Ruby on Rails is terrible. I speak from experience, I administer non-public installations of Redmine and Gitlab.

First of all, the language syntax is awful and incomprehensible.

Then there's the Byzantine deployment process (although, to be fair, it seems endemic to modern web development; it's just more obvious in Ruby since most Ruby programs are web apps).

Finally, the startup for a Ruby application is incredibly slow -- worse than Java ever was, even in the Bad Old Days running 1.2 on a 300-ish MHz Pentium II (which was a state-of-the-art desktop at the time).

Hint: If someone makes a post to the Github issue tracker for your web application saying "Increase timeout from 30 seconds to 300 seconds" [1], a one-line patch changing a number from 30 to 300 does not count as solving the problem. There is no reason that a web app should ever have an expected load time greater than 30 seconds.

[1] https://github.com/gitlabhq/gitlabhq/issues/694

While I personally like Jekyll, and the added simplicity when it comes to hosting, there should be almost no performance difference between a "dynamically" generated blog based on Rails/Sinatra behind Cloudflare and static hosting on S3.

The only user generated content on a blog are usually comments. With just a bit of added caching headers, I'd say that 99+% of all page hits on a blog can be served out of Varnish. You can even use rack-cache if you don't mind the performance penalty. If you're ok with using e.g. Disqus (as Octopress does) for comments, you can serve 100% of them and just purge the cache when adding new content. This would probably also work when doing a purge for new comments. This way, you have to do the dynamic computation exactly once per page and you're done.

Adding the CDN properties of cloudfront to S3 is nice, but Cloudflare seems to provide a very similar solution for "regular" sites.

I'd say this is similar to comparing AOT compilation (Jekyll) vs JIT (dynamic generation + caching).

I converted my blog[1] to be static this past weekend using all AWS (S3+CF+Route53) and it went relatively painless. But I decided to use middleman instead, and after using that I don't think I'll be going back to Jekyll.

I still have a Github.com pages account that uses Jekyll because its free, but since its so damn cheap I may convert this to AWS stack as well.

The largest problem I was running into was permissions and invalidating the CF CDN. I am thinking of rolling a simple Markdown editor using node-webkit and scripting up some things to automate everything.

[1]: http://thoughtlessbanter.com

I wrote a deployment script for s3/cf/r53 in python last week. It's pretty janky right now and is very oppinionated on a lot of things, but it handles invalidation requests for you and the code is simple enough to easily modify to your liking.


That's what I do for my blog, http://www.korokithakis.net/. It's a small, custom blog system I wrote in half an hour, and caches everything, using Django's cache library. I just purge things whenever I add or edit a post. Generation is pretty fast, but the static media is a bit heavy for that theme...

why not push them to a CDN ? and you could extract the inline images from the css and combine all of them into a sprite

Yep, I just enabled CloudFlare on it, I'll see if this makes it any faster. It's hosted on AppEngine, so I'm guessing (/hoping) the static files are already pushed to a CDN by Google.

I'm curious about what changes have you noticed when enabling CloudFlare, and whether you considered PageSpeed as well. Do tell?

I haven't noticed many changes, it looks like the site loads a bit faster, but overall I haven't seen much. PageSpeed is different from CloudFlare, it just tells you how to make your page more lightweight. I had run PageSpeed before switching to CloudFlare.

Cool, please do drop an update if you come to any conclusions about its efficacy.

BTW, I meant the PageSpeed Service for GAE [1], which seems like it's basically CloudFlare running mod_pagespeed on all requests.

[1] https://developers.google.com/appengine/docs/adminconsole/pe...

Oh, from what I saw, that just applied the best practices in the PageSpeed plugin to every request, which I had already done, so I didn't think it was necessary to run again. Please let me know if it does something I haven't seen!

My main use case was a CDN, as I'm not sure if Google pushes the GAE files to various locations.

Yup that's all it does, automate those best practices, so there's probably no need for you to use it over CloudFlare.

GAE edge caching is poorly documented, here's what I've found so far:

[1] https://groups.google.com/forum/#!topic/google-appengine/6xA...

[2] https://code.google.com/p/googleappengine/issues/detail?id=2...

[3] https://groups.google.com/forum/#!msg/google-appengine/8QgEU...

[4] https://groups.google.com/d/msg/google-appengine/8QgEUBOiNFw...

Damn, those links are very useful, thank you. I'll make sure I follow the practices, thanks again!

I'd be curious to do a comparison with a dynamic site behind a cdn. I suspect that from a performance standpoint it would be almost identical. I guess I chose Jekyll because it seemed more convenient and I was interested in trying something new. Also curious to see Cloudfare vs. Cloudfront (and maybe some other offerings in the mix as well).

And that pretty much concludes the article? Serving Cached Dynamic Site with CDN and Static Site with CDN should have the same speed assuming CDN is the same. Which make this speedtest a little point less no?

Wy would you need to clear the cache with disqus? Isn't it a JS-served comment system?

Yeah I was thinking something similar also as bonus points using this is that you learn a ton while hacking for near-static-page performance and you get to have your CMS.

I don't keep up on what various platforms offer but wasn't the original movabletype a static site generator? You entered a new post, clicked generate, it made a bunch of pages. It was only later that they added a dynamic option.

Blogger was static too. It was just a cloud based DB. You gave it an FTP name/pass and it logged into your site and uploaded the new files.

Web 3.0 - static content? ;)

That or it's anti-thesis of sorts, i.e. websites that are rendered almost entirely on the client side.

Websites are already rendered entirely client side.

I think he is referring to having the clients download a copy of the database and a big chunk of javascript to create views and forms to use the database. Syncing the database with the server at specified intervals, etc.

Maybe that can be Web 4.0 - Any ideas for Web 5.0? ;)

Perhaps. It would be pretty ironic, no? In the early days of the web everything was static content. Would be interesting to make a move back in that direction. Of course, this only works for a small subset of websites that don't need dynamic content. I believe most major sites cache their static assets (images, css, js) in a cdn while relying on a dynamic webserver for changing content.

I recently wrote a blog post about how Rack::Cache and ETag work http://blog.craz8.com/articles/2012/12/19/rack-cache-and-eta..., which is particularly important to know for Heroku based apps.

The key part is that, for public content (and blog posts tend to be public), the Rack Cache can be used to serve these pages directly from the cache store (usually Memcache) with minor database traffic needed, even for people who have never seen this content.

That last part was the surprise for me - surely ETags are only used by return visitors! Rack Cache makes ETags work for new visitors too.

I think I can get my blog to run almost as fast as a static site, and I'm working towards getting that done and documenting it as I go.

(In addition, Heroku seems to be adding Varnish headers to my responses. They say they don't use Varnish in Cedar apps, but this is clearly not correct)

I was looking to move my Drupal site to Jekyll but quickly got bored. I therefore moved the site to Sinatra and now Rails, yes it's slower than jekyll and the rails site is slower than the sinatra one but more fun.

re: "more fun". Yes, having a faster site that uses less resources is great, and very probably important for those seeking audiences as large as possible. But coding my own CMS and making it just like I want it (minus where my abilities limit me in that) is something I never would go back on. It's simply so much more fun than, say, just using wordpress (not to mention something even more basic).

Sure, my site is just "my stuff", quotes and links I collect, maybe a rambling here and there. Therefore my comment is not quite on-topic, it's not the same kind/league of site. But I just can't find anything exciting in pure speed, I want something that I like when I look at it and use it, not something I like because I know it's very popular. I like having variable and filterable views on content, user preferences, etc. too much.

I don't want to pretend I'm some kind of artist; I'm a shoddy coder, and not a great designer either. My CMS is unusable for anyone except me I'm sure. But still I take pride and enjoyment in whatever it is I'm doing, and at the very least I would encourage everyone to make something that is about ideas and features more than performance, even if as a secondary/private site. Minimalism is sometimes overrated. It may be more effective when dealing with many people, but it's also a bit sterile. So maybe do both, one for business, the other for inspiration.

Agree. My first reaction when reading the article was "your blog was massively overengineered". And while it was (after all, it got replaced with static HTML pages), I'm sure it was fun measuring and tweaking the performance and learning about building a scalable infrastructure.

Isn't it obvious that static HTML is served faster than dynamically generated HTML? What's so new about that? Please let me know if I am missing something.

It's actual data from one actual (and very common) use case.

Also, the document itself is just one fairly small piece of the whole puzzle.

Ummm... not to be the wet blanket, but what business advantage is there to your blog loading 2.3x (or even 5x) faster? I understand you may occasionally be adding content that helps plug your work, but in general I am less concerned about the speed of a long article loading than I am the content being worth the much longer time it will take to read.

Maybe it's just me?

There's not really a tangible business advantage. Sure, I put a brief plug in some of my posts, but that's not really the point. For me the whole point was just to explore new technology and push it as fast as it can go. I learned a lot from the process, I found it personally intriguing, and maybe I can even use some of what I learned in future business ventures. Other people might be able to take away something from it as well, however small.

(I know it's a pretty hefty post, hence the tl;dr at the top. No one is forcing you or even asking you to read the whole thing)

Depends on what you do. The company I work for does outsourced datacenter operations and application monitoring. Our site and our blog absolutely must be up at all times, period. If one of our blog posts made it to the front page of HN or reddit and we weren't able to handle the load, or God forbid we went entirely offline, there would be some serious egg on our faces. Why should potential or current clients trust our advice about how they can stay online if we can't do it for ourselves?

I'm sure this was a fun exercise but I see this just as a limitation of heroku. Normally you could've switched to full page caching in your rails app and get the same results.

True, but there's still an advantage to using a cdn in terms of worldwide performance.

Wow. I'm actually impressed by the load speed of that page.

As someone who lives in South America I've gotten used to sites loading in 2+ seconds, but this one feels almost instantaneous.

From previous HN threads on Static Site Generators I have seen people recommend Punch and Nanoc. How does those compare with Jekyll?

Jekyll is slow as molasses on anything over the trivial level, I haven't used Punch or Nanoc so I can't comment on those.

I guess the price difference between the two setups is huge. I'd love to hear more about that.

Actually the price will end up being pretty similar. It was more or less free when I was hosting through Heroku (pennies a month, and that was just for images on CF). It's hard to predict exactly (depends on traffic) but I'd be surprised if it costs me more than $1 per month with the new setup. Will find out for sure at the end of the billing cycle.

Update: For the month of December, Amazon charged me $1.06 for S3 and CloudFront combined. This includes the ~million requests I performed during benchmarking.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact