Hacker News new | comments | show | ask | jobs | submit login

It's called experience.

Which perhaps sounds rude, but it's not meant to be.

This stuff isn't taught per se, you learn it bit by bit as you solve each problem that you face.

I learned about HAProxy when my site load exceeded that which a single web server could manage.

I learned about heartbeat when I had to update my HAProxy and it knocked the site offline.

I learned about master/slave replication of databases when a site I worked on had considerably more reads than writes and scaling vertically (buying a bigger box) cost more than scaling horizontally (adding cheap read slaves).

I learned of sharding when I worked on a graph stored in an Oracle database and performing calculations on the whole graph exceeded that one physical box.

I learned of one-hop replication to solve the problem of a sharded graph.

I learned of partitioning to solve the problem of having one big database and the computability not being maxed but the storage being maxed.

I learned of memcached when I wanted to reduce page generation times and realised going to the database was more expensive than keeping it in cheap RAM elsewhere on the network: http://www.buro9.com/blog/2010/11/18/numbers-every-developer...

I learned of reverse proxy caches when I wanted to make sure requests for things already served never reached the web layer again.

I learned about Varnish when I considered that most reverse proxies use disk storage for their cache.

We can go on and on here, but the message is that you learn these things one at a time solving real problems that you come up against. There is always the next hurdle to jump through, and when you get there you too will learn how to get past it.

I'd emphasise that you cannot attempt to do this prematurely, that premature optimisation quote really applies well to architecture too. Keep things as simple as they can be and just know that when you get to a hurdle that someone else has already solved it and you've just got to find out where it's written down (if anywhere), what they used, how they approached it, the upsides, downsides, what they'd do differently, etc.

You could try sites like http://highscalability.com/ but I would urge you not to implement things without knowing why you're implementing them. Don't cargo cult ( http://en.wikipedia.org/wiki/Cargo_cult ) this stuff, it's really key to do only what you need to do, when you need to do it.

seriously dude... write that book...

Cause i can't find it anywhere...

This is one of those books that the market is basically incapable of publishing, because its like "Hey why don't you do 6 months of hard work and then we'll pay you a $10,000 ~ $20,000 advance which you will never earn out" when Plan B looks like "Save some large company whose system being down is costing $X00,000 a day, make client look like hero, get compensated accordingly."

And as I was trying to emphasise in my post, there's not a "right way" to do things, some problems just make some approaches more right than others in those instances.

Just with Bentley's Programming Pearls book, he underlines again and again that knowing your problem is more important than knowing the best algorithms, the one that will work for you depends on your problem and only you know that.

Highscalability.com and shared slidedecks act as a community generated set of architectural patterns, but no-one should implement them without knowing what their problems will be.

What I don't understand - and perhaps will be incapable of understanding until I face these issues myself (and hopefully I will in the future) - is why this view that scaling is in a sense un-documentable...

What is it about scaling an application that can't be reduced to an algorithm - at some level of abstraction at least, so newbies can at least get an idea of how they should start thinking about it.

To put the question in another way - could a framework like django ever come to provide scaling tools out of the box? Or is it just something that fundamentally can't be reduced. Might it be that there just haven't been enough people who have faced scaling problems that repeatable patterns haven't yet become obvious?

It's undocumentable because as strange as it seems, no two problems are the same. There is no one-size-fits-all approach, not even a one-size-fits-many. At best we have a series of one-size-may-fit-you-if-you're-lucky options. Disqus isn't using NoSQL or Nginx, something a good number of scaled web applications have switched to. Why? It doesn't solve their problem. Why is their problem different from others? That's a long and complex answer that revolves around almost every aspect of how their applications run, access data, what types of data they're accessing, and so on, and so forth.

Is the problem their algorithm? Does it spend a long amount of CPU time working away? Could it be written a different way? Is it data access, is the lag caused by queries taking to long? Is that down to badly formatted queries, inefficient schema, server problems or something else? Is it even a single problem or a combination of a multitude of minor little niggles that combine into a big headache? Do you do thousands of little queries, or smaller numbers of big ones? Are your tables narrow or wide? What size and types of data do you store in the fields?

A number of the things that Disqus have done to scale out aren't appropriate for other environments, by nature of the fundamentals of the app. All they can advise on is how to scale your python/Django/MySQL based commenting system, but even then your approach to writing one might be different to theirs.

Quite simply, no one can tell you how to scale your application and its infrastructure, because every application and infrastructure is unique by the very nature of every problem being unique, and every solution more so.

That's not to say there is no value in the information that Disqus has provided. Quite the contrary, there is every bit of value there, and I greatly appreciate them posting it. There is a good chance that whilst some of what they've done won't be of use to you, some of it may be. It may even be of use for other reasons that are entirely different from those that benefit Disqus.

Quick example. You want to load balance web traffic, what do you chose for it? Is software of hardware best? Do you do lots of SSL (would an SSL Accelerator be of use?) Do you want the servers to directly respond to the client, or respond through the load balancers?

Apache httpd+mod_proxy Nginx HAProxy ldirectord lighttpd

You want to add in caching: Varnish Apache Traffic Server Squid Polipo

and so on! Each software package has its own particular strengths and weaknesses, and its as more a matter of gut instinct and intimate knowledge of the way your code and site works, than anything else that can help you find the right way to scale.

Thanks for that in depth reply... I get ya in the abstract... but yeah I guess I have to go through it myself to really see it.

Can anyone speak to how close these books come to this? Or recommend other books?

Building Scalable Web Sites http://oreilly.com/catalog/9780596102357

The Art of Capacity Planning http://oreilly.com/catalog/9780596518585

Web Operations http://oreilly.com/catalog/0636920000136

John Allspaw's books are great. I'd recommend these also:

The Art of Scalability http://www.amazon.com/Art-Scalability-Architecture-Organizat...

Scalable Internet Architectures http://www.amazon.com/Scalable-Internet-Architectures-Theo-S...

Enterprise Cloud Computing http://www.amazon.com/Enterprise-Cloud-Computing-Architectur...

Both of John Allspaw's books (the latter two on your list) look good from their table of contents.

And if you're in doubt, John is now VP of Ops at Etsy and came from Flickr before that: http://www.kitchensoap.com/about-me/

His blog is interesting too: http://www.kitchensoap.com/

So without having read the books, I would shoot for the latter 2 if I wanted to have hard copies around to introduce me to this kind of stuff.

I've found "Scalable Internet Architetures" by Theo Schlossnagle to also be quite valuable. It contains general advice on how to approach problem solving when it comes to building uh, scalable architectures.


I'm a big fan of Building Scalable Web Sites - it's a bit old now (it predates cloud-computing-for-everything) but still very relevant if you're just starting to learn about this stuff. It's basically everything the author learned scaling Flickr from a tiny site to several hundred million photos.

This stuff isn't taught per se, you learn it bit by bit as you solve each problem that you face.

Some of it is taught. I learned about HAproxy second-hand, before any problem had to be solved.

I do agree that much of this has to be learned initially through experience, but I also believe much of it can be taught. Teaching it in a classroom wouldn't work, though, as Ops only exists when there's something to operate.

It's my perpetual dilemma: few startups have enough [1] scale to warrant having a senior enough sysadmin to be a mentor, let alone another one or two more junior ones, but large companies are awkward environments for experimentation.

I try and dump some bits of wisdom/experience to my blog. Shameless plug: http://blog.maxkalashnikov.com

[1] Even Disqus is borderline in terms of traffic. Moore's law and even its more linear corollary for I/O can take one a long way.

I find in practice that even more important than figuring out which tools to use to solve your problem is figuring out what your problem is in the first place. The importance of measuring tools is often vastly under-represented in this discussion. Doing so properly almost always involves writing some custom profiling code, and having a good understanding of where resource bottlenecks are likely to be in your systems (and then ignoring that when the numbers tell you otherwise).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact