Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: How did you learn about the design/architecture of backend systems ?
58 points by yr on July 13, 2010 | hide | past | favorite | 14 comments
I'm interested in learning the backend systems of Facebook, Amazon, Google, Twitter, Loopt or any other startups. I'm mainly interested in design/architecture and scaling.


For web applications, understanding the concept of state and managing it takes you a long way towards building a scalable system (on a related note, any time spent really understanding HTTP is time well spent). It's very useful to read books, presentations and war stories, but for me, reading code in applications designed for scale was far more important. I strongly recommend the Amazon Dynamo paper. Read and think about the design decisions they made. Then go read the Cassandra code for the actual implementation (pretty much, in terms of the big ideas, at least as it was back in 2008).

As with any learning it's very hard to truly grok the importance of loose coupling, service oriented architecture, embedding statelessness in the architecture etc., unless you try to build something, get it wrong and then fix it. So, the very best way to learn would be to put yourself in a place where you get an opportunity to do that.


My answer comes from the perspective of a recent grad who has spent a year at a mid-sized SF social gaming company, working only recently on the back-end (i.e., not a scaling expert).

A few things I've learned w/respect to scaling in my context:

- I/O is likely to be your bottleneck, so design your db well and anticipate splitting it across multiple machines (and what that means for your access logic)

- don't spawn a new thread for every request (a la apache)

- keep your services simple (perhaps: one for handling web requests, one for data access, one for caching, one for your payment system) and their relationships even simpler (one hop max from RPC caller to callee)

- cache like a hoarder

- find/write an efficient serializer for RPCs between services

What helped me get a grip on this stuff was sitting down with an architect who has done it successfully multiple times. I asked for an elevator pitch description of scaled web architectures, and then ask him about his failures. Voila bullet points.


Apache does not spawn a new thread for every request.


Every request does not require a thread to be spawned (depends on configuration/load), but every connection would require its own thread.

"The worker MPM uses multiple child processes with many threads each. Each thread handles one connection at a time. Worker generally is a good choice for high-traffic servers because it has a smaller memory footprint than the prefork MPM.

The prefork MPM uses multiple child processes with one thread each. Each process handles one connection at a time. On many systems, prefork is comparable in speed to worker, but it uses more memory. Prefork's threadless design has advantages over worker in some situations: it can be used with non-thread-safe third-party modules, and it is easier to debug on platforms with poor thread debugging support."

http://httpd.apache.org/docs/2.0/misc/perf-tuning.html


thanks for the explanation :)


"- cache like a hoarder"

This isn't necessarily true. If the data is likely to be very volatile, it will probably be better just to ping the database than to bother trying to cache it.


here an article about various techniques to make github (a rails app) fast:

http://github.com/blog/530-how-we-made-github-fast

very interesting but also full of intimidating stuff:

"[...] For requests to the main website, the load balancer ships your request off to one of the four frontend machines. Each of these is an 8 core, 16GB RAM bare metal server. Their names are fe1, …, fe4. Nginx accepts the connection and sends it to a Unix domain socket upon which sixteen Unicorn worker processes are selecting. One of these workers grabs the request and runs the Rails code necessary to fulfill it. [...]"

I think they know what they are doing ...


I'm still not the world's foremost expert, but what I do know I've learned through a combination of trial and error, reading books (I'll edit this later and put in a couple of specific titles), reading stuff on the 'Net and classes I took in school (I did a degree in "High Performance Computing" which had some useful aspects to it).

A good place to start, if you're not already familiar with it, is High Scalability: http://highscalability.com/

Edit: book recommendations:

Scalable Internet Architectures -

http://www.amazon.com/Scalable-Internet-Architectures-Theo-S...

Linux Clustering - Building and Maintaining Linux Clusters - http://www.amazon.com/Linux-Clustering-Building-Maintaining-...

High Performance Linux Clusters - http://www.amazon.com/Performance-Clusters-OpenMosix-Nutshel...

Linux Enterprise Cluster - http://www.amazon.com/Linux-Enterprise-Cluster-Available-Com...

Java Message Service - http://www.amazon.com/Java-Message-Service-Mark-Richards/dp/...

Java Message Service API Tutorial and Reference - http://www.amazon.com/Java-Message-Service-Tutorial-Referenc...

Enterprise JMS Programming - http://www.amazon.com/Enterprise-JMS-Programming-Professiona...

Hadoop: The Definitive Guide - http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/0...

Pro Hadoop - http://www.amazon.com/Pro-Hadoop-Jason-Venner/dp/1430219424/...

Wikipedia:

http://en.wikipedia.org/wiki/Shared_nothing_architecture

http://en.wikipedia.org/wiki/Shard_%28database_architecture%...

http://en.wikipedia.org/wiki/Publish/subscribe

It's important to understand the difference between vertical scaling and horizontal scaling. Horizontal is very en vogue these days, especially with commodity hardware. Why? Because you can add power incrementally without spending tons of money upfront, and without requiring a "forklift upgrade" (that is a reference to needing a forklift to bring in a new mainframe or minicomputer). This is a pretty good article on the topic:

http://www.scalingout.com/2007/10/vertical-scaling-vs-horizo...

As popular as horizontal scaling is, don't ignore the possibilities of going to bigger hardware though. It has it's own advantages, especially when you start talking about physical floor space to store servers.

Of course "cloud computing" changes some of this, both by making it cheap and easy to add VPS's to scale horizontally, or by making it possible (sometimes) to easily add more processing power, RAM, etc. to your "server." Read up on Xen, KVM, EC2, etc. for more on that whole deal.


Forgot to include it originally, but it would probably be good to study Map/Reduce as well:

http://en.wikipedia.org/wiki/MapReduce

http://labs.google.com/papers/mapreduce.html

Caching is huge too... IO is expensive, RAM access is cheap. The more you can pre-load, pre-calculate, and/or pre-sort stuff and store it in memory, the better (in terms of avoiding expensive IO anyway). Caching has it's own issues though: if you cache so aggressively that you exhaust physical ram and cause more swapping, you can actually hurt yourself. Also, you have to deal with the possibility of stale data in the cache, and determining when and how to expire and reload items in the cache. But still, caching is essential, it's just not necessarily easy.

Also, for perspective if nothing else, read the papers and stuff on SEDA (Staged Event Driven Architecture). There's still debate about how effective the SEDA approach is, but reading the discussion(s) will help you appreciate the issues involved. http://www.eecs.harvard.edu/~mdw/proj/seda/


bump for this awesome list, especially highscalability

I don't know where OP's experience level is at, but I think there's no substitute for direct experience. Running, debugging, performance testing a remote server is an art all its own.

If you don't have much industrial experience with this, I'd suggest getting hold of the source code of some site that you know scales ( http://code.reddit.com/ ) and then going on EC2 and use some performance testing tool ( http://jakarta.apache.org/jmeter/ ) to check it out.


Cloud computing can change 'some of this' or 'all of this' depending on the nature of the work. For many applications/requirements it could change every core design decision (when you can pay for computing by the literal hour). Well worth spending some time learning about (for the OP).


awesome list, thanks!.


I found Cal Henderson's book "Building Scalable Websites" (which describes pretty much everything he learnt while scaling Flickr) incredibly useful.


Go work at Google. Even if you have to pay for it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: