
Ask HN: How did you become a hardcore back-end developer? - andywood
In the course of reading lots of tech news, and occasionally hanging out with other hackers, I've come under the impression that there is some kind of intense, difficult black art to building large services that scale to 10^N number of users, for any impressive value of N.<p>I'm a dev of 20+ years who has worked on everything from games, to desktop apps, to web apps. I worked as a front-end web development lead at Microsoft on several <i>small</i> web apps, and as the back-end dev on a <i>small</i> web service.<p>I think I could get interviews at any number of interesting places, but if I was going for back-end work (the part that interests me), I feel like I would simply be out of my depth. How do I get there? I'm sure there are obvious answers like "read books", "learn on the job", or "try to build youtube/facebook/twitter, and succeed". So specifically, my question is for those who have already learned to build medium-to-large services: what was <i>your</i> path to acquiring these skills, and do you have any advice for people like me?
======
SpikeGronim
I'll share my experience, which may differ from other people's. The largest
system that I've worked on was Amazon S3. At the time that I worked there we
were doing 100,000+ requests per second (peak), storing 100+ billion objects
(aka files), and growing our stored object count by more than double every
year. The most important skills for that job were distributed system theory,
managing complexity, and operations. I can't explain all of these skills in
depth, but I will try to give you enough pointers to learn on your own.

For distributed systems there are two main things to learn from: good papers
and good deployed systems. A researcher named Leslie Lamport invented a number
of key ideas such as Lamport timestamps and Byzantine failure models. Some
other basic ideas include quorums for replicated data storage and the
linearizability consistency model. Google has published some good papers about
their systems like MapReduce, BigTable, Dapper, and Percolator. Amazon's
Dynamo paper was very influential. The Facebook engineering "notes" blog also
has good content. Netflix has been blogging about their move to AWS.

Every software engineer needs to manage complexity, but there are some kinds
of complexity that only show up in big systems. First, your system's modules
wil be running on many different machines. The most important advice I can
give is to have your modules separated by very simple APIs. Joshua Bloch has
written a great presentation on how to do that. Think about what happens when
you do a rolling upgrade of a 1,000 node system. It might take days to
complete. All the systems have to interoperate correctly during the upgrade.
The fewer, simpler interactions between components the better.

The best advice I know of about operating a big distributed system is this
paper[1] by James Hamilton. I won't repeat its contents, but I can tell you
that every time that we didn't follow its guidelines we ended up regretting
it. The other important thing is to get really good with the Unix command
line. You'll need to run ad-hoc commands on many machines, slice and dice log
files, etc.

How did I learn these skills? The usual mix of how people learn anything -
independent study, school, and building both experimental and production
systems.

1\.
[http://www.usenix.org/event/lisa07/tech/full_papers/hamilton...](http://www.usenix.org/event/lisa07/tech/full_papers/hamilton/hamilton_html/)

~~~
tptacek
Logical timestamps are an extremely simple idea that knocked me on my ass when
I first worked them into a system. Also a great thing to look up to get a
"flavor" of how distributed systems work.

I feel like if you walk into a job interview knowing the corner-cases of a
two-phase commit and being able to solve a problem using Lamport timestamps,
you're probably in the top 90th percentile of dev applicants.

~~~
palish
_90th_ percentile?

Maybe ~99th.

The author has been developing software for 20 years. He is likely a fine
applicant for a significant number of software dev positions, since he can
learn and apply many different technologies very quickly, from what he has
said. And also, from what he has said... he doesn't come close to what you
described.

I've been developing software professionally for five years. I've been
programming C/C++ for 10. (I'm 23; my passion has been for gamedev.) And I
don't come close to what you described.

If I devoted myself to learning what you just described, I could probably
achieve a thorough understanding (deep knowledge, an important distinction
from superficial knowledge) inside a month. But at the end of that, it seems
doubtful I'd be much closer to accomplishing the author's stated goal... I
would only know two essentially random cornercases.

All of that said, thank you (and SpikeGronim) for mentioning Lamport
timestamps; time to go a-wikipedia'n.

<http://en.wikipedia.org/wiki/Lamport_timestamps>

~~~
SpikeGronim
Maybe tptacek just has a really great applicant pool ;). You also have to
account for specialization. I have no clue how to do optimized gamedev.

If you want more Lamport goodies: "Paxos Made Simple" (distributed
transactions done right):
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69....](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3093&rep=rep1&type=pdf)

"The Byzantine Generals Problem" (harshest failure model known and how to cope
with it): [http://research.microsoft.com/en-
us/um/people/lamport/pubs/p...](http://research.microsoft.com/en-
us/um/people/lamport/pubs/pubs.html#byz)

An interview where he talks about his approach to systems, particularly formal
reasoning and specification: [http://www.budiu.info/blog/2007/05/03/an-
interview-with-lesl...](http://www.budiu.info/blog/2007/05/03/an-interview-
with-leslie-lamport/)

His publication list - [http://research.microsoft.com/en-
us/um/people/lamport/pubs/p...](http://research.microsoft.com/en-
us/um/people/lamport/pubs/pubs.html)

------
jerf
1\. Find bottleneck.

2\. Remove bottleneck.

3\. Repeat.

4\. Every once in a while, make a bold move to throw something out that can no
longer work that way and replace it with something more scalable. But while
this is important, it comes up less often than you might think.

The difference is that you spend a lot more time in that loop than a desktop
dev, but if you understand programming it isn't a special black art until the
very, very top end.

The other thing to get is that it's always about _buying time_ rather than
_solving the problem forever_. The goal is to have bought enough time that you
don't have to be stuck in a local optima or make panicked decisions.

~~~
justin_vanw
This is a good point, to which I would add:

Architect your system so that you have visibility into where it is breaking,
and that no piece has more than one simple job. Otherwise you will end up
spending all your time trying to figure out where the bottlenecks are, and
every bug will take a day or more to track down. Any part of your system that
is complex will basically not be fixable, since nobody will know what the
consequences of any change actually is until it breaks something else, which
will then take another day to fix, and yes this logic does lead to an infinite
chain of days fixing bugs caused the previous day.

~~~
petervandijck
Small pieces, loosely connected.

------
ismarc
Ride on others coat tails, stand on others' shoulders. It's not that it's any
harder, it's that the skills used day to day are different. The single skill I
picked up that served me best was being able to rationalize about what complex
, highly concurrent code was doing and the performance implications of it. And
I got this by reading code, and not just little programs, but things like the
udp packet handling in the Linux kernel, or the storage and firewall rule
insertion mechanisms for iptables.

But, nothing beats working directly with geniuses. Earlier this year I made a
change (at my last company) that increased the number of simultaneous users by
well over an order of magnitude. The change was known and had been tried by
others in the group, but was deemed infeasible. I didn't come up with the
magic change needed, I found how to apply it. And what I learned in the
process is applicable outside of that. Without working directly solving the
problems, it's hard to learn how.

------
CyberFonic
For me the path the heavy duty back-ends was Unix and C. Most of the work for
large corporations, in addition to the mainframes, involves big systems; IBM:
AIX, HP: HPUX, Sun: Solaris. Helps to know a bit about storage: EMC, Hitachi,
NetApps, etc. And of course databases, DB2, Oracle.

The best news is that these days, you can build up these skills using a $1k
box with Linux or BSD. Years ago, you needed to get a job first because
systems were in the order of $millions and they wouldn't fit in your average
spare room.

You'll also need to demonstrate so CS/SE chops, because mucking up a big back-
end system is not like a web page that occasionally crashes, it can cost
$10k's per hour while it's down.

------
davidhollander
I would start by viewing it as tree structure optimization problem. Draw a
tree where each node is a physical server and the root node is the domain name
server. Now try to maximize throughput of random lookups while minimizing
height (complexity). For each level of the tree, come up with a list of
everything you can think of that might affect the traversal
(processing\lookup) time when a node (server) in that level is entered. Also
create a list of everything you can think of that might affect the lines
(connections) between nodes. This exercise should give you a good idea of what
you need to learn and help generate more specific questions.

------
justin_vanw
There are maybe 20 people the world who 'know' how to scale a website up to
millions of users. There are lots of teams of hundreds of people who actually
do it.

Don't get worried that you won't be able to go in and run the show on the
first day. There isn't any secret sauce, and sites that scale to this level
are so rare that they probably each have their own arcane and complex way of
doing it that has evolved over years of people trying different approaches and
failing.

Anywhere that is worth working isn't looking for someone who knows how to
scale a website to millions of users, they are looking for smart people who
can contribute. Their development budget is probably in the millions of
dollars per year, they will be more than happy if you can help.

TLDR; Nobody is going to write a book on this, since only 500 people in the
world would benefit from reading it. There is no single answer.

To address the specifics of what you are asking, there is basically a
balancing act of consistency vs performance. You need to find the exact
balance that is 'good enough' for every problem. The oft quoted 'there are two
hard problems in CS, cache invalidation and naming things' pretty much sums it
up.

------
mathgladiator
The simplest way is to just do it.

You are fortunate that you live in the age of cloud computing. For instance,
you can spend $10 for a day and get access to more compute resources than most
people could hope for after months of budget proposals.

Find a problem, solve it, launch it, test it, find bottleneck, kill it. Repeat
this enough times and you can start to a feel for where bottlenecks will
happen and how fail happens.

~~~
chintan
I second this.

I wrote a distributed crawler from scratch with 10 EC2 machines. It was one of
the best learning experiences ever!

------
fingerprinter
I've was mostly a web guy, riding the internet from '94 until about '06 when I
started to get into more serious stuff...up until that point it was C, Perl,
Java etc , but it was mostly pushing business data around, which is what I
think 90% of all commercial programming is these days (so don't knock it...it
pays the bills).

In '06 I joined a startup and we needed to scale. I hadn't had experience with
this stuff and neither did most people on my team...so here is what we did.

* Try new things, but basically find out what most people are doing that have already gone down this path (stand on shoulders of giants, as someone mentioned)

* Read, read, more reading...talking to other devs...network...DO NOT REINVENT SOMETHING (I also call this the Kiss of Death). Unless you are Google, Amazon or Facebook, use off the shelf if you can.

* Use technologies that will work for your problem. We chose Erlang for ours b/c it of what we were doing. Something like Java would have worked, but would have made the job 10x harder. C would have been ideal, but we would have to reinvent nearly all of Erlang, so just choose Erlang.

* LEARN about things like good architecture design, SOA and failure (when a system goes down, what happens...).

*Invest in a good test suite or test infrastructure, but realize that it will be nearly impossible to test at scale.

During that time I felt like I was constantly reading every paper I could
find, blog on scaling and back-end systems and talking to every dev or had
ever done it. It was work, but not the type normally associated w/ dev....but
was 100% worth it.

------
diego
I started writing my story but it became too long so I posted it here.

[http://dbasch.posterous.com/how-did-you-become-a-hardcore-
ba...](http://dbasch.posterous.com/how-did-you-become-a-hardcore-back-end-
develo)

Tl;dr: in 1998 I created an mp3 search engine that got significant traffic,
had to learn on the fly, ended up going to Inktomi where I joined a team
tackling much bigger problems. We all learned a lot over the next four years.

------
jsarch
@andywood,

Can you take a moment tomorrow and add an edit to your post giving a summary
of whether you felt the comments answered your questions?

I ask simply because my first read of your post focused on "How do I get
there?" and not "what was your path?" As such, I was surprised to be reading
life stories of fellow HN'ers. Since we all absorb info differently, I'm
curious to know if the stories helped and what you gleaned from them.

All the best in your endeavor. -- A fellow large-scale enthusiast.

~~~
andywood
For some reason, I don't have an edit link for this post anymore, but I can
answer this right now. All of these responses are exactly what I was looking
for, and then some. As far as the phrasing, I'm equally interested in direct
advice like "read this paper", and personal stories. I've always been able to
intuit how to go about learning any given topic in computing, whether
languages, game programming, HTTP, Win32, or what have you. I don't know
exactly why this subject in particular seems more esoteric to me - probably a
product of my background - but it does. I wanted to know how others learned.
Before this thread, my best answer would have been "Get a job at Amazon or
Google as a front-end dev, and try to work my way into the back end." Now I
have papers to read, algorithms to learn, topics to explore, and ideas about
setting up a toy environment for learning. So yes, all of the answers are
definitely helping, and I hope a few more people will add their stories. A big
thank you to everyone.

------
petervandijck
To boost your resume, you could work on some of the large-scale open source
systems (nosql etc.) That'll look good, and get you some good experience too.

You can run 1000 servers for an hour on Amazon for fairly cheap. If you use
that to do some testing/benchmarks etc. of popular nosql systems, for example,
and then write about that, you can create some notoriety in the big-systems
world fairly fast.

Good luck!

------
Pahalial
<Obvious answers here>

When you discount "learn on the job" and "read books", i'm really not sure
what's left, or what you expect the people who have achieved success by doing
these things to tell you (while omitting those things.)

~~~
andywood
My intention was not to discount them at all. In fact, I've learned everything
I know about web development on the job. I just didn't want the existence of
obvious answers to deter anybody from sharing the details of their individual
experiences. And all of the responses so far have been _exactly_ the kinds of
things I'm looking for.

------
known
Try developing using [http://www.stoneridgetechnology.com/products/pci-e-
developme...](http://www.stoneridgetechnology.com/products/pci-e-development-
boards/hft-development-kit/)

------
ochekurishvili
By mastering technologies I didn't know.

