
Design Lessons & Advice from Building Large Scale Distributed Systems at Google - paulsb
http://perspectives.mvdirona.com/2009/10/17/JeffDeanDesignLessonsAndAdviceFromBuildingLargeScaleDistributedSystems.aspx
======
neilk
Jeff Dean is well known as a superstar within Google, although not so much to
the outside world. With the exception of PageRank he created or had a hand in
almost all the technologies you've heard of as the major Google innovations.

I have a Googler friend (genius coder in his own right) who sometimes wondered
if he wouldn't be more productive by just devoting his workday to ensuring
that Jeff Dean was properly caffeinated.

I find this kind of fascinating because Jeff Dean's academic background was in
compiler optimization research. Not the obvious choice to build infrastructure
for a large website. But perhaps compiler people know how every nanosecond
counts, and can see the network as just another high latency part of a big
computation.

~~~
strlen
Designing and implementing a compiler involves _many_ aspects of computer
science -- as does distributed systems development.

If you've never written a compiler from scratch, I highly suggest doing so(1):
there are data structures (symbol table -- a hash table, parse tree, tries),
algorithms (converting an NDFSM to a DFSM(2), parsing, filling registers and
many others), computer organization / architecture (outputting assembly,
optimizing the code with pipelining and CPU cache in mind). Not to mention
it's also a substantial project which teaches you a lot about software
engineering: it's not something you can write in a single session, it's
something that has to be frequently extended and tested (I can't think of a
cleaner candidate for test driven development: you specify the source code you
want to input and test if the generated IR/assembly code is what you're
looking for). It's also really fun :-)

Not to mention custom compilers/languages are very useful in distributed
systems: look at Thrift/Protocol Buffers ( languages for describing
serialization of data structures for RPC). Google also has custom language
which is compiled into state machines that they used to implement their Chubby
service (and probably other distributed systems) -- see the "Paxos Live"
paper. At an employer I've been at, which (when I was there) was deploying
distributed systems with tens of thousands of nodes, we've created a custom
language for describing/querying groups of machines: this language and systems
built on top of it (for monitoring, configuration, provisioning and
deployment), helped reduce admin:machine ratio by orders of magnitude (think a
team of three handling ~10,000 machines).

(1) I suggest picking up an _older_ (not the newest) version of the Dragon
Book and targeting a x86/amd64/MIPS: while for a practical compiler I'd almost
_certainly_ suggest targeting LLVM or JVM/CLR, you'll learn more targeting
bare metal.

(2) Incidentally, Lamport's landmark paper suggested treating processes in
distributed systems as state machines.

~~~
sandGorgon
that is a fascinating account - could you elaborate on how you leveraged a
compiler/custom language for your 10K machine management.

I am not able to wrap my head around what was it specifically that you needed
a language for - rather than, say building it on top of perl/python/lua - and
whether you used the yacc/bison toolchain to build it.

~~~
strlen
First, I didn't build the language/compiler myself (it was written in my group
before I arrived). It would be embedded in other languages, much as SQL is:
you execute queries (first against data on a local disk distributed via
version control, later against a web service).

The queries you'd make would be like: "Give me difference of (union of ( all
machines running 64 bit Linux , all machine in cluster "match", all machines
in wast coast datacenter )) machines starting "box10" and then show me the
vlans these machines are on". Of course this would be in a much more terse
(but very easy to learn) language. Perl/Python/C code (and through a CLI tool,
shell scripts) would then do operations on returned values (i.e. parallel ssh
to execute a command, parallel TCP call).

------
pmorici
Hm, there are some interesting numbers in there.

"Map Reduce Usage at Google: 3.5m jobs/year averaging 488 machines each &
taking ~8 min ... Big Table Usage at Google: 500 clusters with largest having
70PB, 30+ GB/s I/O"

So to run 3.5 million jobs at 8 minutes each on 488 machines that means they
would need at least 26,069 machines to complete those map reduce jobs in a
year.

Similarly if you deduce that for their largest storage cluster they are using
their previously described commodity hardware approach and at the moment the
sweet spot for drives is at about 1TB with 1 or two drives per machine that is
between 36,700 and 73,400 machines in their largest storage cluster. That
seems like a lot.

------
wallflower
> Working on next generation Big Table system called Spanner

o Similar to BigTable in that Spanner has tables, families, groups,
coprocessors, etc.

o But has _hierarchical_ _directories_ rather than rows, fine-grained
replication (ad directory level), ACLs

~~~
xal
It's funny. All these abstractions build on top of each other to provide fault
tolerant failover at google scale and we are coming full circle back to the
file system metaphor.

Finally :-)

~~~
kls
If there is something that I have seen come to pass time and time again in
technology, is that, the old becomes new again. You can literally make a
fortune on identify the next big thing and then implement an old idea on top
of it. Look at mobile, we started out with simple systems that took assembly
or C (Newton, Palm) and graduated to Java and now we are seeing the latest
platform move to web technologies. Take any facet of technology and you can
draw a parallel to legacy technology.

------
seymourz
Who ignites Google's inspiration for infrastructure innovation? This man
certainly should be Urs Hoelzle. All the star work, done by Jeff Dean and
other at Google, is from his thoughtful planning ...

