In general, the maxim of "cache everything, everywhere" is the cheapest way to gain both speed and availability. Of course you need to have results already to cache them, which is where all that big data processing comes into play, but there's still way more optimization going on here than is necessary to get results like this. You could make a tradeoff and give people newly-processed results more slowly and still give them mostly what they want, and not have to maintain so much custom software+hardware infrastructure.
Sometimes I feel like Batty from Blade Runner when I think about how well old crappy software used to work. "I've seen things you people wouldn't believe. Attack ships on fire off the shoulder of Orion. I watched mod_perl apps serve 50 thousand dynamic requests per second with nothing but Apache and MySQL. All those moments will be lost in time... like tears in rain..."
Can't wait for someone to clone this. Come to think of it, has anyone cloned Spanner? I recall some calling GPS a hack but I thought it was engineering brilliance. Spanner was some good work, too.
Still gotta wait a while for a full Spanner replacement with GPS tricks and all. Or something even better hopefully.
I think it is crazy that something as simple as keeping time can have such a complicated solution at that sort of scale.
Every review I hear about working at Google makes me want to stay away, just like the recent conversations about working at Amazon, but crazy stuff like this always piques my interest.
Google was Fortune Magazine's #1 best place to work for six years in a row. The reason you occasionally hear a story about how Google isn't perfect to work at is because those are news. In general, it's a fantastic workplace.
I grew up in a tiny midwest town and I love it here. I would not enjoy living on either one of the coasts.
Many of my classmates in college couldn't wait to get out of the midwest. I have friends at Google, Amazon, Microsoft, and other large names in the tech industry, but more often than not, when I hear them talk about their jobs, even when they are talking about them positively, I am glad I stayed in the midwest because that is what fits me the best.
(I work for Google Seattle, but I grew up in Pittsburgh -- Seattle is definitely more my speed, but I do enjoy a trip home now and then)
I have lived in a lot of different places: (New Jersey, North Carolina, Texas, Southern California, Northern California (don't laugh re: me splitting those!) and briefly in Illinoism Connecticut and the Dominican Replublic. ) and find value in their differentness.
To be clear, I'll say again it's not like I hate the Valley, but the reality is that day-to-day life between my Valley coworkers and mine just isn't that different, but sure is more expensive. If you find a Silicon Valley job from a SV company in a remote office... and there are rather a lot of them, just not all in one place... there's not that much advantage left, unless you really love something about SV specifically, which is of course a totally reasonable and sensible thing.
Whereas my friends in Houston get to go to one conference a year (if that). But I totally agree if you want a house with land to raise kids personally I think it sucks to live here.
However, I do enjoy living in Chicago, and that I do consider it a huge difference from the coasts.
But like I said, we're all very different.
This is true of almost everyone, of course.
Every year i read the incoming intern abstracts, and they all literally say the same thing "I really would like to work on <whatever really popular crazy project was in the news lately>". Literally all of them.
That said, often you can work on them if you are good enough at what you do.
(But yes, often you have to prove that first, either internally or externally)
I spent about half my time at Google working on mundane improvements to search - visual redesigns, feature unification, infrastructure improvements - and half working on crazy green-field projects. Most of the crazy stuff was eventually canceled, and the stuff that did launch (eg. Google Authorship) ended up being a lot more toned-down than we initially envisioned. Ultimately I think I learned more from the crazy projects, but it's a very different kind of learning, much more experiential than factual.
The other thing you learn when you actually succeed at a crazy new idea is that people build up a tolerance to them really quickly. The first time we did an interactive doodle on the home page (PacMan...actually technically that was the second, but it was the first people noticed), everybody went wild, it was in all the newspapers, and we calculated people spent 4.82 million hours playing it. Now when an interactive doodle comes out, most people don't even notice. Remember that Google Docs, GMail, etc. were revolutionary in their day; it's only because they've become successful that you don't want to work on them.
But it has other strengths.
working on a giant campus that's isolated from
the outside world
Actually, I am indistinguishable from everybody else.
I think it is crazy that something as simple as keeping time can have such a complicated solution at that sort of scale.
(Edit: Yes, different elevations cause a gravitational time dilation difference. For Earth's gravitational field and the elevation difference between different Google servers, I doubt it's an issue at the time resolution that Google needs to maintain.)
...you can't generally guarantee that they are (even approximately) stationary with respect to each other, because points on the earth's surface (in general) are not stationary with respect to each other in an inertial frame of reference.
> ... because points on the earth's surface (in general) are not stationary with respect to each other in an inertial frame of reference.
True. There is both the earth's rotation, and the relativistic difference due to differing elevations. But given earth's angular velocity and gravitational gradient, points on the surface are still approximately stationary with respect to each other, where "approximately" is defined by the amount of difference it will make compared to the time precision that Google cares about.
And there is no relativistic funny business involved, because the clocks are stationary with respect to each other. There's no difference of viewpoint as to whether the projectiles met at the halfway point, or where the halfway point was, or even how far off from the halfway point they met (and therefore how far off the clocks are from each other).
This is the argument used in my relativity class to show that you can synchronize clocks that are stationary with respect to each other. (You have to be able to do that to construct an inertial frame of reference, that is, to be able to determine what time coordinate some event occurs at, no matter what spatial location it occurred at.)
And how would either end-point know this exactly?
"Watch" may mean using something like a phased-array radar to measure it more precisely, if you wish...
And for that matter, why would anyone build any process/system/software that requires a distributed system's machines to all have their clocks in-sync. I am baffled.
People have mentioned distributed transactions and security, another area is synchronizing modeling with (hard or soft) real time inputs from separate hardware. There are a bunch of ways to get yourself tied up in knots once at least 2 physical bits of hardware are involved.
I loved working there. Sure, who your boss is has a huge impact on your happiness. I ended up with a boss I enjoyed working for.
Does that review make you want to stay away, too?
A couple years later, a new hire got assigned to a team I was on. He was a little bummed because he'd really had his eye on another project. So we talked to our manager about it and he was allowed to transfer to the team he'd been hoping for.
I have more anecdotes like this, but the long and short of it is that in my experience Google is a lot less capricious and uncaring an organization than you imagine.
I think the problem is that their interviewing process is. That's the first thing people encounter and it's the only thing rejected people encounter. That harms people's views of the company, even if they understand the explanation about false positives being so expensive.
And frankly, I've never once worked for a company where I was immune from re-orgs.
Well put. There were a lot of reasons I turned down my Google offer out of college and have been unenthusiastic to re-apply, but the process of saying "We'll find something for you" was a huge part of it. (The recruiter wouldn't even listen to something as simple as wanting SWE over SRE.) I ended up at a much smaller company where I could know my job and product and meet my boss before I signed.
I worked in the Mountain View office. Transferring is indeed easy, and people did transfer frequently. As a result turnover was high, and it was hard to build friendships, or gel as a team.
Every team after that is your choice with full access to information. Google has a thriving internal job market.
Personally, I get paid pretty good money and have tons of resources to build models that serve 100s of millions of people.
I've interviewed a couple of ex-googler's (not fired, still employed there) at a startup and we had to turn them down. They could handle a simple coding exercise fine, but then fell over in the component where they had to debug code and extend it with a feature.
That was when I realized a big chunk of Google's employees are going to be a reflection of their interview process (puzzle solvers but not software engineers). This was even reflected in the creation of the Go language. It's stupid simple to deal with Google employees that don't know how to use more advanced features in a language without creating something unmaintainable.
You can get 20 years 6 figure salary in most of the locations Google operates offices working for other companies. Especially in the bay.
He wanted the solution, and google interviews are more algo heavy than most. There is also this:
I reviewed lots of CS papers while in uni and it only took a quick search to confirm the algorithm was what I had in mind. Completely unnecessary to memorize each one.
Keeping time has been a difficult problem for... well, as long as we've tried keeping time. For an interesting historical account on past difficulties, you might check out the book "Longitude" .
Check this talk: https://archive.fosdem.org/2015/schedule/event/ntimed_ntpd_r... . It's more difficult than it appear.
Timekeeping is such a basic problem that Albert Einstein invented the theory of relativity because of it.
Hopefully the irony isn't lost on the readers of this document :)
To me, this isn't irony at all.
It shows that google can't defeat the algorithmic complexity of things like consensus and shared state editing, and so gracefully degrades functionality.
That shows a very good understanding of scaling - realizing any solution you choose has limits and tradeoffs, and thinking about and handling those limit cases sanely ahead-of-time, rather than waiting for it to fall over and hoping for the best.
Incidentally OT is another example of the value of in-sync clocks, for those asking about that elsewhere in this thread.
The talk for which I put the slide deck together was given at a summer school and unfortunately not recorded, but if there is sufficient interest, I might tape a re-run and upload it. (Though unlikely to have time in the next month, so it might be a while.)
In the meantime, the original papers (listed in the bibliographies at http://malteschwarzkopf.de/research/assets/google-stack.pdf and http://malteschwarzkopf.de/research/assets/facebook-stack.pd...) have a lot more detail than my (very condensed) slides.
I hope you do it!
 - http://www.tested.com/tech/1926-why-google-uses-tape-to-back...
 - http://www.theregister.co.uk/2013/12/29/a_year_of_tape_tittl...
The GFS paper describes replication in a bit more detail:
> Users can specify different replication levels for different parts of the file namespace. The default is three. The master clones existing replicas as needed to keep each chunk fully replicated as chunkservers go offline or detect corrupted replicas through checksum verification
Some other reasons you wouldn't backup a distributed file system (not saying there are no reasons ever ):
(1) Difficult to add another layer of backup without impacting performance unpredictably at backup time. It's more predictable to implement much of the replication synchronously within the request (while optionally some replicas to catch up out-of-band)
(2) Files are differently important - some may warrant a greater degree of redundancy than others. The file system can understand this and take advantage of it; a separate backup system on top of the file system probably can't.
(3) A standard backup/restore process often implies downtime during recovery. One goal of distributed systems is to avoid downtime by handling faults transparently. They continuously repair themselves. See: recovery-oriented computing.
(4) A backup and restore process that's in any way intrusive on the operation of the system will not be easy to test on an ongoing basis the way that failure recovery will be tested constantly within the distributed file system. (In a big server fleets, drives will fail all the time, giving you no end of opportunities to exercise your recovery process.)
 One reason might be a defense against "unknown unknown" faults in the file system itself that cause it to irrecoverably lose track of data.
As the slides mentioned, the difference between this web-services stuff and HPC is that HPC has a much higher compute/data ratio. Said another way, this web-services stuff is all about moving data, collating it a bit, and less about intensive mathematical processing of it.
Most of the boxes in the diagrams are datastores. Block stores, object stores, columnar stores, caches. I hate to say it but... big data. Not (as) big compute.
At least the build system was recently open-sourced, so I won't have to build that from scratch. But things like Borg and D are both elegant and easy to use, and I would hate to have to go back and care about deploying software or configuring RAID arrays again. Unfun and uninteresting. Totally solved at Google for the kinds of problems I work on.
It's scary being at the bleeding edge.
...And I haven't found it to be a major problem, being an ex-Googler a little over a year out. Tech skills are easy to pick up on your own. My current startup is based on Node.js, a native Android client, and AWS; the one before it was Django & Heroku. Haven't tried looking for jobs yet - I made enough at Google to not have to worry about that for a while - but I occasionally get in-bound interest from big-name, fast-growing startups. Most clueful hiring managers look for experience with problems, not with solutions, and Google lets you face problems that the rest of the industry won't deal with for a while.
GFS, MapReduce, BigTable were the key Google inventions that became the sparks that ignited the modern day big data/analytics revolution.
And unlike Google they actually make it available to the public and then follow it up with supporting the projects.
Another example - one that I've heard second-hand - is how Facebook gets consistency on the news feed. Apparently your own writes to Facebook are sent separately to a write-aside cache, and then the webserver merges them back in whenever you view a page. As a result, Facebook is always strongly-consistent when it comes to your comments (you'll never post something and then fail to see it show up), but it's only eventually-consistent with respect to other peoples' comments. But then, you won't know about or care about the latter, because how would you know that you're not seeing something they posted?
As it is they are just demonstrating competence at scaling.