Hacker News new | past | comments | ask | show | jobs | submit login

I love reading about Google's systems, but I wish I could work on those problems at scale, that is my dream really. I wonder what more systems Google has that we don't know about.

I know Borg has become what we know as k8s but surely there must be more things that Google has made internally that are not open source.

Curious about this and would like to know more about it from anyone in the trenches at Google.

> I love reading about Google's systems, but I wish I could work on those problems at scale, that is my dream really. I wonder what more systems Google has that we don't know about.

I work for Google and I used to have this exact thought too. I think the reality is not quite as rosy, though far from bad!

You have to realize that there are hundreds of people who work on systems like this, and as a consequence, your day to day work is more or less the same as what you would do on systems of a smaller scale.

Before I joined Google I always wondered what things they did differently and what magical knowledge Googlers must have possessed. After joining I realized that while on average the engineers are definitely more capable than other places I've worked, there's no special wisdom and instead they just have more powerful primitives/tools to work with.

Of course, maybe I am mistaken and just don't know of the magic?

> there's no special wisdom and instead they just have more powerful primitives/tools to work with

Reminds me of compound interests. Google operates at a scale where the company has enough brainpower to design systems like GFS/Colossus and Borg, which enable systems like Spanner, which enable systems like Zanzibar, and so on.

this is correct

There's still plenty of opportunity to do things at scale and change or replace major systems entirely.

The harsh truth of working at Google is that in the end you are moving protobufs from one place to another. They have the most talented people in the world but those people still have to do some boring engineering work.

But you can reduce any job to this can't you? Pretty much all engineering is just moving some strings around.

Work in finance and you can move integers around!

Work in the news industry and you can flip booleans! (too subtle?)

> (too subtle?)

It was for me! :-)

(Of course, because it's after 5pm somewhere, I'm unfortunately already lit)

Can you believe that people at Google still have to, like, eat lunch and stuff? And talk to each other? The coffee is the same damn color every day too.

Maybe there's a place, somewhere, for the purest-of-the-pure non-boringest thoughts.

"Larry&Sergey Protobuf Moving Co."

Sandstorm.io kentonv protobufs?

Oh hai, you rang?

Sandstorm.io doesn't use protobuf, it uses Cap'n Proto, which was designed to replace protobuf.

Fun fact: The Zanzibar project was started in ~2011 specifically to replace my main project at the time, which was trying to solve the same problems. Apparently, some senior engineers felt letting me work on core infrastructure was too dangerous. They succeeded in turning my project into a lame duck and making me quit, which is when I then started working on Cap'n Proto and Sandstorm.io. In retrospect I'm glad it happened.

Yeah... Google is not always the most fun place to work on big infrastructure projects.

What is the right data format to move around? JSON?

The point is you're writing mostly business logic and glue. You get a server request, you transform it with some logic, call some other servers, combine the responses and run some more logic, and return a response.

The scalability and interesting work has been factored out and handed off to infrastructure teams that build stuff like this auth framework, load balancers, highly scalable databases, data center cluster management tools, etc.

Which really is the smart way to do it. To the extent that you can stand on the shoulders of giants who've basically made scalability the default, you are free to focus on what you're actually trying to build. The only downside is if all the interesting engineering challenges are already solved for you, the remainder might not that be that interesting to people who enjoy engineering challenges.

It's just a saying. All we do is move protos from one service to another.

JSON is definitely not the right stuff.

The encoding/decoding cost is painful :(

I mean in this context if you're doing that level of scale. For a lot of purposes json is totally fine.

It's really not - compared to the wire cost/static type checks and loads of other stuff you give up.

The most impressive part about Google is how its emphasis on internal standards has allowed it to build some really impressive stuff.

Eg, You can do a sql join on any dataset, in any datacenter. You can turn any query into a hosted visualization.

Every test invocation is streamed to a central server and results can be shared with a url.

There’s more, but those are my two favorites.

Is the join a Spanner query, or is there a system on top of a Spanner that federates/aggregates databases?

Most of adhoc analysis don’t even need Spanner. With Dremel, you can simply define a table on bunch of sharded files and do a SQL query on them.

yeah the TAP/Forge testing infra is pretty gnarly

Wait until you get a glimpse into the exciting world of real-time ad bidding. It's every engineer's dream!

Adtech is interesting because of the scale, complexity, and timing required compared to many other software projects. It gets a bad look but the engineering involved is not boring.

Borg is very different from k8s!

There are open source projects you can work on. You already mentioned one: Kubernetes. But there are also others.

The worst part about any job is politics, the extremely competitive nature of Googlers makes it a less than fun place to work.

The competitive nature of Googlers is what make Google a very fun place to work.

Run in the rat race for a decade or two and report back.

The rat race typically describes the cycle of working to live and living to work without much else going on.

Competition at work is something different, and you're always in competition with other companies and people to survive anyway.

To each their own I guess.

Colossus is their data center scale filesystem. They dont talk about it..

Colossus is actually the only project I can think of for which they had one of the leaders sit down with Kirk McKusick and have a chat for ACM Queue, instead of a paper. https://queue.acm.org/detail.cfm?id=1594206

And they reveal exactly zero details. I know a bit about it, but not enough to say exactly what semantics it offers to file system clients. I believe it is not POSIX-like, hence the need to layer Spanner and GCS over it.

There's been a teensy bit more details than that, e.g. [1]. If you think about exactly the file semantics that Bigtable would require (append, pread) that's exactly what is provided. Note that Colossus and D are two separate things. Google systems can use D without Colossus and a long time ago people used Colossus without D, although today Colossus/D is implied. The presentation gives the broad strokes of how Colossus is able to bootstrap itself from Chubby. It helps if you've also read the Bigtable paper [2].

1: http://www.pdsw.org/pdsw-discs17/slides/PDSW-DISCS-Google-Ke... 2: https://static.googleusercontent.com/media/research.google.c...

??? The original GFS paper, which the chat references repeatedly, was clear about the semantics not being POSIX-like. The interview mentions that, too, along with stuff like snapshots. Colossus is basically the same, with increased scalability.

On the other hand Google engineers are vendor-locked-in using Google specific tech one way or another.

Yeah the whole time I was there every time I had to use bigtable or Spanner I always muttered to myself “I wish I could be using some free software garbage instead of this proprietary Google stuff right now.” Every Googler secretly yearns for the performance, reliability, and elegance of MySQL.

Uncool, given that Google only exist thanks to free software.

Google started as a C, Java and Python shops. Android is born out of a Linux Kernel.

If I remember well, long ago, the initial database they used was actually an in-shop fork of MySQL.

Easy to say bad things about free software now that you have billions and hundred of geniuses to work on your own stuff.

Actually, this kind of comment reminds me of the "linux is cancer" era of Microsoft. Funny like Google is now becoming the new MS now they got all the markets, while MS pretend to be nice now that they are not the top dog anymore.

I agree with you both! I don't think Google (or probably even the parent) meant to disrespect or discredit the value and contributions of open source. But internal, non-open platforms aren't really "vendor lock-in" or just NIH. They're often much of much higher quality (if you have the resources) simply by solving the exact problems you have directly.

Disclaimer: I've been bitter about MySQL-for-everything-ism lately too but think Java is pure heaven.

To be fair, quite a few engineers at Google did yearn for MySQL and a single instance at that. Not for any of the traits you list°, but because it would have let them no longer worry about HA, replication, request hedging, key hotspots, etc. It would have also meant not having a product that works when there are more than a handful of users, but that's another story. BT was a lot of work to write for and that's why Spanner evolved the way it did.

°I sense some irony

You should add a /s to make this more obvious .

Um, not really. Care to elaborate? If something is available open-sourced we're typically free to use it, as long as we are abiding by the license conditions.

Not strictly true. Most software would probably require at least some modifications to run internally but as far as I know there’s no policy preventing open source software in production, quite the opposite.

For example, here’s some information about memcache: https://www.quora.com/Does-Google-use-memcached-or-does-it-u...

There’s more, but if I can’t find a reference to them on Google Search I’ll assume its not in my place to discuss it publicly.

Using protobufs as a base layer may seem like lock-in, but it very much is the opposite. Protobufs are surprisingly simple and maybe even elegant once you get past the ugly parts, but most importantly it decouples software from arbitrary protocols and makes it much easier to deal with changing implementations. (Not to mention the potential for rich backwards and forwards compatibility.)

Why? The build-or-buy trade-off is very different at Google scale and this is one of the few organizations that can build everything in-house for their specific needs.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact