Hacker News new | past | comments | ask | show | jobs | submit login
Google Infrastructure Security Design Overview (cloud.google.com)
317 points by emilburzo on Jan 13, 2017 | hide | past | web | favorite | 52 comments

Many of these solutions are unavailable below a certain scale, and there is currently little commercial utility or pressure in offering these features in a wholly-owned-and-operated fashion to small businesses or individuals. The new deal (eg. DDOS resistance) is to rent an implementation, or go without. Basically, the gap between everyone else and the Googles of the world is large and growing.

On the other hand, I wonder how useful some of them are. Boot-level security sounds fantastic but the cost of engineering and at the rate they probably cycle hardware, with decent service-level signatures this probably largely wasted money (eg. unexpected behavior like comms from service X to service Y is default-denied at multiple levels, logged, triggers hard shutdown/reset of system). While performance is cited as a concern, you'd save a lot of money removing the design/deployment/maintenance of all that complexity and could afford a little extra (more standard) hardware.

I find that thoroughness to be what's most impressive about this stack, considering every layer and securing it both independently and in relation to the others: it's as close to a textbook example as I can imagine. After all, what's the point of securing up the stack if you can't trust the bottom? Here's hoping AWS and Microsoft get there too.

Edit: Just trawling through, seems like quite a few of the tools are on github.com/google

> Many of these solutions are unavailable below a certain scale

This was true before Google Cloud. With Google Cloud, you can enjoy these benefits whether you are an individual developer with a sub 100$ monthly budget / a mom n pop shop with 1000$ budget or a SMB / Startup with a 10K$ - 100K $ to spend on your infrastructure.

That's exactly what he's saying. Small startups, who don't want to use Google, would have a lot of difficulty implementing some of these designs on their own.

Without boot level security, it's easy for the NSAs of the world to slip in a hard drive or two with extra "surprise" software on it, later engaging in active/passive surveillance or credential theft. Always assume that any one single employee could be compromised.

The NSA's of the world don't need to hack Google's infrastructure. They can just ask.

This is protection from rogue employees acting independently, assuming it's not just marketing and ego-stroking for the engineers.

Why don't you google "NSA google smiley face".

Yes, and that happened before many of the security measures described in this doc were in place. It's one of the reasons behind Google's current and ongoing investments in security. Knowing that yes, the NSA is going after you is a wake-up call.

In particular, the doc says all data on the WAN (between data centers) is now encrypted.

I don't get it....search results returned only your comment.

First result for that search: http://www.slate.com/blogs/future_tense/2013/10/30/nsa_smile...

OPs comment:

> The NSA's of the world don't need to hack Google's infrastructure. They can just ask.

NSA doesn't just ask; they found ways to MITM Google.

First of all, NSA hardware attacks of this ilk are supposed to occur through mail. Operations the scale of Google can acquire hardware in a secure/monitored fashion that bypasses public shipping facilities which would largely frustrate this type of attack. Also, I would hazard a guess than Google building their own hardware makes attacks on their boards far more difficult than for the rest of us. As for disks, they would be acquired in serial numbered batches from known suppliers and could be quickly tested to match known performance and sensor (eg. heat) metrics at the time of ingress. This is not very difficult, and assists in protection against tampering. In addition, the use of commercial grade disk hardware acquired in large batches means that the ultimate internal destination of a given disk in the organization is very difficult to ascertain, therefore the workload would be unidentifiable. Careful internal distribution processes would add stronger protections. Regardless of a compromised disk, proper architecture in a large-scale system mitigates the impact and data exfiltration capacity of individual compromised machines. Removed hardware would always be destroyed.

With the NSA's budget, I don't see why they would limit themselves to mail-only attacks. They could compromise any level in the supply chain, especially for targets which are worth the effort. They, or, more likely, the Brits tapped Google's DC-to-DC fiber and reverse engineered all sorts of internal protocols, as seen in Snowden's leaks.

Yes - or they rigged _all_ commercially available HSMs in use for encrypting a DC-to-DC fiber.

I think it was Neils Provos who said on stage that Google does not trust link encryption, but rather prefers end-to-end, even though that's a much greater problem in terms of key management.

> Boot-level security

You can get pretty far with commodity hardware. Even Secure Boot with custom keys prevents most threats.

IMHO the biggest problem with commodity hardware is IPMI BMCs, a problem so insidious and widespread as to limit the utility of implementing trusted boot. (I designed datacenters for a major bitcoin exchange.) I would hazard a guess that Google's custom hardware has a more intelligent/limited/secure (and crypto-validated firmware based) IPMI implementation, and this contributes far more to security versus commodity hardware than cryptographically secured main processor / system boot.

I agree. Is there any serious effort at making an open source BMC firmware?

At least Intel AMT improves the situation a bit.

> We have started rolling out automatic encryption for the WAN traversal hop of customer VM to VM traffic. ... all control plane WAN traffic within the infrastructure is already encrypted. In the future we plan to ... also encrypt inter-VM LAN traffic within the data center.

It would be nice if this was more explicit. For example, is traffic that is TLS-terminated at their LB reencrypted all the way to the back end VM? At what point is it decrypted again? Are those keys unique to us or are they used for whatever traffic happens to traverse the same network paths? (I assume shared but with software-defined networking maybe it's practical for them to be unique.) What does the "control plane" encompass?

In any case, I'm curious what people think about trusting the service provider for inter-service and inter-VM encryption. Do you use the LB's TLS termination? Do you still enable encryption for your DB connections even if it is (or will soon be) redundant with their network encryption?

Anyone with access to the hypervisor at the service provider will have access to plaintext. TLS protects you from service provider network compromise within whatever scopes that covers. If you're in the cloud, you do have to have some basic trust in your service provider as compute is always in plaintext (barring homomorphic encryption).

Anyone with access to the hypervisor at the service provider will have access to plaintext.

This is mostly true with today's state of the industry, but with upcoming technologies like Intel SGX[1], the hypervisor will not be able to access the plaintext anymore.

[1] - https://software.intel.com/en-us/blogs/2013/09/26/protecting...

It's not really an issue of trust but rather defense in depth. You want to protect against rogue employees who can tap into the network, for example.

In the CIO summary they mention every service uses KeyCZAR.

First line on KeyCZAR repo:

"Important note: KeyCzar has some known security issues which may influence your decision to use it."


I work at Google. The final bullet in the CIO Summary on Keyczar was a typographical error, taken from our paper on encryption at rest (https://goo.gl/hSordh). It's since been removed from this Security Design Overview. The encryption at rest paper goes into additional detail and includes the important clarification that while a very old version of Keyczar was open-sourced, the open-sourced version has not been updated to reflect internal developments.

Thanks for the reply and follow on information. Wondering why those internal changes didnt get rolled into the public release, especially if they were security focused updates? Lack of adoption of the library maybe?

They could be design decisions that are tailored for Google's use or issues for which Google has other compensating controls.

The first listed issue is "Use of SHA 1 and 1024 bit DSA", which they admit are "considered weak by current security standards".

Not sure why the OP has been downvoted. Definitely something interesting to note.

Like "Use of SHA 1 and 1024 bit DSA". Ouch.

Keyczar is no longer being maintained and should probably be deprecated.

Either that or someone can take the reins and update it to use modern algorithms.

This is great to see. For those who don't know, this is an "assurance case" (definition: "a body of evidence organized into an argument demonstrating that some claim about a system holds, i.e., is assured") - https://www.us-cert.gov/bsi/articles/knowledge/assurance-cas...

I'm glad to see more assurance cases. You can't just do one thing and have a secure system. And if you want people to trust you, you need to give them a reason to trust.

The CII best practices badge ( https://bestpractices.coreinfrastructure.org ) also has an assurance case; details at https://github.com/linuxfoundation/cii-best-practices-badge/... . If you want to help us make that better, let us know!

> ... and laser-based intrusion detection systems

Huh? I thought that was exclusive to movies like Entrapment and Mission Impossible.

It's a fancy term for motion detectors. If my neighbour can afford one for his yard, it's not that crazy to put some in a datacenter :)

Edit: I obviously wasn't implying they're using the same ones. Come on, now >.>

Motion detectors are usually passive infra-red (PIR) sensors - no lasers involved. Unless you can cite a consumer-grade laser-based motion detector, I think this means Google's data centers are protected by slightly higher level gear.


Homesafe Safety Beam Laser Motion Detector Sensor & Alert

Only $39.99!

> This high tech device creates an invisible infrared beam up to 60 feet long and sounds a loud alarm, pleasant chime, or mutiple chimes when the beam is crossed.

From my understanding (after doing a few weeks of research on this for my own home security) the current "top of the line" tech is the Tomographic motion detectors which build up a mesh.

It has been commercialized by a security company named Xandem, some info on it:



I'll be purchasing a Xandem system soon

Nope, real-world solutions to monitor air ducts and other spaces that need to be open but which shouldn't have people in them.

I wonder what their data deletion policies really are for something like Photos. I deleted all my old photos weeks ago but when I pull down the archive of my Google data, they're still there. With such a policy, I could see that data sitting around for years while Google claims that it's in the process of deletion, something that is not actual deletion. Then again, I doubt they actually ever delete anything.

In which case, you likley didn't delete them correctly.

If you delete them from your device, it doesn't delete the cloud copy.

If you delete from an album, it removes the image from the album, but not from your account.

Google's privacy policy says has limits to delete user data, and I can assure you they are very strict about that. (Lots of data is deleted within hours, but the multiple days is to ensure all backups of it are gone too)

See http://blog.tech-and-law.com/2010/11/google-data-retention-p...

This is correct. A deletion should be effective/visible immediately, but it can take some time before all backups are guaranteed to be gone.

No it isn't google takes your data to tapes as well as offline long term storage.


They could be deleting encryption keys to the tapes? All speculation.

There was a talk about the backup infrastructure. The speaker talked about the issue of keys, but didn't provide specific details:


There are whole teams and pipelines dedicated to making sure data is deleted on all media, tapes included. The long tail can be affected by things such as a machine holding a bunch of GFS chunks from your files that went to the hardware repair queue in the meantime. Those chunks might not even be that useful without the others stored on other machines, but in the general case you can't make guarantees that e.g. they don't hold information that a skilled person could use to identify you.

Also, I'm not sure if photos has this, but you also need to "empty the trash" in your drive account after you delete something before it will be actually deleted.

IIRC it gets auto-deleted after 30 days or something.

do you see the actual photos or just the folder thumbnails?

so they've reinvented kerberos, presumably in a way that works. interesting.

(and there are many other things)

Why do you think that Kerberos doesn't work?

I have a question about Step 5 in the post, it states:

Is "Step 5: Add '1' to the end"

Is this a delimiter for beginning of the padding or does it server some other purpose?

Did you mean for this to be somewhere else?

Yeah I did, thanks. That's what I get for multitasking. Unfortunately its too late to delete :(

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact