Hacker News new | past | comments | ask | show | jobs | submit login
Confidential VMs (cloud.google.com)
181 points by caution 27 days ago | hide | past | favorite | 168 comments

Does anyone know what mode of AES that SEV (or SME) uses?

I have been reading though all of AMD's documents, and I cannot find what mode of AES that SEV (or SME) uses. I find it extremely odd that this is not called out in any of AMD's documents, and frankly a bit worrisome.

For the record, "A Comparison Study of Intel SGX and AMD Memory Encryption" [1] claims a modified version of AES-ECB is what SEV uses, BUT their reference links to AMD's whitepaper [2], which does NOT say anything about their mode, so I do not consider [1] to be a trustworty resouce.

[1] https://caslab.csl.yale.edu/workshops/hasp2018/HASP18_a9-mof...

[2] https://developer.amd.com/wordpress/media/2013/12/AMD_Memory...

AMD SEV's mode has changed over time (as AMD moved from SEV, to SEV-ES, to SEV-SNP). Each time, some academic has managed to reverse engineer the mode. This paper (published 2020) has documented the developments and reverse engineered the mode in SEV-ES. https://arxiv.org/pdf/2004.11071.pdf

Another paper that mentions AES in the ECB mode is "Secure Encrypted Virtualization is Unsecure!" [0]

[0] https://arxiv.org/pdf/1712.05090.pdf

Simple picture to demonstrate: [1]. Simply put, ECB does not use different salt every block.

[1] https://ctf-wiki.github.io/ctf-wiki/crypto/blockcipher/mode/...

They document a "physical address based tweak". My understanding is that this is xoring a function of the physical address with each 128bit block before going into AES-ECB (and that function is that same for every VM on your system).

The XTS/XEX modes would be a proper way to implement this. They use the same concept of xoring an address based tweak into the message.

Two subtle but important features of these modes are:

1. the tweak is xored both before and after the permutation

2. the tweak is derived from a secret and the address in a secure way

(I don't know if AMD chose such a secure mode, or used some insecure homebrew tweaking scheme)

I only saw that in "A Comparison Study of Intel SGX and AMD Memory Encryption", which per my grandparent comment, is not a trustworthy source (they reference the whitepaper, and the whitepaper doesn't say anything about it).

Is that documented in an actual AMD source, or even talk?

As far as I know, the whitepaper is all you will find publicly. It does mention the tweak (on page 4), only if you want the details and don't trust the third-party papers, you will have to reverse-engineer that yourself =]

Fair enough. That's what I assumed, but the HN crowd is pretty resourceful too.

SEV is targeted to the VM instances, with a single key per VM. SME is applicable to the entire server, similar to total memory encryption with a single key for all host kernel/machine. SME is not super applicable to the cloud, more suitable for a single server environments.

That doesn't tell me what mode of AES it is, that's key management.

AES 128bits

Respectfully, I know that. That doesn't say if its ECB, CTR, XTS, etc. What mode AMD uses makes has huge implications on what it is good for and not good for.

The answer to your question: The AES mode AMD SEV uses is similar to XEX. AMD computes a tweak value based on the physical memory address and performs an xor-encrypt-xor operation. The tweak algorithm used on Rome (2nd gen EPYC) is based on GF math and uses a random value that is changed on each boot.

What is the attack Vector that this solution prevent ?

Am I missing something obvious ?

Will it prevent Google from being able to have a Root access to the VM?

From my understanding it does not seem to protect from Google. If they are still able to have a Root Access to the VM it does not matter if the memory is encrypted or not.

The only thing that I see, is in case of a spectre/meltdown vulnerabilty where the isolation of the RAM fails...

GCP does not have root access to the customers VMs. Confidential VMs with AMD SEV create an additional cryptographic isolation layer (in addition to virtualization one) between tenants and Google infra, mitigate 0days guest escapes, make observability attacks less possible, protect against some set of DMA attacks, and mitigate memory physical access attacks. To add to this not all spectre variants are applicable to AMD SEV, e.g. L1TF or foreshadow is not.

> GCP does not have root access to the customers VMs

I mean, they have physical access to the hypervisor host machines, where they could do anything they like to them, e.g. tap the JTAG pins of the CPU.

Insofar as you assume that the attack here is “the NSA compels Google to gather evidence against you”, the lack of just being able to log into the VM doesn’t really change much.

This protects against remote rogue employees and intruders.

As for physical attacks, Google is ultra paranoid about physical access to DCs, and I think we can quickly agree that rogue employees and outsiders would have little chance of successful attack given the outrageous (and secret) methods that Google employs. Remember, this is one of the most-attacked organizations in the world, they've had decades (plural) to enact defenses and test them, and a successful attack would cost them over $10 billion - there's a virtually unlimited budget for physical defense. Circa 2020, I'd put Google's physical intrusion defenses up against most military installations.

Is there something special that happens when a legit remote cloud user logs in that doesn't happen when a rogue remote Google admin logs in? Something that prevents a modified web client from stealing the creds during a user's web console session?

SEV protects against physical attacks. A rogue admin logging in through a management console would have to go through whatever controls and auditing the console's access control functionality provides.

> decades (plural)

Namely, two at most.

Is there a point of diminishing returns on experience for security? Two might be enough.

> Insofar as you assume that the attack here is “the NSA compels Google to gather evidence against you”, the lack of just being able to log into the VM doesn’t really change much.

For most people (and especially businesses) this is a totally unrealistic security aim. If what you're worried about is the government using it's legal ability to compel people and businesses to provide evidence then there's ~no product or service from that country that you could realistically use.

I mean, yes, there's no service where the security/privacy is contractual, that is safe from the state overriding the contract.

But the promise of things like homeomorphic encryption, is that you can do a computation on a truly untrusted substrate, i.e. you can trust computations performed by an untrusted adversary, and also know that they didn't learn anything about those computations. It's a technical solution to security/privacy, not a contractual one.

The ideal that everyone's hoping for, is that there's a way to get that same kind of technical guarantee from cloud compute providers, without needing a layer of maths that makes Monte Carlo quantum simulation look fast.

All commercial activities are at the core protected by contractural protections and good faith. Cloud is not as different as you may think.

Any other expectation of protection from the state are a limited based on probability, seriousness of the matter, and your potential culpability. Your employees, service providers and others can be required to provide information without informing you. In extreme cases, agents will pose as utility, security or building management.

> Your employees, service providers and others can be required to provide information without informing you. In extreme cases, agents will pose as utility, security or building management.

...and homeomorphic encryption would stop all of those attacks. Presuming it's a homeomorphically-encrypted substrate for an autonomous agent, making its own "evaluations" of the data it can perceive from the outside world (ala a smart contract with access to an oracle) rather than simply trusting data from the insecure domain that happens to be signed with the right key.

This is also, y'know, the security architecture that allows nuclear submarines to avoid being subverted by an enemy nation that has temporarily gained control of the White House. The sub's commander needs to know not only that they've received the order, but also that the world really looks like one where such an order would be legitimately given. The isolated secure agent, speaking to an insecure principal, needs not only proof of their credentials, but also needs to independently verify their claims about the state of the world. (And, if they can do that, the system is often architected such that the principal won't even communicate in the moment, but instead has just left flowchart-like orders in advance, involving various dead-man's-switch timers and so forth.)

> ...and homeomorphic encryption would stop all of those attacks.

It may, but how many business processes are as well thought out as nuclear missile submarines?

I think that's the reality, that most/all consumer and enterprise services can provide only a "reasonable" and compromised level of security and privacy.

If a nation state and/or corporation that runs the infrastructure decides to exfiltrate your data, legally or not - the power dynamic is totally asymmetrical.

Individuals and smaller businesses can only do so much within their power to protect themselves, and at some point accept the compromise that "perfect" security and privacy are not possible.

(Although, I can't help but think that there may be technical solutions as yet unimagined, which could level the playing field.)

Right. And now that this is a realistic and regular worry for normal businesses doing normal things the value proposition of the cloud is looking worse.

Are you suggesting that US companies should fear the US government may... steal their IP? I'm not being facetious, I just don't understand what the "realistic and regular worries" you cite are supposed to be.

You can login to your instances from the admin console according to this : https://cloud.google.com/compute/docs/instances/connecting-t... So Google has a way to login to your instances...

That's probably because there's an agent running on the default image. (Yes, duh, but if you disable that then they don't have "root".)

Of course the question is basically moot, because unless they have some sort of third party append-only log of actions they perform on the hypervisors, how would anyone ever know? Yes, AMD this and Intel that, but since Google is building their own computers, and since it's close to impossible to verify that you are in fact running in the secure enclave ... then you should assume you aren't.

Naturally, if you come up with a structure where you can incentivize Google to remain honest (eg. somehow make it evident to the world if they access or tamper with your stuff), then it becomes safe to delegate running VMs to them.

Again, of course, the same problem comes up if you try to do it in-house. How do you verify the security staff at your data center is honest? CCTV? Who watches the watchers?

You could imagine a scheme in which the company running the data centers ran hypervisor software made by intel on chips made by intel. You trust intel, and you trust their software and hardware does as advertised, but you don't need to trust the hosting company (Google).

The software would run your VM, and provide some kind of API which your VM could query to be sure it was running in a secure enclave, managed by Intel's signed software. The result of the API could be signed with Intel's key.

> You trust intel, and you trust their software and hardware does as advertised, but you don't need to trust the hosting company (Google).

I should note that this by itself is a fairly hilarious proposition.

That's pretty much the idea of SGX.

But SGX has been broken multiple times, and because people love breaking SGX because it's such a "all the security eggs in one basket" design, it will virtually certainly be broken again in the future.

Furthermore, SGX is reasonably disliked, inasmuch as people consider its equivalence to the security and boundary implications of the Management Engine.

they could make the "create confidental VM" checkbox do absolutely nothing if they wanted!

This provides an additional layer of confidentiality against physical attacks, as well as attacks cross-VM or VM-to-hypervisor attacks.

This gets you close to the security of a physically dedicated server, without the expense of actually buying/leasing a full server.

The host (hypervisor) can usually dump a guest's (VM's) memory, or tamper with it. This removes that attack vector, whether it's from a rogue Google sysadmin, or from another user who escalated to the hypervisor.

Google probably has root by default via an agent, but you can remove that. Google can probably run single user mode to change your root password, but you can change your bootloader/kernel to forbid that. Google can probably mount your disk images and just read them, but you can use full disk encryption to avoid that.

Google does not have an agent on confidential vms; that defeats the purpose. The confidential VM runs a shielded OS which includes vTPM based attestation to protect against boot/rootkits so you know if someone has tampered with your boot chain up thru kernel.

Protection from specter/meltdown or other memory isolation vulnerabilities is pretty significant.

Technology, technology, blah blah blah.

Tell me this: will Google indemnify you against all your losses proportional to the amount they are to blame?

i.e. if you lose $50 million because you relied on Google's "confidential VM" and an investigation shows it's 100% because Google didn't protect the VM, do you get a year's worth of fees back or $50MM?

You raise a very, very important (and debated) point.

(disclosure: I was at AWS 2008-2014, and VMware's cloud business 2014-2016; opinions my own, no disclosure of secrets, etc etc)

Most SLAs only indemnify you for the paltry infrastructure costs in relation to the downtime or accident.

Few SLAs are more extensive, covering for example data loss.

AFAIK, no SLA exists that extensively cover such type of loss. (perhaps with the exception of heavily customized contracts, sometimes part of a military contractor deal, where numbers are out of this world and there's huge margins for clauses and exceptions everywhere).

They don't exist for two reasons, IMHO:

1) it's very complicated to define metrics related to the amount to be indemnified, and

2) it's hard to keep that amount current, based on how the data changes, or the architecture changes.

It would be interesting to see an "insurance" for these kind of things. I once thought of such a product, just after I left AWS, and for a little while I entertained the idea of launching a startup to go after that market.

I have long felt that almost all SLAs are completely toothless and thus certainly not worth paying extra for.

A few comments on this topic from 44 days ago: https://news.ycombinator.com/item?id=23362073

Now insurance, that sounds like a much more promising facet for protection. But still I wonder about the incentives: SLAs are offered by the original company and mean that they have skin in the game. (Well, they already have skin in the game as you’re a customer and if they mess things up you will look less favourably upon them as a provider, but the SLAs should in theory provide more direct financial incentive to avoid breaking things.) Insurance would be offered by by a disinterested third party¹, and thus lose that SLA incentive.

¹ Insurance in general relies upon this for scale and safety; I can’t imagine trusting any provider offering first-party insurance: they’d likely go bankrupt the first time anything went wrong.

UL started that way: insurers grew tired of exploding steam boilers and asked their owners to have their boilers inspected and only insured owners who adhered to safety rules. I'd imagine insurance companies would hold the providers to some security standard.

The SLA you mentioned there was problematic because it didn't actually cover how much you were losing if the service went down. A SLA that would cover that would certainly not be useless, and likely not be toothless either, even if insurance was involved.

I wasn’t speaking about any one SLA; I was speaking about the contents of pretty much every SLA I’ve ever looked at. Seriously. Coincidentally, that very day, I happened to find what is I think the best SLA I’ve found, which went as far as refunding 5× what you paid them for the month after a certain amount of downtime, and almost added it to that comment as the first decent one I’d come across, but as I read further, the amount of downtime it took to get to any meaningful refund still seemed unreasonably long, so I ended up deciding it was still mostly toothless, though perhaps its gums were just a mite tougher than most.

Negotiated SLAs may be able to take into account customer business losses (I don’t know, I’ve never seen a negotiated one), but standard SLAs never include that.

Interesting. I wonder if a $1000/month insurance linked to a customer's subscription would be enough to provide a 1000x cover? Essential proving that the risk quotient is < 1/1000 subscriptions. Such an offering by a cloud provider might make a huge statement.

The huge problem with this is that you are giving an economic incentive to the customer to "go down" for hours or days.

Yeah, but unplugging your servers does not count as "my provider is down".

Hard to prove bad behavior, though.

Does business insurance cover this?

A normal business insurance policy covers losses to your business due to your own accidental actions, or an employee's willful illegal action, but generally does not cover a loss due to a contracted service not performing properly. So the cloud-service company might have it covered for their own losses, but not for their customers.

And a company the size of Amazon or Google probably doesn't actually buy business insurance; they are larger than many insurance companies and will find it cheaper to just accept most losses in their current regulatory regime.

I've been thinking about this in regard to AWS. The encryption at rest for most things is completely transparent so the only thing your really protected against is someone walking into a data center and grabbing your drive somehow. Or improperly disposed drives. Maybe some kind of hypervisor or SAN exploit but I don't know much about that.

AWS seems to have turned part of the cloud operating model they are supposed to be responsible for back onto the user and no one questions it.

There are other use-cases. For example, you can crypto-shred an unlimited amount of data just by deleting the key associated with it.

You can also set up workflows such as your client owning the encryption key that encrypts data held by you and they can revoke it at any time. Slack has a similar system and I was asked by a large financial institution about the same. I expect to see this more in future.

> For example, you can crypto-shred an unlimited amount of data just by deleting the key associated with it.

Sounds like a great target for ransomware crews!

Yes indeed! But you mark the key for deletion and there's a minimum time of 7 days before it is deleted and during that time you can't use it. You've got quite a while to realise there's a problem and fix it.

How many critical domains and TLS certs expire with many months of notice? In a proper shop, someone will be alerted. Most places are not on the ball, and that alert is going to go entirely unnoticed.

You'll probably notice your disks not mounting and your vms not starting up and your users not being able to take data etc. Or not, depends on how frequently accessed it is.

We'd definitely notice.

This sounds reasonable.

AWS encryption at rest comes in two flavors: their managed keys protect against the threat you managed (which is still important for some compliance targets) but if you use customer-managed keys you can go further and protect against compromises in your accounts - server A only has access to encrypt, server B can only decrypt, role C in account D can encrypt data before transferring it, even the account root user can’t update the policy to break that, etc. It’s considerably more work but also more benefit.

Not sure what benefits are there in scheme you described. Which threat exactly is mitigated by using this tech?

Simple example: say I use S3 with the Amazon-managed key. Anyone who has a policy granting access to a bucket — and lots of people love to write bucket policies with Resource:* — can read or write objects in that bucket using that key.

If you use a CMK, you can write custom policies which are both far more restrictive and shouldn't change as much as other resource types. That means that I could, say, have a policy which says the key-admin role/group/user is the only principal in the account which can update the key settings at all (even administrator can't do that then), the writer role is the only one with kms:Encrypt, and the reader role is the only one with kms:Decrypt. No matter what S3 access you have, if you aren't one of those roles you won't be able to use the encrypted data. This is probably also in a scenario something like “central group A provisions the KMS, devops group B creates lots of other resources using that key”.

You can add conditions, too — “anyone in our account can encrypt, decrypt requests can only come from this IP address or VPC”, “only requests from this AWS service are accepted” (i.e. that compromised EC2 instance can't use it), “access to data encrypted with this key can only be done in the two regions we approve of that”, “this particular encryption value must be used on all requests”, etc.



That adds an extra layer of defense: if I compromise a user, even one with some level of administrative access, who doesn't have the CMK access all I can get are errors or encrypted data rather than the raw data. If you're careful you can architect environments where a person can deploy code without direct access to secrets or a system can stream data through to an encrypted store (if you are storing PII, this can be a huge difference between “all of our users” and “only the ones who used the system during this time period”). In some cases these can be bypassed (i.e. a CMK might not allow Administrator to access it directly but they could possibly issue credentials for a user who does have access) but you're preventing generic attacks which just scrape up everything that compromised credentials have access to, and hopefully increasing both the level access required and the likelihood of producing an audit alert.

Sorry, here's your cloud fees back.

Self hosting (remember that) seems best for really confidential things.

Self hosting is only best if you want maximum control. But most people can't handle that imagined level of control in reality, at least not on the same level as a major cloud platform. AWS (or GCP or Azure) infrastructure is overwhelmingly more likely to be able to do security better from a maintenance, intrusion handling or physical security perspective than any single organization can do, especially organizations smaller than AWS. The tradeoff is that despite how much AWS can claim to not touch your data, and all the feature, contract and compliance documentation they have to show for it, you can never be sure they're not touching it (deliberately or by accident).

Self hosting is not about confidentiality. For nearly all categories of "confidential data", I would much much rather have it in a major cloud platform than running in some closet somewhere or in some random colocation center somewhere, all other circumstances being equal.

Self hosting is about how much you want to be in control, regardless of your capabilities to actually be in control.

> Self hosting is only best if you want maximum control.

Not necessarily.

Beyond a certain scale you can go build your own datacenter (or smaller: rent a whole rack cabinet in a datacenter) and start exploiting economics of scale.

A lot of people don't realize that nowadays you can pack tens of cores and literally terabytes of ram in a 2u server.

That's what I don't quite understand about the current state of cloud computing. We're seeing huge advances in hardware/network technologies this decade but there's an ever increasing push to centralize hosting with cloud providers. Will this ever swing the other way?

Look at what's driving the shift: data centers are a major capital investment up-front plus a significant amount of staffing to operate and secure them. If you have enough proven need to justify that, you can easily beat a cloud provider — especially if you can simplify the problem in some ways that a generic service cannot.

For most organizations, however, it's hard to justify investing millions of dollars up-front in the hope that at some point you'll be saving enough to make that pay off. If that's not your core business it's often easier and safer to outsource it so, for example, you don't end up with a data center full of 50% utilized hardware which you bought to have capacity for growth which wasn't quite what you expected — or a big crunch when you have more demand than capacity and now need to double that investment to handle [currently] 10% of your usage.

> For most organizations, however, it's hard to justify investing millions of dollars up-front in the hope that at some point you'll be saving enough to make that pay off.

Well if you have your bills and a prospect of how much building and operating a datacenter would be, it could be very easy to do the calculation.

Btw one should not dismiss so easily the work of datacenter companies. They often have very high security standards and practices.

And this means that you don't necessarily have to build a datacenter from the ground up. You can start saving by just renting one or two rack cabinets and start putting your own hardware in there.

Oh, I’m not being dismissive of their work - it’s just multiple lines of skilled work which you have to complete. The building, hardware, and software management all require 24x7 operations and security, work with vendors and capacity planning, etc. and overseeing all of that work.

At some level of usage those costs are lower than the savings but that line has been going up for years, especially for anyone who needs PCI, HIPAA, FEDRAMP, etc. where there’s a ready package available covering a lot of it.

Yeah especially if your company has more than one location - for redunancy

Dell PowerEdge r6525 - 1U server

- CPU: 128 cores, 256 threads (2 sockets)

- RAM: Up to 2TB RDIMM or 4TB LRDIMM (16 channels)

- Avg. power at 100% load: 750W

Standard rack size is 45U:

- CPU: 5760 cores, 11520 threads

- RAM: 180 TB

- Power: 33kW

You might need to sacrifice 1U or 2U for switches.

Current generation is so "cloud-happy" they dont appreciate the cost of being cloud-based... (10x? 100x? more?)

Large companies have been announcing HUGE savings, small companies would be able to save a LOT too... such a pity, all the cloud abstraction creates lazy teams IMO, and lazy companies... (again IMO, I know this wont be a popular view, because this audience is exactly the cloud-happy audiencem but if you achieve self criticism, self-hosting / colo etc is probably a better fit for 99% of cases)

It seems you're talking more about running costs, and not about any of the security aspects I was talking about?

I'll happily accept that you can pay less money for the same amount of power, but security isn't free. You don't only outsource a considerable amount performance and reliability engineering to $MAJOR_CLOUD_PROVIDER, but also a lot of security engineering. Doing a lot less of that is cheaper for sure, but is that worth the cost? I'd argue that for most (not all: most), it isn't.

At ever growing scales the equation will eventually tip in your favor, but you have to either be working at a very substantial scale for that, or you simply must not care for an important portion of the tasks that the major cloud provider picks up for you. That is fine by the way, but you have to be sure that that is actually a concious decision and you're not simply forgetting to actually do that work or doing it poorly.

I hope I don't sound combative, but "most people can't handle that imagined level of control in reality" don't seem very fact based. All companies used to self host, because there wasn't cloud. Many organisations still self host, like shared hosting providers, governmental institutions, banks. I have two counter arguments against "big cloud can do better":

1) There is often assumption, that people behind cloud services are smarter than anyone else and don't make mistakes. In reality, they still are humans. Big names attract some bright people, but not anyone is genius with good working in team skills.

2) Cloud companies have much harder problem to solve. They have two hostile fronts, outside world and clients. They need protect themselves from malicious clients and keep clients separated. They offer generic services for everyone, so there is unused functionality for your use-case. You can disable/uninstall things in your self hosted setup (infra as whole, not meaning inside your virtual machine), filter aggressively in network perimeter and so on.

Self host don't only mean "running in some closet somewhere or in some random colocation center". I can't say, how things are done in US, but in my country government has several DC-s/server rooms for governmental agencies. There is on premises hosting too, sometimes with very good physical security.

Bear in mind that while cloud providers can be more technically sophisticated with their security, they are inherently less secure from the get-go than a box on premise: Because by default, the cloud provider must be configurable across the Internet from anywhere in the world, and my self-hosted box, by default, can only be configured by a mouse, keyboard, and monitor physically plugged into it.

From that point, yes, you open up your self-hosting to the world in a (hopefully) limited fashion and restrict access to your cloud management (hopefully) to a much narrower scope. But by default, a box in your building starts completely secure, and your AWS box starts accessible to anyone on the planet with your AWS password.

While it's true what you say, the underlying assumption is the general approach of "my network is secure and keeping track of what goes in and out is something I control". I won't comment on how it applies to your situation, but I do think this is assumption is outdated castle-and-moat security thinking. This assumption also falls apart on closer inspection in 99% of the situations and in my experience especially so in cases where people bring up this assumption.

Large companies with very decentralized infrastructure (who also profit off selling clouds in many cases) promote zero-trust infrastructure models. This is predominantly based on what works for them (having hundreds of offices or large amounts of remote staff), and of course, a must to sell people on if you want to sell cloud services.

Zero-trust is not without merit, by any means. It is good to not assume there are no cracks in your walls, and you should indeed use as much internal security as possible wherever you can.

But you know what's really quite silly? Deciding to fill your moat in with dirt and knock over your castle wall because you think it's possible for someone to get in anyways.

You had better believe I'm going to use the latest authentication and encryption tools between machines that I can to ensure nobody can listen in from a stray network connection... and that I'm also going to put all of it behind a firewall.

Yeah, lock your doors inside your castle, but for heaven's sake, the moat and the castle walls still help. Defense-in-depth is a concept I swear everyone forgot when clouds became a thing.

Ah yes, but now you're describing requiring you to set up engineering regiments within your castle, some portion of which you could outsource to your cloud provider in the alternative situation. My point is that this security boundary that you posit as an advantage in your original post is simply not very interesting (and often harmful) on its own.

Cloud providers aren't very intelligent security layers.

My favorite example is my Google Voice account. It has a different area code (out of state) than my real phone number. I get a lot of spam calls, almost all through Google Voice, and I know not to answer them, because nobody legitimate calls me from the area code my Voice number is from.

Google has state-of-the-art artificial intelligence and spam filtering capabilities. It's arguably the two most sophisticated advantages Google has. And it is completely ineffective at blocking spam calls. If Google Voice gave me the ability to create my own filter rules, I could write a one-line rule that would drop any call from that one area code, and I would have perfect spam filtering for my account.

This isn't an example about Google Voice, but about the difference between generalized technologies that cloud providers use versus configurations you can apply yourself that are custom tailored. Obviously, Google can't block everyone in that area code as a spam filtering method... many people legitimately have that area code. But for my phone, it would be a good rule and would be nearly 100% effective.

Which is to say, my engineering regiments will always be more capable than my cloud provider's engineering regiments, because mine know my system and my customers and my use cases. I'm paying engineering regiments either way, so I might as well pay my own.

> Cloud providers aren't very intelligent security layers.

I think I lost track of what you're trying to discuss now. I'm not arguing cloud providers are a security "layer" in any sense, just that they take responsibility for some things you otherwise need to do yourself. If you got that from my post I apologize. Even if I said something like this, I don't know how your Google Voice example (which is an application/service) applies to cloud infrastructure.

> Which is to say, my engineering regiments will always be more capable than my cloud provider's engineering regiments, because mine know my system and my customers and my use cases.

Good for you if true, but I've personally never seen an environment where such confidence on the part of infrastructure engineers has held up. At least not from a security perspective.

> I'm paying engineering regiments either way, so I might as well pay my own.

If it turns out the equation favors you, then great, those companies exist. But I don't think the equation favors many, at least not when including all the items you need to have for self hosting.

> I don't know how your Google Voice example (which is an application/service) applies to cloud infrastructure

I tried to explain the concept above, but it's that whether it's an application/service or cloud platform, it's tooling has to be designed for the entire customer base. Often, a far stupider solution can be far more effective, if it only has to be written to apply to one use case.

> such confidence

Don't get me wrong: Nobody's perfect and everyone has security holes. But things like all of the public S3 bucket fiascos should remind you that the cloud is, by default, open to everyone, and people become incredibly overconfident that Amazon or Google or Microsoft will keep them safe.

> If it turns out the equation favors you

It almost always does. When I do something in house, I am paying for hardware, software, and engineers. When I do something on the cloud, I am paying for hardware I don't own, software I don't own, engineers who work for someone else, and a healthy profit margin for one of the five most valuable companies on the planet.

Cloud is a narrowly-effective solution for startups which can't size out their solution themselves fast enough, and short-time peak loads. For everything else, you should probably not cloud.

Self hosting is no longer realistic because all the data you actually want to compute on is now in the cloud. The egress costs of that data into your self hosted environment would destroy you.

Imagine a world where the cloud produces more data at a rate that exceeds the pipes leaving the cloud. You are quite literally locked into computing within that same cloud.

Its very realistic! Its being done at this moment! (amazing, isnt it?)

lfmao... but jokes aside, the "cloud" is just series of datacenters with less choice of a brand, you do realize this? There is no such "cloud" as you say... Internet allows exactly for decentralized data (among cloud-1a-b1-c2 to cloud-2b-5x-3h is not all "in the cloud", its the same as datacenterA to datacenterB.... proximity still has an effect, as do all other network conditions...)

Dont let the marketing sandcastle trap you in!

Except Google has a much much bigger budget for security engineers than you have...

In this threat model that's a downside, not an upside.

SEV is not, to me, a convincing security model. It was tried a long time ago, it doesn't work. SGX uses small enclaves for a reason. Hacking your average Linux box is quite easy already, which is why compromised passwords flow like water.

With the SEV threat model the cloud provider is no longer on your side. They are no longer defending you against threats, they are a threat. That's why you want encrypted RAM. But you're being threatened by one of the world's most advanced security organisations, a company that literally pays a large team to do nothing but locate zero day exploits all day, every day. And they swap zero days with other major tech firms too. So they have access to bugs in your OS before you do, and this is structural, there's no fix for it.

Worse, they recommend you use Google-controlled, 'hardened' OS images. But that makes no sense because you're trying to defend yourself against Google.

Finally, it's very unclear to me that the Linux kernel is going to accept bug reports of the form "the hardware behaved in arbitrary incorrect ways because it was redefined by a malicious hypervisor on the fly". The kernel isn't designed to deal with a malicious hypervisor. It's not going to be checking the results of things that might be hypercalls to check the hypervisor didn't hand back invalid results. In the past this has caused big problems with userspace apps trying to run on malicious kernels: the kernel was able to break in to the encrypted memory space immediately because the kernel could manipulate syscall returns in ways the app didn't expect.

But let's put OS hacking to one side.

SEV has a very poor track record of security. There were dozens of bugs in their firmware in past revisions, ordinary C type buffer overflows. Then there were basic crypto bugs: you could send the firmware an invalid point on the elliptic curve and it wasn't checking for that, things like this. Perhaps Google has audited AMD's firmware now and have knocked out all these problems. Perhaps not. Who knows - they mention working with AMD on performance but not security.

Moreover, SEV doesn't have anything to say on the topic of side channel attacks. And unlike with SGX, because the whole point of SEV is you use existing software and operating systems, there's also no fix for this problem. Normal kernels and apps aren't designed to resist a compromised hypervisor. SGX enclaves are designed for-purpose so you can argue that they're the smallest piece of app logic that needs to process your data and you can go to town on securing it, whilst the bulk of the app handling resource management, connections, scheduling, etc, is blinded by cryptography. Enclaves expect the host to attack them because they were written that way. They can do things in a less efficient but more side-channel proof way. With SEV this is theoretically an option (could run some specially written hardened OS), but, it's not how it's being advertised so nobody will do it.

> But that makes no sense because you're trying to defend yourself against Google.

If you're using a cloud provider, you ultimately have to trust that provider is doing what they claim. After all, you have to use their management control plane to configure SEV, and that control plane could always be lying about whether SEV is actually working.

So SEV isn't intended as a defense against Google as an organization. What it can do, is provide a layer of defense against rogue hardware administrators, as well as other tenants that might be sharing the physical machine.

Actually SEV is meant to protect you against Google as an organisation (which would be the same thing as rogue administrators in this case). They don't mention it in the announcement but SEV is meant to be used with a little client side tool that does a remote attestation with the remote hardware. It handshakes with the firmware and you get back a hash as part of the VM boot process. You check that against an OS image you trust, and that's how you know what booted.

If you don't do this then it provides no protection. The host can break in by just telling you SEV is in use when it's really not.

You are aware that SGX is also extremely broken, right?

Kinda when compared against perfection. Not when compared against SEV.

Consider that most of the side channel attacks being exploited against SGX also work across process and across VM domains. They tend to get advertised as SGX specific because that's the juiciest, newest and coolest target to hack. But they can break arbitrary CPU enforced protection domains.

Some of these side channel attacks are Intel specific. Many of them aren't: they're to do with how CPUs are designed, which is why Spectre et al affect AMD as well.

Whilst SGX gets a lot of focused attention from researchers exploring side channels, it has turned out to be pretty robust against the more ordinary kinds of attacks that felled a lot of prior systems, including multiple generations of SEV. Nobody has ever found basic cryptography or C programming bugs in the system enclaves, for example. I thought that would happen at least once - never did. All the bugs have been people reverse engineering CPU internals to a much greater degree than ever done before.

One reason they do this is because SGX is patchable in the field. A remote client can tell if the CPU microcode and SGX stack were updated to close vulnerabilities. Intel call this TCB recovery. So, it's kinda 'ethical' to research SGX bugs because you aren't breaking anyone's equipment.

AMD SEV has sadly not had a working equivalent of TCB recovery in prior versions. There was an attempt at such a mechanism but it can't stop downgrade attacks, so doesn't really work. Researchers have managed to totally break SEV such that the CPU generation itself had to be discarded and replaced, not just once, but multiple times. That's the worst case scenario for hardware roots of trust. Hopefully the new gen chips won't suffer any similar fate.

Given the fact that SGX has always been renewable/patchable, that all bugs found in it were really hard-core low level CPU design bugs of the type that AMD have also had, and that it has a stronger security posture to begin with (less code in enclaves than a whole OS), I'd say overall it's doing well. Now SEV is playing in the major leagues I expect to see more research on AMD chips: it'll be interesting to see what they come up with.

Any contract worth its salt is going to limit its liability to the extent feasible.

Unless Google Cloud specifically offers an insurance policy against security breaches (spoiler: they won't), the most you can expect is to be refunded your hosting fees.

That being said, you can always buy your own insurance to protect against that kind of risk, and _the insurance provider_ may mandate using a technology like Confidential VM in order to qualify for insurance coverage.

Nobody does that -- it makes absolutely no business sense to a service provider. If you want your $50 million back, you buy insurance. :-)

The best you can get from vendors/service providers are SLAs, where damages are meant to be mildly punitive (amount x number of affected customers can add up and hurt the provider.)

The SLA normally covers only the cost of running the VM, or a small multiple thereof. You can negotiate a better SLA, but you’re unlikely to get it.

A better alternative is insurance. There are many insurers that offer contingent business interruption coverage for failure of cloud infrastructure, among other causes.

I think what you are thinking of is called insurance... it’s different from an SLA... you can get insurance to help protect your business but it can be expensive

Will the bank indemnify you against all your losses if a thief breaks into your safe deposit box?

Nice, time to launch Silk Road 4.0

I'm just kidding

The Empire brand is so much bigger and more valuable

May as well note: SEV relies on AMD-signed vendor firmware blobs. This means that AMD, or anyone who can get their keys, can compromise the security of SEV.

Actually, the integrity of the AMD microcode can be verified using Google stackdriver as part of VM audit logs, together with the integrity of the VM kernel.

They state in the press release:

> With the beta launch of Confidential VMs, we’re the first major cloud provider to offer this level of security and isolation while giving customers a simple, easy-to-use option for newly built as well as “lift and shift” applications.

How is Google's offering different from the Confidential Compute Microsoft already offers?[1]

[1] https://azure.microsoft.com/en-us/solutions/confidential-com...

The Microsoft solution uses Intel SGX and all it gives you is access to a machine that has an SGX-enabled processor with some SDK tools pre-installed to use it. The SDK is C-based and reimplements parts of the C standard library -- having non-standard arguments and return types in some cases (like say unsigned instead of signed.) In this case: you have to write all the code yourself and manage the secure processor features to use it. It's application-level, same host / OS.

Googles offering is more intuitive. Google Cloud already stores data on VMs encrypted to disk and handles decryption to be able to start the VM. But once the data is in memory its unencrypted and could be read by other processes. On a bare metal machine there may be more than one VM using portions of the same processor with physical access to the same memory range. So compromising this at a higher level would effect other customers. With the new offering it supports encrypted memory isolation for a virtualised VM. It's on a VM-level or 'whole operating system' meaning there is no need to write any special code to take advantage. You just tick a box. Tech is by AMD and not Intel.

Both of these options support different use-cases, IMO. Microsofts confidential compute allows you to manage untrusted applications on the same host with a high level of control. You can prove to other machines running the same app that you're doing this 'securely.' The Google solution doesn't give you the same level of granularity but is much, much easier to use for those who just want to take advantage of better memory protection and integrity checks.

My thoughts on this are mixed though because:

1. While the products are clearly very different -- Intel's SGX tech has already had numerous security vulnerabilities and that doesn't make me very optimistic AMD will have magically solved those issues.

2. The general advice in finance for highly sensitive data is not to use VMs, period. Since privilege escalation on one VM could potentially lead to access to the bare metal and hence access to the other VMs. Some of these risks still seem relevant even if memory protection is being used. I.E. it's better not to use VMs if you care about security. Trying to attract more highly sensitive data to 'the cloud' makes me nervous, to be honest.

3. I like the concept in general. Even though it's not a silver bullet it's nice to be able to have access to this option.

SGX has a stronger security model, but targets enclaves instead of VMs.

If you want something to run in SGX, you will likely need to rewrite the software (you can’t call syscalls directly in SGX since you cannot trust their results anyway).

SEV’s security model is weaker (no integrity), but lets you use essentially normal VM images.

Disclaimer: I work at Google in this space.

This uses AMD's SVE, which as I understand is more akin to Intel's MPX than SGX (which is what Microsoft is using).

The equivalent Intel technology is TME, which is not present in any commercially-available processor yet.

MPX was an instruction extension that was never widely adopted and that Intel has deprecated.

If your data is sensitive you should not be sharing resources (cores/memory) with other users, IMO.

I am the first person to agree with you, without hesitation.

But it’s worth looking at exactly what it means to attempt isolation of programs on shared hardware. SEV is an interesting way of working on it.

It seems pretty clear that your data could never be modified or read, but there’s nothing preventing starvation of resources or side-channel attacks to leak encrypted values.

Of course I also always argue against using the cloud for anything sensitive as “the cloud is really, just someone else’s computers”. Albeit with a fancy provisioning api and some proprietary services adjoining it.

I question the limits of security in cloud services, but there's more to it than the technical aspects. There's a component of CYA for regulatory compliance. If you're a financial or medical company, you've got to demonstrate a good-faith effort to implement various thresholds of security to pass e.g. SOC audits, and this helps. Nothing is 100% secure. This is a component of an overall security plan. I don't want to overstate the benefit, but it's more than just PR.

> Of course I also always argue against using the cloud for anything sensitive as “the cloud is really, just someone else’s computers”.

This is too simplistic: employing that argument obligates you to show how you’re mitigating the same threats on your own, especially with regards to ops and security staffing. I have considerably more confidence in any major cloud provider having robust internal monitoring than the typical corporate VMware deployment, and that even extends to bare metal unless you can air-gap it — if you get a bare metal server from AWS, Azure, Google, etc. they’ve still put more work into the firmware, management interfaces, etc. than most IT groups do and those are very juicy attack surfaces.

This conversation will become quickly fruitless because everyone is different levels of risk averse.

For me, I can say plainly: "this piece of equipment has these access controls, both physical and virtual and we have various radio frequency dampening systems" etc;

For you, you can think about outsourcing that responsibility.

There's no "right" answer, some cloud providers may indeed have much stricter access controls than I could ever have (for instance, budgets may require my servers to exist in a physically shared space, albeit in my own racks; those racks being porous to allow airflow). But ultimately you will never have more control than if you have complete ownership and audit capability of all systems.

I'm sure many people have lived in the same regulatory hell that I have; and I wouldn't argue that the regulatory hell is easier in the cloud or otherwise; I would instead argue that if I was the CIO; I would sleep better knowing I had done my job and not attempted to outsource the responsibility and wash my hands of it, which is what you're effectively doing, even if you trust the cloud provider, even if they've shown good faith- it's no longer your eminent domain to oversee.

Right: my point is simply that you have to start with a threat model and make reasoned decisions based on that and your budget. “always argue” is the same as “wrong for a significant number of people” even if it's right for your particular circumstances.

"Always argue against" does not equal "never give in".

But I can see how you read it that way.

I would definitely challenge you on 'wrong for a significant number of people' because if you're focusing on security then it's likely a core principle; and therefore you need to understand and be able to effectively argue your case.

And that doesn't matter if you agree with my position or not for that last point to be true.

And that's why I colocate my servers.

>And that's why I colocate my servers.

And its often cheaper has better performance and no lock-in to a cloud provider.

There's an entire ecosystem of supported services in GCP (or whatever cloud provided you'd be passing on) that you would have to find an alternative for if you want to co-locate and manage your own physical equipment.

The more logical alternative (without abandoning the cloud entirely) would be to use the cloud provider's dedicated instance functionality (GCP's terminology for this is "sole-tenant nodes"), but these are much more expensive than virtual machine instances, especially if you don't need the capacity of a dedicated node. At some point, you or your bosses are going to be asking if the security is _really_ worth the premium.

SEV-enabled VMs can provide a convenient middle ground -- more protection than just a hosted VM instance, but since you're still sharing physical resources, the cost is closer to a VM than a dedicated instance.

If SEV VMs are considered the equivalent of dedicated instances from a compliance perspective, this could open the door to cloud hosting for a variety of industries who were unable to do shared hosting before. However, that if remains to be seen.

Having worked at companies that colo'd and ran on AWS I think the difference isn't as big as you're making it seem. Many of services you "need" in the cloud are only needed because you're in a cloud in the first place.

The good news about colo'd equipment is that it's dirt cheap. You can have millions of customers running on a few poweredge nodes with full redundancy and capacity to spare.

> You can have millions of customers running on a few poweredge nodes with full redundancy and capacity to spare.

As someone that actually manages "a few PowerEdge nodes," you're overstating their capability and oversimplifying what it takes to run a production-grade system with millions of users.

I mean my current company got our first million users with four used poweredge nodes and two database servers. Since we were using bargain basement equipment we were extremely paranoid about hardware failure and so every little aspect of our app was built to handle 2 whole nodes failing simultaneously and every service running on those boxes in VMs was two-failure redundant as well.

I mean ops isn't simple no matter where you're hosting it but it's not any harder than when I worked at AWS shops.

You can't use the ecosystem of services if you use SEV. Think about it: if your encrypted web server VM stores data in a Google-provided SQL database, your data is no longer encrypted. Or if you're using Google-provided SSL termination, or Google provided key hosting, or Google managed operating systems, or Google-provided AI, or Google-provided log analytics.

If you need to protect your data, all of those services are no longer usable.

None of these things are incompatible with a VM running SEV. In fact, since many of Google's managed services themselves use Compute Engine, they may already support Confidential VM's running SEV.

It's irrelevant. You can't audit those services because they're Google-proprietary software, and you aren't the ones with administrator access to them.

This stuff is new, so I think there's going to be a lot of confusion about what it means and how it works. But the basic concept is easy. SEV and SGX don't mean anything except that you can remotely audit a piece of software that's running by checking the hash of the code that was loaded, and that code can derive private encryption keys unavailable (in theory) to any other piece of software on the system, and that its memory is encrypted.

In SEV that piece of code is effectively the entire OS. In SGX it's a much smaller piece of code, more like a single shared library.

With a Google managed database system, you don't know what software it precisely is. These clouds aren't open source. Even the hypervisors aren't open source, as far as I know. Even when based on an open source product there are proprietary patches. And, they change constantly.

That means if your VM starts up and you get a hash back saying it's the OS you wanted to start, that's great, but the moment you open a connection to some other server even if it's all running on AMD hardware, all you get back is a hash and moreover (iirc with SEV) a random hash that's only useful if you're actually the one that started the remote VM. With SGX you get the actual code hash and/or code signing key, even if you didn't start it, which is a bit better.

What does that hash actually mean? Unless you can reproduce the build of whatever you're talking to and audit the code to look for back doors, and keep auditing it as it changes, it means basically nothing. That's why SGX focuses on really tiny pieces of code - less attack surface, but also less audit churn and easier reproducibility.

Ultimately to have guarantees your data is private in a cloud, you need either to audit the software stacks and rely on hardware roots of trust, or you can use clever forms of encryption like FHE. But just ticking a check box by itself does nothing at all.

> This stuff is new, so I think there's going to be a lot of confusion about what it means and how it works.

SEV is a hardware function that provides real-time memory encryption. That's all. Maintaining full confidentiality beyond the boundaries of the processor's memory controller is not within the scope of SEV.

Google Cloud's Confidential Computing platform _may_ offer a broader solution with some type of cryptographic guarantee, using SEV as a component of the solution. I don't know how far they are, or if that's the direction that they're looking to take the platform.

However, if you're looking for a platform with remote attestation that the computing environment is fully trusted, SEV is not sufficient. You would need a traditional TEE for that.

> just ticking a check box by itself does nothing at all.

Taken directly from Google's blog post:

"Confidential VMs can help all our customers protect sensitive data, but we think it will be _especially interesting to those in regulated industries._"

(Emphasis mine)

It provides additional hardware-backed protection when sharing a physical machine with other tenants, which among other things, can make shared hosting a possibility for security-conscious environments that previously prohibited it.

If that doesn't sound like a useful feature, or you feel that it's theater, then you're probably not the target market for the feature.

SEV is a hardware function that provides real-time memory encryption. That's all.

You're demonstrating my point for me. That isn't all, by any means. SEV is primarily implemented in firmware, and provides a form of measured boot and remote attestation. Don't take my word for it:


"AMD Secure Processor. Provides cryptographic functionality for secure key generation and key management."

This is literally the second feature of two that it advertises as part of SEV.

Those parts are critical and SEV doesn't really mean anything without it. RAM encryption is only useful if you don't trust the owner of the host hardware. But if you don't trust the host, you can't assume they switched on RAM encryption or booted the OS you asked for into the VM, you have to check it. That's what the remote attestation lets you do.

If that doesn't sound like a useful feature, or you feel that it's theater, then you're probably not the target market for the feature.

I work in regulated markets! And yes, it's true, there's a lot of regulators that can be satisfied with security theatre. The weakness of regulator understanding of technology isn't, by itself, a reason to consider SEV without RA useful.

Most things fall on a spectrum - I view this as an iterative enhancement.

> Confidential VMs leverage the Secure Encrypted Virtualization (SEV) feature of 2nd Gen AMD EPYC™ CPUs

Powered by AMD. I wonder who will leverage this next.

So what does SEV actually protect against?

Something like heartbleed would still happily decrypt and transmit confidential data.

Something like speculative side channel attacks would still speculate on the unencrypted memory right?

Rowhammer would still flip bits, but now one bit flipping would turn an entire 128 bit block into garbage when decrypted? It seems like that would at least make rowhammer a lot harder to exploit into a privilege escalation. ECC memory already gave some limited protection here.

> Something like speculative side channel attacks would still speculate on the unencrypted memory right?

This wouldn't prevent a Spectre attack or similar cache-based attack, as the memory would be decrypted at that point. However, it would mitigate attacks like Meltdown or Rowhammer.

I don't see how it protects against meltdown, wouldn't the memory still get decrypted by the escalated access?

Edit: If each VM has it's own key, then they couldn't read each other's memory with meltdown. I think that must be the angle.

> I don't see how it protects against meltdown, wouldn't the memory still get decrypted by the escalated access?

No. A particular VM's memory is decrypted by that VM's key.

Assuming that AMD's CPUs were vulnerable to a hypothetical attack similar to Meltdown, either a VM or a guest would be able to dump the machine's memory, but the memory contents belonging to other VMs would be encrypted and unintelligible.

I think my edit and your comment passed each other on the wire.

That makes sense, it gives hardware protection from privilege escalation between VMs on the same host. Be it through a hardware exploit or hypervisor vulnerability.

This seems a move to make people handling sensitive data (E.g. healthcare and insurance) make sure they have peace of mind and can tick the box “security and privacy” off? Even neutralising the potential issue of being linked with the omniscient google? How will MS and AWS respond will be interesting.

Confidential VM is essentially productizing AMD SEV technology. Intel will have something similar on the market in the (hopefully near) future.

Given that AWS and Azure also use AMD and Intel equipment, I'd expect them to introduce similar functionality. (AWS is probably closer as SEV support is built into Linux, whereas Windows has no support for it as far as I can tell.)

This is a terrible name. Assume everything else is not confidential!

But everything else is not actually confidential... Confidentiality means limited visibility, when you process your data, the memory is in clear without this tech, or Intel SGX that offer confidentiality and integrity.

This is using SEV-ES (SEV2) which is vulnerable to the severe attack described last year in [1], and unfixable due to the lack of antirollback functionality.

Only SEV-SNP [2] is supposed to address it, but only on new silicon which doesn't exist yet, and that probably not even Google has.

So why is Google releasing this feature if it is so flawed?

[1] https://arxiv.org/pdf/1908.11680.pdf

[2] https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthen...

Even worse: according to a source of The Register [1], it is based on simple SEV (SEV1), not even SEV-ES.

If true, it is even more disappointing.

[1] https://www.theregister.com/2020/07/14/google_amd_secure_vm/

Is SEV really a "breakthrough technology"? AMD was far from the first to do this, and you have to trust AMD to have implemented this correctly and not be backdoored or cooperating with the US government to believe it's really secure.

> Is SEV really a "breakthrough technology"?

SEV is a breakthrough in the sense that it's essentially transparent to the guest environment. Earlier technologies like Intel SGX or ARM TrustZone have a lot of performance limitations, and application needs to be explicitly developed to support it.

Intel is working on a similar technology -- Total Memory Encryption (TME) -- but they haven't released it to market yet.

Intel actually had a similar tech without the RAM encryption some time ago, called TXT.

TXT never really took off. There were a few problems, but, I don't think SEV has actually solved these problems, beyond the lack of the memory encryption.

One is that the hypervisor is ... well it's still a hypervisor. It can play a lot of games on the operating system, and hardware is limited in what it can do to stop that. TXT's solution was to "measure" the hypervisor, so you could audit it. That doesn't work for google/aws/azure who all use proprietary hypervisors, so you need to place a lot of trust in your chip and kernel that they can resist arbitrary malicious behaviour by the most privileged piece of software on the system - one that controls all hardware access. That's very difficult. For instance, the hypervisor controls access to the system clock.

Another is that it was very hard to make the operating system secure. Heartbleed being just one example of what can go wrong. So the trusted computing community concluded around this time that placing an entire operating system into your 'trusted computing base' doesn't really work. It's trying to run before you can walk. If you can't make the operating system reliably secure against remote attackers then trying to make it secure against the far harder adversary of someone who controls your hardware stack seems futile.

That's why Intel's equivalent isn't transparent. It's not very opaque, it's basically like loading a shared library and the library gets encrypted RAM. But when you try to write an enclave that's really secure, you realise that there's a lot the host machine can do to make a mess of things. I don't think that changes much if it's "just" the hypervisor that's malicious instead of the hypervisor and kernel. The solutions end up looking the same - you want to minimise your attack surface, you need to think carefully about clocks and time sequencing, side channel attacks, etc.

> Intel actually had a similar tech without the RAM encryption some time ago, called TXT.

Given that RAM encryption is literally the core function of SEV, any functionality that lacks it is by definition dissimilar.

TXT had RAM protection, as in, other software or hardware devices couldn't read the protected memory areas. RAM encryption itself is primarily about stopping bus sniffing or cold boot attacks. Useful, but by no means the only kind of protection you need. Especially because combining encryption with authentication is very hard. It is easy to forget that encryption doesn't stop someone flipping bits and you can corrupt the plaintext in ways useful to an attacker by doing so, hence the rise of AES/GCM. But I don't think SEV uses AEAD?

But the core technology is basically the same concept. You get a protected memory space (to some degree of protection), you can derive keys linked to the loaded code hash, and you can do remote attestation to set up a Diffie-Hellman handshake with the remote protected domain. All that stuff is identical between TXT and SEV.

They are ahead of Intel, at least. In the server/VM market, that is roughly the same as being first...at least for now.

I kind of assumed all my cloud computing resources were already private and confidential.

Not so?

You are right, data is private and confidential when it is ingested to the cloud and/or stored in the cloud, not when processed. Encryption of "data-in-use" is the 3rd leg in data protection of sensitive data, and it became possible with hw capabilities in new CPU chipsets, from AMD and Intel, as it has to be hardware based (better security and performance).

As storage and ingestion requires processing, that's the same as saying it's not private at any point when in the cloud.

And that's not necessarily a big deal. If you trust Google or AWS to hold all your business and customer data, no problem (and if your customers transitively have that trust). But I think there's a lot of denial about this fact: the cloud has all your data and all your customers data. Fixing that is really, really hard. It's not anywhere near as simple as Google are claiming in this announcement, certainly not "tick a box and it's switched on".

Do you mean those resources physically located with you? That's a good assumption.

Or do you mean those resources physically located in someone else's data center running on their hardware and software platforms among their infrastructure and only assigned to your task as-needed? That's a bad assumption.

Data in cloud and probably on the device you're using right now is mainly only encrypted while sitting on the disk and when bouncing around the net. The difference here is that the data is also encrypted throughout its life - while in memory and while the data is getting processed on the machine.

Confidential Computing environments keep data encrypted in memory and elsewhere outside the central processing unit (CPU).

Aren't Amazon's Graviton 2 processors specified to do this too?

My knowledge of Graviton is limited, but based on Amazon's description of Graviton 2's encryption capabilities, it seems more like an equivalent to AMD's Secure Memory Encryption (SME), rather than SEV.

It's a lot more than just a SME equivalent. They've designed the management plane in a really clever way that makes it more similar to SGX/SEV than at first glance. I recommend watching this video https://www.youtube.com/watch?v=kN9XcFp5vUM

Is this homomorphic encryption or something else?

Homomorphic encryption enables computation to be performed on encrypted data without the need to decrypt it on the CPU. Compared to Confidential Computing approaches, the processing complexity of FHE is quite high, especially for tasks that require execution of complicated algorithms, making it hard to scale with this approach. Confidential VMs with AMD SEV decrypt data within VMs and keep it encrypted "in-use" by encrypting memory with a key generated by AMD secure processor (non-extractable) per VM. After processing data and code can be encrypted back to keep it protected at-rest.

I don't understand, and couldn't get any information from the article either. If the data are decrypted within the VM, then it is still decrypted at that point, and the host machine can read it.

The data is transparently encrypted and decrypted specifically within the processor. The OS kernel on the host machine doesn't have access to the unencrypted contents of the guest VM's memory.

> I don't understand, and couldn't get any information from the article either.

See this wiki article for more info on this class of technology: https://en.wikipedia.org/wiki/Data_in_use

You can access memory within a VM, not outside of a VM. Host machine with a hypervisor is not within a VM instance, so it will not be able to read your VM memory. The memory is encrypted all the time, but when the instruction has to be executed on CPU, memory controllers (only and only have access to the keys of this VM) decrypt the instruction to execute it on cpu in clear. For FHE, cpu instructions are executed on AES encrypted blocks, and will take significant time, so not very practical today. Does it make sense?

Here's AMD's documentation about it: https://developer.amd.com/sev/

It just encrypts memory pages and registers with a key that the host and other guests can't get.

How is SEV compared to SGX?

SEV: No changes are required to the apps, better performance, but bigger TCB. GCP mitigate this with Shielded VMs, in particular integrity of the kernel in your trusted boundary, notifications to users if the integrity state changed from the baseline and made it default and free. https://cloud.google.com/blog/products/identity-security/sec... SGX: smaller TCB, but limited scale, and you have to partition your app to secure and no-secure parts using one of the SDK available, Intel SGX SDK, Microsoft OpenEnclave or Google Asylo.

SEV is atm considered 'secure' SGX is not:


The two attack vectors are page table integrity and unencrypted VM state.

Modifying the page tables to establish a cryptographic merkle tree would fix the first attack, and SEV-ES fixes the secrecy attack from the second paper. Unfortunately a change to page table structure may make it impossible to run unmodified kernels in the VM.

I think it as impossible to prevent a hypervisor from fingerprinting what's running on a child VM; there are too many timing and power attacks to ever mitigate that, which was the attack vector on SEV-ES

I guess a database as a service instance at Google will still be accessible for Google in a decrypted way?

Unless you run a database in the Confidential VM, can run mySQL, postgresql, MariaDB. Google managed db service, say GCP CloudSQL is not supported...

It is funny that Kubernetes,Istio, Asylo etc are transliteration of greek words and Google has trademarks on them.

It's interesting that there's no mention of Intel SGX in this blog post.

It doesn't use Intel SGX, nor would SGX work with such a service.

Intel is working on a competing technology -- Total Memory Encryption -- but AMD beat them to market by quite a bit.

Is there demand for such a thing? I mean what is the use case where one would want this level of security.

Quite a few: protect sensitive data in the cloud from the tenants and cloud providers, protect my clients sensitive data, address some requirements of the regulated markets, mitigate some privacy regulations, and a few new: collaborative computing with untrusted parties?

I don't like this technology. If it works as claimed, it could be used for almost unbreakable DRM.

It already has been in a sense, as the underlying technology for SEV was developed by AMD to protect the DRM running on game consoles.

That being said, AMD SEV is transparent to the underlying applications and users, so anyone with an interest in protecting memory from certain classes of attacks (e.g., Meltdown, Rowhammer) can benefit.

Other technology in this space (e.g., Intel SGX, Arm TrustZone) requires the application to explicitly support the secure enclave, so their usefulness to a typical end-user is much more limited, and as such, they aren't really used much other than to enable DRM.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact