This is the primary purpose of encryption at rest. It is not unlikely. Have you ever worked in an average datacenter? Some of them have very little physical security and don't monitor employees' access to cages. And if you have a cage on the same floor as another customer, all it takes is a fake name badge and a clipboard to walk up to someone with their cage open and say you're doing a routine inspection. Walk into the cage, pop out a drive, put it under your clipboard, walk away. Ask physical security pentesters how "difficult" it is to steal from a datacenter. And let's not even get into dumpster diving, where clients regularly toss entire servers without erasing disks into dumpsters.
Like someone else said, it also is a good practice in multi-tenant configurations where a misconfigured storage or compute backplane could expose data to the wrong tenant.
Even if all it protects is against the scenario of the cloud provider forgetting to wipe a disk, that's worth it.
But still, in general, I agree. I fill in these security questionnaires maybe once every couple months, and I'm starting to see a clear shift from encryption of data-at-rest to encryption of data-in-use. Which mostly just feel like an even more tedious form of security theater.
All I can do is support your position. Our customers are interested in the idea of encryption-at-rest of any sensitive business data that we have on disk at any time.
I work in banking, so I am ethically bound to provide the most brutally-honest takes regarding security models to our customers. When developing proposals for encryption-at-rest we have to cast everything in the worst possible light.
We started with looking at things like DPAPI but that is almost a joke for our use cases.
We are more worried about something along the lines of - A senior IT administrator has access to all application servers with full admin/root and can trivially dump customer databases to the dark net for crypto. When modeling for this scenario, we have to consider both physical and digital controls. Digital controls alone will never provide a comprehensive solution for this kind of adversary. Our approach looks something like this:
The customer has to install HSM(s) in their environment. This would typically be something living in a nested secure network or USB dongle directly attached to each server. These secure environments/items must not ever be physically accessible unless at least 2 employees are present at the same time. Each of our application servers which transact PII must be configured to talk to the HSMs. Multiple HSMs provide redundancy in a production environment, mitigating downtime risk. The cryptoscheme is a key-wrapping approach, where working keys are managed per user work item. When a work item is requested for the first time, the HSM has to unwrap its working key (which is stored encrypted on the application server). There is a bit of a mild tradeoff between security and performance here. Hypothetically, we could require the HSM encrypt the payload itself on every I/O, but for our product that would quickly saturate the capacity of even the most expensive HSMs. The policy is that an untouched work item has its key zeroed out in memory after X seconds. Working keys that timed out of memory are marked for rekey on next retrieval.
We have also modeled for remote debugger scenarios, and believe that simple firewall and other access controls can mitigate most of those vectors. This is very much a defense-in-depth strategy, so I am only presenting a small piece of that puzzle here.
Now, I would hope Amazon shreds drives that are declared dead, but I wouldn't want to risk my business on it.
It has almost nothing to do with protection against a targeted attack and everything to do with chain of custody.
Encryption at rest is a protection against physical theft that's it
Encrypting a hot relational database is madness but I've seen several bad attempts at it anyway
Anything "stateful" like a database breaks this paradigm.
I have nothing against databases used as cache that can be "re-filled" upon re-creation, but I believe anything holding business critical data shall be held outside of a kubernetes cluster. Why, because being one command away deleting your StatefulSet, Helm Release ... etc scares the shit out of me.
You can of course minimize the risk with correct RBAC, ensure proper backup/restore migrations but that require lots of staff and efforts I can't spare.
So until I can be reassured that I have all the tooling that can recover rapidly any catastrophic failure/mishap, and that all this tooling is tested monthly, I enforce using managed databases services.
regardless, it’s the wrong thing to fear. this is at the level of logging in every user as root on your servers and databases because proper user management would require extra staff and efforts you can’t spare.
Deploying databases in Kubernetes is fine for many applications, I've done both. Not every application that uses a database is data intensive.
Linux is specifically designed to support this type of usage. The necessary syscalls were added decades ago, originally to support databases. Kubernetes intercepts these syscalls because they break its abstractions; while they appear to function like the underlying kernel syscall, the resultant behavior is not the same and generally unsuitable for these types of database architectures. The practical effect is degraded and unpredictable performance because it violates invariants that core optimizations rely on.
This has been kicked around by Kubernetes people for years, including within my own orgs because we use a lot of Kubernetes. No one has every been able to make this type of software achieve comparable performance, even when we've used a lot of hack-y workarounds. Kubernetes was not designed to allow software to interact with the Linux kernel in this way. Consequently, this type of software is deployed on VMs or bare metal in practice, even if everything else is on Kubernetes.
The Kubernetes world has changed a lot in the past few years in ways that make databases-in-k8s more appealing. Such as:
- Kubernetes "eating the world", meaning some teams may not even have good options for databases outside k8s (particularly onprem).
- Infrastructure-as-code being more prevalent. Since you already have to use k8s manifests for the rest of your app, adding another IaC tool to set up RDS may be undesirable.
- The rise of microservices, where companies may have hundreds of services that need their own separate data stores (many which don't see high enough traffic to justify the cost of a managed database service).
- Excellent options like the bitnami helm charts: https://github.com/bitnami/charts or apparently Vitess (haven't used it myself): https://vitess.io/
Obviously if the use-case is a few huge, highly-tuned, super-critical databases, managed database services are perfect for that. But IMO a blanket ban might be restricting adoption of some more modern development practices.
How database-as-a-service vendor run their services is none of my business as long as they deliver the performances I need and working backup/recovery procedures.
(the answer back in the day, and perhaps still, was just "they don't really worry about it at all, and hope nothing goes wrong")