The author thinks that this is a security issue because this option should be enabled by default. However, (I assume) it's not in Digital Oceans interest to do full disk scrub because it reduces the lifespan of their SSD.
If a user forgets to log out of Facebook on a public computer, is it Facebook's responsibility? Similarly, if a user does not correctly delete data on a budget host, is it the hosts fault?
"Oh, there's an option for that" to stop default dangerous behavior should not be excusable. This is further proof on why DO isn't enterprise ready and still a toy for barely stable dotcoms.
Thank god for EU privacy laws. If these guys have any EU presence, they'll be forced to clean up their act via regulation. Clearly, the invisible hand of the market and "there's a option for that" is the fail its always been in regards to security.
Their growth and offerings are great, but they were going to run into problems sooner or later. As we've seen with many other hosts in the past, this is probably just the beginning.
The data for your virtual machine's virtual disk(s) will not always be in the same place, it may get moved around the storage volume or moved between storage volumes and this will be completely transparent to you. It may also get stored on backup volumes too. When you "secure wipe" the disk as you kill the VM all you are doing it wiping the data that is in the current location not any latent copies that may be sat elsewhere (as backups or "ghost" data sat in currently unallocated bits of physical media).
The only way to grantee your data is stored securely and is gone securely when you want it gone, is to use full-volume encryption from the start and make sure that the key's are never stored at the provider side (this does mean that you need a mechanism by which you can provide the key(s) to the VM whenever it reboots for some reason). That way there is no need to wipe the current store or worry about ghost data elsewhere: just destroy all copies of the keys for those volumes. Of course there is still some risk as the keys need to be in RAM somewhere so the encrypted volumes can be accessed at all, but once you get to that level of concern you can only be sure you are secure by having your own physical kit.
Of course not all current hardware sharing solutions can support full volume encryption as you don't really have a proper volume(s) that you can encrypt and put filesystems in, just a part of a larger filesystem pretending to be one (and you can't mount things so you can't use a file-based volume instead)...
tl;dr: wiping the VM's disks does NOT protect you from this sort of thing, using properly implemented full volume encryption will (as much as is possible in a shared virtualisation host, iff your hosts's solution can support FVE in the guest VMs/containers at all).
If you need to be sure your instructions to wipe data do protect you from that random other guy then you either need to use good encryption (it doesn't have to be full volume encryption but once you get to the point of caring it is probably easier to go the whole hog than do it piecemeal) or you need to have dedicated physical storage (something a VPS provider generally doesn't offer.
Of course the provider could instigate full data wiping for all relevant operations, but that imposes I/O load that will affect all other users of the given host machine(s) unnecessarily (I say "unnecessarily" because most won't care as their bulk data is not sensitive and they've invalidated associated things like SSH keys anyway, the host will take the attitude that if your data is sensitive you need to take measures to protect it).
It's called a dark pattern.
I could argue that it benefits the customer as well.
If it costs DO more (because of disk wearout) then in theory they would have to raise prices in order pay for their higher costs. And those higher costs could then disrupt the balance in pricing that makes them attractive.
My point is that forgetting the numbers you can't say it only benefits them. Being able to offer lower prices (because your expenses are lower) also benefits the customer as well.
Proper procedures should make it easy for people to do security, otherwise they are part of the problem.
Correct. And leaving the proper procedure in other people's hands is the problem. The best procedure [in testing a new intergration prototype] would be to create a new special test key and properly track and maintain that it is only enabled during the known testing time frame and then properly "disposing"/"revoking" it in the system YOU control. We can argue all day that DO should scrub the data by default and prevent new instances from low level access... however security relies on trust. So, do you TRUST (and KNOW) that checking the "scrub data" check box actually works and does what it says it does? If it doesn't, its beyond your control. And if you care about proper procedures and security you either a) don't use systems/servers beyond your control (if you are that concerned...) and/or b) rely on proper procedures and things you can and do control. I don't know what happens in DOs infrastructure... maybe there is some other leak I'm unaware of so to limit any damage for testing I will properly create and control short lived test keys.
Seriously, if there's nothing on your cheap sandbox server that you don't want published, you're probably not using it in the first place.
Yes, this issue needs to be raised.
No, you can't say that anyone who doesn't agree is a non-factor.
You have the option to scrub your data anytime you like, I've always understood that wasn't the default on DO.
Due to the way that SSDs wear level, even if every block returns 0, you don't know if that's the same block or if the real data is squirrelled away on out-of-life blocks. From what I understand the SSD has a lot more capacity than advertised and moves the blocks forward as sections of the chip wear out. Assuming you could reprogram the controller (or use a different one), you could go back and read the old blocks and recover data in the clear.
Surely some lawyer out there took a look at this, so maybe I have missed something, but this looks like a big problem to me.
From my own experience using DO I can say I'm a happy customer and I plan to keep using them. I tick that box when I've used anything remotely sensitive in a VM when destroying it, and leave it empty other times (like when I've made a mess experimenting with something and want to quickly trash & re-create a droplet).
What does the verb "destroy" mean?
Screenshot taken today (30DEC13) from a fresh blank Digital Ocean server I just provisioned:
apt-get -y install binutils ; dd if=/dev/vda bs=1M | strings -n 100 | grep 2013-12
(It's the /droplets/[droplet_id]/destroy one.)
An SSD is limited by its number of writes. To compensate for this, the SSD has very complicated on board logic that abstracts the actual SSD away from what it tells the OS system. This allows it to do certain tricks to save writes. However, when you are "scrubbing" an SSD, internally the SSD might be writing somewhere else entirely. Scrubbing is not considered an effective way of wiping SSDs, from what I believe.
If I can provision a new VM and cat /dev/vda and see data from the VM that previously occupied that spot, then you are doing it horribly, horribly, horribly wrong.
That zeroing out the data leaves open a different and vastly more difficult attack path doesn't make that any less true.
If customers don't care enough about the data why should DO?
DO is a budget provider. Their main hook was cheap SSDs, and is still cheap costs. They billable time it takes to wipe a VM adds to the cost. If you want the data zeroed... then pay for it, otherwise it must not be important enough.
This was mentioned to me on twitter hours ago, prior to this post. The first thing I said is that most people these days understand the importance of a responsible disclosure, and that we take all security issues very seriously. Not following responsible disclosure with a company such as DigitalOcean is extremely irresponsible and I would be amiss to point that if anyone did ever find a software vulnerability filing it and waiting 24 hours for the appropriate response is preferred. - https://www.digitalocean.com/security
As far as I can tell here, there is no unexpected behavior that isn't documented or stressed. In both our API documentation, and our control panel, we note that you must either pass a flag or check the box to security delete the data. As far as I can tell, the flag is currently functionally correctly. so..
Is the complaint that customer data is being leaked from VMs? That the flag being passed via our API/Dash isn't actually working? Or, that our policy on not doing a secure delete by default isn't something you agree with?
Even if every staff member believes they checked the "Scrub Data" checkbox or used the API flag when destroying droplets, human memory is unreliable and people make mistakes.
This is a very serious security issue and it's appalling that anyone is making excuses for it, and it's even more appalling that the company responds by blaming customers.
Customers should not be able to access other customers' data under any circumstances. It shouldn't even need to be stated that providing access to other customers' data should not be the default.
This one. You have chosen a default that fails deadly. It's like designing a car that explodes when you turn it off. Oh, there's a button over here you can push to disable the explosion feature. That doesn't really make it better.
You've created an option that can and will deeply screw many of your users. The mere existence of the option is not wrong by itself. But the fact that it can and will so easily screw so many people means that the option needs to have lots of flashing warning lights around it and it needs to be on by default.
I just checked out the "Destroy" tab for my droplet. There is absolutely no, none, zero indication that failing to check this box will allow the next person to occupy my spot to read all of my data. Here is the exact text:
"This is irreversible. We will destroy your droplet and all associated backups."
"Scrub Data - This will strictly write 0s to your prior partition to ensure that all data is completely erased. Estimated Destroy Time: 11 minutes 22 seconds"
I would expect "destroy your droplet" to mean that the data gets destroyed. I would expect the "scrub" option to be for paranoid people worried about the FBI seizing your equipment and using electron microscopes to extract residual data. At no point does anything in here give me any expectations that the default is "hand over all of the data currently on the VM to the next random stranger who walks in the door".
Do you really speak on behalf of DigitalOcean? If so, you need to get your head straight fast, because this is not even remotely acceptable. You cannot defend the current practice, because it is not defensible. If you don't understand why that is, you need to sit down and think about it until you do.
Right now, as a customer of yours, my thought is this: if you think this isn't important and doesn't need to be called out, what else have I missed? What other crazy data leaks do you allow by default with the defense that I could turn them off if I cared? I hope and assume the answer is "none", but now I'm rather worried.
I can kind of sort of understand how one might end up building a system like this, thinking that it was a good idea at the time. But I cannot understand at all how someone could possibly defend it once it's pointed out that it's terrible.
I'm going to give you a call later this afternoon but I wanted to clarify. First, Moisey and I worked on this this morning, so worth a read: https://digitalocean.com/blog/
Second, the way this was approached was super super confusing, originally as it was in 140 characters, on the twitters. Maybe my short comings for not totally understanding the situation before I spoke, but information was fairly fragmented.
First: Yes, I do speak on behalf of DigitalOcean.
My original understanding was that this issue was with the secure delete flag not working when being passed. This promoted me to request, and continue to request, that if it was the case, security@ was notified with an outline of what is going on per http://digitalocean.com/security/ - If it's an expected behaviour, while still not good at all, it isn't my call nor was I prepared to call the company into the office at midnight on a Sunday knowing we would issue a software update in the morning. Had the flag not been respected, I would have immediately called the senior engineering team as well as Ben and Moisey, so fully understanding the situation was very important.
As a customer, I'd like you to know we do take security very very seriously, it's something we discuss going into everything, as we appreciate healthy conversations about the way our product works.
I spend 4 hours last night trying to figure out exactly what was going on, it felt very difficult to get a straight answer of "your policy fucking blows and you better change it tomorrow" - That's something I can take to the bank, but I'm sorry if I wasn't clear.
Regarding the call, just e-mail if you want to get in touch directly for whatever reason. But I think we're square.
Oh bullshit. Don't deflect the issue here by complaining that you don't like full disclosure policies that many security experts agree with. (Such as, I don't know, Bruce Schneier?) If you want to get into secondary levels of annoyance, how about the fact there have been multiple instances in the past with DO that were only resolved by open/full disclosure on forums?
1. DigitalOcean users are unable to install their own kernel updates!
2. DigitalOcean have to bother making a new kernel image available via their admin interface; they haven't done this for over six months of Debian kernel updated in my experience.
3. Even if DigitalOcean did make a new kernel available, there's no notification to inform the customer that they have to log in to the admin interface and pick the new kernel from the list, then reboot their VM.
4. The list of kernels in the admin interface is sorted... bizarrely. I check it every so often and there is no sensible overall naming scheme; you are presented with a popup menu listing every single kernel for every single distribution; the latest kernel for Debian is in the middle of the list.
5. My attempt to resolve these issues with DigitalOcean support covinced me that the person I was corresponding with has no idea what a kernel even is, much less that DigitalOcean's list of available kernels is... lacking.
This situation, plus a couple the longstanding lack of progress towards IPv6 support; lack of ability to control kernel parameters; lack of a way to snapshot the filesystem for backups; makes me an unhappy DigitalOcean user who is going to jump ship for Bytemark at the earliest opportunity.
$ uname -v
#1 SMP Debian 3.2.46-1
This reminds me of something I omitted from my original rant. I've actually had to pin the kernel image package that I've got installed on my VM to the version that DigitalOcean provide:
550 http://http.debian.net/debian/ wheezy/main i386 Packages
550 http://security.debian.org/ wheezy/updates/main i386 Packages
*** 3.2.46-1 0
If it's documented, you've already disclosed it yourselves.
It's a win-win :)
This one. Choosing insecure defaults for a virtualization API is a Bad Idea. As a rule of thumb (to put it bluntly), people are dumb. If you give them a loaded gun, they will shoot themselves with it. And they will blame you for it. At least put the safety on and make them take a conscious step before blowing their face off. Don't mean to tell you your business, but seriously, insecure defaults are a Bad Idea for a virt API.
It is disheartening to see the same mistakes being made.
Whilst I absolutely see their USP was always solid-state storage, and that has pitfalls in terms of how you can scrub data to avoid it being leaked, the platform should take every precaution to protect customers data.
There shouldn't be an option to 'scrub data' and it shouldn't be defaulted to off so they can save some hassle, and avoid spending a few dollars. It shouldn't be an option because it should be on all the time, anything else is surprising the customer mightily.
"What do you mean, my data leaked? Oh that's fine" -Nobody
"Why'd you pick a provider who doesn't take our companies information security seriously?" -Every boss anywhere
Now i've known about this for quite some time and I mostly use DO for development and testing Chef recipes so I don't have an issue with it being off by default. I love the transparent $5 pricing. But I also don't store sensitive information on DO, yet. I also assumed when I was a newbie any sensitive info would be deleted.
The easiest option would be to include a notice next to that checkbox that if there was any sensitive information you should select this option. I understand that SSD wiping would drive up your costs. Another option would be to switch to mechanical hard drives. I don't how easy that option is with your setup.
The issue was instead filed against fog, so that users of that library may be protected to the extent possible under the circumstances.
In other words: This isn't your issue anymore. You've already publicly dismissed it. Worse, you've gone back on an earlier promise about it.
It is now in the hands of the community to try and protect your customers, since you have refused to.
As it seems like this is actually an issue with people not liking how our product works, I've begun the internal conversation going about two things:
First, communicating again to our customers by way of a blog post that this is how the production functions, as well as highlighting any relevant tutorials.
Second, working with the product team and engineers to either reverse this functionality at best, at minimum draw greater attention to it.
And it's why you are wrong. Giving the same data back to the same user (holding their blocks) would be fine - but allowing customer data to be read by other customers, for any reason, is bad practice in any area of business.
In many sane jurisdictions, the practice when selling something is to factor the cost of eventual disposal or recycling into the initial purchase cost. This is required by law, for example deposits on drink bottles which can be redeemed by returning the bottle.
In your case, the honest thing to do would be to factor the eventual cleanup of data from the disk into the initial purchase cost of the service. So the cost to provision a VM would include the wipe cost.
Pointing out that you don't do that is a community service. Congratulations to the author of the post for noticing the issue and bringing it to everyone's attention. Now we can all make an informed decision.
It's definitely not illegal or reprehensible, but anyone doing *aaS has exponentially more knowledge than it's users and knows it. Anyone who has once tried not to impose minimum password complexity knows what I mean.
To me it seems that data cleaning should be something you choose as you purchase the service. I understand that cost might be a factor in this particular case, but in then why not communicate about it and make a premium or discount system? It would probably come in good light to both users with sensitive data and those who don't care.
4 wget https://kmlnsr.me/cleanimage.sh
5 rm cleanimage.sh
6 cd /tmp/
7 wget https://kmlnsr.me/cleanimage.sh
8 chmod +x cleanimage.sh
Not only it looks bad and alarming to customers, but also poses a security threat, where an attacker could target his website and/or server and replace the script with something nasty inside. How long before they'd notice such fact? No idea, but I've opened a ticket about it right on, giving them some advice on why its bad (availability, scaling, performance, security and PR reasons) but also how to better handle it, and it seems nothing has been done about it so far.
That rings a bell in my head not to use Digital Ocean service as things they do are looking pretty amateur.
"This file is used to clean up traces from DigitalOcean images before being published."
I don't think someone actually logged in and ran those commands on your instance. I could be wrong, but I'd bet this is just from sloppy creation of the base image leaving weird stuff in history after the image was published.
"This is Kamal Nasser's script that has been set up to run on the images. The cleanimage.sh script sometimes doesn't clear the history. Thank you for bringing this to our attention. I've brought this to his attention.
There is nothing to worry about with this."
So in fact, it seems like it is being used, instead of being a leftover in shell history. In addition to that, I was later answered to the same ticket, by different support member that this script is not being used and just sits on that web page, but it all looks really bad in terms of professionalism.
If the server was compromised... :\
Turns out it "add[s] a very large time to delete events" when you actually delete things when a user makes an api call to DESTROY. Who knew?
I worked for a hosting company that made data destruction their problem (i.e. we wiped the disks after the instance was terminated) because we didn't want new customers seeing non-zeroed disks and thinking we do not care about security.
If I have something important on a VM it is purged before issuing a destroy. I will overwrite the block device myself so I know it is empty.
Why should I have to pay for DO to do the same again just because someone can't be bothered to read the manual.
This is wrong, it is costing DigitalOcean money not to fix it - in terms of SSD lifetime, not TRIMing those blocks increases fragmentation of the internal physical layout of the pages of flash memory. The behavior of DigitalOcean's virtual machines has surprisingly managed to achieved the worst possible outcome. Their hardware is being misused and their customer's data is being mishandled.
At the time, the blog post claimed that the issue was resolved and that data was now being wiped by default. I wonder why that would have changed.
If your using an overlay or API on top of a cloud or service, its the overlay's responsibility to ensure a consistency with your expectations. The API is consistent with the UI.
While other cloud providers accept the time that this takes as non-billable, DO don't. By getting higher utilization is how they are able to offer their prices and still have some modicum of service.
What sort of mental gymnastics are required to make that a reasonable choice?
They are charging their customers for the number of minutes it takes to safely destroy the VM. This is not a charge for something coming 'after'. It's fundamentally a charge for their actual server use, not a bonus fee.
>What sort of mental gymnastics are required to make that a reasonable choice?
They aren't charging for security, they are giving you the option to buy less server time if you don't need security, or handle it yourself by wiping only the sensitive files. There are no mental gymnastics here.
Now, the problem here is that DO turned that choice around, and are therefore not providing security by default, but offering you the option to pay more to get it.
Additionally, this is poorly advertised (the API docs do not clearly state "Your data may be accessible by other users!"), and that explains why many customers are (reasonably) a bit pissed at DO.
Besides we are not talking about a high margin business here. $5 vms when most providers are charging 4x times that. Its not unreasonable to expect that your going to have to pay for extras. Similar to a budget airline, you get what you pay for. You want a service that includes that cost in your other fees... then use AWS, rackspace or one of the 1000s of others.
Seriously, there should not be an option "Shall we pass your latent information onto the next user?" left active by default. If people want to save that trivial amount of money, then let them turn off safety themselves.
I simple can't blame them for delivering what they say they are going to give me, even if they could have built their infrastructure better.
I have looked at the UI and the API docs and it simply is not there. The scrub option says that it writes zeroes to your partition, but it says nothing about giving all your data away if you don't do that.
It's fine to make this optional. But it needs to have large flashing red warning lights all around it and it needs to be off by default.
"The cloud. Somebody else's computer".
I think cloud computing is great for the right applications, as long as people understand the risks.
But there will always be problems like this. Always. This is part of the hidden cost of "simple cloud hosting".
This has been an identified and solved problem for YEARS. No excuse for a modern VPS/IaaS provider to be leaking customer data in this way, except incompetence.
First, it's not uncommon for virtual disk formats to be logically zeroed even when they are physically not. For example, when you create a sparse virtual disk and it appears to be XGB all zeroed and ready to use. Of course, it's not. And this doesn't just apply to virtual disks, such techniques are also used by operating systems when freeing pages of memory - when a page of memory is no longer being used, why zero it right away? Delaying activities until necessary is common and typically built in. Linux does this, Windows does it [http://stackoverflow.com/questions/18385556/does-windows-cle...], and even SSDs do it under the hood. For virtual hard disk technology, Hyper-V VHDs do it, VMWare VMDKs do it, sparse KVM disk image files do it. Zeroed data is the default, the expectation for most platforms. Protected, virtual memory based operating systems will never serve your process data from other processes even if they wait until the last possible moment. AWS will never serve you other customer's data, Azure won't, and none of the major hypervisors will default to it. The exception to this is when a whole disk or logical device is assigned to a VM, in which case it's usually used verbatim.
This brings me to the second issue. Because using a logical device may be what DigitalOcean is doing, it's been asked if it's hard for them to fix it. To answer that in a word: No. In a slightly longer word: BLKDISCARD. Or for Windows and Mac OS X users, TRIM. It takes seconds to execute TRIM commands on hundreds of gigabytes of data because, at a low level, the operating system is telling the SSD "everything between LBA X and LBA X+Y is garbage." Trimming even an SSD with a heavily fragmented filesystem takes only a matter of seconds because the commands to send to the firmware of the SSD are very simple, very low bandwidth. The SSD firmware then marks those pages as "free" and will typically defer zeroing them until use. Not only should DigitalOcean be doing this to protect customer data, but they should be doing it to ensure the longevity of their SSDs. Zeroing an SSD is a costly behavior that, if not detected by the firmware, will harm the longevity of the SSD by dirtying its internal pages and its page cache. Not to mention the performance impact for any other VMs that could be resident on the same hardware as the host has to send 10s of gigabytes of zeroes to the physical device.
Not only is DigitalOcean sacrificing the safety of user's data, but they're harming the longevity of their SSDs by failing to properly run TRIM commands to clean up after their users. It hurts their reputation to have blog posts like this go up, and it hurts their bottom line when they misuse their hardware.
Edit: As RWG points out, not all SSDs will read zeroes after a TRIM command, so other techniques may be necessary to ensure the safety of customer data.
Now, about Trim... Trim is only an advisory command. You tell the disk, "I'm not using these LBAs anymore, so feel free to do whatever with them." The disk has the option to completely ignore your Trim command, and even if it does mark those LBAs as unused in whatever LBA->NAND mapping table it uses internally, the disk can also continue returning the old data on reads of those LBAs if it wants to. There are disks that make the guarantee that Trim'd LBAs will always read back zeroes until written again (an ATA feature called Read Zero After Trim), but I'm guessing DigitalOcean isn't using SSDs that support RZAT since that's generally only found on more expensive SSDs, like Intel's DC S3700.
What I'm getting at is that Trim isn't guaranteed to do what you think it does. Unless the disk supports RZAT, the only way you can guarantee that the disk won't return old data in response to a read command is to write zeroes over that block.
If you're a VM provider and can't count on Trim doing what you want it to (reading back zeroes on Trim'd LBAs) because your drives don't support RZAT and you don't want zero out partitions at creation or destruction time, the right thing to do is encrypt every partition with its own randomly generated key at creation, then destroy the key when the partition is destroyed. Users will see random data soup on their shiny new block devices, which isn't as nice as seeing a zeroed out block device but is still nicer than seeing some other user's raw data. (Also note that doing this doesn't stop you from also issuing a Trim for a partition when destroying it so the SSD gains some breathing room.)
More importantly, you clarify that RZAT is a necessary feature for what I'm mentioning to work properly. You're right. They should both be ensuring the blocks served to customer VMs are zeroed on use and ensure that they are appropriately running TRIM commands to ensure maximum performance from their hardware. Not all SSDs perform RZAT, and it wouldn't be a bad idea for the host to ensure the device is logically zeroed for the VM anyway.
DigitalOcean could easily switch to doing both, or at least guaranteeing the former by creating new logical disks for customers as every other vendor does. If, as they have blogged about in the past, they are directly mapping virtualized disks to the host's LVM volumes, they are unnecessarily complicating their hosting set up and making their host configuration more brittle. With thin-provisioned/sparsely-allocated or with file-based virtual disk images, they can more flexibly deploy VMs with different disk sizes with minimal changes in host configuration.
Alternatively they could trivially ensure that even forensic tools would have a very difficult time erasing volumes by enabling dm-crypt on top of LVM, and resetting the key every time a virtual machine is deleted. This could reduce performance on some SSDs (particularly SandForce based models) but would allow minimal changes to their configuration to ensure deleted data is unrecoverable.
I don't agree that file-based disk images are more flexible than LVM's logical volumes — it's ridiculously easy to create, destroy, resize, and snapshot LVs.
While this is true (the disk will never respond with old data once you have zeroed it out) it is important to remember that even zeroizing the disk yourself isn't a guarantee that the old data is actually gone from the disk itself - the disk may present itself as a raw block device, but internally it may use error correction, write amplification prevention, or error prevention schemes which may mean that old data will remain on the disk even though you have written zeroes over it. For example hard disks remapping bad sectors, or SSDs relocating chunks of data when the EEPROM gates in that chunk are starting to wear out. You would have to use forensic means to recover this information but it still remains. The only way to guarantee that the information cannot remain on the disk is to use encryption and make sure that the unencrypted key never touches the disk.
From within the VM, all the VM will see is zeroes. It sounds like DO is giving VM instances direct access to the underlying SSD or something like that. In fact, I'm having a hard time figuring out precisely how this is occurring. Whenever you create a new VM, how can the VM possibly be reading data from the host's harddrive? Isn't that the definition of a security problem, since VMs are expected to be isolated?
I hope someone will explain the underlying technical details more deeply, because this is very interesting.
The only way to ensure your data is secure is to use encryption to start with (preferably full-volume encryption, and make sure the keys are not stored at the providers end, so you'll need some mechanism for giving the VM the keys when it reboots and will have to trust no one can somehow read them from RAM) then you don't need to wipe the data at all: just destroy all copies of the keys and the data is rendered unreadable (to anyone given a new volume that spans physical media where your data once sat, it is indescribable from random noise).
You are right about the zeroes, though; sparse-files solve that problem. and this is what I personally find interesting about this article. I would be very interested to find out what the Digital Ocean uses for storage. This does indicate to me that they are using something pre-allocated; I can't think of any storage technology that allows over-subscription that would not also give you zeroes in your 'free' (un-allocated) space.
>For virtual hard disk technology, Hyper-V VHDs do it, VMWare VMDKs do it, sparse KVM disk image files do it. Zeroed data is the default, the expectation for most platforms. Protected, virtual memory based operating systems will never serve your process data from other processes even if they wait until the last possible moment. AWS will never serve you other customer's data, Azure won't, and none of the major hypervisors will default to it. The exception to this is when a whole disk or logical device is assigned to a VM, in which case it's usually used verbatim.
Yeah, the thing you are missing there? VMWare, well... it's a very different market. Same with Hyper-V. And sparse files, well, as I explained, suck. (I suspect that to the extent that Hyper-V and VMware use sparse files, they also suck in terms of fragmentation, when you've got a bunch of VMs per box. But most of the time if you are running VMware, you've got money, and you are running few guests on expensive, fast hardware, so it doesn't matter so much.)
Most dedicated server companies have this problem. Most of the time, you will find something other than a test pattern on your disks, unless you are the first customer on the server.
No matter who your provider is, it's always good practice to zero your data behind you when you leave. Your provider should give you some sort of 'rescue image' - something you can boot off of that isn't your disk that can mount your disk. Boot into that and scramble your disk before you leave.
I know I had this problem, too.. many years ago when I switched from sparse files to LVM-backed storage. Fortunately for me, if I remember right, Nick caught it before the rest of the world did. I solved it by zeroing any new disk I give the customer. It takes longer, especially when I ionice the dd to the point where it doesn't kill new customers, but I am deathly afraid (as a provider should be) of someone writing an article like this about me. Ideally, I'd have a background process doing this at a low priority on all free space all the time, but making sure the new customer gets zeroes, I feel, is the most certain way to know that the new customer is getting nothing but zeroes.
>Zeroing an SSD is a costly behavior that, if not detected by the firmware, will harm the longevity of the SSD by dirtying its internal pages and its page cache. Not to mention the performance impact for any other VMs that could be resident on the same hardware as the host has to send 10s of gigabytes of zeroes to the physical device.
Clean failures of disks are not a problem. Unless you are using really shitty components (or buying from Dell) your warranty is gonna last way longer than you actually use something in production. Enterprise hard drives and SSDs both have 5 year warranties.
The dd kills disk performance for other guests on spinning disk if you don't limit it with ionice or the like, and that's the real cost. I would assume that cost would be much lower on a pure ssd setup.
I was trying to find any cases of a public cloud provider's customer data being leaked or easily visible on the internal customer network, but didn't come up with anything. Somebody's got to do a study on the major cloud providers and see if the good old methods to subvert network routes still works, or if you can easily mitm vm neighbors. (My guess is you can...)
What if DO actually encrypted the SSD space with a key that they only have, and a new key is created for each droplet?
Then any droplets that are created later in a deleted space will just see effectively random data, no?
In fact, a lot of SSDs, e.g. Samsung's, already work this way, transparently AES-encrypting data before it is written. (With AES in hardware, the overhead is negligible, even for high-performance drives.) The "Encryption" feature they advertise actually just lets you set your own key for the encryption key container--it happens either way.
The key is protected. In the event that the drive must be re-provisioned (like in the case of a lost password), the decryption key is simply overwritten by the new key, rendering the original data unreadable.
Or TLDR your idea would be great if they wanted a secure product, but they may legally be prevented from providing a secure product or even talking about the topic.
That said, this would probably go down better for the company and the community if you tried a private disclosure rather than posting about it on Github.
That said, I haven't really destroyed many of my VMs.
I suppose when I made that comment I was expecting everybody to be like me; eager to flip switches to see what they do.
Anyway disingenuous title to say the least.
Nowhere in the DO UI for destroying a droplet does it indicate that they will leak all of your data to the next customer if you don't check the box.
The GUI does not indicate anywhere that data is leaked if you don't check the box.
What that poorly worded checkbox says to me: "tick this box if you want to prevent DigitalOcean from reading your stuff."
Nothing in that web GUI "clearly states" that not ticking the box will allow the next random user to read my files.
I like DO as a service, but this is kind of strange. Humans act always the same. When catastrophe hits they want to sit it out, underestimating the impact
Sounds like a major risk if SSH, SSL, passwords etc can leak this easily.
In case you are running a VM on top of their platform you may want to check to make sure this is enabled.
Some day I would forget to close my home door with a bang. No home invasion yet because nobody notice that the door wasn't fully closed. Now I am telling people I will forget to close my door, and I will get an invasion.
Some people's repo have password committed. It only takes someone to google that to find out. If someone posts that on HN, yeah, it's on the web anyway, it only takes one person to make my password known instantly.
This is not a bug report to Digital Ocean or any other PaaS. It is a request for a third-party library to support an option Digital Ocean's API provides.
Says who? You?
DO has a history of not responding to issues UNTIL they are publicly disclosed. And in any case, your iron-clad "argument" is a matter of opinion and nothing else. Many people prefer full disclosure.
> And in any case, your iron-clad "argument" is a matter of opinion and nothing else.
First, let me repeat: I did get the story mix up and the ethical approach I am referring to doesn't quite apply in the current story.
> DO has a history of not responding to issues UNTIL they are publicly disclosed.
Does not matter what happen between DO and whitehats. If an OS command injection is discovered, even if DO has a history of not responding to security issues, the moment the vulnerability is discovered, a whitehat should alert DO privately first. If they ignore it again, then of course you can let the public know and let your zero-day exploit begin. Regarding this, public disclosure before private disclosure is unethical.
I guess you consider Bruce Schneier a peddler of unethical behavior, then? He maintains the threat (and execution) of full disclosure is vital to maintaining security.
Full disclosure does this. Before full disclosure was the norm, researchers would discover vulnerabilities in software and send details to the software companies -- who would ignore them, trusting in the security of secrecy. Some would go so far as to threaten the researchers with legal action if they disclosed the vulnerabilities.
If the code is public, just fixing the code without CVE or similar is considered bad because diffing the code will yield the vulnerability.
You don't go around and tell people you found a vulnerability until it is fixed (in the case of vendor ignoring alert it is ethical to tell the public).
This sort of terrible behavior was by design and known. They got written about and claimed to have fixed it, but didn't.
There's no point in private disclosure for something that the company themselves documents.
Of course, he also whines about responsible disclosure. The mutual exclusivity of these two things does not seem to have occurred to him....
If I delete a file on my computer without doing a secure delete, I know that the data is still on the disk and can still be recovered. However, I also know that in normal operation of the computer, that data will never show up again. There are no circumstances where I can create a new file and have it get filled out with the contents of the deleted file. There are certainly no circumstances where another user on my computer can do that and get my data.
I would have every expectation that this scrub option works the same way. That it defends against specialized recovery efforts, not random people making new VMs. DO's documentation says nothing to indicate otherwise.
I spun up a vm and ran "strings" on the blockdev and got this:
Some poor iPhone users in Portugal have no idea that the app they're using is backed by a webservice on a VM that gives its block storage contents to anyone who gives Digital Ocean a $5 PayPal payment.
If that isn't a data leak, I don't know what is.
But the dd thing is really embarrassing here, I mean I'd expect some data on shared hardware being recoverable using hardcore forensics, but there are enough levels between hardware and dd that using at least one of them to make old data inaccessible should be both possible and pretty cheap.
"scrub_data Optional, Boolean, this will strictly write 0s to your prior partition to ensure that all data is completely erased."
If I didn't already know about this issue, I would never Have thought that leaving this option out would leak all of my data. My reading of the above option would be that, with it off, they would leave your data on the drive until it was reused, leaving open the possibility that e.g. the FBI could seize the equipment in the meantime and access it.
The opposite of "write zeroes to your partition" is not "give all of your data to the next customer".
The feature now works if you use it. By design if you don't use it you don't wipe the disk (saving you money).
It's pretty easy nowadays to scrub a drive. Writing zeroes would suffice.
Personally, I'd worry more about what data is being leaked when your VM is paged to disk on your provider's servers. Parts of each of your VMs will probably reside in the pagefile at some point, so therefore writing zeroes won't save you if the provider has bad disposal practices (like not scrubbing before disposal). So it seems impossible not to have to trust a cloud computing provider whatsoever; some basic trust seems to be a requirement.
But that minimum level of trust should be the extent to which you trust them. Not scrubbing your drive before handing it over is placing faith where faith doesn't belong.