
DigitalHax – Allows you to recover data from "Destroyed" Digital Ocean VM - gregimba
https://github.com/gregimba/DigitalHax
======
zagi
Hi, Ben from DigitalOcean here - just to give you guys an update. This method
will no longer work on a newly created droplet.

We've now default scrub_data to ON for both web interface and API as we look
at making this process permanent. Additionally, we've re-engineered the way
we're provisioning disks and access to previously written data is no longer
possible.

We've taken all steps in favor of security currently and will build a
permanent solution that favors security and caution moving forward.

~~~
flexd
I am a bit surprised I have heard absolutely nothing about this via. email
(I'm a customer). From what I remember of the last security incident I did not
get an email until way after it had become publicly known.

You guys really need to become better at communicating with your customers
when I can look at the front page of HN one day and see some issue with your
services, DO people commenting and no mail in my inbox.

The priority should be to alert customers there is a problem, and most
importantly to fix the problem.

And sending a mail a week or a few days later is really not okay, a rapid
response on your end to notify us is needed if we are going to be able to
quickly take necessary precautions.

------
sneak
So, this is going to overwrite lots of data on the block device you're trying
to recover data from, resulting in a lot of repeated information and erasure
of recoverable stuff. The correct answer is to redirect the output, and make
find.sh output the gzipped data so you can pipe that to your local disk never
touching the remote end.

Edit: Here's the code.
[https://github.com/gregimba/DigitalHax/pull/1](https://github.com/gregimba/DigitalHax/pull/1)

~~~
gregimba
Ok. I really only know enough to be dangerous im 17. Looks like I have some
changes to make though.

~~~
sneak
Stick with it! I didn't know about shell builtins like 'read' when I was 17 -
you're already ahead of millions in the game out there.

Read a lot of other people's code to see how stuff is done, it helps a lot.

------
FiloSottile
This is a cute PoC of how much easy it is, but with freely available forensic
tools like, say, PhotoRec is possible to extract much more meaningful and
diverse data (entire files, images, database files...) that by simply running
strings.

So, don't take it as the maximum damage one can get.

------
sukaka
how long does dd take? Could use an estimate. Ran dd for around 10 minutes
this morning and got 500,000 lines, and it was still running.

update: finished in around 12 minutes. out.txt is around 10gb.

update: out.txt is around 54 million lines from wc -l out.txt. I'm using less
with command [line number]G to poke around. I have an NYC1 droplet, and
there's a lot of junk not mine.. text in other languages and python which i
don't use

~~~
sillysaurus2
Thanks for the followup about how long dd takes. I was wondering as well.

May I ask, which droplet type were you running dd on? Micro?

~~~
sukaka
no problem. The command "dd if=/dev/vda bs=1M | strings -n 100 > out.txt" in
find.sh, which is the same as the one first mentioned today morning
[https://github.com/fog/fog/issues/2525](https://github.com/fog/fog/issues/2525).
$20/month droplet

~~~
sillysaurus2
Thanks! I wonder how long it would take for someone to scrub their VM manually
using dd before terminating it? Maybe the same length of time? 12 minutes
seems a pretty reasonable amount of time, but since it's an SSD the writes
could take >3x longer than the reads.

~~~
gregimba
The other thing you could do instead of trusting Digital Ocean you could use
shred on your sensitive files before you destroy the droplet.

~~~
sillysaurus2
That's true, but if you've already rm'd a bunch of sensitive files, their data
unfortunately can't be shredded. So you'll have to make sure you've always
used shred for everything since the beginning, which is good practice but
probably rare.

------
jamesbrownuhh
As a user of Digital Ocean (amongst others) I find it hard to get too excited
about this. When I destroy a droplet (VM) I already have the option to scrub
the discs before deletion.

If I choose not to use that (and I never have on any of the hundreds of
machines I've created and later torn down) it's because there is nothing of
any sensitivity on them. If someone wants to resurrect gigabytes of entirely
boring and transient log data from what I was last doing, they're welcome to!

I can only really see this being a concern for people who were storing
sensitive information on a cloud instance which they then removed and chose
NOT to scrub. In which case, they already have larger issues than this one.
"Problem with user, not with cloud."

~~~
bradleyland
If you're in the business of designing systems for users, I hope you'll
reconsider your viewpoint. I'm a Rails developer, and the attitude you've
adopted is very familiar to me. It's a kind of "trust your users" attitude
that was prominent in the Rails community for a long time. Unfortunately, that
led to several very, very ugly security issues.

What you're saying is that insecure defaults are OK, so long as they're
obvious. The problem is that things are rarely obvious to 100% of the people,
100% of the time. A company the size of Digital Ocean has enough customers
that if even 5% of their customers misinterpreted this option, a "significant"
number of people would be affected.

Consider, for example, a user who sees the destroy page and assumes that
"scrub" data is some extra, optional precaution, because "destroy" can't
possibly mean "erased but recoverable". I mean, it says _destroy_ , right?

Just a few days ago, someone posted a link to a blog post titled Toyota
Manufacturing Principles. One of the principles mentioned in that post was
poka-yoke[1]; otherwise known as mistake-proofing. A tenant so important that
Toyota -- one of the most successful industrial companies in the world -- has
made it a core principle.

One of my business partners has a favorite catch phrase: it's never a problem
until it becomes a problem. It's his way of pointing out that just because
something hasn't happened to you yet doesn't mean it won't be a problem if it
does. Speaking as someone who has made their fair share of mistakes, I'd
caution you to consider that advice carefully.

1: [http://en.wikipedia.org/wiki/Poka-yoke](http://en.wikipedia.org/wiki/Poka-
yoke)

~~~
jamesbrownuhh
Very fair point, I do see what you're saying. I had a conversation just
recently about some software that a third party had developed for us, which in
one use case would basically present the users with a question saying, in
effect, "Do you want to reflect the change you've just made in all places
where it matters, or leave it wrong in some of them?" \- complete with option
buttons Yes, No (!), or "Always No" (!!!)

As I said at the time, "that's a stupid option and a user should never be
given that choice". So, I see where you're coming from, your point is well
made. :)

------
rdl
This is a case where only "aggressive full disclosure" got a company to
respond. Which is why I'm generally only willing to go through "responsible
disclosure" for companies which have shown themselves to be reasonable in the
past, or in exceptional cases where the vulnerability is impossible to end
users to mitigate, and/or causes exceptionally grave harm.

------
jonahx
In what circumstances will this work? Are you recovering data from other
customers? If so, will this work even if the other customer has deleted their
VM using the recommended procedure?

~~~
cmircea
It only works if you're lucky to land on a drive that hasn't been zeroed when
the previous VM was deleted.

~~~
gregimba
Yep. Hopefully this will encourage digital ocean to change the default or at
least warn people.

~~~
neom
[https://digitalocean.com/blog_posts/transparency-
regarding-d...](https://digitalocean.com/blog_posts/transparency-regarding-
data-security)

~~~
xorgar831
Well that just made my brain explode. This is going to make it that much
harder to argue that public clouds take security seriously. I'd love to see an
example of what data they think is fine to leak, since that seems to be their
performance strategy.

~~~
rallison
Indeed. The non-scrub option simply should not exist. Are there use cases for
non-scrub? Yes. Are the risks worth it? No, at least in my opinion.

Forget to check that box? Oh well, better hope the next droplet doesn't go and
read your data.

Moderately competent developer doesn't realize the implications of not
checking that box? Oh well, better hope that developer didn't have too much
sensitive data on the droplet.

Etc etc. Security is the big area where the default should be to err on the
side of caution - often removing choices that are simply too dangerous (when,
for example, the tradeoff is a tiny amount of performance gain).

I say this all as someone who likes and is a customer of DO. I am
disappointed.

------
kanzure
Approximately how much would it cost in DigitalOcean time to cover 10%, 50%,
90% of their data?

------
revelation
Instead of installing apache2 you might consider just using scp.

~~~
gregimba
I figured SCP would be slower and but now that I think about it your probably
right.

~~~
dsl
scp will always be slow because of some built in limitations. It's not your
data anyway, so toss the ecnryption and just use netcat.

