
Oracle Wallet Master Key Lost - quicksilver03
https://www.reddit.com/r/sysadmin/comments/bn38pp/oracle_wallet_master_key_lost/
======
marktangotango
For those of us who reject the new ui

[https://old.reddit.com/r/sysadmin/comments/bn38pp/oracle_wal...](https://old.reddit.com/r/sysadmin/comments/bn38pp/oracle_wallet_master_key_lost/)

~~~
_asummers
My worry is that this trick will go away soon. I really dislike the default
Reddit UI.

~~~
monsieurbanana
I hope that this trick will go away. I'd be much more productive.

~~~
willio58
It will, so why not start now? No more excuses!

------
technion
I've been in a number of these situations where the basic workflow was
"encryption = good" with no consideration for these scenarios.

I've seen several organisations decide they need to encrypt backups. The first
thing I ask is "where is that key saved" and the answer has always been "in a
file, written to those backups". It's amazing how many times "encryption"
leaves someone one ransomware away from losing everything.

~~~
dpwm
Encrypting backups is more-or-less essential for many types of data – but you
absolutely need to know what is at stake (complete data loss) and how to
mitigate that (redundant hard-copies of keys stored securely both on- and off-
site).

~~~
LoSboccacc
> stored securely

print the keys and give them to the cfo to store as asset, they have the
systems in place to keep paper around for decades.

~~~
newnewpdro
Do you really want finance involved in operational chores as mundane as
restoring backups?

I see no reason for this, whoever is head of the department responsible for
disaster recovery should just stick a backup printout of the keys in a local
safe deposit box and get the right names on the account.

~~~
Xylakant
Why not hand a printout to finance? It’s a last-ditch safety anchor in case
everything else fails. Keep another copy for day-to-day operations in digital
form. Nobody wants to type a 4096 bit key for every backup restore. But if
millions are at stake, motivation to do so will rise tremendously.

~~~
newnewpdro
I already said it's a backup copy of the key, obviously nobody is expected to
type in from a printout in day-to-day operations.

Our experiences must differ substantially here. I've worked at a few startups
over the past two decades, and I would not trust any of the finance people
from any of those companies to practice good operational security or be
reliably available as a cog in a critical real-time operational path.

They were consistently the kinds of people who would treat a locked filing
cabinet behind a locked office door as the height of security, and in many
cases left important private documents like checks revealing salaries
completely insecure, face down on recipients unattended desks.

They have often been the staff least reliably in the office, and least
accessible at any given moment - especially after hours or on weekends.

In my view it's just exposing the key to more risk of getting left out or
forgotten in a poorly secured, often unattended filing cabinet behind
trivially picked locks. Just to make it available through someone frequently
absent on vacation. Good thing the security will likely suck, because you'll
probably have to pick those locks when you need the copy and the CFO is skiing
down a mountain.

This is an issue for the department in charge of performing backups and
restores. They will already have access to the key to do the job, and they
need to be accessible in times of crisis. Why wouldn't the COO be the right
person if operations is in charge of disaster recovery? It's their ass on the
line when they can't do a restore on the weekend. The CFO and the entire
finance department has zero expectation of being available after hours or on
holidays.

------
perlgeek
This once again reinforces the point that a backup is worthless if you haven't
tested restore.

~~~
cptskippy
That's one of those idioms that rings hollow until it rings very true.

~~~
SlowRobotAhead
_“If you haven’t tested your backups, you don’t have backups.”_

LOTS of people don’t have backups.

------
crispyambulance
I don't do this type of work, but come on, don't people practice doing such
recoveries before they commit to such a system?

Wouldn't performing dry-run with a checklist of actions have kept this from
happening or at least alerted them to the deficiencies?

~~~
marktangotango
Hey man, it’s agile, cloud, devops; we don’t value sys admins, dbas, or
business analyst! We hire inexperienced devs and expect them to do it all!
Experience? Bah! To expensive!

This is the world we’ve got today.

~~~
zaphar
If you read the thread it wasn't agile devops who caused the problem. It was a
DBA following specific instructions from Oracle support. Which is like the
polar opposites of what you describe.

This company was screwed as soon as they did business with Oracle. A company
whose entire business model is making things as inscrutable as possible so
you'll pay them more money in support contracts.

~~~
Jare
Oracle never told them company to NOT test the backups. That was the company's
own doing (and demise).

So the question is, why did the company not think of testing their backups,
and generally doing a full validation of the system before committing to it
for months and millions of $ in business? What sort of workflows, processes
and culture in the company allowed this to happen? Probably the cheap sort.

~~~
fencepost
What I got from reading the thread last night was that due to some quirk of
oracle a test restore would have been a destructive overwrite of the active
data. Not sure if that's due to oracle hosting or if that was an incorrect
impression.

~~~
dfrage
If you can't do a full end-to-end backup and bare metal restore, you don't
really have a backup. If this is indeed a "quirk" of Oracle Cloud, it's not
fit for purpose.

------
village-idiot
For some reason I read this as _the_ Oracle Master Key was lost, and I was
expecting much more fireworks.

~~~
djsumdog
Yea, I was suspecting some major Oracle breach or a CA failure or something.
But nope, localized to one company (but it still seems like Oracle's hosted
software is at fault)

------
tomxor
Always make multiple hard copies of encryption keys, it's the one thing your
mirrored incremental offsite backup will not help you with.

~~~
josu
The problem is the wallet was empty.

~~~
tomxor
That would have been noticed if someone had at least attempted to make hard
copies immediately. The second I encrypt a DB (which i've only done once so
far, so i'm far from an expert), I get a copy of, verify and duplicate the
key, otherwise all backups from that point onward will be useless...

The point you add encryption and fuck up the keys is the point you are able to
recover and try again, instead of waiting for days, weeks or months like these
guys - you check it immediately - if you didn't get a copy of the key or it's
invalid, that's ok, your old unencrypted backup isn't out of date yet.

------
Cieplak
If the key was in RAM, they might be able to find it in a core dump or in swap
memory. Obviously a proper key mgmt system should never let that key be
written to disk without encrypting it first with a KEK, but that farther up
their hierarchy of needs.

~~~
vbezhenar
How can you realistically find it? Probably requires writing very
sophisticated software with questionable legal grounds. It might cost a lot in
the end without guaranteed results.

~~~
SlowRobotAhead
So I did the math quick, but say you had 256GB of ram, and the key was in
there somewhere (pretending it’s nicely lined up in some known endian and
format), assuming you can visually cut 50% of RAM out by excluding areas the
key definitely isn’t in (we know they key doesn’t have 4 repeating chars or is
even 1/4 all zero), if you just cycle through each keysize, you need to be in
the 10,000 guesses per second to spend less than a year on it.

I have no idea the keysize or speed that is practical. But it’s hardly
impossible to just dictionary attack using your old RAM as the “list”.

~~~
jnwatson
You could significantly narrow it down by only trying sequences of very high
entropy.

~~~
SlowRobotAhead
That’s a much better way to describe what I was getting at.

------
LoSboccacc
I don't know the tech involved so I'll ask as future reference: was it
possible, before the reboot, to have the database write the key out, or was
this database doomed forever once the key went missing?

~~~
scarmig
If the database was using the key to write encrypted data, it had the key in
memory at least.

~~~
bufferoverflow
But not necessarily in one place, we don't know how Oracle stores encryption
keys in RAM. So even if you magically have the whole RAM dump after the
reboot, it wouldn't be a trivial task to find the key.

------
gdsdfe
It's interesting to read the comments on anything related to Oracle, the
general sentiment is people hate it ... a lot! I wonder how long it will stay
in business?

~~~
snarfy
The US government is their largest customer. Considering the glacial pace of
the government, I suspect it will be a really long time.

~~~
curiousgal
3rd world governments as well because they don't know any better.

~~~
GhostVII
Just because a government is in charge of a poor country, doesn't mean the
people in that government are unable to make informed decisions.

~~~
viraptor
No, but it does mean that the marketing budget of Oracle is insane relative to
how much influence it can buy in a poor country.

------
neop1x
So the data was not encrypted and you thought it was? And you pay for this
product? Aha...

------
nytesky
I almost feel like encryption systems like this need physical keys. They can
be secured in safe, maybe create 2 even.

~~~
vbezhenar
Physical keys can break. Just use password, print it and store few copies in
different places.

~~~
dfrage
Exactly. Print out the whole thing, doesn't matter how many characters if
you're trying to recover from something this drastic, and follow the general
backup rule of "If you have 1 copy, you have 0, if you have 2, you have 1...."
Put copies in different flood plains, ideally different parts of your country
or the world.

While its not always possible, it's a good idea to use at least two completely
different backup methods, avoiding single points of failure like in this case.
For example, use a logical volume manager to make frozen copies of your
filesystems, and back those up using something low level.

------
justinclift
Wonder how many other Oracle cloud environments will turn out to be bitten by
the same bug?

------
bec123
Key Management.

------
4ad
The comments are terrible, blaming Oracle (I guess that's trendy these days),
some people even advise suing Oracle.

This is a system administration failure, not an Oracle failure. Basically they
didn't test their recovery strategy.

~~~
jodrellblank
From the comments, Oracle told them to enable encryption to solve a problem
with backups. They did, and they kept the master key to the Oracle wallet. The
Oracle wallet is read only, they could never put something in it or fail to
put something in it. The encryption system could and should have put
encryption keys in it, but didn't.

That's the Oracle failure. "Enabling Oracle encryption according to their
instructions, failed to do what it should, and we lost access to all our
servers and all our backups because of it".

There's also a separate failure of administration on the client side of not
practising working with encryption before using it in production, and not
testing recovery and noticing the wallet was empty and the backups were
unusable before it became a disaster, but the fact that it did become a
disaster is a failure of Oracle's system.

~~~
hotsauceror
I'd be very curious to know what drove that decision to recommend enabling
encryption in the first place. That seems like an odd way to "solve a problem
with backups", particularly given the huge risks inherent to irresponsible key
management. It almost sounds like their original support engineer proposed
this as a fix during a ticket. Normally something like this is something you'd
plan out, test, and do risk analysis with the business rather than have your
on-call deploy it in the middle of the night because John Q. Engineer said you
should.

