Hacker News new | past | comments | ask | show | jobs | submit login
Oracle Wallet Master Key Lost (reddit.com)
193 points by quicksilver03 9 days ago | hide | past | web | favorite | 79 comments

My worry is that this trick will go away soon. I really dislike the default Reddit UI.

I hope that this trick will go away. I'd be much more productive.

It will, so why not start now? No more excuses!

You can also set it in preferences if you have a user account

Reddit tends to force the new UI, even with it disabled on preferences, with a hint of one wants to have another go at it.

I've been in a number of these situations where the basic workflow was "encryption = good" with no consideration for these scenarios.

I've seen several organisations decide they need to encrypt backups. The first thing I ask is "where is that key saved" and the answer has always been "in a file, written to those backups". It's amazing how many times "encryption" leaves someone one ransomware away from losing everything.

Encrypting backups is more-or-less essential for many types of data – but you absolutely need to know what is at stake (complete data loss) and how to mitigate that (redundant hard-copies of keys stored securely both on- and off-site).

> stored securely

print the keys and give them to the cfo to store as asset, they have the systems in place to keep paper around for decades.

Do you really want finance involved in operational chores as mundane as restoring backups?

I see no reason for this, whoever is head of the department responsible for disaster recovery should just stick a backup printout of the keys in a local safe deposit box and get the right names on the account.

Why not hand a printout to finance? It’s a last-ditch safety anchor in case everything else fails. Keep another copy for day-to-day operations in digital form. Nobody wants to type a 4096 bit key for every backup restore. But if millions are at stake, motivation to do so will rise tremendously.

I already said it's a backup copy of the key, obviously nobody is expected to type in from a printout in day-to-day operations.

Our experiences must differ substantially here. I've worked at a few startups over the past two decades, and I would not trust any of the finance people from any of those companies to practice good operational security or be reliably available as a cog in a critical real-time operational path.

They were consistently the kinds of people who would treat a locked filing cabinet behind a locked office door as the height of security, and in many cases left important private documents like checks revealing salaries completely insecure, face down on recipients unattended desks.

They have often been the staff least reliably in the office, and least accessible at any given moment - especially after hours or on weekends.

In my view it's just exposing the key to more risk of getting left out or forgotten in a poorly secured, often unattended filing cabinet behind trivially picked locks. Just to make it available through someone frequently absent on vacation. Good thing the security will likely suck, because you'll probably have to pick those locks when you need the copy and the CFO is skiing down a mountain.

This is an issue for the department in charge of performing backups and restores. They will already have access to the key to do the job, and they need to be accessible in times of crisis. Why wouldn't the COO be the right person if operations is in charge of disaster recovery? It's their ass on the line when they can't do a restore on the weekend. The CFO and the entire finance department has zero expectation of being available after hours or on holidays.

and who has the keys to that deposit? it's key all the way down. unless you are finance. never saw finance having their document storage delegated to a third party. it's their nature. that and legal, but since the master key is quite like a title to your data, finance made more sense.

Do you really want finance involved in operational chores as mundane as restoring backups?

Not as a day to day occurrence. But as a last line of defence, absolutely. No other department has better routines in place to keep papers safe and secure for years through rounds of layoffs, restructurings, buyouts and everything else a company might go through.

Why not just burn them on the CD? This way they are safe from hacker's attacks, and still in the digital form. There are special types of CDs that will last decades. There even is such thing as "1000 years dvd" :)

Paper is proven to last a few hundred years without issues, if stored properly, and we know all the kinks of storing it properly.

It's easy to read, contrary to a DVD (you need a DVD drive).

It's also cheap.

Why make things complex when they could be simple.

Those special optical discs are traditionally pressed and gold sputtered. Burnable discs are not suitable for long term archives.

M-DISC claims a 1000 year lifespan. LG has some consumer writers that support M-DISC I believe.

There are Sony BD-XL 128GB discs that claim 50 years of shelf life - however I'm not sure if they are suitable for archives. However as long as they are regularly tested - they could be decent enough for cold-storage.


This once again reinforces the point that a backup is worthless if you haven't tested restore.

From the thread:

>It's Oracle cloud - you can't really test them without restoring a snapshot (overwriting your current data)

Jesus, if thats true, what a shit product. AWS RDS doesn't even let you restore a backup to an already running database, it always creates a new one.

I think about it this way: you don't need a backup system - you need a restore system.

It also reinforces the point that there is a trade-off between reliability and security. When you encrypt something you are slightly increasing the chance that you will lose access to that data.

Our accounting software offers end-to-end encryption, but it's not enabled by default. When the user tries to enable it they see a bold message saying, "Warning: In this mode we cannot reset your password. If you forget your password you will lose access to your data. Are you sure you want to continue?"

That's one of those idioms that rings hollow until it rings very true.

“If you haven’t tested your backups, you don’t have backups.”

LOTS of people don’t have backups.

Restore would have been successful while the key was still in memory.

You could say if that happens it's a badly designed test, and I agree. But it goes to show that testing entire infrastructures partly run by service providers is not easy.

Restore to the same system isn’t actually a restore test.

I don't do this type of work, but come on, don't people practice doing such recoveries before they commit to such a system?

Wouldn't performing dry-run with a checklist of actions have kept this from happening or at least alerted them to the deficiencies?

Hey man, it’s agile, cloud, devops; we don’t value sys admins, dbas, or business analyst! We hire inexperienced devs and expect them to do it all! Experience? Bah! To expensive!

This is the world we’ve got today.

If you read the thread it wasn't agile devops who caused the problem. It was a DBA following specific instructions from Oracle support. Which is like the polar opposites of what you describe.

This company was screwed as soon as they did business with Oracle. A company whose entire business model is making things as inscrutable as possible so you'll pay them more money in support contracts.

Oracle never told them company to NOT test the backups. That was the company's own doing (and demise).

So the question is, why did the company not think of testing their backups, and generally doing a full validation of the system before committing to it for months and millions of $ in business? What sort of workflows, processes and culture in the company allowed this to happen? Probably the cheap sort.

This is an outsourced cloud database. Oracle provides the backup systems for them. The kind of people who buy Oracle do it because they assume that because Oracle is charging them millions of dollars that Oracle is doing the right thing for them. I don't know all the details here but it is very easy for me to imagine that the Oracle cloud system was indicating all systems go on the backups front and they didn't know they that a reboot would make the backup/restore completely fail when the key was lost.

This is in fact the very thing that a good agile devops shop will prevent by not relying on Oracle to handle it for them. That's why the comment I was replying too looked kneejerk and not the result of having read the article to me.

What I got from reading the thread last night was that due to some quirk of oracle a test restore would have been a destructive overwrite of the active data. Not sure if that's due to oracle hosting or if that was an incorrect impression.

If you can't do a full end-to-end backup and bare metal restore, you don't really have a backup. If this is indeed a "quirk" of Oracle Cloud, it's not fit for purpose.

They are using Oracle so calling them cheap doesn't seem fitting.

As an old boss used to say, "cheap, but not as in 'inexpensive'". Too often, bad companies overspend in the wrong places and skimp on actually valuable stuff. It's amazing how a company doing this can still survive on appearances and inertia for years.

For some reason I read this as the Oracle Master Key was lost, and I was expecting much more fireworks.

Yea, I was suspecting some major Oracle breach or a CA failure or something. But nope, localized to one company (but it still seems like Oracle's hosted software is at fault)

I thought the same!

Always make multiple hard copies of encryption keys, it's the one thing your mirrored incremental offsite backup will not help you with.

The problem is the wallet was empty.

That would have been noticed if someone had at least attempted to make hard copies immediately. The second I encrypt a DB (which i've only done once so far, so i'm far from an expert), I get a copy of, verify and duplicate the key, otherwise all backups from that point onward will be useless...

The point you add encryption and fuck up the keys is the point you are able to recover and try again, instead of waiting for days, weeks or months like these guys - you check it immediately - if you didn't get a copy of the key or it's invalid, that's ok, your old unencrypted backup isn't out of date yet.

But you need more than one wallet. Just like you need more than one house key, in case you lose one.

If the key was in RAM, they might be able to find it in a core dump or in swap memory. Obviously a proper key mgmt system should never let that key be written to disk without encrypting it first with a KEK, but that farther up their hierarchy of needs.

How can you realistically find it? Probably requires writing very sophisticated software with questionable legal grounds. It might cost a lot in the end without guaranteed results.

If the keys where just plainly kept in RAM they should be easy to find in the dump. The reason is that key material has certain characteristics (high entropy) that make it easy to be found. Tools that can do this are freely available.

So I did the math quick, but say you had 256GB of ram, and the key was in there somewhere (pretending it’s nicely lined up in some known endian and format), assuming you can visually cut 50% of RAM out by excluding areas the key definitely isn’t in (we know they key doesn’t have 4 repeating chars or is even 1/4 all zero), if you just cycle through each keysize, you need to be in the 10,000 guesses per second to spend less than a year on it.

I have no idea the keysize or speed that is practical. But it’s hardly impossible to just dictionary attack using your old RAM as the “list”.

You could significantly narrow it down by only trying sequences of very high entropy.

That’s a much better way to describe what I was getting at.

I don't know the tech involved so I'll ask as future reference: was it possible, before the reboot, to have the database write the key out, or was this database doomed forever once the key went missing?

If the database was using the key to write encrypted data, it had the key in memory at least.

But not necessarily in one place, we don't know how Oracle stores encryption keys in RAM. So even if you magically have the whole RAM dump after the reboot, it wouldn't be a trivial task to find the key.

It's interesting to read the comments on anything related to Oracle, the general sentiment is people hate it ... a lot! I wonder how long it will stay in business?

The US government is their largest customer. Considering the glacial pace of the government, I suspect it will be a really long time.

3rd world governments as well because they don't know any better.

Just because a government is in charge of a poor country, doesn't mean the people in that government are unable to make informed decisions.

No, but it does mean that the marketing budget of Oracle is insane relative to how much influence it can buy in a poor country.

Oracle DB is deeply embedded in some mission critical "octopus" systems that touch large swaths of an enterprise (PeopleSoft, SAP, TIBCO ESB). It's disruptive, expensive, and time consuming to rip out the guts of your enterprise, particularly if you're not an IT business. Inertia is a hell of a drug.

So the data was not encrypted and you thought it was? And you pay for this product? Aha...

I almost feel like encryption systems like this need physical keys. They can be secured in safe, maybe create 2 even.

Physical keys can break. Just use password, print it and store few copies in different places.

Exactly. Print out the whole thing, doesn't matter how many characters if you're trying to recover from something this drastic, and follow the general backup rule of "If you have 1 copy, you have 0, if you have 2, you have 1...." Put copies in different flood plains, ideally different parts of your country or the world.

While its not always possible, it's a good idea to use at least two completely different backup methods, avoiding single points of failure like in this case. For example, use a logical volume manager to make frozen copies of your filesystems, and back those up using something low level.

Wonder how many other Oracle cloud environments will turn out to be bitten by the same bug?

Key Management.

The comments are terrible, blaming Oracle (I guess that's trendy these days), some people even advise suing Oracle.

This is a system administration failure, not an Oracle failure. Basically they didn't test their recovery strategy.

If I pay you for a service, say hosting a database and management, that says it's encrypted at rest. Your system fails to complete the required steps for this process- namely fails to write the decryption key to a place where it can be recovered, It's your fault. Not the customers.

From the comments, Oracle told them to enable encryption to solve a problem with backups. They did, and they kept the master key to the Oracle wallet. The Oracle wallet is read only, they could never put something in it or fail to put something in it. The encryption system could and should have put encryption keys in it, but didn't.

That's the Oracle failure. "Enabling Oracle encryption according to their instructions, failed to do what it should, and we lost access to all our servers and all our backups because of it".

There's also a separate failure of administration on the client side of not practising working with encryption before using it in production, and not testing recovery and noticing the wallet was empty and the backups were unusable before it became a disaster, but the fact that it did become a disaster is a failure of Oracle's system.

I'd be very curious to know what drove that decision to recommend enabling encryption in the first place. That seems like an odd way to "solve a problem with backups", particularly given the huge risks inherent to irresponsible key management. It almost sounds like their original support engineer proposed this as a fix during a ticket. Normally something like this is something you'd plan out, test, and do risk analysis with the business rather than have your on-call deploy it in the middle of the night because John Q. Engineer said you should.

I'm not so sure why everyone is on-board with blaming Oracle out of the gate.

From one of the OP's comments, "Oracle support told him, step by step, to encrypt the database, which happened ages ago and is the root of all of this."

That wallet is the responsibility of whomever created it. Just because it's empty now, doesn't mean it's always been empty. In addition to that, what if the person who initially encrypted everything with TDE, used a local only wallet? This would explain why the ewallet.p12 file is empty. Oracle recommends that you store your master key in the ewallet.p12 file and not the cwallet.sso file, which is the auto-open file, but only on the workstation it was created on.

There could have been multiple failures here with multiple parties and not just Oracle.

I think some of that comes from this occurring in Oracle's cloud environment.

It's not an on-prem setup where the customer has full control to manage/test backups, it's at least some level of managed service from Oracle.

So at the least the commands used should have offered/enforced creation of a key backup as part of the encryption process (in the way that bitlocker does for Windows disk encryption) to reduce the risk of something like this happening.

The key phrase in the posting are "Oracle is stumped".

This is way more than enough to turn this over to lawyers and let the court decide who is to blame.

Yes, the company could have checked their recovery strategy but if Oracle claims the key gets written to disk and it wasn't, there is a good chance for at least some compensation from Oracle.

There must be something that I don't understand fully about this because It would see odd to me that data would be encrypted at rest, and at the same time have the plain text key available to decrypt the data.

He says he has the password for the key wallet. But the wallet is empty.

I'm not familiar with the system in question but it is not stated if reading the key would need a password or not.

But being a cloud service, it would be odd if the encryption key would be available without any interaction by the owners.

This is Oracle's hosted solution (like Amazon RDS) and their software failed. The OP even mentioned how the backups of their wallet were empty and this was never flagged somehow.

If this was something they ran locally, yet it'd probably be their fault. But this is a hosted solution with a major bug. I hope some Oracle people are searching through seeing how many other customers have this empty wallet bug, and fixing it now.

Thinking it through, if they find other wallets with this problem, I wonder what the fix is?

They could generate a new key + wallet, which could probably then be made to work for the running instance.

All the backups though, would probably need to be opened with the current (missing) key+wallet in order to re-encrypt them.

Might be doable with the current key+wallet that's in memory, but also it might not be. eg might not be able to change the software on the running system, in which case ouch... places that need historical backups (eg for legal purposes) would be in bad place.

Clearly something Oracle-side fucked up here too, so it's also an Oracle failure. Although the "support talked us through..." bits seem like a sign of "should probably have gone for something hosted", but depending on the contract could also spin to "so clearly Oracle must have told us something wrong".

Yep. Probably Oracle has standard no warranties and obligations clause in their license.

Sounds like they are using a managed database service. If that is the case, there is no system to administer and Oracle is at fault. Welcome to the cloud.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact