Hacker News new | past | comments | ask | show | jobs | submit login
Don’t Wanna Pay Ransom Gangs? Test Your Backups (krebsonsecurity.com)
599 points by picture on July 19, 2021 | hide | past | favorite | 316 comments

"Test your backups" is so easy to say, but quite difficult for many to do. There are a lot of shops that probably don't know how to recreate a machine from scratch. How many systems are developed as balls of clay. Little bits added and smeared in over time until the ball just gets bigger, but each piece lost in the process. How many folks can go through their local config files and explain all of entries, how many can even tell which ones they have changed, or why? Especially when they were changed by Frank, but he left 2 years ago.

You'd like to think you can just restore the system from backup and it'll just light back up. But how do you test this without cratering your existing system? Like a boat in a basement, many system are built in-situ and can be very rigid.

Modern environments like cloud computing and creation scripts can mitigate this a bit organically, but how many of these systems are just a tower running Windows w/SQL Server and who knows what else? Plus whatever client software is on the client machines.

How do you test that in isolation?

At least read the media to see if it can be read (who doesn't love watching a backup tape fail halfway through the restore).

Simply, it takes a lot of engineering to make a system that can be reliably restored, much less on a frequent basis. And this is all engineering that doesn't impact the actual project -- getting features to users and empowering the business. Which can make the task even more difficult.

God these "we're so ignorant!" responses are sooooo tiring. If people don't want to hire the hardcore nerds that know this shit, if they don't want to pay for the expertise to work the technology they build their business on, then maybe they deserve it.

This modern attitude of compartmentalization, extremely localized expertise, and outsourcing everything to the cloud is going to be our downfall. Stop hiring muggles!

1. Couldn't you argue the same thing about brick and mortar stores that get robbed? "If they don't want to hire hardcore commandos to protect their property, maybe they deserve to get robbed/looted?"

(My point is that incompetence doesn't make it morally ok that a criminal thing happens to someone)

2. There's a risk/reward component. Nobody likes to buy insurance. Resource constrained organizations will almost always choose to invest their resources to get MORE resources, not protect against the chance that something bad will happen. A rational organization should only invest in protection when the risk is so great that it's likely to interfere with its primary business (beyond their legal/moral obligation to protect information they're trusted with).

2a. If a $2m ransomware attack hits your organization every 5 years, and it would cost you $1m/year in talent & resources to harden against this, you SHOULD let it happen because it's cheaper. Just patch the vulnerability each time it happens and try to stretch the next ransomeware attack to more than 5 years away.

3. Of course there are many irrational organizations that don't protect against ransomware for irrational reasons (e.g. due to internal politics). There's not much to say here except that at some level of management (including the CEO & board) where people are not paying attention to what's happening, and they should go hire those hardcore nerds and pay them what they need to.

Resource constrained organizations will almost always choose to invest their resources to get MORE resources, not protect against the chance that something bad will happen.

This is a good point but it points to something some might not like. Resource constrained warehouses might on skimp on covering the risk of fire, resource constrained restaurants might skimp on sanitation, Resource constrained power companies (PG&E) might skimp on line maintenance and let whole towns burn to the ground (Paradise, 80+ people, Berry Creek 30+ etc) and so-forth (up to every company being too "resource constrained" to pay to stop global warming). In cases of this sort, you have companies risking both their capital and the life and limb of average people.

We really have companies following this resource constrained logic and horrible things have and are happening. Economists describe this dynamic in terms of "externalities" and letting it run rampant pretty literally has in world on fire (and drowned under water, etc).

You bring up a very good point, and personally I attribute this to the tyranny of shareholder primacy. At least here in the states, nothing is going to change until there is an alternative that places shareholders at a level that is at most equal to other considerations, and more preferably below more important business considerations.

And yet we mitigated y2k. All with publicly traded companies. The reward was simply being open for business on 1/1/2001 without slipping a beat.

Sometimes I think as an industry we hyped too much and over delivered giving the impression that it was no big deal.

The difference is that we knew that the Y2K bug was coming. It was a given that we were going to end up reaching 1/1/2000. Here, it's not a given that backups will ever need to be touched.

Spending a lot of money to fix something that is known will happen is easy to justify. If you don't do it, you will end up losing significant amounts of money.

For something that isn't guaranteed to happen, it's harder to justify putting a lot of money into it. What is the value of doing thorough testing of backups if they're never actually accessed? One could argue that it's purely a waste of money.

"One could argue that it's purely a waste of money. "

A counterargument would be just any of the frequent news articles about company X having to pay Y millions or lost such and such money.

* 1/1/2000

Actually 3 dates, that was one. 2000 was a leap year and lots of systems got that wrong. 9/9/1999 was sometimes used as a can't happen date as well.

There are lots of human endeavors that have zero shareholders, involve computers, and the people doing them think about security exactly 0 minutes out of the day and everyone thrives. This is to say that the opinion about how it’s the bottom line blah blah blah is probably totally wrong, and maybe the security blowhards are just talking about shit that is irrelevant like 99% of the time.

And then you have huge corporations with zillions of dollars passing through their systems, employing hundreds or thousands of people, and storing terabytes of mission-critical data, who treat the people who try to help them make wise IT decisions like mindless code-monkeys who don't know anything, despite having been ostensibly hired for their expertise on the topic.

Stop blaming the victims. If every corporation had to enforce their own physical security with a private army to prevent theft or killing of their employees, then no business could be a going concern.

We have a social contract where businesses just need to put forth some minimum effort (door locks, alarm) and police and the military do the rest.

We need to enforce this social contract online and against global criminals now. If the criminals are in a state that harbors them, use 10x retaliation to that state to give them an incentive to fix the problem. Dictators don't respect anything but force.

Yeah, I've come to realize we don't have any "social contracts" on this planet. We have a planetary cancer called "humanity" that barely manages to cooperate even in their common desire for destruction of everything good in the universe.

That's not how anything works. The shareholders are the ones asking the company to do a good job with the important business decisions so that the company will do well and the stock will go up. How else would it work...?

More like, the shareholders are asking the company to do whatever it takes to turn the biggest possible profit. That might mean solid business decisions, or unscrupulous externalization of costs.

As some other comments point out, security hardening doesn't directly raise company value. It's hard to justify in the short-term.

Attitudes about it are much better now than they have ever been, but there are still a lot of folks out there that don't prioritize it.

This argument is super lazy. Businesses do tons of things which don't raise short-term value. R&D, buying insurance, hiring junior people, laying out money for reserved instances with AWS.

Guess what all of these have in common? The exact thing that security has; eliminating long-term risk. Don't blame the stock holders because your arguments didn't persuade management...

Your point of mixing security risks with fire, sanity or environmental risks is interesting.

Those externalities are historically managed by regulation and laws so they are equal rules for every company : you are not theoretically in competition with companies that don’t protect themselves against fire because it’s mandatory to do so (in many countries).

Maybe we should start to make companies responsible for not somehow mitigating the risk of being attacked. It makes even more sense when customer privacy is concerned. I know GDPR started in that direction though.

>you are not theoretically in competition with companies that don’t protect themselves against fire because it’s mandatory to do so (in many countries).

But in practice you can be. Companies can get away with not following regulations for a long time. This isn't all too uncommon in the restaurant business.

Regulations need to be cheap and easy to follow. Otherwise they'll be skirted here and there. Yes, businesses will get in trouble for not following them, but if enforcement isn't on top of most of the violations then you'll just create a system where everyone ends up skirting the rules.

I do agree with your point. However, I feel like regulations & norms tends to work over time.

If security became a mandatory thing, single actors will of course be slow to invest, but the industry by itself will adapt by easing the adaptation to regulation.

For example, in fire safety, it’s now harder to build a non fireproof building because nobody will build that since building technics are now fully integrating this issue.

Your local web agency may not do a lot more effort to backup its data due to regulation, but you can imagine that its hosting provider would be able to provide a cheap and friendly backup solution thanks to this becoming mandatory.

It’s just my thoughts.

This is very true! And that's part of what I was going for - if it's cheap enough then people will use them.

To the first point... some stores do. Hell, even fast-food restaurants in some big cities hire armed guards to keep the peace. Have you ever seen a McDonald's with armed security? I have.

And in any case the threat surface is way different. If I hold up the local Best Buy at gunpoint I'm not walking out with their entire customer roll. But if I hack their POS, there's a pretty good chance that I am.

> If they don't want to hire hardcore commandos to protect their property, maybe they deserve to get robbed/looted?

In places where there is sufficient law enforcement, this is not required but even then some extra-high-risk shops like banks still do. In places where there is not sufficient law enforcement, like failed states, favelas and the internet, almost all shops should hire guards (or their digital equivalent) or risk being robbed. In this particular case, the long term fix is to extend the rule of law onto the internet but until that time having good security is not optional.

> 2a. If a $2m ransomware attack hits your organization every 5 years, and it would cost you $1m/year in talent & resources to harden against this, you SHOULD let it happen because it's cheaper. Just patch the vulnerability each time it happens and try to stretch the next ransomeware attack to more than 5 years away.

No, that's just brain cancer banks board directors can opt for.

You simply skipped over customers data bying lost. Oh, insurance give you the money, court settlements will be payed, it's cheap ! For you.

And by not having that "talent & resources" you streach your own business beyound sanity. Like that "new telecoms" breed that do not own any cables or antennas. Just marketing, H&R and finances. And lacking any competence (becouse no infra, what they could possibly do ? :> ) call center. And then they outsorce their finance. And they go down or sell themselves when zefir blows...

Your thinking style promotes egg shell type of business. But it's cheap !

I would consider these irrational reasons.

1. Yea if you don't lock your store you will get robbed and be held liable for the loss, your insurance won't pay out because you didn't sufficiently protect your liabilities.

2. Businesses buy insurance and secure their equipment because they are held liable for this, the cost is included in the cost of the product.

2a. If a ransomware attack hits your organisation it will destroy your reputation because customers will notice that you don't take their protection seriously, they will leave for a more expensive but reliable product that takes their business and clients seriously and actively works to protect them.

I get your arguments they just don't make much sense from the perspective of a business and its liabilities.

"If a ransomware attack hits your organisation it will destroy your reputation because customers will notice that you don't take their protection seriously, they will leave for a more expensive but reliable product that takes their business and clients seriously and actively works to protect them."

In many cases, you cannot really switch.


Ransomware just hit ticket machines on a UK railway. I do not think that this will make people who regularly take that railway seek for alternatives.

I think buying insurance would be a better analog to having good backups. You save some money by not buying insurance, just like when you skimp on IT infra. Warehouse fires / ransomware gangs will cost more if they happen, but if they don't you came out ahead (I guess).

I think having working backups is closer to locking your store and having security cameras than it is to hiring a team of soldiers.

"If a $2m ransomware attack hits your organization every 5 years"

The question is - how do you know that it is going to hit you every 5 years?

This isn't a completely random event. If you build up a reputation of being a soft target, other hackers will try to dip their beaks, too. And there is a lot of them out there.

Paying even one Danegeld attracts more Vikings to your shore.

> they don't want to hire hardcore commandos to protect their property

Brick and mortar stores don't need hardcore commandos. The recent surge in ransomware is actually a good thing as people slowly start to care about the obvious: that if they build their business on something that has a weak link, breaking this link will compromise their business, so it's their job to make sure it never happens.

This means asking awkward questions to people who are in charge of your IT infrastructure, whether in house or outsourced. What happens if this computer room is set on fire? What is our strategy of dealing with ransomware attacks? How long it will take to rebuild the systems after they are compromised? These are valid questions to ask as a business owner, and if you don't know the answer to them, you are to blame when the worst happens.

On Point 1....Yes; and besides that most Brick and mortar stores have insurance for catastrophic loss...so essentially they have a working backup.

And most physical items in a store can easily be replaced. If a criminal is holding your shop ransom, it's not for intellectual property.

> (My point is that incompetence doesn't make it morally ok that a criminal thing happens to someone)

There is negligence, however. If you leave the door wide open and unsupervised and something gets stolen your insurance company will be much less understanding. That does not say anything about how theft is "morally okay", just that negligence is not okay.

It strikes me that companies need to look back at covid and consider the value IT provided to their business during that period, and the extent to which it has grown their business since the 90s.

Then take that and allocate a reasonable chunk of that growth (say 15%) to ensuring that can continue, through ongoing investment in IT. Their alternative is to abandon the internet, go back to paper, and dump their IT costs (and boost to business).

Unfortunately though, as long as there are bean counters trying to cut every outgoing and outsource every cost, and play accounting bingo to turn everything into monthly opex, it's unlikely to change.

Most companies need skilled technical people because the sheer aggregation of risk in a few outsourced providers (see Kaseya recently) shows that they won't be top priority when something hits the fan. If they want to be top priority, they need the people on payroll and on site. Not everyone needs a group of rockstar 10x-types, but we do definitely need better fundamental IT knowledge and ability to solve problems. And business needs to make clear this is what's needed in terms of job adverts and compensation - supply tends to rapidly learn to deliver what's valued... If you can convince bean counters to pay anything for it, that is...

Companies for whom a data-loss will cause significant distress/damage to the public should be penalized for that loss, up and including jail time for their officers.

Companies for whom data loss doesn't impact the public should be able to screw around however they want.

The very nature of a limited-liability corporation encourages picking up nickels in front of a steamroller. Even holding the companies liable for the damages they do isn't enough, because they'll take risks that they can't ever repay (and jailing their officers might make us feel better but it can't un-steal your identity).

> they'll take risks that they can't ever repay (and jailing their officers might make us feel better but it can't un-steal your identity).

I'm not sure exactly who the officers of a company are (genuinely) but if we're talking about the decision makers - the board and CEOs and whoever - not the employees, then most of them aren't going to take risks that make them genuinely likely to go to jail. Creating that genuine risk is probably the only way of manipulating their behavior, especially when they haven't got any real skin in the game (e.g. a CEO paid a few million and some irrelevant stock holdings in the company that make their net worth high, but which are safe to lose).

The main issue is that such people are often well enough connected that they can spin a story that it's completely unreasonable to hold them responsible for their decisions. Personal responsibility is something for poor people. Someone will find a way to make the law that says legalese for "If personal data from a company is stolen because they chose to interpret the risks in way that unreasonably exaggerated the migitation costs and downplayed the restoration costs, or because they failed to consider the risks to the personal data they ought to have known they were collecting, then the CEO should go to prison for eight to eighteen months" mean "If the CEO personally steals personal data from a company and sells it to foreign agents, then they get twenty days in a luxury resort - but if they just use it to increase the cost of customers' insurance, then they get saddled with new high paying job, poor fellow".

I do wonder if the workplace safety/health and safety approach can be used here to good effect - even if your company's activities are nothing to do with safety, your workplace has to, by law, be safe, and company officers are responsible legally.

The common message I hear about security is "it's not part of our core business". Safety was made (at least in some countries) to be part of your core business, as an unavoidable obligation. Nobody can use lack of information, capability, skill or awareness as a get-out for poor safety practices - you just have to do better. If we had the same with security, it might get the attentino of the board and its members.

I think people are afraid to take the first step. For example, you really want to continuously test your backups, and that's not a straightforward engineering problem. But, before you invest time in that, you could just test them manually every month or so. A lot of people let great become the enemy of good -- if you restored one backup successfully, you're way ahead of most of the industry. Sure stuff can break in between the manual runs, and disaster can easily strike while they're in the broken state. But, that's less likely than "oh, our backups missed a critical table" or something.

I also think doing a disaster recovery exercise every few months is also highly valuable. You might think you know how everything works, and that you've covered everything, but remove permission from staging for everyone on your team and have them build it from scratch, and you'll figure out all the things that you forgot about that silently work in the background. (Last time we did this, we realized we didn't back up our sealed secret master keys -- they get auto-rotated out from under us. So we had to provision new passwords and recreate the sealed secrets to recover from a disaster. Now we back those up ;)

(A corollary: if you've had really good uptime in the last year, your customers probably think that you offer 100% uptime. But you don't, and they're going to be mad when you have an actual outage that the SLO covers. So it might be worth breaking them for 5 minutes a quarter or something so that they know that downtime is a possibility.)

One more point that I want to make... sometimes the cost isn't worth it. If you're a startup, you live or die by making the right product at the right time. If your attention is focused on continuously testing your backups, your time is taken away from that core challenge. While a hack where you don't have a backup is likely to kill your company, so is having a bad product. So, like everything, it's a tradeoff.

Or...you know....as with all technology, just make doing the _right thing_ easier and cheaper.

I don't want to hire the "hardcore nerds" that gatekeep expertise behind ridiculous rates and tribal knowledge, fwiw. I'd much rather pay for services, incrementally adoptable technology and clear roadmaps.

> I don't want to hire the "hardcore nerds" that gatekeep expertise behind ridiculous rates and tribal knowledge, fwiw.

Seriously? Most of the Archlinux/Linux (where nerds congregate) automation and customization stuff is open source. Lots of people sharing their configs, and you can copy from them.

Also, why is it not okay for nerds to command high rates? Doctors do it. Lawyers do it. Politicians do it. And most of them suck at their job, by the way. So if a "hacker" could deliver on his promise, I think paying him a high white-collar rate is quite fair.

> Seriously? Most of the Archlinux/Linux (where nerds congregate) automation and customization stuff is open source. Lots of people sharing their configs, and you can copy from them.

Yep! So tell me again why I should employ what the previous poster referred to as "hardcore nerds"?

> Also, why is it not okay for nerds to command high rates? Doctors do it. Lawyers do it. Politicians do it. And most of them suck at their job, by the way. So if a "hacker" could deliver on his promise, I think paying him a high white-collar rate is quite fair.

I'm responding almost entirely to the idea that "hardcore nerds" are the people to employ, and "muggles" are not. Paying for expertise is great! But really, as with all technology, I want to pay for people to make things easier for those around them. It strikes me as painfully obvious that, if you're crying out to employ "hardcore nerds" because "muggles" can't handle the work, you're also probably not the type to employ at all, either way. This is just my take, however.

The point is you want hardcore nerds relative to your organization. At the coffee shop level that might just be paying someone 15$ an hour rather than winging it. It’s little different than calling an electrician rather than stringing extension cords everywhere.

At the other end, Fortune 500 companies have unique needs which requirer significant expertise. At that level trying to outsource most stuff is perfectly reasonable, but they still need internal talent to avoid being ripped off or suffering major disasters.

I am again speaking purely about the elitism and arrogance of the phrasing from the original post. In my experience, the absolute best people in a given field aren't even close to what I would call a "hardcore nerd", and those same people would also certainly not condescend to those with different skillsets via referring to them as "muggles".

Pay people that have a skillset to do a job. But note that, for most orgs, the job _isn't_ "Do a job and be a dick about it", which is what we've been talking about. Being successful in a role means working with others for a common goal. That's literally what companies are (or, I should say, were meant to be).

You are letting your own prejudices against your idea of "nerds" get in the way of logical thinking here. You biases are showing. "Hardcore nerd" as used here is not what you seem to think it means.

It seems more straightforward that it’s a difference in the understanding of the semantics, rather than someone’s biases against the concept of a “nerd”. Words and phrases mean different things to different people, especially in a global context like here, so it’s best to be cautious and generous when trying to interpret others.

I don’t think you can be at the top of any technical or competitive field without a level of obsession beyond the norm. Athletes eventually need to go beyond the weekend warrior basics. The bare minimum for Doctors is to study for recertification, but you don’t hit the top without voluntarily keeping up with the latest research etc. And as easy as it is to coast as a developer, sysadmin etc to actually be at the top requires unusual levels of dedication.

Remember many extremely gifted athletes never make into a top collage team let alone the NBA etc. Similarly someone can be unusually intelligent and well educated, but that alone isn’t enough. Many people talk a good game, but being the best isn’t about having a large ego it’s about actually solving problems and improving things.

You should employ them, because you run a business that depends on IT that you'd like to keep running without paying protection racket money. Or so I presume. If you enjoy paying a mafia, then by all means, do so.

I usually employ people that are both technically competent and capable of working on a team and communicating.

You probably have some negative experience with nerds, but what the poster meant I think is just people with interest and hard skills in computers. It doesn't mean they shouldn't be able to communicate, it means they should be able to also do other things than communicating.

Given your lack of communication skills displayed here you send up more red flags than a matador. The problem is you, not the people you're hiring.

Thanks for the feedback I guess.

edit it wasn’t really feedback. Closer to a complaint.

> Yep! So tell me again why I should employ what the previous poster referred to as "hardcore nerds"?

You employ who can get the job done. I don't really care what they are called.

I answered you on the "gatekeeping" part. If there is something I like about this community (IT) is that there is way less gatekeeping than any other profession.

> Yep! So tell me again why I should employ what the previous poster referred to as "hardcore nerds"?

Because the referred "hardcore nerds" are only ones who actually bother[1] to figure out what piece of config to copy where?

[1] most of what most people think is hardcore nerdiness is just this: https://xkcd.com/627/

> Or...you know....as with all technology, just make doing the _right thing_ easier and cheaper.

This has proven, over and over, to make things worse. Important levers get removed because Joe Sixpack can't look at more than a label and a few buttons without freaking out. I'm sick of watching wonderful technology decay to uselessness or annoyance because some rent-seeking wanterpreneur thought he could build a business off expanding the audience for everything

It's been proven, over and over, that making things easier and cheaper harms adoption? I'd love a source on that.

Adoption wasn't mentioned. Things can get worse and become more popular.

It increases adoption at the cost of everything turning to shit

The argument, "Everything being harder and more expensive is better", seems like an obvious troll take. Don't think I'll keep down this thread. Good luck!

Everything harder and more expensive is better... in complex, complicated, and intricate systems. We improve tools for brain surgery, yet expertise is still required to do that job. If we dumb that down too much, so that an understanding of the brain is no longer required, there will be a point of outcomes becoming poorer instead of better. The same is true of complex networked systems, storage, and data.

The origins of this post are about the lack of availability/viability/adoption/success with regard to backups. The gripe is that companies don't care enough about it. How would you go about increasing the adoption and success in data management/disaster recovery? Would you make it harder and more expensive, or would you make it cheaper and easier?

I don't think the answer to that question is as obvious as you seem to. Many companies care a lot about whether they're complying with the law, perhaps because complying with the law is often hard and expensive. They'd never dream of hiring someone's nephew to do all their legal stuff on the cheap. Perhaps if sysadmins were a similarly exclusive guild, companies would take those responsibilities equally seriously.

What does nepotism have anything to do with the question I asked?

You're being deliberately obtuse. The fact that nepotism can outweigh other considerations in who gets hired for IT duties (where it would not for e.g. legal duties) is an indicator of the lack of seriousness with which IT is regarded.

What are you even trying to argue against?

That was the whole point of the previous post. It is already cheap and easy and there is no gatekeeping, ridiculous rates or tribal knowledge. Just hire anyone mildly knowledgeable and do more than nothing. In 2021 there is no excuse anymore.

All your vitriol is just based on your projections of your misunderstanding of the phrase "hardcore nerd".

Well if companies have no special needs they can get Chromebooks and gsuite.

> I don't want to hire the "hardcore nerds" that gatekeep expertise behind ridiculous rates and tribal knowledge, fwiw.

I guess you never worked with lawyers (the top ones) or surgeons.

Easy to whine online, harder to raise money, hire the "hardcore nerds" and ship something!

>If people don't want to hire the hardcore nerds that know this shit

Sorry, all those people had to get jobs at AWS, Digital Ocean, Heroku, etc. years ago if they didn't want to be Puppet jockeys for the next mumble years. Frankly I wouldn't be surprised if the shared-hosting "cloud" companies didn't actively push DevOps as a way of reconfiguring their hiring landscape.

I think cloud-based businesses are way more likely than homespun solutions to actually be able to stand up a system from backups quickly, so I'm not sure what you mean about the cloud being "our downfall." Specialization is a sign of a maturity, isn't it? A few hundred years ago you could more or less know what there was to know in multiple fields of science, but this is no longer possible.

Quite, when you build a system in AWS you are hiring a bunch of hard core nerds to do all this shit for you. That's the deal. You don't need to set up your own restore server to verify your tape backups every night, you just archive your snapshots to Glacier. You can even set up a restore environment in a new VPC for zero capital costs and just spin it up once a quarter to verify the process. It makes all of this stuff an order of magnitude easier and cheaper.

> If people don't want to hire the hardcore nerds that know this shit

Put a cost to this then perhaps your argument will be believable, or you will see it's not a viable option.

This is what companies like Accenture try to do. They are not cheap and I'm not sure if they have withstood a ransomeware attack.

Butch Cassidy: If he'd just pay me what he's paying them to stop me robbing him, I'd stop robbing him.


Says a million finance people that prefer armies of compliant muggles over a few self-important wizards.

You can’t just man every brick and mortar out there with “hardcore nerds”. There simply isn’t enough of them.

"Test your backups" is so easy to say, but quite difficult for many to do.

It's difficult to do, yes. But it's difficult in a "dig a tunnel through that mountain" way, not in a "solve this mysterious math problem" way. It certainly could be done. It would just take time and money (money including hiring people).

People constantly point to the difficulty of backups and the difficulty of hardware level separations between systems. But these are merely difficult and costly. "No being hacked" and "writing always secure code" are impossible and so they won't protect from backup failure.

And yes, companies would rather spend money on other things. That's what it come down to.

It's interesting that you make that comparison, because I would choose the mysterious math problem every time. Probability of success is about equal, but it's far less dangerous to do math than to cut through a mountain.

Nitpicking an analogy goes against the point - it simplifies some ideas at the cost of modifying the details.

I think OP meant to compare a known hard task vs an unknown hard task. Both are hard, but first is known to be possible to finish, but the other not so.

I agree that's what he meant, but it's just interesting to me that for the known hard task he took something that also comes with danger. Exactly the feeling that I get when I think about restoring backups.

Well, one of the things about testing backups is that (if you are doing it remotely right) it reduces the danger when you need to actually restore.

A widely accepted method fur judging at least subjective risk when there are insufficient facts for truly objective measure is to ask how much you would bet on various outcomes - such as whether P=NP is solved before New York's 2nd. Ave. subway is completed.

Digging through a mountain actually doesn't have to be dangerous. Assuming modern methods, how dangerous it is depends on how many resources are spent on the process.

So, in a sense I think that part captures the problem. People object to testing backups using the logic of "with our shitty processes, it will be dangerous". I suppose you have companies with a logic akin to "due to being cheap arrogant, we have shitty processes" -> "due to shitty processes, our processes are fragile" -> "due to being fragile, we avoid anything that would stress our system" -> "due our system not being able to take stress, we just pay ransomware instead of stressing system with a backup".

It seems like we've reached to "throw-away enterprise" level. Build it cheap until it breaks, then walk away when the fix cost is too high. There's a cost-benefit to this. Reminds me of bandcamp and other declining sites that just vanished one day with information some would consider valuable.

No, it requires baseline professional competence.

I’ve worked with 9-figure turnover entities broken by this sort of thing and the first recommendation is always fire or manage out the CIO, risk/audit officer and/or CFO.

Everyone cries about having no money. What is lacking is an ability to identify and manage risk that puts the existence of the company as a going concern at risk.

Honestly, this is one of the problems the cloud is great at solving. We keep things in seprate projects, and restore a backup from one production project (Kubernetes, Database servers, etc) to a special DR project we have set aside. The only step we do not do is update the front end DNS, and then run our tests after our Infrastrucutre as Code deployment is complete.

Honestly, this is one of the problems the cloud is great at solving

It's also one of the problems the cloud is great at causing. It's certainly possible to architect maintainable (and repeatable) infrastructure in the cloud. But it's just as easy (or easier) to deploy a mess of unorganized VM's that were launched and configured by whoever needed one.

So a complete hot standby?

Sounds like a DR test; spin up an entire new thing to make sure the backups worked, then turn it off. Costs a few hours of extra resources.

I do something similar with my home mail server (not sure I'd go with self-hosting if I started fresh today, but it has run without issue other than maintenance for upgrades & such for well over a decade). I have a second copy running in a low spec VM. Every day it wipes itself down and restores the latest backup, I check regularly (a couple of times per week) to see that it is running and has copies of recent mail (that part could be more automated but I've never go round to it...). It wasn't setup for DR specifically, but it is on a separate network so if push came to shove I could ramp up its resources, change a few DNS settings, open up relevant bits of its firewalling, and have it take over, if the main server failed in a way it hadn't copied yet.

Could be; it's ambiguous. But yeah, not updating DNS could be just declining to test operation in the wilds of outside traffic before tearing it all down.

That said, in this way, maybe a periodic "DR" that actually replaces the current operations would be a helpful...well, not test, but..."resilience practice" maybe? It could be a new twist on continuous deployment: continuous recovery.

> There are a lot of shops that probably don't know how to recreate a machine from scratch.

You can't fix already broken processes. VMware solved this 20 years ago. It is pretty simple to restore VMs on different systems, you don't need to worry about the ball of mud when you can duplicate it.

You can't really serialize something like an ami. So how are you going to make an offsite backup? Things need to be relatively simple & reproducible otherwise you will get bitten in many different ways due to strategies like this.

If you want to backup an individual AMI you're probably doing it wrong. That probability goes to near certainty when you're talking about serializing it for off-site backup. Backup the deployment automation and the data, sure.

I agree, that was kind of my point. He was talking about taking VM's but in a cloud environment this is a bit more awkward. Copying and storing VM's securely is not hard, but transferring AMI's is the only rough equivalent I know of in the cloud world. Ideally, you don't have to do this. But for one part of my current stack, the configuration that this specially configured windows box has been lost for a while. Rebuilding from scratch has not worked each time it has been tried.

> You can't really serialize something like an ami.

Copy AMI to separate AWS account, not in your Org, and keep keys to that account offline.

oooo true, that does invalidate my point. AMI's are very easy to copy-over across accounts, i.e. to a potentially firewalled account.

Every business in the world hires accountants to balance their books, lawyers when they need to file paperwork or respond to a problem, mechanics, electricians, plumbers and other professionals to fix physical stuff. Yet when it comes to software the most they will consider is an online monthly fee for something cheap and out of the box, and when things get complicated or go wrong "this is out of our area of expertise" is always the excuse. It's really time to treat software professionals, especially when related to security, as a core requirement for EVERY organization, starting from a 2 person mom-and-pop shop.

> It's really time to treat software professionals, especially when related to security, as a core requirement for EVERY organization

Some organizations do, some don't. Long time valuation seems to agree with the former, short term the latter!

Those other groups have professional organizations with licensing. Plumbers will almost always refuse to do maintenance work on any work done by an unlicensed person.

.... software/IT just hasn't had people willing to do that.

If all you need is "person who is able to properly administer Windows systems", there are things like Microsoft's certifications.

Many places have a separate "test" or "beta" environment, with less resources and maybe a small database with spoofed entries.

Maybe we should get into the habit of going all Shiva/Brahma on that environment every week or month. Burn it to the ground, and recreate it with an automated process. Sort of like a meta CI test.

Beautiful resiliance.

This is all solved, it just takes money and typically bringing in outside experts. Occasionally it will require changes to apps but most of the time it can be retrofit.

No it isn’t easy, but it’s also not an impossible task.

I think easy or hard are not even appropriate terms. As a business owner it is not easy or hard, it just costs an amount of money. If you pay enough money, you can get the result. You just have to decide how much a disaster costs and how much preventing a disaster costs to work out if it’s worth it.

It doesn't even require outside experts, just one or two competent sysadmins on staff.

> How do you test that in isolation?

Is it easier to test after the ransomware attack?

Actually it is easier because you get to "test" it by reloading on the actual hardware, not separate hardware.

To do a solid test, you need to restore the system. It's difficult to restore a running system because, well, it's running.

That typically means a parallel environment. A single box represents a bunch of "magic values" that are stuffed in some config somewhere. Imagine several of those. "We need to restore SQL Server, the AD Server, the Application Server..." Reloading on a new environment is an easy way to find out those magic numbers, typically the hard way. Restoring on your existing hardware, with existing networks, existing IPs, etc. you're laying your software and configs over a known, working environment.

How do you test a recovery of licensed software thats bound to the machine that you have running in production, for example? "Oh, sorry, you have a duplicate license detected" and it shuts down your production system for a license violation. "Sorry, we detect the wrong number of cores", or whatever other horrors modern licensing systems make you jump through. You DO have another dongle, right? (Do they still use dongles?)

It can be far easier to do an image restore to your already working system than trying to load it up on something else. Since your production box is horked anyway, an image restore should "just work". But testing it, that's another story completely.

Who is configuring physical servers by hand. Are you a part of some kind of museum exhibit? In all seriousness, you're talking about a business which is carrying a truly staggering amount of risk. If the margins are so tight they're running hand-rolled physical servers then restoration is a moot point. They'll go out of business if attacked.

If you're not testing your backups, then they're broken with close to 100% certainty.

Yes, actually.

Before the attack, all machines are active and being used. If you screw one of them up that happens to be running something obscure but vital, the screwup is your fault.

After the attack, all machines are dead so the screwup is blamed on somebody else. If you have the email/memo (you did print it out and file it, did you not?) showing that you informed the CTO/CIO, blame for the screwup will get buried.

All this "but it's so hard" whining should be balanced against one simple fact: If you get hit by a ransomware attack, you will be trusting the attackers to restore your system.

Yep. And if it is so hard to do, how come the attackers can restore your systems when you pay them?

I think what you are describing is a perfect advert for declarative/immutable infrastructure. Yes, it may require work and talent. But that's the price to pay for resiliency. Invest and modernize your tech stack or go bankrupt. Cloud tech can help. As can Guix/Nix.

You should definitely try.

Bare metal backups are a good place to start if you have a complicated system.

A lot of environments aren't as complicated, and you COULD just pull your hard drive out, put in a spare, and test how your backup works. At least, test it while airgapped to see if it comes up at all.

Testing the backups is harder then you think... It's not like you are going to double up your entire server fleet just to see if you can restore everything from backup. You maybe test restoration of one or two servers and then assume the rest will also work. And you probably have some redundancy so that the data is saved on different machines plus a backup plus an additional backup. Then if things goes down, which is very, very unlikely, like a coordinated nuclear attack on several data-centers, but of course can happen, you assume you will figure things out... And then someone will ask: so what happens if your team also die in the nuclear attack, then you add one zero to the backup price/cost estimation.

Finding something reasonable between "we're at risk for a nuclear attack" (a black swan event) and "we're at risk for having our root credentials exfiltrated" (a daily occurrence) is not hard. I feel like these kinds of dramatizing hyperboles are nowhere near the nuts & bolts of the situation.

And honestly, why not double up your entire server fleet for a temporary build-from-scratch rehearsal? Many shops could quintuple their infrastructural costs and still sit far above being in red. Most software enterprises in the modern day don't reap economic value as a function of how well they can convert hardware resources, but human resources. Optimizing on infra costs is not a main priority for any shop I've seen. I imagine not even for an IaaS provider these days.

That's the point of running these tests though. Take a system down (by turning it off, and putting it safely in the corner). Then bring up a new machine to replace it. Go through the steps you think would be the proper procedure, then document the shit out of what didn't work and what needed to be done. Write that up so that it becomes the new procedure. The next time you run this quarterly test, you start with the latest procedures. Updating accordingly, rinse, repeat.

When you work with the old systems, you are always afraid to shutdown the machine, because more often than not, it will not boot up. And if you petition to replace it with a new machine, you may very likely hear the phrase "We are planning to move the app to cloud, so no need to build a backup in the meanwhile".

Back in the day, we had purchased a new, bigger machine, and transferred over, and everything was just peachy.

Months later, we had a power outage (scheduled I think, I don't recall).

Anyway, at some point during the transition and such, I managed to have the machines hard mount NFS across each other.

As long as one of the machines is up, everything is rosy. But cold start? They were both hanging while restoring the mount (which, they couldn't because neither was "up").

Took us about 1/2 hr to suss out what was happening and get single user in to tweak it, but...yea...exciting!

"Smart" people do this kind of innocent stuff all the time.

"What if it shuts down tomorrow?"

I think a big part of the issue is that backup processes are designed for a small number of machines failing. The the IT department restores data from backup, and manually adjusts the config and software till stuff is working.

That process works well till ransomware comes in and destroys every server and client machine at once, and suddenly you've just given the IT department multiple years worth of work to all do at an emergency pace.

This is like a sales pitch for just paying the ransom.

It probably works more often than not.

It works until it doesn't, such as in a hardware failure.

[1] https://www.anandtech.com/show/15673/dell-hpe-updates-for-40...

Seriously, I started running through all those points he made and started thinking about other things he didn't mention and then halfway I was just like fuck it.

It's a prisoners dilemma

> There are a lot of shops that probably don't know how to recreate a machine from scratch. How many systems are developed as balls of clay. Little bits added and smeared in over time until the ball just gets bigger, but each piece lost in the process. How many folks can go through their local config files and explain all of entries, how many can even tell which ones they have changed, or why? Especially when they were changed by Frank, but he left 2 years ago.

This is a non-issue with Git + Ansible, even without getting into tools like Terraform. At my dayjob, i set it up so that there's an Ansible playbook that does around 200 administrative tasks for each of the servers for a particular environment - all of the configuration is within Git repositories. Changes are applied by CI, all of the process also being thoroughly documented for local development, if necessary.

Everything from installing packages, creating or removing user accounts, setting up firewall rules, setting up directories and sending the necessary configuration, systemd services for legacy stuff, container clusters, container deployments, monitoring, container registries, analytics, APM and so on are handled this way.

Noone has write access on the server itself (unless explicitly given in the playbook, or the admins) and even the servers themselves can be wiped and reinstalled with a newer OS version (mostly thanks to Docker, in which the apps reside in), plus all of the changes are auditable, since they coincide with the Git repo history.

It took me maybe 2 weeks to get that up and running, another 2 to handle all of the containerization and utility software aspects. I'm not even that knowledgeable or well paid, there's very few excuses for running setups that don't let you do things "the right way" in a somewhat easy manner, like Windows Server. That's like picking a hammer when you need to screw in a screw - it'll do the job but chances are that there will be serious drawbacks.

So you do it anyway! The less sure you are that you can do this, the more important it is to do it now: you are more likely to remember the change that was made (and that person is more like to be around)

30 years ago mainframe companies started realizing that their mainframes couldn't restart anymore - after many years of uptime all the on the fly configuration changes wouldn't be reapplied and so the whole couldn't restart. (all hardware had redundancies and backup power supplies, so any individual component could be replaced and most had been over time) So they started scheduling twice a year restarts to test that the whole could come back up. The mainframe itself is fully able to run for years without the restart, but the configuration wasn't.

A first step would be restore a backup, then check md5s of your executables\data, after all you are testing the backups not system functionality. A second step would be to run automated component level tests, or data integrity tests on the restore to verify that hashes aren't lying to you.

The primary problem you are protecting against is that backups are broken, or corrupted. A broken, as in failure to write, backup won't restore. A corrupted backup probably wouldn't restore either, even if it did it would fail the MD5 checks. No one is expecting you to destroy and repair Prod on a weekly basis.

A failure to invest in business continuity and maintaining your systems will cost you one way or the other. Hard drive failure, fire, flood, lightning strike, or ransomeware. These aren’t new problems facing businesses.

I provide a test system in a vm. I have an installation script for the db. Dump the db and replace paths, import, start application, done. This can then be used to test the new release before deployment.

bonus: the vm can be used for debugging or used as playground

Be mindful of what customer data you’re handling though.

Underrated problem of testing backups, generally. Making it easy/common can also mean greater exposure to risk of data leaks, if you don't take great care.

You don't even need system images. Just the data. So SQL backups and just file directories.

However for a company of our size, it's not really possible. I was talking to my lead about this last year and we have about 50TB of our entire system I believe stored in our databases. All he said was "we have one, but hopefully I'm retired before needing to find out how to restore from it."

Ransomware gangs go after large organizations, those that can afford testing backups, not after pop and mom coffee shops.

Is this true? I think big organizations just end up in the news more often.

Third-hand evidence, mom n' pop shops don't have the expertise to pay in bitcoin. Or perhaps, more charitably, they've been inoculated by previous low-effort scams and smartly assume bitcoin payments are another.

Honestly I'm not sure many large IT depts have it either, but for $1M+ the attackers can afford good customer service.

There was a school group featured in a report on BBC Radio 4 recently, they did handle quite large budgets, but they were also an educational charity. One of the governor's refrained (paraphrasing) "they must be totally immoral to target us"; I very much doubt they were targeted beyond 'can we exploit this box'.

If your systems are in this state better pray you never experience hardware failure.

Well, that's why you pay more for PaaS than hosting it yourself, right? You can make it someone else's problem, to an extent.

Also, ransom gangs can take over your backups.

Keep offline, airgapped backups.

If only virtual machines and networks were a thing.

One thing I've always wondered: How do you prevent ransomware from ruining your backups, too?

Lots of ransomware tends to be "delayed" – it can't encrypt everything instantly, so it encrypts a little bit each minute. During those minutes, isn't it conceivable that the now-encrypted files replace your old backups?

I suppose this isn't really a "backup" but rather a "mirroring strategy." But for certain kinds of data -- photos, video, large media -- can you really afford to do any other kind of strategy?

The other question I have is related to that first point: since ransomware can't encrypt everything all at once, how is it possible for your system to continue to function? As you can tell, I'm a ransomware noob, but it's quite interesting to me from an engineering standpoint. Does the system get into a "half encrypted" state where, if you rebooted it, it would fail to boot at all? Or does ransomware targeted at businesses tend to be more of a surgical strike, where it targets and wipes out specific datastores before anyone notices?

(It's the "before anyone notices" part that I'm especially curious about. Isn't there some kind of alarm that could be raised more or less instantly, because something detects unexpected binary blobs being created on your disk?)

Other replies have backed into this, but the best solution is to

1. Use a COW (copy-on-write) filesystem like btrfs or ZFS

2. Set up snapshots to be taken periodically (hourly/daily) and sent to a different host or volume.

3. Monitor disk usage: if you get hit by a cryptolocker, your disk usage will approximately double as it rewrites all your files.

4. Manually backup snapshots or the full volume to offline storage every N days/weeks/months.

In case you missed it, I wrote this up a while back: https://photostructure.com/faq/how-do-i-safely-store-files/

TL;DR: Lots of copies keeps stuff safe!

Another important step - make the offiste backup 'pull' based - so the credentials to access the data already there do not exist on the system being backed up.


My homelab, admittedly low complexity, uses a NAS device that powers on at a scheduled time, mounts the shares it needs to, runs the 'pull' backup, unmounts, and powers itself off.

The intention being that, in the event of intrusion, it's presence and accessibility are limited to the window of time in which it's performing the backup.

Additionally to that, a rotating set of removable HDDs as backups of backups that also get spread off-site amongst family members houses.

I really should go into offering backup solutions to local small business...

5. Don’t use (only) direct disk access to backups

I think all of that is lost if the cryptolocker just formats any volumes named Backup.

The copy-on-write can’t just be enforced by the filesystem. If this computer can permanently delete content on the backup system, then so can the locker.

> 2. Set up snapshots to be taken periodically (hourly/daily) and sent to a different host or volume.

Does btrfs or ZFS have a way to pull snapshots in a way that they are encrypted on the client side ?

Ideally you could hire a third party to pull these backups from you, have them warn you when the process fails or doubles in size (the data is being encrypted) and still be able to prove that there's no way they can access the data. And then the private master key(s) go into a safe.

2. Always pull the snapshots from another host with tools like syncoid. This host must be inaccessible from network so it can’t be infected.

What's the point of COW? There are 0 tools that restore the "orignals" of the copies on write.

In this context COW typically comes with cheap snapshots. And restoring from snapshots is trivial.

> I suppose this isn't really a "backup" but rather a "mirroring strategy."


> But for certain kinds of data -- photos, video, large media -- can you really afford to do any other kind of strategy?

Yes. Make the backup system for that big slow-changing data a moderate amount bigger than the primary data store, and then you can have months of retention at low cost.

If too much data changes at once then it should go read-only and send out a barrage of alerts.

You just use cold backups.

For home, I have two USB disks I use for backups and I alternate which I use. Neither is plugged in at the same time. At least one is always "cold".

For larger scale, you can do the same thing with tape. One tape backup goes off-site (perhaps) or at least cold.

The cost isn't that high. A USB spinning disk may cost a third to a 5th that of your SSD hard drive. And you can get hard drives up to 18TB now. But even a portable 2TB USB-powered 2.5in external hard drive is only $60, so this is a cheap and robust strategy.

> For home, I have two USB disks I use for backups and I alternate which I use. Neither is plugged in at the same time. At least one is always "cold".

Why not have one of those drives off site, and rotate every so often? Carry the drive with you when you swap, so that the original and all backups are not in the same place at the same time.

I have three external drives. Originally I planned to keep two offsite, but I don't have an offsite office anymore.

That mitigates the risk, but it relies on the assumption that you'll notice when files get encrypted. It's not a guarantee, malwares can hide themselves long enough for you to plug both disks before you notice.

True, but I don’t rotate them every day. Maybe once a week or month. Unlikely to not notice by that time.

With tape backup, you might keep a tape in cold storage for years.

Your general backup strategy should follow something like doing full/incremental and/or snapshot based backups. So in the case of your media, if you do a daily snapshot then your daily backups would be very small. And if the media doesn't change that often you can keep weekly copies for several weeks, monthly copies for many months, and several years of yearly snapshots.

The other strategy is with tape rotation. You need to have about 30x the tape storage as you have online storage, so you can keep 7 yearly backups, 12 monthly, 6 weekly (all full backups), and 7 - 14 daily incremental backups.

I use Amazon S3 buckets, with the web server having an IAM user key with very limited access that only allows writes.

No reads/listing.

So if the web server gets hacked, the hacker can only write to the bucket, but has no way to know what is already there or access anything in it.

Make sure you have versioning turned on as well. Even if attackers can figure out your naming conventions and overwrite, you just go back to the first version and everything is good.

Would be better if there were a “create/write to new file” permission which doesn’t allow overwriting existing files.

> I suppose this isn't really a "backup" but rather a "mirroring strategy." But for certain kinds of data -- photos, video, large media -- can you really afford to do any other kind of strategy?

Saving diffs/snapshots will solve the issue. As long as the file doesn't change the cost is almost 0.

If you decided to use incremental backups to mitigate against this, what are your favourite tools or providers? Backblaze with duplicati? Duplicity with s3? Rsync and rclone without it being incremental?

Granted I'm a complete amateur here, but still I wonder if my approach is helpful. My home backup drives are running on Raspberry Pi, so I have control over what other software runs on them. I've been writing Python programs that run on the Pi and monitor for changes. My hypothesis is that if I do the right analysis, I will notice changes to the files unless the ransomware is capable of actually infecting the software running on the Pi itself.

I believe that I would detect unexpected binary blobs. Of course this depends on me writing the programs correctly, and a lot of other assumptions, but it might suggest a way to protect backups.

My backup "drive" is anything but passive.

The change you detect would be your files being encrypted, this seems pointless unless you keep automatically rollback changes that don't seem okay to you (which means you need backups for your backups to be compared to...).

Indeed, this assumes the correct files are somewhere but not accessible to the family computers.

Linux executables can be modified while still running, this is what differentiates windows updates, many requiring rebooting, from Linux updates. Once rebooted, then you realize your executables are tainted.

This really hits home, decades ago - I was working at this place that did daily tape backups. I remember thinking, this is unreal - there's literally a room filled with tapes.

One day, I asked if they ever had performed a recovery off of the tapes, as I questioned if the tapes were even being written to. (NOTE: Backups was not my job at all. )

Why had I brought this up? I would be in the server room and never saw the blinky lights on the tape...well.. blink. Everyone literally laughed at me, thought was a grade A moron.

A year later, servers died... Pop'ed in the tape... Blank. No worries, they had thousands more of these tapes. Sadly, they were all MT. They had to ship hard drives to a recovery shop, and it was rather expensive.

I left shortly after this.

> Everyone literally laughed at me, thought was a grade A moron.

A note for anyone else in a similar situation - a good team doesn't ridicule someone for questions like these. A responsible leader should have cited a time in the past that they did a restore or a spot check, and no one should have laughed. The laughter sounds like masked fear or embarrassment.

This goes for any team. "How do we know this function of our job does what we think it does?" You should have an answer. Now, I've only worked in R&D software and not in IT. But IMO IT teams should work the same way in this regard.

> a good team doesn't ridicule someone for questions like these

A good team won't ridicule any questions. If you're on a team that ridicules your questions, that's a huge red flag. Get out as soon as possible!

> A responsible leader should have cited a time in the past that they did a restore or a spot check, and no one should have laughed. The laughter sounds like masked fear or embarrassment.

... or assigned the engineer asking questions the task of figuring it out!

Right? I regularly ask seemingly-rhetorical questions "just to make sure", and this approach helps me catch tons of otherwise-unnoticed issues. Being curious and vocal is a valuable approach in any technical business, IMO.

Yeah, they didn't even doubt themselves for a second, instead of challenging their own beliefs or at least showing the person that the backups were working before laughing.

Pop'ed in the tape... Blank.

Modern tape drives (like LTO) will at least do a read after write so you should never end up with blank tapes after a backup. But still no excuse not to do restore tests.

And make sure you're not storing your backup decryption key in the same backups that are encrypted with that key. Likewise, make sure you're doing restore tests on a "cold" system that doesn't already have that decryption key (or other magic settings) loaded, otherwise you may find out in a disaster that your decryption key is inaccessible.

That assumes that you're even doing the write in the first place, and not just logging a million "Error device not found" on your backup task. Speaking from personal experience, haha.

OP implies something like that is what was going on:

> as I questioned if the tapes were even being written to.

> I would be in the server room and never saw the blinky lights on the tape...well.. blink.

To be fair, it seems like a lot of backup systems were (properly) designed to recover data for when a single computer or drive or database fails or gets overwritten or specifically attacked -- but not for an wide-ranging attack where every networked computer gets wiped.

All the stuff in this article is great scenarios to think about (recovery time, key location, required tools), but it's still all at the backup design phase. The headline of "test your backups" seems misleading -- you need to design all these things in before you even try to test them.

It seems like a real problem here is simply that backup strategies were often designed before Bitcoin ransomware became prevalent, and execs have been told "we have backups" without probing deeper into whether they're the right kind of backup.

In other words, there's no such single thing as "having backups", but rather different types of backup+recovery scenarios that either are or aren't covered. (And then yes, test them.)

IIRC in the Maersk NotPetya disaster they had to look worldwide for a domain controller in Africa that happened to be off at the time, but fix and patch it before bringing it online. Restoring from backups would leave you vulnerable if a worm is still bouncing around. It takes a big coordinated effort for larger companies.

Also the article doesn't seem to consider the fact that some hackers are now threatening release, not just destruction. Embarrassing emails, source code, and trade secrets. Backups won't help at all.

Yes, test your backups regularly.

When I worked for a large insurance firm, we would run drills every 6 months to perform off-site disaster recover and operational recovery tests to validate our recovery processes. Everything was tested from WAN links, domain controllers, file backups, mainframe recovery and so much more. We were more or less ready for a nuke to drop.

Obviously this costs money, but if you're an insurance firm, not being able to recover would cost way more than running DR and OR recovery drills every 6-12 months.

Why would some companies be so diligent while others get caught with their pants down? Can we tell which is which? Might be a good etf to invest in.

Typically only companies that had a disaster happen to them or their customers (like that insurance) will have the institutional awareness. All the rest will file the risk somewhere with alien abductions and toilet paper shortages. When you tell them what could and will happen they will just shrug it off like you are trying to sell them useless bs.

> When you tell them what could and will happen they will just shrug it off like you are trying to sell them useless bs.

Exactly. It's that mentality which drove me to small-scale contract IT work for smaller "mom and pop" organizations. Give them a fair price and do good work and most of them are happy to have your services, treat you with respect, and are often more than happy to trade knowledge and services for equivalent exchange of same. This can lead to much "win /win/everybody wins!"

And if you take a contract that plays out in an unsatisfactory way, it's easy to simply turn down further contracts from the one problematic customer. More time to give your loyal customers, or hunt down a better customer to replace the bad one. ;)

I think there are 3 stages:

Inexperienced, will buy anything that sounds good and trustworthy, no matter whether snakeoil or real deal, because they don't know better. That is most mom&pop shops.

Burned, will buy/do nothing, because when they were inexperienced they were sold/told crap. Now they trust no-one and also think they can save money.

Experienced, when they had a real disaster in the burned stage, recognized their lack of proper tools and manpower as a reason. Now they try to evaluate suggestions properly through inhouse expertise. Only possible if large enough.

> Inexperienced, will buy anything that sounds good and trustworthy, no matter whether snakeoil or real deal, because they don't know better. That is most mom&pop shops.

Since switching to contract IT work and coming in much more direct contact with "mom & pop" shops than I did in prior years, I've come to realize that most "mom & pop" shops are far more business savvy than they're often given credit for. They mostly just don't have access to any sort of fair and reasonably priced IT folk who ain't tryin' to scam them outta house and home.

I've found that by offering that fair price and quality work, I can gain a level of loyalty that results in me not even needing to advertise my services to have more than enough work and profit to keep me goin' and happy with my career choice. "Word of mouth" is by far the best advertising you could ever ask for anyhow… Nothin' beats trust for generating "brand loyalty" and return business.

> Experienced, when they had a real disaster in the burned stage, recognized their lack of proper tools and manpower as a reason. Now they try to evaluate suggestions properly through inhouse expertise. Only possible if large enough.

I've come across these folk as well. They also tend to be able to recognize instantly when they're not bein' taken advantage of. This type has always been a good loyal customer type worth putting in a bit of extra effort for, too. Having been "burned" before, they recognize the value of payin' a fair price to an honest hardworkin' tech.

> Burned, will buy/do nothing, because when they were inexperienced they were sold/told crap. Now they trust no-one and also think they can save money.

The saddest example of the three, because they'll continue to suffer because their trust had been abused.

Money and time. Throughout my career, there's never been a moment where we're like "All right, let's sit down and assess where we are", or a "Ok we're finished with software engineering, let's do some chaos testing". There's always something that seems more important to do.

I'm now convinced most people are overworked and most SWE projects are overcommitting. I mean I'm currently the sole responsible for two codebases of nearly 300K LOC total, rebuilding the one into the other. At my previous jobs this would involve a fully staffed team of 4+ engineers, tester, product owner, etc - and they could probably use more.

Considering it was almost 20 years ago, just as the Internet was starting to take off and it certainly pre-dates things like Cloudflare, things like this were pretty mandatory. Couldn't tell you if it's still the case, but it did make me appreciate having a good DR and OR plans if the nukes did drop.

> Might be a good etf to invest in.

Nah, it gets way outperformed by the "too big to fail bailout-monkey" ETF.

Unfortunately you need political connections to know the composition of that ETF.

Just go through Cloudflare's list of customers.

You would hope that an insurance company would be good at assessing risk.

Not really just a backup and restore. You need to be able to rebuild from zero. I think of it more as a disaster recovery exercise, and for those… you are only as good as your last _real_ rehearsal. That may mean a suitcase of tapes, a sheet of paper, and a rack of blank servers. Then you have the problem of release of confidential information. For this reason, the sweetest target for ransomware is the company who can neither recover their data, nor can they afford to have it publicly posted or monetised by the gang. Oh and you do store those backups offline dont you? Ransomware gangs have been known to loiter and observe their target for weeks to learn how to sabotage backups when the time comes.

One thing that has irked me about everyone's flippant comments about moving to the cloud is that the "devops as a recovery mechanism" generally only works for single-app startups or small shops with only a few dozen simple VMs at most.

Some of my customers have thousands of VMs in their cloud, and they aren't cloned cattle! They're pets. Thousands upon thousands of named pets. Each with their individual, special recovery requirements. This then has a nice thick crust of PaaS and SaaS layered on top in a tangle of interdependencies that no human can unravel.

Some resources were built using ARM templates. Some with PowerShell scripts. Some with Terraform. A handful with Bicep. Most with click-ops. These are kept in any one of a dozen source control systems, and deployed mostly manually by some consultant that has quit his consulting company and can't be reached.

Most cloud vendors "solve" this by providing snapshots of virtual machines as a backup product.

Congratulations big vendors! We can now recover exactly one type of resource out out of hundreds of IaaS, PaaS, and SaaS offerings. Well done.

For everything else:

Fantastic. No worries though, I can... err... export the definition, right? Wrong. That doesn't work for something like 50% of all resource types. Even if it "works", good luck restoring inter-dependent resources in the right order.

Okay, you got things restored! Good job! Except now your DNS Zones have picked a different random pool of name servers and are inaccessible for days. Your PaaS systems are now on different random IP addresses and noone can access them because legacy firewall systems don't like the cloud. All your managed identities have reset their GUIDs and lost their role assignments. The dynamically assigned NIC IP addresses have been scrambled. Your certificates have evaporated.

"But, but, the cloud is redundant! And replicated!" you're just itching to say.

Repeat after me:

    A synchronous replica is not a backup.

    A synchronous replica is not a backup.

    A synchronous replica is not a backup.
Repeat it.

Do you know what it takes to obliterate a cloud-only business, permanently and irreparably?

Two commands.

I won't repeat them here, because like Voldemort's name it simply invites trouble to speak them out loud.

This is a post written by a person who's been at this a while, and has spent at least a portion of that time as Cassandra.

Yeah, this guy has worked in an enterprise. Reminds me of the experience in big non-tech corporations. Years and years of terribly executed legacy SaaS and PaaS building into a permalayer of crap mostly created by contractors that will be gone in 6 months. The top level management don't understand tech, so they pay for cheaper and cheaper contractors to "maintain" the permalayer of crap. It's a never ending spiral of pain, hiring turnover, and bad code.

I guess the only thing we can do is to pray. In those enterprises probably no one knows everything and if one of them breaks who knows what happens...

“Your PaaS systems are now on different random IP addresses and noone can access them because legacy firewall systems don't like the cloud.”

The whole post should be printed out and pinned to the Kanban/Scrum/whatever board of every infrastructure/DevOps team, but this sentence in particular. This property of Azure (and I imagine every other cloud provider) was one of the nastier fights we had with the guys who run the on-prem firewalls.

Some companies just do Cloud so their yearly report contains the word Cloud you know?

Mostly they seem to do it to move CapEx into OpEx.

I never quite understood why spending more money is better if it comes from a different bucket. I'm sure there's some explanation that only makes sense if you don't look too closely.

I always assumed it had to do with taxes... CapEx can't be written off in it's entirety, but rather has to be calculated as depreciation over time. OpEx however, is a business expense can be written off completely.

I'm not sure how that really benefits an organization with a time horizon that is longer than the time it takes to depreciate a server. You still get to write off the full price of a server, it just takes a few years longer (and it was probably cheaper!). But then again, I'm not in finance...

This is a work of literature.

I’m a novice and am dealing with data that isn’t too complicated, large, or important. My approach is to build restore directly into the normal workflow. I test my backups by using them each week.

A stack is spawned from a database backup and once it passes tests, replaces the previous one.

Not sure how smart this all is but my goal is to learn through application.

The main reason I think this normally isn't done is that it requires downtime to do safely most of the time.

In order to not lose data, you can't have any writes between the time when the backup was taken and the present, or you need code which reconciles additional state and adds it onto the backup before switching over.

Normally, backup restoration is done during a maintenance window where the site is disabled so no writes can happen, and then usually a window of writes are lost anyway (i.e. 'last X hours, since the backup was taken')

For your use-case, do you just have very few writes? Do you lose writes? Do you have some other clever strategy to deal with it?

It should be noted that not everyone is a global company.

A typical bank / credit union may only serve one town. As such, it would be socially acceptable to designate 3am to 4am as a regular maintenance window where services are shutdown.

Good point. The 5 minutes of downtime is simply tolerated. My captive audience are dozens of humans and thousands of robots all willing to try again.


>A stack is spawned from a database backup and once it passes tests, replaces the previous one.

I like this approach, although risky if you mean you routinely replace the production db.

My preferred setup is to automate restores into a pre-prod environment, apply data masking and run tests there. It's not a replacement for full DR exercises, but at least it automates the backup verification process as part of your build system.

It puts you 95% of the way there while providing many side benefits. This is what my team is targeting. Prod deployments are 0-downtime, but all the other deployments are fresh.

> A stack is spawned from a database backup and once it passes tests, replaces the previous one.

The stack replaces the previous one or the backup replaces the previous one? While having a single backup is a good start, you might want to consider keeping several backups so you can restore from, say, a data entry error that you discover two months after it happens.

The new cloud stack replaces the old which is then decommissioned.

Database images are immutable and a history of them are kept.

3-2-1 Backup Rule:

Three copies of your data. Two "local" but on different mediums (disk/tape, disk/object storage), and at least one copy offsite.

Then yes, absolutely perform a recovery and see how long it takes. RTOs need to be really low. Recovering from object storage is going to take at least a magnitude more time than on-prem.

Also, storage snapshots/replications are not backups, stop using them as such. Replicating is good for instant failover, but if your environment is hacked they are probably going to be destroyed as well.

Don't just test your backups. Make sure your automation can't clobber or tamper with your backups. This includes both local and disaster recovery sites. Give your pen-test team super-user privs on your automation and give them Amazon gift cards if they can tamper with your backups. If they can't mess with the backups, give the gift cards to whoever designed and hardened your infrastructure.

Why not actual money? Amazon gift cards leak metadata to Amazon, and can only be used to buy stuff from Amazon.

I think logistically its easier for a team within an org to spend "their" money on gift cards for intermittent activities and hand them out as necessary. Getting stuff onto the actual payroll is probably more complicated.

Hey Payroll, edoceo needs an off-cycle bonus of $$$.

Your manager should be able to write a similar email.

At least at the company I recently left, this kicks off an approval process within both the HR and accounting departments. Meanwhile an Amazon purchase (and thus an Amazon gift card) is something I could put on my card and expense, or approve someone else doing myself.

I get it doesn't make sense, but that's corporate America for you.

That said, be careful of the gift card route. Depending on the amount you can find yourself in the wrong side of the IRS that way.

I wouldn't want to embezzle funds and commit income/payroll tax fraud just to bypass paperwork

It doesn’t work like that in every place in the world. Those gift cards also aren’t that easy. It’s all taxable benefit to the employee and a cost for the company to put on the books. That needs to be justified and tax office may really slap you on the wrist for doing so.

Forget the backups - the pen test team can just produce fake emails requesting they get the $$$.

I work in a large company and a manager cannot do that. The most he can do is argue to assign you a bigger yearly bonus (at the expense of your coworkers).

It used to be that you could give employees gift cards up to a certain amount as awards and it would not be considered taxable income (but I believe that's no longer the case).

Any gift(s) up to a total value of ... $13k? -ish? I don't know what the limit is now. Google's cafeteria is (was? depending on that limit...) an example of how to benefit employees without causing the employee additional tax.

Setting aside the gift card bit (addressed in above comment), $13k sounds way too high. Like two orders of magnitude too high.

From irs.gov

> Whether an item or service is de minimis depends on all the facts and circumstances. In addition, if a benefit is too large to be considered de minimis, the entire value of the benefit is taxable to the employee, not just the excess over a designated de minimis amount. The IRS has ruled previously in a particular case that items with a value exceeding $100 could not be considered de minimis, even under unusual circumstances.

Which about matches with what I've seen at BigCo.

$40 box of tools as a gift? Did not show up on my paycheck.

$150 electronic device as a gift? Showed up on my paycheck.

There's another about other fringe benefits being taxed - with a prime example being tech cafeterias.


In the past few years, guidance has shifted toward accounting for employer-provided food with employee income as well.

I don't think that's actually enforced yet though. I would bet the IRS wins that particular fight, but it'll take awhile.

That's a different issue - IRS clamped down on gift cards and non-cash compensation that used to be considered de minimis. Now most employers gross up and report any gift card type gift over ~$5.


And they support Amazon.

Oh, of course. Can't believe I forgot the biggest reason.

Good point. Cash bonus and maybe RSU's if they company is public.

Which organizations currently do this?

There is another approach. Scrub old data you don't need.

2-3 year email retention on corp email.

Paper files for sensitive client info (or don't keep it).

We can reinstall office / windows / active director etc.

Mandatory 2FA on google suite?

Git codebases on github etc for LOB apps (we can rebuild and redeploy).

We use the lock features in S3 for copies of data that must be kept. Not sure I can even unlock to delete as account owner without waiting for timeouts.

> 2-3 year email retention on corp email.

I work at a big company that does this, but with 6 months. While I understand why they would do that, often some knowledge is lost with these emails. And usually I don't know what's knowledge and what's junk before I need it.

On the other hand, it's a good way to make sure that your processes are written somewhere and people don't rely too much on email archiving. Sadly, that's something I didn't realize until it was too late.

I also work at a company with a shortish email retention time. The explicit goal is to force people to move important information to places where it's accessible for others.

A lot of sensitive data is legally required to be kept around for much longer than that (email is up to 15 years if you're a publicly traded company).

Most of the remaining suggestions aren't relevant to ransomware even if they're otherwise mostly fine recommendations. 2FA won't stop ransomware, or data destruction. Redeploying code and reinstalling active directory doesn't restore customer or user databases. Paper files are not accessible when they're needed, are easily damaged or misplaced, and cost a lot to store and index (if you're referring to keeping them as backups then yes you're making a form of the argument of the post just in a very expensive and inconvenient way). Read-only S3 copies are almost certainly falling in the realm of backups... but is also a relatively expensive way to do it for most organizations larger than a start-up.

Offline and offsite backups are the cheapest, most effective tool for keeping copies of your data in your companies possession due to unforeseen circumstances and they protect against a huge number of potential events beyond just ransomware. It's negligent IMO for executive officers of a company to not have invested in a solid, tested, and comprehensive backup solution.

Most attack campaigns start with compromised credentials, so MFA absolutely helps prevent ransomware.

The two most common attack campaigns are drive-by malware infections done through phishing links and infections via compromised documents. Neither of which involve credentials.

Attacks involving credential stealing almost never involve malware.

Source: The 2020 Microsoft Digital Defense Report https://www.microsoft.com/en-us/security/business/security-i...

Only about half. The rest is through emailed trojans and RCE bugs.

Offsite works fine, but for many people offsite is still accessible by current credentials.

The S3 options around this work well with object lock or similar.

I remember the old school tape methods (we had a rotation of who took the tape home). This was truly offline

You can also enable versioning and deny all s3:DeleteObject actions via the bucket policy.

It won’t stop a root account compromise unless you’ve got a multi-account setup going (as they could edit the bucket policy).

But if you’ve not got any monitoring, they could also just remove the lock on the objects without you noticing and wait for the timeout themselves.

They also threaten to leak your data if you don't pay. They know a lot of orgs can restore the data and won't need it decrypted. There's no real defense against this (other than good security practices).

You can never prove that they won't leak your data after paying though. I'm not a CEO/CTO and haven't had to make these decisions, but from my perspective it's an empty promise that by paying them they will actually keep their word and not leak your data.

If they do leak your data after you pay, then it'll ruin their reputation and make it less likely for other victims to pay in the future.

Which is why we should do „fake“ ransomware attacks where a company „pays“ and gets „betrayed“ when the attacker still leaks some „super important“ data after the payment.

What are the hackers gonna do? Sue you?

I mean, that's a whole nother kettle of fish. At that point they have you in the hook for indefinite blackmail, because after you pay they still have the days. Is that actually common, though? I think most of these ransomware cases are simply pay-to-decrypt.

Almost all of the ransomware gangs now also exfil data and use that as additional leverage.

yes, double extortions are getting common because people rely on backups

There's no glamour in back ups.

Even less glamour in great back up.

Even less in testing back-ups.

And there's a lot of glory in "slashing the IT budget with no disruptions in operations, cutting the fat is good for business".

This is a good nutshell of the evolution in internet systems operations over the past 10 years. The boss didn't go to GSB just to watch some unfungible college-dropout make their own schedule every day.

Isn't this more of a "We don't want our client / customer information released to The World At Large" question? I would think most business entities have backups of some kind (Scripps being the only exception I can think of), and will pay the ransom to keep any sensitive information off the market.

Edit: Should have added that I find it hard to believe that companies have PB of data backed up. I could believe GB, and maybe even TB, but PB is pretty hard to swallow. The past three companies I've worked for (25 year span) had, at most, a couple of gigs of sensitive information that couldn't be easily replicated.

I also find it hard to believe that a ransomware gang could encrypt 50 Petabytes without anyone noticing it. It would also take some time to decrypt 50 petabytes if you paid the criminals and got the key.

And would you trust you data after criminals had access to it?

Ransomware attacks rarely indicate any data leakage; all they usually do is prevent you from accessing your own data (by encrypting your drive with a key you don't have access to).

The current trend is double-extortion ransomware attacks - encrypt your copy of your data, and threaten to release it publicly as well.

[1] https://www.cybereason.com/blog/rise-of-double-extortion-shi...

These days attacks labeled “ransomware” in the news seem to be hybrid attacks. There usually is sensitive data exfiltration in addition to encrypt-in-place.

Reminiscing - I once worked on a "backup server" that ran NetWare 3.11 and QIC80 drive.

The customer was quite convinced their backup was fine. They could hear the QIC going with the standard weeeeee, tick, tick, tick, weeeeeee, tick, tick, tick sound every night as they were leaving.

When I ejected the drive, the tape was nearly clear. All the material have been wiped clean off of it, and was sitting at the bottom of the server in a neat gray, brown pile (at least that is how I remember it, & I am sticking to it).

Since they never had to restore, they never checked.

How does one learn how to do proper backups? Using my throwaway as I suspect my company doesn't do them (and even if they do, I don't know where they are or what to do with them as the main engineer left on my piece of software).

That entirely depends on how your infrastructure is set up. If it's in AWS, just set up rolling snapshots of your EC2's and databases. If you can store everything in Github, that works too. If it's on local machines, something like Backblaze is cheap and easy to set up.

I'm not a backup specialist, but do run my own backup infrastructure, so a few starting points. Not saying they will work in all environments, or even any scale environment though.

1. Figure out what's valuable and what needs backed up. Some will say "back up everything", but then you don't know what you need to restore and how to restore it! You need to know what's valuable. Is that source code? A complex application environment? Files in S3 storage? User generated files stored locally on a server? Data in a database?

2. Figure out where it's stored. Not always straightforward, but again necessary. Otherwise you won't know how to back it up, or recover it. Remember the cloud isn't a backup - when someone steals your production S3 keys, absent some careful configuration, they can dump and delete all your bucket contents...

Understand what you need to backup and how you'll recover it. If you are dealing with Windows based environments, consider the practicalities of getting these up again - windows isn't as nice as Linux (which can boot on any hardware within reason, modulo some bootloader settings in a few complex configurations)

3. Figure out how you're going to protect the data you back up. Now you've identified the crown jewels, don't just store them in plaintext and unencrypted on your laptop (!) If you're going to use encrypted backups, understand where the keys are stored and how they're used. More importantly, understand how you'll have the keys to decrypt the backups after a disaster.

4. Figure out how you'll stop a someone who gets in from deleting or accessing your backups. Assume they'll be attacked. Understand what the backup solutions offer in terms of security. Can you use asymmetric crypto so only public keys get stored on your servers? Can you use protection on an s3 storage bucket to prevent deletion, even with the access key? Etc.

5. Test doing backups. Then practice restoring to a non production environment to see what you missed. Document it all!

6. Set up monitoring so you know when something goes wrong with your backups. Gather and expose enough information to yourself that you know what's going on! This could be as simple as an automated email to yourself every morning, telling you the duration of the backup job, and the amount of data written, and amount changed since last time. But you need to monitor this. Don't just go for a failure alarm (have one of those, and make it fail-safe so it alerts if the backup didn't succeed, even if the backup script didn't run at all!) - also notify yourself on success so you are aware of the backups and what goes on.

7. Ensure you have enough copies. 1 isn't enough - what if the payment card on your storage account expires, or you get locked out? Or the company goes under? People normally say 3 backups in 2 locations, at least 1 off-premises.

8. Return to 1, because by now something has changed and you need to add more new things to the backup system.

There's all sorts of other things to consider like data consistency and whether operations on your systems are atomic (what happens if the backup snapshot is taken mid-operation in your app?) that you'll want to think about too.

That's not how modern ransomware works.

It's now common to see data extracted and for the ransom to cover not disclosing your corporate data.

But yes, agree sky backups in terms of restoring operations.

"Test your backups" is very good advice, but it will do almost noting to protect you against ransom attacks.

A ransom attack works because one of the first things attackers do when they gain entry to the system is locate and encrypt the backups.

Having tested backups is great, but it will not protect you from ransom attacks.

If your backup is writeable, it's not a backup, it's a mirror.

Correct. Backups cannot “protect” you. However proper backups can help you recover from the effects of ransomware.

I heard from companies using ZFS, that suddenly see a massive increase in disk space usage... A results from copy-on-write for all the new encrypted files. Then they restored to a snapshot before the encryption started and were able to resume business (after a purge and decoupling of all suspicious machines).

Could it be that simple? Sure snapshots are not backups, but this is not a hardware failure.

In my home situation I do a periodic rsync over ssh to a Pi3 at my parents place. It's super simple but I can just browse the sshfs to check if stuff is there and working. And when I need it, I drive there and get the disk itself. Sure, this is not relevant for a large business, but for us self-hosters it is a nice solution. The backup is manual, by choice, so I never overwrites accidentally.

Fourth, lack of backups may not even be the threat. What if the hackers demand a ransom to prevent them from releasing sensitive data to the public?

Those are much rarer, and in several cases the attackers didn't actually have the data when those threats are made. That aside, what you're talking about is substantively different crime and one that is much easier to prosecute internationally than computer based crime.

Espionage, and blackmail are both crimes that have been around for a very long time and have international treaties around them. That doesn't stop it from happening, but when someone is caught it's much more likely they're going to be extradited and prosecuted than for a crippling attack against a computer system.

Either way, if an attacker is threatening to release your internal confidential data, would you rather have a copy of that data as well or not? Your course of action is also the same in either case, notify the FBI if you're in the US.

Lots of good info here - it's also worth pointing out that if you're compromised, you may not have all the backups you think you do.

A lot of the attackers out there are adding the step of disabling and deleting local snapshot-style backups as part of their attack, because they don't want all their hard work to get thrown out the window with a simple OS-level rollback (side note - if your endpoint security vendor tries to sell you rollback as a ransomware protection feature, run).

For this reason, data backed up to tape or some other physical media that gets removed is much more likely to survive a breach than volume shadow copies and snapshots. Test the hard stuff!

With that point said, wouldn’t immutable snapshots in the cloud like what rsync.net offers be quite valuable in terms of a rollback strategy?

Depressingly few organisations I've worked in or with have a clearly defined set of RPOs and RTOs for their essential systems, and similarly few have regular processes to test their backups and archives to either confirm the process works, or to determine how long a recovery would take.

This stuff is all conceptually very easy to do - but politically extremely difficult to agree on the definitions, and then obtain the resources on the ops side, presumably because it's yet another of those things that fits in the category of being hidden and non-urgent, and therefore a low priority, right up until the moment it isn't.

If your disaster recovery process isn't tested, you actually don't have any disaster recovery. It's not only about 'how long it takes' it's also about whether or not it works at all. Can you rebuild from scratch? What happens if your entire infrastructure goes down at the same time? What happens if a datacenter you rely on just disappears? What happens if you lose access to your systems? Can you lose access to your systems? IMHO one of the only silver lining of these attacks is that organizations are starting to ask these questions more often.

There's been a lot of good advice here about backups and disaster recovery.

But there's also a lot of other stuff to consider:

Compartmentalization. Finance and Engineering and Sales only need to interact in limited ways. How about some firewalls between them, limiting types of access?

Location isolation. Why does something that happens in Peoria affect Tuscaloosa? Once a ransomware gang breaches a perimeter, why is it allowed countrywide (or worldwide) access to a company?

Monitoring. Aren't there tools that can alert on various anomalous patterns? All of a sudden, gigabytes of data start being exfiltrated? All of a sudden, processes fire up on a multitude of servers? Monitoring these things is hard to do at scale, but surely possible?

Microsoft. In 2002, Bill Gates "Finally Discovers Security". How much longer will Microsoft be given a free pass? How many more "critical" vulnerabilities will their software have? https://www.wired.com/2002/01/gates-finally-discovers-securi...

I could go on and on. But why should I? Why can't MBA-type CEOs take IT seriously? Why can't they hire competent people and fund them and listen to them?

> … "and listen to them?"

That's the part I've always had trouble gettin' out of most "management" types. They hire you for your expertise, and then undermine it at every opportunity to "save money" or to exert their "authoritah".

There's significant misalignment of incentives at play here (irrespective of company, generally).

Many "management types" are measured solely and almost entirely by the bottom line - the share price in a traded company, or profits in a private company. Share price is based usually on profits. Usually they're delivering reporting to their boss or the board quarterly. Every dime they save this quarter is "profit" in their eyes.

Obviously there's a line here - you don't want to save the dime that causes the factory to burst into flames and result in 4 months of production downtime. But if that dime can be saved this quarter when times are tough, you'll be seen as a well-performing hero, and next quarter you'll spend a few dimes getting the sprinkler system inspected... Until you need the boost to profits next quarter too(!)

In management and generally in business it's hard to go backwards. Nobody wants to increase their spend on IT unless they are making more money. If they could sign a new deal this quarter, they'll probably give you a decent percentage of that deal as IT budget to get the deal signed (factoring in the risk of the customer not signing).

But in an environment focused almost entirely on pursuit of unrelenting growth of profit or revenue or share price, security simply won't become a priority until you can convince manager-types that the issue is commercial and measurable and that it impacts the bottom line. Even if the issue is unexpected costs of recovery from a breach, the first questions they'll ask are "How likely is it to happen? What will it cost? Can we insure against it? What do our competitors do?" - it's not about preventing the breach, it's about defraying the potential loss of profits without compromising on growth or profits when it isn't a problem.

Hence you'll see people hired to fill roles (new or existing) then being hamstrung by a total and outright refusal to act, because it isn't a commercial problem. Skilled tech leaders are good at turning the problems into language leaders can understand, but there is a limit and a line, and the solution in my view is clear regulation the leaders can "get", involving individual penalties and obligations of competency - even if your core business isn't safety, you have safety obligations as a business to your employees, contractors and the public. It's expected and required that you become suitably competent to do this, or get the expertise to do so, but with you still liable. We need the same for security (in my view).

It's not all bad news - I'm a "management type", but from an almost entirely technical background. Bean counting isn't my style... Maybe we can infiltrate more organisations and bring some basic engineering understanding to decision making? It's frustrating to see most hide behind MBA-waffle though rather than try to actually do real things that make a difference.

> I'm a "management type", but from an almost entirely technical background.

These are the management folks I've historically had the most luck working with. Almost always easy to work out a system of security and backup that actually works (while not wasting massive money to get the job done) with these types of people. I truly wish more "management types" actually had enough technical background to make those wise decisions about the business they're supposed to care about so much. Sadly, it's like "common sense". It's just not all that common.

One of the common complaints I hear from MBA-type CEOs is they don't understand what to look for in a security person. This means they often end up with a similar MBA-style smooth-talker who says they're good at security, and talks the talk.

Assuming you do get some capable security people in, they're part of a "cost centre" - most organisations still see IT as a cost to the business they'd love to eradicate, rather than as a key enabler that allows the organisation to exist. I had hoped covid would cause a shift in mindset as companies realise the enabling effect their IT teams had, but old habits die hard, and it looks good to recharge IT to lower your perceived overheads of doing business by billing other departments internally for IT. That leads to cost cutting and the other issues you pointed out.

Even then, on your final point about listening to them, I share your frustration. Again the common complaint I get is that the security people don't speak the same language, so neither understands the other, and the conversation ends. The security team expect the suits to know why it's bad that the office printer is 15 years old; the suit feels that's prudent cost cutting and assumes it must be fine because it came from a reputable brand.

Ultimately security people need to better communicate to stakeholders that the starting point is for everything to be insecure, and that security is needed to make it secure. And left untouched, it will eventually end up insecure again, through not being patched. Unfortunately this message is just perceived to be self serving, as it's exactly the same message every other department is giving - "our team is really important, give us more money to...."

Some other thoughts in relation to your points:

- the continued insistence on flat network structures with file shares and similar is a huge issue. Same for the security posture of a Windows server in a corporate environment - it's almost entirely based around the idea of a trusted LAN. That's an outdated set of assumptions, but is very often how malware spreads. There's zero reason for workstation to workstation traffic originating from any part of the organisation, irrespective of protocol. Give Devs a separate environment without restrictions, and let IT use a secured jump environment to do their remote connections. Preventing end user devices talking to each other at all would be a good first step.

- Next up would be getting rid of large network shares that half the organisation has read+write access to. Something HTTPS based, with proper logging and 2FA would be a better starting point. Rate limit requests and monitor the logs on the rate limiter. Convince Microsoft somehow to move AD towards a zero trust architecture and run it over HTTPS like a modern service, rather than legacy protocols, or preferably move to something that doesn't require multiple gigabytes of other likely vulnerable services running (DNS, print spooler, file shares, etc) just to give you AAA.

- Security isn't something anyone wants to pay for until it's too late. Businesses often see cyber as another risk on the risk register, and they try to treat the risk through insurance. In the longer term this won't work, because it is becoming a near certainty that the average organisation will be compromised. Insurers don't like to cover for certainties(!) If businesses just see cyber as a financial risk that happens once in a blue moon, expect them to extrapolate the costs per breach and set your budget based on the cost of a breach split across 5 or 10 years. Defenders' dilemma.

- Snake oil security sales pitches very effectively target the MBA suits directly and sell them over hyped claims. You'll then end up pressured to use your finite security budget on their ineffective snake oil, which doesn't actually achieve anything much (and likely slows down systems). This leaves you without budget to develop internal bespoke tools for network monitoring. It's always entertaining to see how many companies can tell if their users iPhones were affected by (for example) NSO Group - can they actually check DNS logs for presence of IOC domain resolution, or do they lack even that level of visibility? But the basics aren't exciting, and the big vendors send well-heeled sales people in with dark backgrounded slide decks to inspire MBA-laden confidence in their snake oil.

"A backup not restored is a backup not made." Should be on the wall in every IT department. Together with "Snapshots are NOT backups".

My very first mentor when I started my IT career 30 years ago told me “your job is to make sure the backups are working. Everything else is icing on the cake.” Still true.

But one caveat: your back ups can get screwed up if you back up data that has already been encrypted by ransomware. One easy way to defend against this is to have a tier 2 back up with a time delay, eg it backs up the backup files from a week ago.

Tear 1 backup is just normal back up. You test it frequently of course to make sure that you can restore it but you don’t have to do any extraordinary work to detect ransomware in tier 1.

Tier 2 backs up the back up, but only after a certain number of days have passed. That number of days is the window that you have to detect that ransomware has infected you. If you ever find a ransomware infection, you isolate and turn off tier 2 immediately to preserve a known clean state, and once you have everything rebuilt clean and patched, you restore from tier 2.

You use tier 1 for restores in non ransomware situations because it’s necessarily more up-to-date.

A huge step forward was using zfs with snapshots for my system and backup devices... zfs local fs might be still a bit dangerous, because theoretically ransomware could delete snapshots, but if you use a server / share that underlies zfs, you are pretty save.

There are some decent guides for Arch:

Official: https://wiki.archlinux.org/title/Install_Arch_Linux_on_ZFS

Encrypted: https://forum.level1techs.com/t/manjaro-root-on-zfs-with-enc...

Plainly the best defense against ransomware. Had two attacks that probably could not have been avoided. Some employee will at some point open that funny looking mail. Scammers know our names and mail signatures (the colorful text, not the crypt sig for mails). Maybe they get some detail wrong, like department or a contact address, but otherwise the mails looked genuine.

A sensible backup solution made us come back within 30 minutes of severe attacks. The police tried to negotiate with the attackers but sadly couldn't get them. But at least they didn't get any money.

We have a bought backup solution that is quite expensive that does snapshots every 15 minutes I believe. Worth it.

I always thought paying the Ransom was about the customer, PII, financial, and HR records/systems that are breached, less about getting the business back online. What a sorry state of affairs that it's both.

What if the ransomware has been hiding for a few months before it activated? You'd either have overwritten the last clean backup by then or you'd restore the ransomware too. Or am I missing something?

The article addresses this:

“That is still somewhat rare,” Wosar said. “It does happen but it’s more the exception than the rule.

It's something that very high-value targets will eventually have to worry about. But figuring out how to corrupt the backups without disrupting the production systems (and being discovered) is not likely to ever be a fully automatic process for the randomware goons. They'll have to invest serious time in understanding how the victim's systems work. For high-value targets, that will happen.

I just walked through a telecom colo last week and saw a weird new box in somebody else's cage I'd never seen before. Later, I did a web search for the brand name. Apparently this company sells a network filer that keeps local daily/monthly/annual backups and doesn't let you delete them. Probably a good idea for an organization that just wants to throw money at the problem. Of course you can build this yourself with "btrfs sub snap" except for the part about not letting yourself (even as root) delete them.

Ah, I interpreted "corrupt backups aswell" as encrypting them aswell.

Don't test your backups. Test your recovery.

This may seem like the same thing, but one of the reasons why those ransomware gangs are so successful is because paying the ransom promises to get you back into business now. You probably still want to do a full remediation afterwards, but paying means that you can do that while your business is running.

To make it unattractive to pay the ransom, you don't have to only be able to recover, you have to be able to do so quickly.

That's what the article is about, you should check it out!

You're completely right, sorry about that. I've read it now.

To add, it's not just about the ability to get the data back from the backup. The effort required to re-setup every client machine, and possibly replace all of them on short notice, is a significant obstacle.

It's also really hard to exercise without significant expenditure.

I see the point of testing backups and it's good advice. But the real problem is that preparing for an emergency that never comes is very expensive in both productivity and costs. You can be 99.99% ready for a ransomware attack that would cost a lot but it would hit your organization's productivity hard. Yet there's a large possibility that the preparedness will go to waste because it will never be used.

We need to find solutions that are very inexpensive but effective. I can only think of it being a cloud based solution where it will be trivial to reset and start over. I suspect that disaster recovery as a service(RaaS) should be part of any cloud based service. I get that some companies are so large and complicated that it would be impossible to provide that service for them but there are plenty of small to mid-size companies for which the service would be possible. So it's possible to offer it as part of any service package.

RaaS has the great advantage that the costs can be shared among many companies so no one company needs to deal with the large costs that may never be used. It will also solve the problem associated with the constant up keep. It's hard to prepare for a disaster but it's even harder to keep it going for as long as it's needed. In addition, the increase in complexity for bad actors would decrease the incentives for ransomware in general.

This won't happen now but given time it would largely fix the current situation for all.

All a bit bs much? You can get a Volvo or you can get a Suzuki swift. A swift is cheaper, and most of the time it runs fine. The day your car gets pancaked you’re gonna know the difference if that day comes. But don’t for a moment try and request anyone to see the plight of the dead in the swift, because they literally reaped what they sowed.

Backing up your system for disaster recovery is something that has been known as expected for ages. If you’re too stingy or dumb to realize and do such processes (especially years after the first ransomware attacks) your business definitely deserves to die a horrible death. Only hope is innocent people don’t get affected by it. But they probably will.

Human nature is what it is. We know the solution to disaster recovery is a backup plan but yet Ransomware continues to happen and getting worse. Your solution of doing the same is not a fix. What can you offer that helps rather than to continue to offer the same solution that does not solve the overall problem? I'm sure network admins have talked themselves blue telling management what needs to happen to protect their IT assets but Ransomware continues to be a profitable business.

> Yet there's a large possibility that the preparedness will go to waste because it will never be used.

Human mistakes, hardware errors, and the increase of ransomware. That, and many companies relies on their IT infrastructure for their existence. I think backups are well worth it.

> Yet there's a large possibility that the preparedness will go to waste because it will never be used.

Is this something companies want to gamble their future on?

Is it something that is easy to do by ignoring the possibility of it happening - and therefore is the default?

Sure seems like it!

Even with successful tests and recovery, you may still have to pay the ransom, because hackers often threaten to publish or sell data.

If you have a cloud, it's not "test your backup".

It's "have an automated restore" ready. Maybe on a different cloud. With clouds, you can test standing up your entire infrastructure/system stack across even a couple hundred machines or more in automated fashion, and then tear the whole thing down.

What sucks for HIPAA is that you can get fined for the breach itself, regardless of your backup management.

Not really a problem with HIPPA is it?

Seems appropriate.

I think we need to treat such threats as a rear disease which could happen whenever you ready or not. We need to check for it but be prepared to have it anyway.

Industry should start thinking more about insurance, negotiation and investigation services.

Is this the death of ops?

Common ops mantra used to be “A backup does not exist until you’ve restored it”. Having a blob of data means nothing- being able to continually restore it and integrity check it is everything.

This is why i have my own tape backup system in addition to cloud (someone elses computer) backup. Ready to restore everything quickly.

By the way, that includes the "ransom" paid to the good guys who provide hard drive recovery services.

It runs into to the hundreds of dollars.

Thousands if you have to go to someone like DriveSavers because you really messed something up.

Paying the ransom and then plugging the hole is probably far cheaper than spinning up and maintaining a hardcore backup-perfect stack.

We recommend creating an entire playbook so you know what you need to do to recover from a ransomware attack.

Unfortunately this doesn’t cover the case where the ransomware group is threatening to leak your data unless they pay :/

It’s also good to consider—if a ransomware allows attackers to access your network—whether there’s anything stopping them from accessing (and encrypting/overwriting/deleting) your backups.

I believe we are unfortunately at the “have backups” stage still civilizationally.

Hot segregated onsite backup with a different authentication mechanism

I didn't realise backups were normally kept off site.

I love disaster recovery disaster stories. When all the good intentions go wrong.

Two from the '89 quake.

1) The smart thinking operator who equipped their machines with UPSes to keep them up through a power failure. The quake cut the power, the UPSes kicked in, the systems stayed up, the drives kept spinning, the quake kept going and buried the drive heads in to the platters.

2) System crashed, backup tapes were kept in the basement. Quake set off the sprinkler system in the storage room, soaking the tapes with rusty water.

"I'm sorry, did I break your concentration? I didn't mean to do that. Please, continue, you were saying something about best intentions." - Jules

You mean, my cloud provider isn't doing it for me?

It's so easy to backup (and test!) these days. There's no excuse other than incompetence.

Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact