
“Catastrophic” hack wipes out email provider’s entire infrastructure - ToFab123
https://arstechnica.com/information-technology/2019/02/catastrophic-hack-on-email-provider-destroys-almost-two-decades-of-data/
======
EvanAnderson
I will continue to stand on my soapbox and proclaim that backup _has_ to
include an offline component (preferably one that's verified independently).
An attacker has to be willing to bring "kinetic" means to bear if you have
offline backup media.

~~~
arethuza
Also, a backup that you don't fully restore somewhere to check it actually
_fully_ works isn't really much of a backup at all.

~~~
yellowapple
I feel like these are self-contradictory unless you're really going to go
through the effort of periodically restoring each of your offline tapes to a
server (which I'm sure some organizations end up doing, to be fair, but that
doesn't make it any less of a pain in the ass).

~~~
AznHisoka
I also hear people say that all the time. Usually they're managing measly
small 1-100 GB databases, not multiple ones > 1 TB in size. Restoring them in
a regular basis is _easy_ in that case.

But I'd love to hear how people practice restoring _all_ their backups on a
regular basis when it requires having an enormous amount of space.
Practically, I just restore a few backups every month, and if it works, I just
assume the others work as well (which I know is a false assumption, but what
are you going to do?)

~~~
Dylan16807
I'm confused. Why does the number of restores affect how much space you need?

~~~
AznHisoka
You're right, we could restore each, one by one.

However, we have around 30 different databases, each of which take more than a
day to restore. We backup databases every X days, upload them to S3, and
delete the previous ones, keeping only the last 3 or so.

So if it takes us 30 days to restore all the databases, those backups might
already have been gone in S3 as part of the cleanup operation anyway.

It just gets complicated, and non-trivial when you have multiple databases,
each TB's large.

~~~
cheald
If you already have the DBs in S3, it seems obvious that you could bring up
EC2 instance(s) with gigantic EBS drives attached, restore a DB to it, and run
verification measures on it. Parallelize as your budget allows.

~~~
AstralStorm
EC2 is not an airgapped machine. That's known as cloud backup not offline
backup.

The solution is to have a tape or disk library and have the other machine take
over for restore. If you have TBs of data you can afford a library and another
PC to run the recovery test.

~~~
Dylan16807
> EC2 is not an airgapped machine. That's known as cloud backup not offline
> backup.

Nobody said it was. This subthread is about the importance of testing backups
_in general_.

> If you have TBs of data you can afford a library and another PC to run the
> recovery test.

They already explained how that's not enough. Yes, they can add more, but
you're missing some info there.

------
eu
From their FAQ page[0]:

> What is your backup strategy / data retention policy?

> VFEmail feels it's important to provide a long-term, stable, environment for
> our users. In that effort, we perform nightly backups to an offsite host
> from all on-site and off-site mail storage locations. This backup runs at
> 12am CST (-0600) and contains all user data. > 3rd party storage of user
> data is generally not wanted by privacy-conscious users. If you fall into
> that category, you will want to use POP3 and download your mail daily. Our
> backup is on a daily/weekly rotation, initiated by a snapshot. If you do
> recieve mail between your last POP and the snapshot at 12am, it will exist
> on backup for a week - unless it's on Saturday night, then it's a year. You
> should set your POP program to download every 5-10 minutes in order to avoid
> having your mail caught on backup.

[0] [https://www.vfemail.net/faq.php](https://www.vfemail.net/faq.php)

~~~
meowface
Good chance the SSH private key for that offsite host was on one of the
servers the attacker compromised.

------
peterwwillis
You should have an offsite backup. If for some reason you don't, you have
options to secure a backup host.

You can harden a backup host to prevent most every conceivable method to
destroy the backup. You can lock down network access to prevent any connection
but one to the file retrieving service itself, and prevent operations such as
deletes or overwrites. You can use an append-only filesystem. You can use a
SAN or NAS to allow a backup host access to one particular mount to dump
files, and a different host can then move files off that mount onto a
different mount that the backup host has no access to. Or you could even have
"rotating backup hosts" that went effectively offline for a week at a time
each, giving you several weeks worth of offline backups in case one backup
server got compromised.

But probably you should just do offsite backups.

~~~
pavon
A nit, but it is important to differentiate between offsite and offline
backups. VFEmail did have offsite backups, but they were online, and the
attacker reformatted those disks as well.

offsite: prevents against physical damage to a single site.

offline: prevents against network-based attacks.

You should have offline backups and offsite backups, which can be the same, or
separate. Eg backup to disk/tape, and send it somewhere for storage, or backup
to disk/tape kept offline locally, combined with online backups to a server
offsite.

~~~
peterwwillis
Good point! I used to assume offsite implied offline, since I'm used to
offsite tape backups.

------
ChuckMcM
"A tale of woe and destruction where our hero learns the value of offline tape
backups..."

Ok, so that is pretty harsh but its part of any disaster recovery scenario
where the datacenter explodes (or burns to the ground, or gets flooded, etc).
And while LTO tapes are slow and nobody likes maintaining those cranky tape
library machines, they will save your bacon when the unthinkable happens. Of
course if it is a really nefarious kind of thing you might find that your
storage unit where the tapes were stored is also torched. No good financial
reason to protect against that, its an unlikely scenario.

That said, when you "catch someone reformatting drives" you turn off the
switches that give them access to your machines. Or at least you start logging
into machines and halting them until you can be physically present to preserve
them.

~~~
herpderperator
Turning off a switch isn't going to kill any existing sessions or dd commands
that were in progress - at least not until the SSH session times out and hangs
up the commands it was running.

~~~
notacoward
I know some people might only ever think about "switch" as meaning a piece of
network hardware, but there are other kinds. Remote power off is a thing, and
damn well _will_ kill any existing sessions etc.

~~~
ChuckMcM
Exactly. When you've got active destruction going on, power down everything
first.

------
synaesthesisx
Just speculating here, but perhaps there was a party _very_ interested in
destroying data on their infrastructure. Maybe to erase evidence of
wrongdoing, or something else entirely.

~~~
jawns
That appears to be the case. The post says that hacking into all of these
systems required access to multiple passwords, and because of the speed with
which the systems were destroyed, I would guess the attacker had them all in
hand to begin with.

------
newnewpdro
There's always a risk of disgruntled employees doing things like this.

I'm not 100% certain offline backups would prevent going out of business
either. A mass exodus of customers from the extensive down time plus
inevitable loss of recent data is catastrophic for a mature business in more
maintenance than growth mode.

At one of my early sysadmin jobs decades ago, a few weeks into the position an
ex-employee broke into the network and wiped the partition tables and whatever
else was in the first megabyte of the disks - usually some of the root fs, on
all the ~1000 shared hosting servers.

That was a very long week of headaches, and the business never fully
recovered. There were offline tape backups, but they were of filesystems not
disk images so we didn't have a backup of the MBR/partition tables. There also
wasn't a high throughput restore mechanism. It was a minimal single
enterprise-class tape drive with a small robot changer holding 10 tapes at a
time. Sufficient for asynchronous backups and selective restores over NFS but
not at all sufficient for restoring the entire datacenter from tapes in a
timely fashion.

~~~
AstralStorm
Restore of even huge offline backup should take no longer than 24 to 48
hours... Only situation that can take longer is a police seizure, since they
will seize offline backups.

Worst case you buy a remote host for a temporary runtime and if needed contact
your DNS or other host and CA face to face to reset or sign new keys. Given
airplanes, this is relatively fast but tedious.

It is quite cheap... As long as you have the data backed up offline. They
didn't.

------
alekseynyc
The service owner is sloppy enough with his online presence that his personal
website advertised in the Twitter profile is pointing to an SEO landing page.
Would make me think twice before entrusting any of my data to his company.

------
shabble
I've occasionally wondered what the best way to secure your backup host from
the individual clients, and possibly clients from a compromised backup host
would be.

The most promising option I've come across so far is Borg running in append-
only mode[1] with a client-push type model.

I imagine it wouldn't help in this case, if the attacker has the creds and
access to run `dd' on the backup machine directly.

Anyone had any good/bad experiences with Borg append-only (or have other
suggestions?)

[1]
[https://borgbackup.readthedocs.io/en/stable/usage/notes.html...](https://borgbackup.readthedocs.io/en/stable/usage/notes.html#append-
only-mode)

~~~
megous
What about backup server being a client? That should do it.

~~~
shabble
as in, the backup host connects to the target machines and pulls them back to
itself?

Biggest concern I have with that is that to get around file permissions, the
backup user needs to be effectively root.

It's much more of a problem is the backup server gets compromised and now they
also have root level creds to every target machine.

I'm not sure if something like apparmor or selinux could allow for some sort
of 'read-only root' type user, and if that would actually be safe in teh
circumstances.

~~~
ars
It only needs root _read_ not write.

There are a number of ways to do this, including a backup client that only
sends data but doesn't write, a read only bind remount, or an lvm read only
snapshot.

The last one is best I think. Make the new lv device readable to the backup
user and have it mount it, then copy the data.

------
Felz
Wow, as someone working on an email service provider startup… this is
basically my worst nightmare.

On the upside, the tooling around infrastructure has improved so much since
vfemail launched in 2001 that there's a lot more I can do. With AWS, I can do
automatic database backups, S3 bucket delete versioning, IAM auditing,
whitelist firewalls, etc.

~~~
wcoenen
> With AWS, I can do...

Can you remotely delete all the data and backups? (It's an honest question;
I'm not intimately familiar with AWS but from glancing at the documentation it
seems that even glacier archives can be deleted without delay)

Because if you can, then so can an attacker who has sufficiently compromised
your business.

 _edit_ : I see some stuff about "vault locking" which might do the trick. Are
you using that to protect your data?

~~~
Felz
Yep, AWS doesn't seem to have any time-lock delete mechanisms as far as I
know, which is a shame. I still have to research this, but as far as I can
tell the best practice seems to be using MFA delete:

[https://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.h...](https://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html#MultiFactorAuthenticationDelete)

And then keeping root user credentials on a cold storage laptop.

Vault lock seems to be for Glacier, but it'd still be worth looking into for
cold storage backups. More layers of defense are always good.

~~~
xfitm3
You can also use S3 legal hold in compliance mode, although, I'm not sure how
you would eventually delete it.

"S3 Object Lock can be configured in one of two modes. When deployed in
Governance mode, AWS accounts with specific IAM permissions are able to remove
object locks from objects. If you require stronger immutability to comply with
regulations, you can use Compliance Mode. In Compliance Mode, the protection
cannot be removed by any user, including the root account."

[https://aws.amazon.com/about-aws/whats-
new/2018/11/s3-object...](https://aws.amazon.com/about-aws/whats-
new/2018/11/s3-object-lock/)

------
heyjudy
I've run efficient email hosting before. They didn't have backups, they had
snapshots. Anything that isn't physically offline with vaulted, off-site
provider isn't a backup.

------
arcticwombat
So many duplicated posts about this..

~~~
dang
I haven't seen one that got significant attention on HN yet, which is the
standard we apply for dupes
([https://news.ycombinator.com/newsfaq.html](https://news.ycombinator.com/newsfaq.html)).
Did we miss it?

~~~
arcticwombat
I've seen at least 4 or 5 posts about the VFEMail thing just today so I dinged
this one when I saw it.

No idea what's considered significant.

~~~
dang
Significant is just a question of whether it got many points and/or comments.

------
GrumpyNl
If you are a target, you will be hacked.

~~~
notacoward
...and sooner or later you will be a target.

