
Speeding up Linux disk encryption - jgrahamc
https://blog.cloudflare.com/speeding-up-linux-disk-encryption/
======
nullc
> otherwise, we just forward the encryption request to the slower, generic
> C-based xts(ecb(aes-generic)) implementation

This seems like at least something of a bad idea, because that implementation
(if my search-fu is correct) is:

[https://github.com/torvalds/linux/blob/master/crypto/aes_gen...](https://github.com/torvalds/linux/blob/master/crypto/aes_generic.c)

Which is obviously not constant time, and will leak information through
cache/timing sidechannels.

AES lends itself to a table based implementation which is simple, fairly fast,
and-- unfortunately-- not secure if sidechannels matter. Fortunately, AES-NI
eliminated most of the motivation for using such implementations on a vast
collection of popular desktop hardware which has had AES-NI for quite a few
years now.

For the sake of also being constructive, here is a constant time
implementation in naive C for both AES encryption and decryption (the latter
being somewhat hard to find, because stream modes only use the former):

[https://github.com/bitcoin-core/ctaes](https://github.com/bitcoin-core/ctaes)

(sadly, being single-block-at-a-time and constant time without hardware
acceleration has a significant performance cost! ... better could be done for
XTS mode, as the above algorithm could run SIMD using SSE2-- it isn't
implemented in that implementation because the intended use was CBC mode which
can't be parallelized like that)

Can't the kernel aes-ni just be setup to save the fpu registers itself on the
stack, if necessary?

~~~
harikb
Curious why CF needs to worry about side-channel attacks when all software run
on those machines belong to / written by them. They do have a “workers”
product with 3rd party code but they can easily keep storage servers out of
that pool. Typically storage encryption is all about what happens when a
machine is physically stolen, hard disk discarded on failure or other such
actions beyond network security. Please correct me if I am wrong.

~~~
dependenttypes
You could measure the timing over the internet.

~~~
starfallg
Users aren't interacting directly with the storage layer so any timing attack
via the network is going to be once or twice removed. Can attackers really
gleam useful and mount a successful attack in this type of setup?

~~~
oconnor663
This is almost certainly true in practice, but it's a big risk, compared to
the risk tolerance that we usually engineer into crypto. For comparison,
suppose someone was suggesting: "Why not use 80-bit keys instead of 128-bit
keys? No one in the real world can brute force 80 bits, and we'll save on
storage." Yes, that's true, but it's taking a relatively large risk for
relatively little benefit. Hardware will get faster over time, and an
extremely high value target might justify an extremely expensive attack, etc
etc. We prefer 128-bit keys because then we don't even have to consider those
questions. I think timing attacks are similar: Yes they're very difficult in
practice, but they raise questions that we'd rather not have to think about.
(And which, realistically, no one will ever revisit in the future, as hardware
evolves and new APIs are exposed.)

~~~
necovek
I always imagined key size to relate to computation cost and not storage —
what algorithm are you referring to?

~~~
joshuaissac
The point is that you need more bits to store a longer key, but the storage
space saved is very little in this case compared to how much easier it is to
crack.

~~~
necovek
Sure, but a difference of 0.1% of storage to go from 80-bit key to 1024-bit
key for 1 Megabit of data (that's 118 bytes out of 128KiB), or 0,00001% for
1Gbit (128MiB) seems not worth raising as a concern.

(I've chosen example numbers just to make calculation trivial)

So I can't ever imagine storage size being the driver for choosing the key
size, though from the other threads, it seems that there are algorithms that
do have a storage overhead that might be related to key sizes.

------
convivialdingo
Did this commercially for 15 years. Always the same problems.

We ended up with several solutions- but all of them generally work the same
conceptually.

First off, separation of I/O layers. System calls into the FS stack should be
reading and writing only to memory cache.

Middle layer to schedule, synchronize and prioritize process IO. This layer
fills the file system caché with cleartext and schedules writes back to disk
using queues or journals.

You also need a way to convert data without downtime. A simple block or file
kernel thread to lock, encrypt, mark and writeback works well.

Another beneficial technique is to increase blocksizes on disk. User Processes
usually work in 4K blocks, but writing back blocks at small sizes is
expensive. Better to schedule those writebacks later at 64k blocks so that
hopefully the application is done with that particular stretch of data.

Anyway, my 2 pennies.

------
tyingq
The blog post reads like this all happened recently, but their linked post to
the dm-crypt mailing list is from September 2017[1]. I'm curious if they've
interacted with the dm-crypt people more recently.

[1][https://www.spinics.net/lists/dm-
crypt/msg07516.html](https://www.spinics.net/lists/dm-crypt/msg07516.html)

~~~
mattst88
Yeah, the time frame is somewhat unclear. The patch they link to in their tree
is dated December 2019 however [1], so I assume this blog post is about stuff
they've completed recently.

[1]
[https://github.com/cloudflare/linux/blob/master/patches/0023...](https://github.com/cloudflare/linux/blob/master/patches/0023-Add-
DM_CRYPT_FORCE_INLINE-flag-to-dm-crypt-target.patch)

------
unixhero
Did they reach out to the Linux kernel mailing list? Or just the dm-crypt
team, I found the answer they received rather arrogant and useless to be
honest.

~~~
jlgaddis
I'm a huge "fan" of F/OSS but, unfortunately, such condescending answers are
all too common in this "community".

------
beagle3
Ages ago I benchmarked truecrypt overhead on my machine at the time (2006, I
think?) and it was about 3%; I assumed that's a reasonable and still
applicable number, also do dm-crypt and modern VeraCrypt. Guess I was get
gradually more wrong through those years, according to the git archeology....

~~~
singlow
Also, disk speed in 2006 was probably much slower. Disks have gotten faster at
a greater pace than processors during the last 10 years.

------
ggregoire
> Many companies, however, don't encrypt their disks, because they fear the
> potential performance penalty caused by encryption overhead.

There is also the overhead of automatically unblocking a remote server during
an unattended reboot. Reading the encryption password on a USB stick or
fetching it through internet is a no from me. I think there are solutions
about storing the password in RAM or in an unencrypted partition, but that's
the overhead I'm talking about. I wonder how companies deal with that.

~~~
r1ch
Debian offers a dropbear shell in initramfs which you can use to SSH in and
provide keys. I only have a handful of servers so currently I do this manually
on a reboot but it would not be difficult to automate using for example SSH
keys unlocking key material. The downside of this is your initramfs and kernel
are on an unencrypted disk so a physical attacker could feasibly backdoor
them. I'm sure there's some secure boot UEFI / TPM solution here.

~~~
zzzcpan
You are missing an integrity checking step. You can do it by sending some sort
of ephemeral binary over ssh that does integrity checking and requests a key
with the resulting hash of the check to proceed, don't blindly trust sshd
running from an unencrypted partition. But still at the end of the day it's
all about obscurity and obfuscation, you can't make it provably secure. You
can go far and make that binary one time randomly generated, obfuscated, bound
its running time, you can use a TPM and what not, but it probably won't matter
for pretty much any realistic threat model.

------
est31
Wow those speed improvements are very neat. And an awesome blog post
accompanying them. Prior to reading this, I've considered Linux disk
encryption adding negligible latency because no HDDs/SSD can be fast enough
for a CPU equipped with AES-NI, but that view has changed. Two questions: 1.
are there any efforts to upstream them? 2. Invoking non-hw-accelerated AES
decryption routines sounds quite expensive. Has it been tried out to save the
FPU registers only if there is the need for decryption?

~~~
andyjpb
The existing Linux system is useful for hardware that does less than 200MB/s,
so you should be fine with HDDs.

Cloudflare is optimising for SSDs.

They don't talk about latency: all their crypto benchmarks measure throughput.
Near the end they hint at response time for their overall cache system but
there's no detailed discussion of latency issues.

The takeaway for me is that I'm OK with what's currently in Linux for the HDDs
I use for my backups but I'd probably lose out if I encrypted my main SSD with
LUKS.

At the end of the article they say that they're not going to upstream the
patches as they are because they've only tested them with this one workload.

I'd also be interested to see a benchmark comparing SW AES with FPU-saving +
HW AES. Unfortunately their post does not include stats for how often their
proxy falls into the HW or SW implementations. Whatever those numbers are, I'd
expect FPU-saving + HW AES to be somewhere in the middle.

~~~
necovek
You can easily achieve more than 200 MB/s with HDDs in RAID, but the
bottleneck might be altogether different — I think it is an important
distinction.

While I applaud their wins, they have basically profiled the wrong thing,
established the full overhead when disk speed/latency are basically removed,
and only gone to actual production workload at the very end — in the worst
case, their improvements could have been for naught, but they were "lucky"
(not really, they were smart, but profiles did not really guide them — they
just optimised the heck out of the system, but they could have been unlucky
and not gain anything if the bottleneck was in a particular place unaffected
by their code analysis).

It's great that Cloudflare allows this kind of engineering to happen
(investigative, explorative, and not necessarily RoI focused), but it's rare
to find a company that does.

------
vletal
Has anyone already tried to compile the kernel with these patches for their
desktop/laptop with encrypted drive?
[https://github.com/cloudflare/linux/tree/master/patches](https://github.com/cloudflare/linux/tree/master/patches)

~~~
asymptotically2
Yes, I'm running them on kernel 5.5.13 (which came out today)

~~~
3r8riacz
Wow, I'd be so happy if you could share the steps you took to achieve this.
Let's say I have a Debian machine, how could I try it out?

~~~
asymptotically2
I use Gentoo Linux so it was as easy as putting the patches in a directory,
and then rebuilding the kernel with the option to enable the synchronous
cipher.

For Debian it's almost as easy: [https://passthroughpo.st/patch-kernel-
debian/](https://passthroughpo.st/patch-kernel-debian/)

------
pmorici
Interesting. One other thing they don't mention that I found interesting when
doing my own digging on dmcrypt speeds a while back is that the 'cryptosetup
benchmark' command is only showing the single core performance of each of
those encryption algorithms. You can verify this by watching the processor
load as it performs the benchmark. That lead me to find that if you have a
Linux software RAID you can get much better performance by having 1 dmcrpyt
volume per disk and then software RAID the dm devices instead of putting a
single dmcrypt on top of the software RAID. Curious if that would stack
performance wise with what they found here or if that just happened to help
with the queuing issue they identified.

~~~
mercora
i remember somewhat recently efforts to parallelize the work of dm-crypt where
applicable had been merged. However, i guess having multiple separate
encryption parameters and states (read: disks) leaves more opportunity for
parallelization of the work especially if disk access patterns are not spread
wide enough.

------
thereyougo
>Being desperate we decided to seek support from the Internet and posted our
findings to the dm-crypt mailing list

When I see a company such as CloudFlare being so transparent about their
difficulties, and trying to find an answer using their community members, it
makes me love them even more.

No ego, pure Professionalism

~~~
sneak
Correspondingly, the response they received reflects just as strongly on the
community itself.

~~~
hyper_reality
Yep, the response they received was incredibly condescending. The follow-up
from Cloudflare remained polite and added a lot more data, and was ignored.

It's a shame because I've seen this condescending attitude quite frequently in
the crypto open source community, and am not really sure how it arises. At
least in this case it seems to have had the good outcome of motivating
Cloudflare to dig in deeper and solve the problem by themselves.

~~~
megous
Full response: [https://www.spinics.net/lists/dm-
crypt/msg07517.html](https://www.spinics.net/lists/dm-crypt/msg07517.html)

From the PoV of the person who responded they didn't provide any relevant
information that would indicate what platform they run, or what speed they
expect, or why they think 800MiB/s seems slow to them. On many platforms this
would be a pretty good result. At first look, it looks like they expected the
speed of unencrypted storage, because that's what they tried to compare
against.

So the response seems reasonable at first glance to me. They got the answer to
their main question. (which they omitted from their blog article)

~~~
gravitas
I disagree with your assessment.

> If the numbers disturb you, then this is from lack of understanding on your
> side.

This is arrogance on the part of the person replying that hand-waved away
their problem "you just don't understand" when in fact, they (Cloudflare)
do/did understand. They then went on to prove that it was due to queuing
within the kernel, not the hardware as commented by this person in their
flippant reply.

~~~
megous
Cloudflare did not understand at the time. Anyway, I'm not questioning that
the reply was not very helpful, I just don't see it as unreasonable. I liked
the technical parts of the CF writeup overall.

~~~
marcinzm
>Cloudflare did not understand at the time

And neither did the people who responded to them it seems.

------
herpderperator
> Unlike file system level encryption it encrypts all data on the disk
> including file metadata and even free space.

Anyone have a source on how full disk aka block-level encryption encrypts free
space? The only way I can imagine this could happen is by overwriting the
entire disk initially with random data, so that you can't distinguish between
encrypted and true "free space", i.e. on a brand new clean disk. Then, when a
file (which, when written, would have been encrypted) is deleted (which by any
conventional meaning of the word 'deleted' means the encrypted data is still
present, but unallocated, thus indistinguishable from the random data in step
1), then gets overwritten again with random data?

I would argue that overwriting an encrypted file with random data isn't really
encrypting free space, but rather just overwriting the data, which already
appeared random/encrypted. It is hardly any different to having a cleartext
disk and overwriting deleted files with zeros, making them indistinguishable
from actual free space.

~~~
koala_man
The point of encrypting free space is just so you can't say how full the drive
is.

This way, an attacker can't focus cracking on the fullest disk, match stolen
backup disks to hosts based on non-sensitive health metrics, etc.

>The only way I can imagine this could happen is by overwriting the entire
disk initially with random data

Traditionally, for speed, you'd write all zeroes to the encrypted volume
(causing the physical volume to appear random), but yes

>Then, when a file (which, when written, would have been encrypted) is deleted

You'd just leave it. Crucially, you don't TRIM it.

>I would argue that overwriting an encrypted file with random data isn't
really encrypting free space

Yup, that's why it's not done

------
floatboth
> Data encryption at rest is a must-have for any modern Internet company

What is it protecting against — data recovery from discarded old disks? Very
stupid criminals breaking into the datacenter, powering servers off and
stealing disks?

A breach in some web app would give the attacker access to a live system that
has the encrypted disks already mounted…

~~~
mercora
being able to purge old disks confidently in a secure manner is a upside huge
enough to make this statement true in my opinion. There have been numerous
incidents even involving companies specializing in securely purging disks. If
your data is encrypted there is basically nothing to do you could even
outright sell those from your DC or something. Just delete the keys/headers
from the disk and you are safe.

Its also not possible to get data injected offline into your filesystem
without having the keys. Without encryption you could just get the disk of the
targeted server running somewhere and set your implants or what you have. When
the server sees the disk back up it looks just like a hiccup or something.

~~~
brobinson
> Its also not possible to get data injected offline into your filesystem
> without having the keys.

This is, in theory, possibly against volumes encrypted using AES XTS (which
seems to the how the majority of FDE systems work) as the ciphertext is indeed
malleable.

~~~
mercora
i am no expert on this but i was thinking it is only possible to inject noise
which is likely corrupting the filesystem in the process. copying/moving valid
blocks should be prevented by XTS as far as i understood (which might not be
that much). I guess using a filesystem with integrity checks helps a bit
although its still not authenticated or something.

~~~
brobinson
There's some more details/links here (I'm also not an expert):
[https://en.wikipedia.org/wiki/Disk_encryption_theory#XTS_wea...](https://en.wikipedia.org/wiki/Disk_encryption_theory#XTS_weaknesses)

------
dependenttypes
> one can only encrypt the whole disk with a single key

You can still use partitions.

> not all cryptographic algorithms can be used as the block layer doesn't have
> a high-level overview of the data anymore

I do not really understand this. Which cryptographic algorithms can't be used?

> Most common algorithms require some sort of block chaining to be secure

Nowadays I would say that from these only CTR is common, which does not
require chaining.

> Application and file system level encryption are usually the preferred
> choice for client systems because of the flexibility

One big issue with "Application and file system level encryption" is that you
often end up leaking metadata (such as the date edited, file name, file size,
etc).

Regardless I think that this is a really nice article. I can't wait to try
their patches on my laptop.

~~~
richardwhiuk
> I do not really understand this. Which cryptographic algorithms can't be
> used?

CBC - which is one the most common stream cipher algorithm.

It's not clear me whether GCM would work or not.

~~~
brandmeyer
GCM requires somewhere to put the nonces and authentication tags. In
principle, you could use a layer of indirection not entirely unlike a page
table to store that information. For example, a 64-bit nonce, 64-bit block
pointer, and 128-bit authentication tag could pack together in a radix tree
for the job, retiring 7 bits of the virtual-to-physical mapping per level for
4 kB blocks.

Of course, the downside is that now the block layer must tackle all of the
write ordering issues that a filesystem does when updating the tree. The block
layer would find itself greenspunning up a filesystem inside itself, even if
it was a filesystem of only one file.

~~~
wahern
The 128-bit tag length, which offers less than 128-bit strength depending on
the nonce size, makes GCM and similar AEAD constructions poorly suited for
archival storage. If you want to store more data without rekeying you need to
reduce the authentication security. GCM makes perfect sense for ephemeral,
message-based network traffic. Traditional, separate, keyed MACs still seem
preferable for archival storage, especially with tree-based modes--native as
with BLAKE3 or KangarooTwelve, or constructed like SHA-3-based ParallelHash.

~~~
brandmeyer
The tag's strength doesn't depend on the nonce size in cases where you can use
sequential nonces. Longer nonce sizes are valuable only when using randomly
allocated nonces and you need to avoid the birthday paradox. 64 bits is
considerably longer than the total write lifetime of modern disks. Even if you
used a nonce per 512-byte block, you'd need well over a yottabyte of writes to
roll through that counter.

The profile that authenticated encryption defends against is an attacker who
is attempting to feed the victim specially crafted bad blocks. 128-bit tags
are good enough that the disk will be completely trashed long before the
victim executes something of the attacker's choosing.

------
gok
That response from the dm-crypt mailing list is unreal.

------
vbezhenar
Offtopic, but why am I getting two scrollbars on this website? This is weird.

~~~
tyingq
There is a scrollable div, the one that leads with:

grep -A 11 'xts(aes)' /proc/crypto

Is that what you mean?

~~~
zzzcpan
I can confirm, they broke scrolling with CSS setting overflow-x in #main-body
which for me also shows two scrollbars.

------
gautamcgoel
Does anyone know what the picture is like on FreeBSD? Is it faster?

------
LinuxBender
Does CloudFlare plan to get their kernel patches merged upstream?

~~~
yalooze
Second to last paragraph:

> We are going to submit this work for inclusion in the main kernel source
> tree, but most likely not in its current form. Although the results look
> encouraging we have to remember that Linux is a highly portable operating
> system: it runs on powerful servers as well as small resource constrained
> IoT devices and on many other CPU architectures as well. The current version
> of the patches just optimises disk encryption for a particular workload on a
> particular architecture, but Linux needs a solution which runs smoothly
> everywhere.

~~~
LinuxBender
I missed that, thankyou.

------
tbrock
Any chance of this patch making it to the mainline kernel?

~~~
saagarjha
Not this one, specifically, but they've mentioned that they're working on
upstreaming some derivative patches.

------
justlexi93
Neat. Poorly optimized queues can have a significant impact on performance,
doubling throughput for disk encryption with some queue tweaks is pretty
significant.

------
thedance
All this seems to me a series of very strong arguments for doing the crypto in
your application.

~~~
saagarjha
That would be even slower and more complex.

~~~
thedance
Why? The slowness in this article comes from architectural brain damage inside
the kernel. Doing the encryption and IO on your threads, when and where you
choose to do it, is the solution. As your performance requirements increase,
you are less and less likely to want kernel features.

