This seems like at least something of a bad idea, because that implementation (if my search-fu is correct) is:
Which is obviously not constant time, and will leak information through cache/timing sidechannels.
AES lends itself to a table based implementation which is simple, fairly fast, and-- unfortunately-- not secure if sidechannels matter. Fortunately, AES-NI eliminated most of the motivation for using such implementations on a vast collection of popular desktop hardware which has had AES-NI for quite a few years now.
For the sake of also being constructive, here is a constant time implementation in naive C for both AES encryption and decryption (the latter being somewhat hard to find, because stream modes only use the former):
(sadly, being single-block-at-a-time and constant time without hardware acceleration has a significant performance cost! ... better could be done for XTS mode, as the above algorithm could run SIMD using SSE2-- it isn't implemented in that implementation because the intended use was CBC mode which can't be parallelized like that)
Can't the kernel aes-ni just be setup to save the fpu registers itself on the stack, if necessary?
(I've chosen example numbers just to make calculation trivial)
So I can't ever imagine storage size being the driver for choosing the key size, though from the other threads, it seems that there are algorithms that do have a storage overhead that might be related to key sizes.
It requires statistical techniques to remove the noise, making the attack harder, but not necessarily infeasible.
These are called timing attacks and they're less common now because professional cryptographers know how to deal with it. But this is very much a perfect example of it.
This confuses me. Why is it in the kernel if it's not constant time? Isn't that a security risk? (Is there any context where it would be safe to invoke this?)
I agree that there can be some cases where it doesn't matter but it's extremely expensive to be sure that it doesn't matter-- making it usually cheaper, when you consider the total costs, to deploy code that doesn't have the sidechannels.
There are other options for non-AES FDE too: most infamously Speck (suspected to be compromised by the NSA), but also Adiantum, which is now in Linux 5.0.
Not as fast. Chacha20 uses 32-bit additions which are fast in software but expensive and slow in hardware. In addition protecting Chacha20 from power analysis attacks is more difficult compared to AES.
> just generic SIMD
Constant-time AES with SSE2 is actually faster than the naive variable-time AES. See https://www.bearssl.org/constanttime.html#aes
In addition Chacha20 is not nearly as fast as AES when using the AVX-512 Vector AES instructions.
> but also Adiantum
Which uses AES (once per sector).
IIRC (I can't find it right now), when NIST had the contest for AES, AES hhad to run on low power hardware in the late 90s/early 2000s. This required things like everything to be fast on an 8-bit microcontroller.
Meanwhile + and bitwise and tend to take the same amount of cycles to be processed, and each cycle takes the same amount of time, see https://gmplib.org/~tege/x86-timing.pdf
Chacha20 in hardware would not be any slower than chacha20 in software, but it would be slower than other algorithms which do not use 32-bit +.
This is not how CPUs typically implement addition, or other ALU operations. Carry-lookahead adders have existed since the 1950s: https://en.wikipedia.org/wiki/Carry-lookahead_adder
> Charles Babbage recognized the performance penalty imposed by ripple-carry and developed mechanisms for anticipating carriage in his computing engines.
Note that Cloudflare opted for Xeon Silver chips that aren't good at AVX-512, unless doing pure AVX-512 operations.
What's the threat model here? I can't think of a plausible scenario where side channel attacks can be used to gain unauthorized access to FDE contents.
We ended up with several solutions- but all of them generally work the same conceptually.
First off, separation of I/O layers. System calls into the FS stack should be reading and writing only to memory cache.
Middle layer to schedule, synchronize and prioritize process IO. This layer fills the file system caché with cleartext and schedules writes back to disk using queues or journals.
You also need a way to convert data without downtime. A simple block or file kernel thread to lock, encrypt, mark and writeback works well.
Another beneficial technique is to increase blocksizes on disk. User Processes usually work in 4K blocks, but writing back blocks at small sizes is expensive. Better to schedule those writebacks later at 64k blocks so that hopefully the application is done with that particular stretch of data.
Anyway, my 2 pennies.
There is also the overhead of automatically unblocking a remote server during an unattended reboot. Reading the encryption password on a USB stick or fetching it through internet is a no from me. I think there are solutions about storing the password in RAM or in an unencrypted partition, but that's the overhead I'm talking about. I wonder how companies deal with that.
> The Network-Bound Disk Encryption (NBDE) allows the user to encrypt root volumes of hard drives on physical and virtual machines without requiring to manually enter a password when systems are restarted. 
A warm reboot using kexec does not need any intervention from my side and directly boots into the already decrypted initramfs with the key already present and thus able to mount the encrypted volumes including the root volume.
Cloudflare is optimising for SSDs.
They don't talk about latency: all their crypto benchmarks measure throughput. Near the end they hint at response time for their overall cache system but there's no detailed discussion of latency issues.
The takeaway for me is that I'm OK with what's currently in Linux for the HDDs I use for my backups but I'd probably lose out if I encrypted my main SSD with LUKS.
At the end of the article they say that they're not going to upstream the patches as they are because they've only tested them with this one workload.
I'd also be interested to see a benchmark comparing SW AES with FPU-saving + HW AES. Unfortunately their post does not include stats for how often their proxy falls into the HW or SW implementations. Whatever those numbers are, I'd expect FPU-saving + HW AES to be somewhere in the middle.
While I applaud their wins, they have basically profiled the wrong thing, established the full overhead when disk speed/latency are basically removed, and only gone to actual production workload at the very end — in the worst case, their improvements could have been for naught, but they were "lucky" (not really, they were smart, but profiles did not really guide them — they just optimised the heck out of the system, but they could have been unlucky and not gain anything if the bottleneck was in a particular place unaffected by their code analysis).
It's great that Cloudflare allows this kind of engineering to happen (investigative, explorative, and not necessarily RoI focused), but it's rare to find a company that does.
Yep, when building my latest workstation, I went with a pair of ("regular") SSDs (RAID1) for my data. Later, I decided to add an NVMe for the OS for the additional speed.
I then went and encrypted all of the drives (via LUKS), however, which basically killed any additional performance I would've gotten from the NVMe drive. I would have been just fine as well off with only the SSDs and without the NVMe drive.
For Debian it's almost as easy: https://passthroughpo.st/patch-kernel-debian/
When I see a company such as CloudFlare being so transparent about their difficulties, and trying to find an answer using their community members, it makes me love them even more.
No ego, pure Professionalism
It's a shame because I've seen this condescending attitude quite frequently in the crypto open source community, and am not really sure how it arises. At least in this case it seems to have had the good outcome of motivating Cloudflare to dig in deeper and solve the problem by themselves.
OTOH, perhaps those that answered from atop the tower had little idea of the mechanisms that the author(s?) dug out and changed. So double shame on them, for being condescending and not knowing.
And it also falls on ourselves to be mindful of this behaviour, that can creep up on us without knowing. We sometimes think our time is super valuable and we don't have to spend it on some "newbie question" or this guy who doesn't understand. The past year I've been mentoring grad students in the lab I work at, and found myself once or twice going this route. I luckily caught it early, took a deep breath and gave them the time and explanations they needed. In the end I got a few nice surprises out of two amazing students, who were seeing a bit beyond what was evident.
From the PoV of the person who responded they didn't provide any relevant information that would indicate what platform they run, or what speed they expect, or why they think 800MiB/s seems slow to them. On many platforms this would be a pretty good result. At first look, it looks like they expected the speed of unencrypted storage, because that's what they tried to compare against.
So the response seems reasonable at first glance to me. They got the answer to their main question. (which they omitted from their blog article)
> If the numbers disturb you, then this is from lack of understanding on your side.
This is arrogance on the part of the person replying that hand-waved away their problem "you just don't understand" when in fact, they (Cloudflare) do/did understand. They then went on to prove that it was due to queuing within the kernel, not the hardware as commented by this person in their flippant reply.
And neither did the people who responded to them it seems.
You can't have your cake and eat it too.
"Without LUKS we are getting 450MB/s write, with LUKS we are twice as low at 225MB.s"
They showed work in a vacuum - demonstrated that dm-crypt has costs over raw device access (I would hope so!) on some unknown hardware, and then asks "does this look right to you?"
Well, yeah, that looks like it looks elsewhere, and by the way, there's a built-in command that also would have told you this.
People whine about technical mailing lists, I think because they don't get the context. Think of them as sort of like water coolers at an office that specializes in whatever the list is about. You get a short slice of expert attention in between doing whatever they actually have to get done.
Throwing a bunch of data on the floor and saying "hey, is this expected?" is not going to work well. Seriously, what were they expecting?
It's entirely possible to say both these things in a much more constructive and less condescending tone than was used.
Context matters. If you don't take the time to understand the context you're walking in to and don't follow local rules, don't be surprised if people are rude to you back. Not that I even think what they said was all that rude.
Do you also think you can slide in to a gam3r chat and expect business etiquette?
It was the tone that they "dont understand" when in fact Cloudflare does understand crypto and performance very well, and went so far as to dive into kernel code and submit patches that fixed the problem that others didnt even realize existed. Even so I agree this isn't worth such a big discussion.
They're not sensitive when it's a "big company", they're sensitive when they're trying to get work done and they receive a flippant response.
I'm on a number of public mailing lists and there's often a participant who tends to be both available/communicative and callous in their communication style. My assumption is there's a filter effect going on here where some folks who have very poor social abilities wind up at their computer alone all the time and public mailing lists become part of their few remaining human interactions.
What I'd take away from this particular dm-crypt interaction isn't that the community is assholes, but that the community is small and the mailing list poorly attended/inactive.
In the past I've reported my own dm-crypt problems upstream and it took years to get a bisected regression reverted. Just getting relevant people to pay attention was a challenge.
It's like having a high PageRank in Google because you actually write meaningful, useful, well-written blog posts which Google happens (happened) to value vs link factory blog posts.
Anyone have a source on how full disk aka block-level encryption encrypts free space? The only way I can imagine this could happen is by overwriting the entire disk initially with random data, so that you can't distinguish between encrypted and true "free space", i.e. on a brand new clean disk. Then, when a file (which, when written, would have been encrypted) is deleted (which by any conventional meaning of the word 'deleted' means the encrypted data is still present, but unallocated, thus indistinguishable from the random data in step 1), then gets overwritten again with random data?
I would argue that overwriting an encrypted file with random data isn't really encrypting free space, but rather just overwriting the data, which already appeared random/encrypted. It is hardly any different to having a cleartext disk and overwriting deleted files with zeros, making them indistinguishable from actual free space.
This way, an attacker can't focus cracking on the fullest disk, match stolen backup disks to hosts based on non-sensitive health metrics, etc.
>The only way I can imagine this could happen is by overwriting the entire disk initially with random data
Traditionally, for speed, you'd write all zeroes to the encrypted volume (causing the physical volume to appear random), but yes
>Then, when a file (which, when written, would have been encrypted) is deleted
You'd just leave it. Crucially, you don't TRIM it.
>I would argue that overwriting an encrypted file with random data isn't really encrypting free space
Yup, that's why it's not done
What is it protecting against — data recovery from discarded old disks? Very stupid criminals breaking into the datacenter, powering servers off and stealing disks?
A breach in some web app would give the attacker access to a live system that has the encrypted disks already mounted…
Yes, exactly. A company I worked for had a hard drive pulled from a running server in a (third party) data center that contained their game server binaries. Shortly afterwards as pirate company setup a business running “gray shards”, with - no surprise - lower prices.
Its also not possible to get data injected offline into your filesystem without having the keys. Without encryption you could just get the disk of the targeted server running somewhere and set your implants or what you have. When the server sees the disk back up it looks just like a hiccup or something.
This is, in theory, possibly against volumes encrypted using AES XTS (which seems to the how the majority of FDE systems work) as the ciphertext is indeed malleable.
It's all about layers of defenses.
You can still use partitions.
> not all cryptographic algorithms can be used as the block layer doesn't have a high-level overview of the data anymore
I do not really understand this. Which cryptographic algorithms can't be used?
> Most common algorithms require some sort of block chaining to be secure
Nowadays I would say that from these only CTR is common, which does not require chaining.
> Application and file system level encryption are usually the preferred choice for client systems because of the flexibility
One big issue with "Application and file system level encryption" is that you often end up leaking metadata (such as the date edited, file name, file size, etc).
Regardless I think that this is a really nice article. I can't wait to try their patches on my laptop.
You can't use any algorithm that requires O(n) IVs (e.g. a separate IV per disk sector), because there's nowhere to store the IVs. (Another consequence of this is that you can't store checksums anywhere, so you can't provide integrity checks.)
You can't use CTR mode either, because you'll end up reusing counter values. What do you do when you need to overwrite a block with new data?
XTS mode solves this, at least partially. It's like CTR mode, but with an extra "tweak" that essentially hashes the block's content into the encryption key. So if you overwrite a block with new data, you get a new encryption key.
This isn't perfect, though, because it's still deterministic. If an attacker can see multiple states of the disk, they can tell when you revert a block to a previous state. But it's much better than other modes, especially since the main threat you want to protect against is your laptop getting stolen (in which case the attacker only sees a single state).
Certainly you can. You just have to reduce the effective sector size that the file system can use.
> What do you do when you need to overwrite a block with new data?
You generate a new random nonce (as per XChacha) and you store it in the sector.
get back to me when you find a high-performance (FAT doesn't count) Linux filesystem that supports sector sizes of 496.
But even if that was the case, you could just pretend to the OS that you have 7 sectors of 512 bytes each rather than a single sector of 4032 bytes. (or if that was not possible you could just take the hit)
If you are talking about using reserved sectors for book keeping at the end of the disk that is possible and commonly done.
CBC - which is one the most common stream cipher algorithm.
It's not clear me whether GCM would work or not.
Of course, the downside is that now the block layer must tackle all of the write ordering issues that a filesystem does when updating the tree. The block layer would find itself greenspunning up a filesystem inside itself, even if it was a filesystem of only one file.
The profile that authenticated encryption defends against is an attacker who is attempting to feed the victim specially crafted bad blocks. 128-bit tags are good enough that the disk will be completely trashed long before the victim executes something of the attacker's choosing.
It might not be ideal but it still can be used. Though, I would not call CBC common at all. Pretty much everyone has switched to CTR or some variant of it (such as GCM).
Also, CBC is not a stream cipher algorithm.
I wonder how cryfs stacks up in this regard.
grep -A 11 'xts(aes)' /proc/crypto
Is that what you mean?
> We are going to submit this work for inclusion in the main kernel source tree, but most likely not in its current form. Although the results look encouraging we have to remember that Linux is a highly portable operating system: it runs on powerful servers as well as small resource constrained IoT devices and on many other CPU architectures as well. The current version of the patches just optimises disk encryption for a particular workload on a particular architecture, but Linux needs a solution which runs smoothly everywhere.
That is, they think their current patch is too specialized for their own use-case to warrant inclusion in the mainline kernel without significant adaptation.
Well when they reached out to the community they were told they're idiots and should f* off in only somewhat nicer language. Then they were simply ignored.
When your community is toxic don't complain that people don't want to be part of it.