> It will use some kind of symmetric algo (generally AES).
AES-GCM, you mean. Let's not forget the authentication in "authenticated encryption". I'm nitpicking, but if a beginner comes here it's better to make it clear that in general, encryption alone is not enough. Ciphertext malleability and all that.
Plenty of stuff still uses CBC (or other modes) with another authentication method. AES-GCM is nice in that it combines both explicitly, but a lot of stuff just combines other methods and it’s fine.
AES-GCM has the annoying property of output size > input size for instance.
I was just trying to mention the most widespread method. Sure you can use AES-CBC or AES-CTR, and combine it with HMAC or keyed BLAKE2…
but as tptaceck pointed out, all authentication methods are going to increase your message size. It's unavoidable: to get authentication you need some redundancy, and the only general way to get that redundancy is to have a message bigger than the plaintext. We do have attempts at length preserving authenticated encryption, but as far as I know they're not as well studied as the classical "encrypt-then-mac" methods such as AES-CBC + HMAC or AES-GCM. https://security.googleblog.com/2019/02/introducing-adiantum...
How so? All symmetric crypto algorithms at their basic level that I am aware of do not change the message size at all. If you have an example, that would be helpful. I’m not referring to padding. If you’re referring to IV, then I see what you’re saying, but most algorithms derive that from positional data or treat it like a semi-public part of the key, which I’m not referring to.
AES-GCM (as a method) is unusual this way, because it combines encryption and validation at the same time, in each block. They’re two steps - you have the cipher text and the validation data separate.
It’s encrypting + signing everything, essentially, for each block. It stores the data for it directly in each block, which is why the inflation.
For why this is both great, and terrible depending on the use case - for problem cases, imagine full disk encryption. If you naively encrypt the block using AES-GCM, any block you encrypt will no longer fit in the device. If you encrypt a file (like a database file) which relies on offsets or similar hard coded byte wise locations to data, those no longer work.
In both cases you’d need a virtualization layer which would map logical offsets to physical ones. Definitely not impossible. Not as straightforward as replacing your read/write_blk method with read/write_encrypted_blk though.
As for why it’s awesome, it greatly simplifies and strengthens the real world process of encrypting or decrypting data where the size of the input and output are not fixed by some hardware constraint or fixed constant, where you have a virtualization layer, or where you don’t need to care as much (or can remap) offsets. Which is often.
> How so? All symmetric crypto algorithms at their basic level that I am aware of do not change the message size at all.
That's because you are not aware of the importance of authentication.
Without authentication, your system is not secure: an attacker might intercept messages, and modify them undetected. The key word here is "ciphertext malleability". And once they can do that, they can cause the recipient to react in ways it should not, and in some cases the recipient might even leak secrets.
Sometimes (like disk encryption) the size overhead is really really really inconvenient, and the risk of interception is lower, so you break the rule and skip it anyway. But unless you are in a similar situation (you probably aren't), you must use authentication. It's only professional.
In practice, that means you should use authenticated encryption. Authenticated encryption is used everywhere, including HTTPS. And yes, it has a small size overhead. Usually 16 bytes per message, like AES-GCM and RFC 8439 (ChaPoly). Per message. Not per block. So the actual overhead is very low in practice. And again, it's the price you have to pay to get a secure system.
You do not seem to be aware of the practical constraints around an actual attack like ciphertext malleability in this context, or have thought through how you would implement direct disk encryption on a block device with AES-GCM without, you know, doing block based AES-GCM for individual blocks?
Which is exactly what I was referring to?
For block based, the best way is simple to use a validating filesystem like ZFS on top of whatever block based crypto is being used, if you need random IO. If you don't, a simple fixed size signature (seperate from the data) is sufficient, and out of band is fine.
In either case, including AES-GCM, the validation and authentication is not, itself, the symmetric encryption algorithm. They wrap approved block ciphers which do that.
> You do not seem to be aware of the practical constraints around an actual attack like ciphertext malleability in this context
Show me a peer reviewed paper demonstrating, or at least convincingly arguing, of the soundness of a particular technique you are trying to advocate, and I’ll believe you.
Otherwise it’s pretty simple: either your authentication method has been validated (as are HMAC and polynomial hashes), or there’s a good chance it’s broken even if you don’t know it yet.
> For block based, the best way is simple to use a validating filesystem like ZFS on top of whatever block based crypto is being used,
File system blocks are typically 4KiB or so. AES blocks are 16 bytes. I’m not sure what you mean by "block based crypto" here, the length of AES blocks has nothing to do with the file system blocks you’re trying to encrypt.
> In either case, including AES-GCM, the validation and authentication is not, itself, the symmetric encryption algorithm. They wrap approved block ciphers which do that.
I have implemented a cryptographic library, so I’m well aware. I insist on authenticated encryption as if it was a monolithic block because it makes much safer APIs. You really really don’t want to let your average time pressured programmer to implement their polynomial hash based authentication protocol by hand, there are too many footguns to watch out for. Believe me, I’ve walked that minefield, and blew my leg off once.
> I'm not against AES-GCM, not at all. It's awesome! I'm pointing out that it has implementation tradeoffs.
Compared to what? All the example you cite in your other comment (SSL to PGP/GPG, S/MIME), make the exact same trade-offs!! They all add an authentication tag to each message, effectively expanding it size.
No matter how you're encrypting, if you're authenticating, you have to store the authenticator. Which is why GCM "expands" the size of the message. If you're not authenticating, you're not encrypting securely.
The fact that XTS isn't authenticated is a huge problem with full-disk encryption.
You can (and folks do) authenticate in ways that don’t make individual blocks bigger.
And any decent structural validation of the data still makes it reasonably secure even without per-block validation.
GCM is also opened up to different types of attacks due to it’s structure. Such as if the data is gone, it may be impossible to figure that out without additional signature or metadata.
Without the correct key for AES, it is exceedingly difficult to construct a value that can result in a successful attack after decryption even for the simplest file systems (as compared to a very visible crash or disk corruption issue even without validation), and that blog post way oversimplifies the actual process. It also makes numerous flat out false statements about many encryption modes.
a trivial answer that solves every one of the attacks mentioned in that blog is using ZFS on top of a encrypted block device.
In each of these cases, for a successful attack, you’d need to generate a new block, or identify an existing block to replace a known block with, that would produce the attackers desired outcome. All GCM does is make it more detectable in the encrypted data if that happens.
Some modes mentioned, if watching the actual disk activity and doing chosen plaintext attacks, it could be possible to shorten the time to recover the underlying volume keys, but that is not helped immensely by GCM (necessarily).
It is going to be obvious in the system itself without the right key if someone tries to swap in a bogus block, because it will be gibberish/corrupt, if it is data used by anything or checked by anything.
AES-GCM just means you can tell when you pick something up, vs when you look at it if it’s damaged. And it does it at the trade off of adding a signature on everything. Sometimes that’s worth it, sometimes it’s not.
> You can (and folks do) authenticate in ways that don’t make individual blocks bigger.
First, name one example.
Second, what do you mean by "individual blocks"?
AES-GCM adds one authentication tag per message. A single message may contain millions of AES blocks, and the total overhead of AES-GCM over it will still be a single authentication tag (16 bytes). That makes it very similar to pretty much any authenticated encryption scheme out there.
Ah, your second question is a good one, and probably gets to the root of the disagreements.
I was specifically referring to the context of things like block devices. There is no single message (in a sane way, anyway) for the device. Each low level block is the message, in the sense you are referring to. That's when inputsize != outputsize is a problem, as that 'message' is also fixed size.
When I am referring to authenticating in a way that doesn't make individual blocks bigger, I'm referring to a HMAC signature in filesystem metadata or similar in this type of scenario. Out of band information. Practically speaking, even a basic CRC of metadata and file contents would make most attacks impractical.
Which you could do with AES-GCM of course, by storing the tag separately. I currently know of no implementations that do so however, but I'm sure there are ones out there. It would require storing the tag per block, which doesn't sound fun or performant.
To answer your second question in that context - everything from SSL to PGP/GPG, S/MIME, etc.
> Practically speaking, even a basic CRC of metadata and file contents would make most attacks impractical.
If I recall correctly, CRC of plaintext-then-encrypt scheme have been defeated in the past. With practical attacks.
---
> Which you could do with AES-GCM of course, by storing the tag separately. I currently know of no implementations that do so however, but I'm sure there are ones out there. It would require storing the tag per block, which doesn't sound fun or performant.
As for storing the tag "per block", I’m not sure what you mean. Sure you need one tag per block, but with the above API you can store that tag anywhere you want. If for instance you pack them into dedicated blocks, a single 4KiB blocks can store 256 authentication tags. The loss of storage capacity would be a whooping 0.4%.
> When I am referring to authenticating in a way that doesn't make individual blocks bigger, I'm referring to a HMAC signature in filesystem metadata or similar in this type of scenario. Out of band information
Then just store the authentication tag from AES-GCM out of band!! Surely your meta-data can handle a 0.4% size overhead?
---
> To answer your second question in that context - everything from SSL to PGP/GPG, S/MIME, etc.
Thought so. They’re all just like AES-GCM. One of them (TLS 1.3, a.k.a. SSL) can even use AES-GCM for its symmetric crypto.
If you store all your tags (which update every time a block updates) in one location, then you’ll burn it out on a SSD, or thrash your disk with seeks on spinning rust (you’ll multiply your writes 2x, minimum). And unless you are adding some kind of virtualizing layer, you don’t have .4% of blocks to play with. You have 0% of blocks.
I tried to read your pointer, but the link goes no where explaining it. Mind giving a more useful link? It
Could be because I’m on mobile.
We weren’t talking about CRC of plaintext anyway - we were talking about block encryption. So it would be CRC (as validation) of on-disk filesystem structures as part of parsing. Aka an actual attack.
Standard AES-GCM appends the tag to the encrypted message directly. None of those I name do it that way. Using AES-GCM as a transport is layering their stuff inside it, which of course is fine as I’m describing it - because they don’t have fixed size structures in their protocols! It doesn’t mean they aren’t doing the additional validation and authentication.
> And unless you are adding some kind of virtualizing layer, you don’t have .4% of blocks to play with. You have 0% of blocks.
That is a shitty problem to have, there is no perfect solution. If you at all can, change the problem. If that means you need a virtualization layer, use it if possible.
---
> I tried to read your pointer, but the link goes no where explaining it. Mind giving a more useful link? It Could be because I’m on mobile.
The first sentence of the link I gave you reads as follows: "Some applications may need to store the authentication tag and the encrypted message at different locations."
Then it shows you the following function that achieves that separation (with zero performance overhead I might add):
int crypto_aead_aes256gcm_encrypt_detached(
unsigned char *ciphertext,
unsigned char *mac,
unsigned long long *mac_size_p,
const unsigned char *message,
unsigned long long message_size,
const unsigned char *additional_data,
unsigned long long additional_data_size,
const unsigned char *always_NULL,
const unsigned char *nonce,
const unsigned char *key);
If that does not help you, you need more basic training. I recommend Dan Boneh's standford cryptographic course or crypto1O1. And if you need to understand the severity of various attacks at a gut level, you might want to take a look at the cryptopals challenges as well:
People have been saying stuff like "even a CRC would make attacks impractical" for decades, and what all they've managed to accomplish is an obstacle course for early-career academic cryptographers. Which, by all means, carry on: it produces great papers, and it's a great way to get new people into the field. But if you care about security, your ciphertext needs to be authenticated.
Which brings me back to: all secure encryption expands the size of the ciphertext. If you're using XTS in a new design, you are doing something very wrong.
I don't understand your actual answer. You gave a list of cryptosystems that all suffered devastating attacks due to not properly authenticating ciphertexts; in the TLS case, several years worth of devastating attacks. PGP is an especially funny example. The only surprise is you didn't manage to work SSHv1 into the list.
AES-GCM, you mean. Let's not forget the authentication in "authenticated encryption". I'm nitpicking, but if a beginner comes here it's better to make it clear that in general, encryption alone is not enough. Ciphertext malleability and all that.