Hacker News new | comments | ask | show | jobs | submit login
How to Subvert Backdoored Encryption [pdf] (iacr.org)
72 points by tlamponi 9 months ago | hide | past | web | favorite | 15 comments

This reminds of the 1998 paper by Ron Rivest titled "Chaffing and Winnowing: Confidentiality without Encryption". [0] It's a lot more approachable than this.

[0] https://web.eecs.umich.edu/~zmao/eecs589/papers/chaffing.txt

Thank you! I couldn't understand the paper at all, but wanted to learn more about steganography techniques, so this is perfect.

This is over my head, no question.

It'd be really cool if someone could ELI5 how such an encoding method could be setup without first sharing something analogous to private keys or one-time pads.

Note that the described scheme relies on a key exchange protocol (such as Diffie-Hellman). Remember that a key exchange protocol allows two parties to agree on a shared key; the key will be secret even in the presence of a passive eavesdropper who can observe the messages sent during the key exchange. In that respect, the fact that the two parties can communicate without having first shared a secret key is no more surprising than the fact that a key exchange protocols exists. The difficulty tackled by the paper is to "embed" the key exchange messages in the ciphertexts of the "innocent-looking" conversation. It is true that doing this embedding would be much easier if both parties had a pre-established secret, but the main result of this paper is to show how to do the embedding without a pre-established secret.

OK, thanks.

That last part is what boggled my mind. I'll read it again.

Their notation is unfamiliar to me, so it's very hard to follow. The scheme I think they're describing relies on sequences of cyphertexts being independent, but surely the government-mandated backdoored protocol could enforce a rule like 'a stream of messages taken two at a time must be first greater than, then less than, one another.' That probably means I'm missing something.

You are right: the scheme relies on a sequence of cyphertexts being independent (or at least, appearing to be independent to someone who doesn't know the decryption key). However, this property is automatically guaranteed if the government wants their scheme to be secure against an adversary who doesn't know the decryption keys (for example, another nation-state). This follows from the definition of "semantic security" which is the standard security definition of encryption schemes.

Wouldn't this scheme still be semantically secure, while defending against the steganography in the paper?

1. encrypt plaintext | 128-bit random value 2. if last cyphertext was greater than its predecessor but less than this cyphertext, go to step 1 3. if last cyphertext was less than its predecessor but greater than this cyphertext, go to step 1

This is an interesting point, but note that the scheme you are suggesting is "stateful" (you need to remember the last ciphertext to be able to generate the next one). The standard definition of an encryption scheme (and the one used in this paper) is stateless (once you have generated the key, you can produce ciphertexts without keeping track of any state). Stateful encryption schemes lead to all kind of complications and for this reason tend to be studied way less from the theoretical perspective. But it would be an interesting question to see if the results from this paper extend to the stateful setting (and I agree that because of the example you provide, some adaptation would have to be made).

Drat, I must have misunderstood the paper, then, because it seemed (to me) to require keeping the old cyphertext in order to determine the GT relation.

Any chance you have a more-plain-English (e.g. like in a NIST pub) description of the paper's proposed system.

I skimmed the paper. Find my hand-wavey explanation below. I think I understand the overview, though if you see a mistake/misconception/error let me know.

Suppose Alice and Bob want to "secretly" communicate a message, we'll call the message LL for love-letter, but the government mandates an encryption scheme, we'll call it GE for government encryption. This encryption scheme allows the government to decrypt your correspondence as they know the keys that are necessary for its use.

Suppose also that GE is "secure": only Alice, Bob, and the government can decrypt an encrypted message. We consider this secure as the government already has the keys and hasn't fundamentally broken the algorithm.

The paper proposes that Alice and Bob can still communicate their LL and that the government won't understand it, even over a link that the government can decrypt. Further, the conversation between Alice and Bob will be encoded in such a way that it does not appear to the government that encryption on top of the mandated GE is taking place. In addition, even if the government knew that the conversation had subliminal meaning it wouldn't be capable of decrypting it/understanding it: the LL was encrypted using a method as hard as the GE. Thus the paper gives a good security guarantee for the subliminal messages.

I thought the important point they make in the paper is that the proposed scheme should not generate messages that are clearly encrypted: the government should see a normal conversation between Alice and Bob unrelated to the LL. They reference steganography as an example of sending encrypted information that doesn't appear to be encrypted (e.g. via a message encoding in the color bits of a picture such that the picture looks the same before and after encoding your message into it). We'll call this the normality constraint (NC). They then give an impossibility result for something known as local decoding. Metaphorically, I think local decoding translates to using the same picture to carry a conversation stegenographically. So using your favorite meme picture repeatedly for the stegenographic conversation isn't secure (again, my understanding is shallow here and the constraints may be even stricter). It would appear though that you could randomly select from a meme archive to have your private stegongraphic-like conversation (treating the pictures as strings of bits and randomly selecting among them for the purposes of the proposed algorithm, I explain more below)

The NC motivates them to use a probabilistic function over a set of strings as the alphabet/symbols of their encryption.

I think this means that the government would see a conversation that is syntactically correct and not outrageously meaningless, though I didn't see this quantified/addressed directly in the paper (correct me if I'm wrong).

Thus, as I didn't see the semantics of the encrypted subliminal conversation addressed I felt motivated to attempt a quick answer myself (again my shallow understanding of the paper may have overlooked how they addressed this).

Specifically: Wouldn't it become obvious if the government saw a syntactically wrong and semantically meaningless conversation?

My hair-brained scheme to address the question goes like this:

Use phrase indexes or phrasal pairs. A phrase index would be a string like "hey, how are you?". A phrasal pair might be pairs of related phrasal indexes. I could see this being mechanically generated via machine learning. Although this solution has its own pitfalls as you can probably guess and would need further fleshing out.

The semantic authenticity required by the NC seems to be hard to satisfy based on my reading. Would any care to enlighten me?

TLDR: You can formally encrypt conversations using strings (or language phrases) such that even if they're communicated over a "compromised" channel, your encrypted conversation retains its confidentiality assuming that the compromised channel hasn't been fundamentally broken (e.g. the snooping party has the keys to the channel, not a universal lock pick that breaks the algorithm).

If I understood correctly, as long as Alice and Bob can arrange some meeting and agree to some method, they'll be able to communicate over any channel, be it encrypted or not.

Example: Alice and Bob agree that the 5th letter of the n-th phrase from a conversation will mean '1' if it is the set ones={A, C , E} and '0'={B, D, F}. And that the set will be generated based on some characteristic from some headline from a newspaper.

That will be secure even if used over the GE-encryption method from the government, and won't attract attention, except if someone knows how to deal with it.

Isn't that? So, a kind of steganography: hide information in plain sight, as long as it was agreed before and was not leaked.

The paper targets an adversarially selected encryption scheme. Thus perhaps any would work, though I sort of doubt this based on my reading. In addition, the paper supposes that the two parties should be able to communicate using a normal-ish public/private key method.

Based on the quotes below I belive paper relies explicitly on the GE channel's hardness to break:

"On Our Modeling Assumptions. Our model considers a relatively powerful adversary that, for example, has the ability to choose the encryption scheme using which all parties must communicate, and to decrypt all such communications. We believe that this can be very realistic in certain scenarios, but it is also important to note the limitations that our model places on the adversary.

The most obvious limitation is that the encryption scheme chosen by the adversary must be semantically secure (against third parties that do not have the ability to decrypt)."


"All known constructions of such undetectable random string embedding rely on the sampling of a public random seed after the adversarial strategy is fixed. In this paper, however, we are interested in bootstrapping hidden communications from the very ground up, and we are not willing to assume that the parties start from a state where such a seed is already present."

" We begin with the following simple idea: for each consecutive pair of ciphertexts c and c0, a single hidden (random) bit b is defined by b = f(c, c0) where f is some two-source extractor. It is initially unclear why this should work because (1) c and c0 are encryptions of messages m and m0 which are potentially dependent, and two-source extractors are not guaranteed to work without independence; ..."

"We overcome difficulty (1) by relying on the semantic security of the ciphertexts of the adversarially chosen encryption scheme. Paradoxically, even though the adversary knows the decryption key, we exploit the fact that semantic security still holds against the extractor, which does not have the decryption key."

This is correct: if both parties have some pre-agreed secret, then it is much easier to secretly communicate over any channel and the method you are describing works and is similar in spirit to standard steganographic methods.

However, deriving the "secret" from "some headline from a newspaper" will not work because the adversary can also perform the same derivation and will be able to extract the secret conversation as easily as Alice or Bob.

I gave a first pass summary in another comment but had a question based on my reading of the paper. I feel I understand it better now and can provide a better summary.

TLDR: You can formally encrypt conversations using strings (or language phrases) such that even if they're communicated over a "compromised" channel, your encrypted conversation retains its confidentiality assuming that the compromised channel hasn't been fundamentally broken (e.g. the snooping party has the keys to the channel, not a universal lock pick that breaks the algorithm). The proposed scheme requires a channel with "strong" encryption capabilities, it won't work otherwise.

For those wondering why that's a big deal: The paper is using public key encryption to enable subliminal messaging using normal everyday phrasing/conversations. You could have any conversation you wanted to (e.g. a non-forced or agreed upon previously conversation) and still communicate covertly (e.g. your talk carries extra meaning that an observer with access to the plain text wouldn't be able to discern).

The two parties don't have to plan or strategize before the meeting takes place. In fact, as long as they both understand what public key cryptography is, they can use the technique proposed in this paper to communicate without ever having met before! I feel like that statement is what throws a lot people off, or wonder why they should care.

For those wondering how it works that know some cryptographic stuff: The paper uses rejection sampling. They assume that the encryption of any given message will produce a string with desirable entropy. Meaning encrypting "hello" five times should result in 5 different encrypted cipher texts. They then select a desirable encryption of "hello" according to their extraction function f (which they defined in their subliminal messaging scheme and extracts the subliminal message but is also an entropy extractor). What the paper does is actually fairly straightforward scheme wise (though the math is different and takes a fair background to follow rigorously). Hand-waving again:

Algorithm steps: 1. Establish a random key seed and perform a key exchange; you must be communicating over a semantically secure channel for this. Seed generation works by producing d cipher texts for each party, exchanging them, then using the greater than function on each cipher text to create a shared seed. They acknowledge that the seed is public to anyone that has decryption access to the government mandated encryption channel. 1.1 Do a key exchange, described as: "Let Ext be a strong seeded extractor, and let S serve as its seed. By rejection-sampling ciphertexts c until ExtS(c) = str, either party can embed a random string str of their choice in the conversation. By embedding in this manner the messages of a pseudorandom key-exchange protocol, both parties establish a shared secret sk∗."

So they essentially use the rejection sampling to provide a cover for the key-exchange protocol. Pretty neat!

2. Using the seed and keys, do public key crypto like normal with the exception that transmitting cipher texts will be done on a single bit at a time by selectively choosing what cipher texts of cover messages are actually sent (that efficiency is very poor and I think they provide better techniques). The encrypted messages will thus be communicated by using an extractor function over a set of encrypted strings.

The encrypted strings from step 2 can be generated from any conversation, without restriction, as long as the encryption function is strong. You do this by using a rejection sampling strategy to produce cipher texts with desirable characteristics.

Example of step 2:

Suppose you want to communicate "hello" as your cover text but you want to use the bit value 1 for your subliminal message. To do so, a scheme to encrypt the single bit value could be to ensure whatever cryptographic text you produce has an integer value greater than the last one you received. So, you encrypt your "hello" message repeatedly until it gives a value that, if represented as an integer, is larger numerically than the previous cipher text message. You then transmit this cryptographic text. Similarly, if you want to transmit 0 you generate a cipher text that will be less than the integer representation of the previous cipher text.

Mathematically this scheme is as strong as the underlying government mandated encryption scheme (it uses the underlying cryptographic scheme as a source of entropy).

Hope you found that intelligible and correct me if you see any mistakes!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact