
How to Subvert Backdoored Encryption [pdf] - tlamponi
https://eprint.iacr.org/2018/212.pdf
======
pmorici
This reminds of the 1998 paper by Ron Rivest titled "Chaffing and Winnowing:
Confidentiality without Encryption". [0] It's a lot more approachable than
this.

[0]
[https://web.eecs.umich.edu/~zmao/eecs589/papers/chaffing.txt](https://web.eecs.umich.edu/~zmao/eecs589/papers/chaffing.txt)

~~~
prophesi
Thank you! I couldn't understand the paper at all, but wanted to learn more
about steganography techniques, so this is perfect.

------
mirimir
This is over my head, no question.

It'd be really cool if someone could ELI5 how such an encoding method could be
setup without first sharing something analogous to private keys or one-time
pads.

~~~
thorel
Note that the described scheme relies on a key exchange protocol (such as
Diffie-Hellman). Remember that a key exchange protocol allows two parties to
agree on a shared key; the key will be secret even in the presence of a
passive eavesdropper who can observe the messages sent during the key
exchange. In that respect, the fact that the two parties can communicate
without having first shared a secret key is no more surprising than the fact
that a key exchange protocols exists. The difficulty tackled by the paper is
to "embed" the key exchange messages in the ciphertexts of the "innocent-
looking" conversation. It is true that doing this embedding would be much
easier if both parties had a pre-established secret, but the main result of
this paper is to show how to do the embedding without a pre-established
secret.

~~~
mirimir
OK, thanks.

That last part is what boggled my mind. I'll read it again.

------
eadmund
Their notation is unfamiliar to me, so it's very hard to follow. The scheme I
_think_ they're describing relies on sequences of cyphertexts being
independent, but surely the government-mandated backdoored protocol could
enforce a rule like 'a stream of messages taken two at a time must be first
greater than, then less than, one another.' That probably means I'm missing
something.

~~~
thorel
You are right: the scheme relies on a sequence of cyphertexts being
independent (or at least, appearing to be independent to someone who doesn't
know the decryption key). However, this property is automatically guaranteed
if the government wants their scheme to be secure against an adversary who
doesn't know the decryption keys (for example, another nation-state). This
follows from the definition of "semantic security" which is the standard
security definition of encryption schemes.

~~~
eadmund
Wouldn't this scheme still be semantically secure, while defending against the
steganography in the paper?

1\. encrypt plaintext | 128-bit random value 2\. if last cyphertext was
greater than its predecessor but less than this cyphertext, go to step 1 3\.
if last cyphertext was less than its predecessor but greater than this
cyphertext, go to step 1

~~~
thorel
This is an interesting point, but note that the scheme you are suggesting is
"stateful" (you need to remember the last ciphertext to be able to generate
the next one). The standard definition of an encryption scheme (and the one
used in this paper) is stateless (once you have generated the key, you can
produce ciphertexts without keeping track of any state). Stateful encryption
schemes lead to all kind of complications and for this reason tend to be
studied way less from the theoretical perspective. But it would be an
interesting question to see if the results from this paper extend to the
stateful setting (and I agree that because of the example you provide, some
adaptation would have to be made).

~~~
eadmund
Drat, I must have misunderstood the paper, then, because it seemed (to me) to
require keeping the old cyphertext in order to determine the GT relation.

Any chance you have a more-plain-English (e.g. like in a NIST pub) description
of the paper's proposed system.

------
tmpmov
I skimmed the paper. Find my hand-wavey explanation below. I think I
understand the overview, though if you see a mistake/misconception/error let
me know.

Suppose Alice and Bob want to "secretly" communicate a message, we'll call the
message LL for love-letter, but the government mandates an encryption scheme,
we'll call it GE for government encryption. This encryption scheme allows the
government to decrypt your correspondence as they know the keys that are
necessary for its use.

Suppose also that GE is "secure": only Alice, Bob, and the government can
decrypt an encrypted message. We consider this secure as the government
already has the keys and hasn't fundamentally broken the algorithm.

The paper proposes that Alice and Bob can still communicate their LL and that
the government won't understand it, even over a link that the government can
decrypt. Further, the conversation between Alice and Bob will be encoded in
such a way that it does not appear to the government that encryption on top of
the mandated GE is taking place. In addition, even if the government knew that
the conversation had subliminal meaning it wouldn't be capable of decrypting
it/understanding it: the LL was encrypted using a method as hard as the GE.
Thus the paper gives a good security guarantee for the subliminal messages.

I thought the important point they make in the paper is that the proposed
scheme should not generate messages that are clearly encrypted: the government
should see a normal conversation between Alice and Bob unrelated to the LL.
They reference steganography as an example of sending encrypted information
that doesn't appear to be encrypted (e.g. via a message encoding in the color
bits of a picture such that the picture looks the same before and after
encoding your message into it). We'll call this the normality constraint (NC).
They then give an impossibility result for something known as local decoding.
Metaphorically, I think local decoding translates to using the same picture to
carry a conversation stegenographically. So using your favorite meme picture
repeatedly for the stegenographic conversation isn't secure (again, my
understanding is shallow here and the constraints may be even stricter). It
would appear though that you could randomly select from a meme archive to have
your private stegongraphic-like conversation (treating the pictures as strings
of bits and randomly selecting among them for the purposes of the proposed
algorithm, I explain more below)

The NC motivates them to use a probabilistic function over a set of strings as
the alphabet/symbols of their encryption.

I __think __this means that the government would see a conversation that is
syntactically correct and not outrageously meaningless, though I didn 't see
this quantified/addressed directly in the paper (correct me if I'm wrong).

Thus, as I didn't see the semantics of the encrypted subliminal conversation
addressed I felt motivated to attempt a quick answer myself (again my shallow
understanding of the paper may have overlooked how they addressed this).

Specifically: Wouldn't it become obvious if the government saw a syntactically
wrong and semantically meaningless conversation?

My hair-brained scheme to address the question goes like this:

Use phrase indexes or phrasal pairs. A phrase index would be a string like
"hey, how are you?". A phrasal pair might be pairs of related phrasal indexes.
I could see this being mechanically generated via machine learning. Although
this solution has its own pitfalls as you can probably guess and would need
further fleshing out.

The semantic authenticity required by the NC seems to be hard to satisfy based
on my reading. Would any care to enlighten me?

TLDR: You can formally encrypt conversations using strings (or language
phrases) such that even if they're communicated over a "compromised" channel,
your encrypted conversation retains its confidentiality assuming that the
compromised channel hasn't been fundamentally broken (e.g. the snooping party
has the keys to the channel, not a universal lock pick that breaks the
algorithm).

~~~
woliveirajr
If I understood correctly, as long as Alice and Bob can arrange some meeting
and agree to some method, they'll be able to communicate over any channel, be
it encrypted or not.

Example: Alice and Bob agree that the 5th letter of the n-th phrase from a
conversation will mean '1' if it is the set ones={A, C , E} and '0'={B, D, F}.
And that the set will be generated based on some characteristic from some
headline from a newspaper.

That will be secure even if used over the GE-encryption method from the
government, and won't attract attention, except if someone knows how to deal
with it.

Isn't that? So, a kind of steganography: hide information in plain sight, as
long as it was agreed before and was not leaked.

~~~
tmpmov
The paper targets an adversarially selected encryption scheme. Thus perhaps
any would work, though I sort of doubt this based on my reading. In addition,
the paper supposes that the two parties should be able to communicate using a
normal-ish public/private key method.

Based on the quotes below I belive paper relies explicitly on the GE channel's
hardness to break:

"On Our Modeling Assumptions. Our model considers a relatively powerful
adversary that, for example, has the ability to choose the encryption scheme
using which all parties must communicate, and to decrypt all such
communications. We believe that this can be very realistic in certain
scenarios, but it is also important to note the limitations that our model
places on the adversary.

The most obvious limitation is that the encryption scheme chosen by the
adversary must be semantically secure (against third parties that do not have
the ability to decrypt)."

Later:

"All known constructions of such undetectable random string embedding rely on
the sampling of a public random seed after the adversarial strategy is fixed.
In this paper, however, we are interested in bootstrapping hidden
communications from the very ground up, and we are not willing to assume that
the parties start from a state where such a seed is already present."

" We begin with the following simple idea: for each consecutive pair of
ciphertexts c and c0, a single hidden (random) bit b is defined by b = f(c,
c0) where f is some two-source extractor. It is initially unclear why this
should work because (1) c and c0 are encryptions of messages m and m0 which
are potentially dependent, and two-source extractors are not guaranteed to
work without independence; ..."

" __We overcome difficulty (1) by relying on the semantic security of the
ciphertexts of the adversarially chosen encryption scheme. __Paradoxically,
even though the adversary knows the decryption key, we exploit the fact that
semantic security still holds against the extractor, which does not have the
decryption key. "

------
tmpmov
I gave a first pass summary in another comment but had a question based on my
reading of the paper. I feel I understand it better now and can provide a
better summary.

TLDR: You can formally encrypt conversations using strings (or language
phrases) such that even if they're communicated over a "compromised" channel,
your encrypted conversation retains its confidentiality assuming that the
compromised channel hasn't been fundamentally broken (e.g. the snooping party
has the keys to the channel, not a universal lock pick that breaks the
algorithm). The proposed scheme requires a channel with "strong" encryption
capabilities, it won't work otherwise.

For those wondering why that's a big deal: The paper is using public key
encryption to enable subliminal messaging using normal everyday
phrasing/conversations. You could have any conversation you wanted to (e.g. a
non-forced or agreed upon previously conversation) and still communicate
covertly (e.g. your talk carries extra meaning that an observer with access to
the plain text wouldn't be able to discern).

The two parties don't have to plan or strategize before the meeting takes
place. In fact, as long as they both understand what public key cryptography
is, they can use the technique proposed in this paper to communicate without
ever having met before! I feel like that statement is what throws a lot people
off, or wonder why they should care.

For those wondering how it works that know some cryptographic stuff: The paper
uses rejection sampling. They assume that the encryption of any given message
will produce a string with desirable entropy. Meaning encrypting "hello" five
times should result in 5 different encrypted cipher texts. They then select a
desirable encryption of "hello" according to their extraction function f
(which they defined in their subliminal messaging scheme and extracts the
subliminal message but is also an entropy extractor). What the paper does is
actually fairly straightforward scheme wise (though the math is different and
takes a fair background to follow rigorously). Hand-waving again:

Algorithm steps: 1\. Establish a random key seed and perform a key exchange;
you must be communicating over a semantically secure channel for this. Seed
generation works by producing d cipher texts for each party, exchanging them,
then using the greater than function on each cipher text to create a shared
seed. They acknowledge that the seed is public to anyone that has decryption
access to the government mandated encryption channel. 1.1 Do a key exchange,
described as: "Let Ext be a strong seeded extractor, and let S serve as its
seed. By rejection-sampling ciphertexts c until ExtS(c) = str, either party
can embed a random string str of their choice in the conversation. By
embedding in this manner the messages of a pseudorandom key-exchange protocol,
both parties establish a shared secret sk∗."

So they essentially use the rejection sampling to provide a cover for the key-
exchange protocol. Pretty neat!

2\. Using the seed and keys, do public key crypto like normal with the
exception that transmitting cipher texts will be done on a single bit at a
time by selectively choosing what cipher texts of cover messages are actually
sent (that efficiency is very poor and I think they provide better
techniques). The encrypted messages will thus be communicated by using an
extractor function over a set of encrypted strings.

The encrypted strings from step 2 can be generated from any conversation,
without restriction, as long as the encryption function is strong. You do this
by using a rejection sampling strategy to produce cipher texts with desirable
characteristics.

Example of step 2:

Suppose you want to communicate "hello" as your cover text but you want to use
the bit value 1 for your subliminal message. To do so, a scheme to encrypt the
single bit value could be to ensure whatever cryptographic text you produce
has an integer value greater than the last one you received. So, you encrypt
your "hello" message repeatedly until it gives a value that, if represented as
an integer, is larger numerically than the previous cipher text message. You
then transmit this cryptographic text. Similarly, if you want to transmit 0
you generate a cipher text that will be less than the integer representation
of the previous cipher text.

Mathematically this scheme is as strong as the underlying government mandated
encryption scheme (it uses the underlying cryptographic scheme as a source of
entropy).

Hope you found that intelligible and correct me if you see any mistakes!

