
Encryption is less secure than we thought - qubitsam
http://www.mit.edu/newsoffice/2013/encryption-is-less-secure-than-we-thought-0814.html
======
timtadh
If you really want to understand this you should read the paper. However, I
will try my best to summarize my understanding of what is going on.
(Disclaimer: I am not a professional cryptographer, but I know something about
crypto-complexity).

Many proofs in cryptography start with the following assumption: assume x is a
bitstring of length N drawn from a uniformly random distribution of bit
strings. This assumption is then used as the basis of the proof of the
cryptographic construct.

For instance in a provably secure stream cipher you start with that assumption
then proceed to show that if weak one-way-functions exist then you can
construct a secure pseudo random number generator that is polynomially
indistinguishable from a true random source. You then use your assumed random
bit string, x, as the key to the psuedo random number generator to generate a
one time pad to XOR with your message.

Since the secure psuedo random number generator is indistinguishable from a
true random number generator (by a polynomial attacker) the message is secure
(from a polynomial attacker). However, this is all predicated on the key being
drawn from a uniform distribution. If the key is NOT drawn from a uniform
distribution than there may be a potential speed-up available to the attacker.

What the paper shows is that often it is assumed that words drawn from an IID
distribution of letters can be treated as if the distribution of words is
uniform (given a sufficiently large word size). However, this distribution
while very flat is not quite uniform. They show how this can be exploited
using the "Guesswork" framework to come up with a tighter bound on the time it
would take to guess a chosen word. This bound is much better than the
traditional bound used.

What this means is that there needs to be more margin given than previously
thought when dealing with keys derived from non-uniform distributions. Such
keys include passwords.

~~~
viraptor
What does "a polynomial attacker" mean? I think I get the idea, but the term
is not very well known by google.

~~~
chrismonsanto
Generally in theoretical cryptography the adversary is a probabilistic Turing
machine that can take K steps to break your encryption, where K is
asymptotically bounded by a polynomial function of the length of the input it
is trying to break.

You will see this called a 'PPT' in the literature, where PPT = Probabilistic
Polynomial Time.

------
signed0
TLDR: _Bloch doubts that the failure of the uniformity assumption means that
cryptographic systems in wide use today are fundamentally insecure. “My guess
is that it will show that some of them are slightly less secure than we had
hoped, but usually in the process, we’ll also figure out a way of patching
them.”_

~~~
conformal
the article title is a total troll, thx mit.

for a real world example of failed uniformity assumptions, see cryptocat.

~~~
moocowduckquack
How is "Less secure than we thought" a troll title, given that the article is
about how one of the fundamental assumptions of cryptography is less secure
than we thought? It doesn't say insecure.

~~~
pessimizer
Maybe he's looking for "Possibly Encryption Might be Somewhat Less Secure Than
We Personally May or May Not Have Thought At One Time, Technically."

~~~
moocowduckquack
Hmm, seems a bit less snappy, somehow.

------
api
I wonder about the practical effect of this in the Real World(tm).

When cryptographers talk about a "break," they mean anything faster than brute
force search. So a break of a 128-bit algorithm that allows key recovery in
2^115 is a "break," but still completely impractical.

But as Bruce Schnier says: "attacks only get better." So a break is a good
reason to look for a fix or a better algorithm _now_ before the crack widens.

~~~
driverdan
This is my question too. How much less secure does it make encryption? Enough
to be breakable or not enough to matter?

------
Anderkent
They quote three papers that supposedly make the mistake of using Shannon
entropy to reason about secrets. They all seem to relate to the same
algorithm, so it sounds to me (and our local crypto guy) more likely that
there's one particular field that is making this mistake, but it's not a
common mistake in crypto.

In my understanding Shannon entropy is not very useful for crypto reasoning
for more trivial reasons. A password that has a half chance of being "foo",
and half chance of being completely random has good Shannon entropy, but it's
clearly not a good password.

------
JonnieCache
Paper: [http://arxiv.org/abs/1301.6356](http://arxiv.org/abs/1301.6356)

~~~
mjn
Thanks, that actually looks interesting.

The MIT press office has a habit of writing press releases that go beyond the
usual hype to be so over-the-top they actually make the work sound crackpot.
Happens most often in AI, but I guess in crypto now too. Not the scientists'
fault, so I usually try to read the original paper to see their own claims,
and invariably they are much more specific and legitimate.

~~~
epistasis
I think it's not just MIT's press office, but every single one that I've seen.
Or maybe there's a selection effect; the more sensational the particular
article, the more people see it, so it just seems that all press offices are
off their rocker.

~~~
JonnieCache
[http://www.phdcomics.com/comics/archive/phd051809s.gif](http://www.phdcomics.com/comics/archive/phd051809s.gif)

~~~
andreif
This would fit better as a comment for the original submission

------
tveita
> We establish that for source words originally constructed from an i.i.d.
> sequence of letters, as a function of word length it is exponentially easier
> to guess a word conditioned to be in the source’s typical set in comparison
> to the corresponding equipartition approximation.

Maybe I'm missing something, but this seems like a known and fairly trivial
result.

I read that as basically saying that if you don't pick the letters in your
passwords from an uniform distribution, they are easier to guess.

I don't see what that adds when people are already using Markov chains and
dictionary attacks to brute force passwords, and I see no implication at all
for secure keys chosen by good PRNGs.

I guess it could have some use in entropy estimation for PRNG state, but
assuming uniform distribution of input seems like a naive mistake to make. Is
there some application of this that I'm missing?

------
Tloewald
If your input is "perfectly" random then the codebreaker has no chance of
determining when he/she has successfully decrypted it (regardless of how weak
your encryption is). So the more random your unencrypted data, the harder it
is to decrypt it once it is encrypted. It follows that less random data will
be easier to decrypt. Now I can't prove the shape of the intermediate function
(e.g. might there be a sweet spot in the middle? "Common sense" suggests the
relationship would be monotonic, but common sense is often wrong).

Incidentally, it's also clear that if the encryption schema is large relative
to the compressed data then it will also be harder to decrypt (of course if
the encryption schema is public, e.g. a public key, then this is decidedly not
true).

So, none of this should be terribly surprising. What it does suggest is that
you should _compress_ data before (or while) encrypting it, and the better
your compression algorithm, the more entropy there will be in your data, and
the more secure your encryption.

~~~
nialo
This is the obvious intuitive response, but it turns out that it's broken:
Compressing data before encryption can give attackers a new side channel to
recover the plaintext, by just watching the message length, and seeing when
the compression is more or less effective. See, for example, the CRIME attack
on TLS.

~~~
p0ckets
Obvious next step: compress and pad to original length.

~~~
rmidthun
Obvious? Didn't that word already cause enough trouble?

What are you going to pad it with? Random values? In that case, I'll just make
two requests for the same information and clip it where they differ. Some sort
of value based on the message? That also has issues, see link below. So far it
doesn't seem like there are any obvious crypto ideas that are not also wrong.

[http://en.wikipedia.org/wiki/Padding_oracle_attack](http://en.wikipedia.org/wiki/Padding_oracle_attack)

~~~
nknighthb
> _I 'll just make two requests for the same information and clip it where
> they differ._

So, byte 0? You're not really using ECB, are you?

~~~
rmidthun
D'oh. Good point.

At least I can point to it as another example of someone not expert in the
field making a stupid statement about crypto...

I think the general problem of padding with random values resulting in non-
determinism would still matter in some cases, but IANAC.

------
stcredzero
From what I've seen, watching security and cryptography from the sidelines for
the past 18 years, is that we programmers are still in the myopia of the early
days of technology.

It took decades for programmers to come to the awareness of the importance of
human factors and user interfaces to reach today's levels. (Really, how useful
is computing, if only a very few want to use it?) Yet we're really just
beginning to truly get it as a field. (Or as a "not-quite-a-field," to give a
nod to Alan Kay.)

It's probably going to take just as long for us to figure out how to account
for human factors and economics when it comes to security. Yet, if you are
really savvy about security, it should leap out at you that _human factors and
economics_ are just about the biggest two determining factors in security.

The big problem is not what algorithm you use to encrypt A or B. It's the
employee's and programmer's lack of knowledge about security and key
management. It's the secretary who clicks on the phishing link. It's
management's misconception that a dozen of BigCo's employees with insufficient
peer review can take on the reverse engineering resources of the entire
Internet.

The big problems in security have to do with organizational awareness and
economics. If you're not accounting for those, you are misallocating your
resources, probably by an order of magnitude.

------
Tichy
The things the mention sound to me just like the standard things cryptographer
worry about on a daily basis. I am not sure there is anything new there.

------
BetaCygni
Isn't this just a variant of the known plaintext attack?

~~~
VLM
Maybe a good analogy is if you take a "giant" rainbow table, you can usually
compress the unencrypted side of the table to something quite a bit smaller
than you'd think.

Maybe another way to phrase it is something like encrypting plain text english
is now mathematically proven to be not as secure as encrypting a zip (or other
format) compressed version of the same english text. (edited to add that its
early in the morning and the specific example of a zip file is a great known
plaintext attack 0x04034b50 and all that, but "compressing in general" is a
good idea. The mistake I made was like calling all photocopiers, "Xerox
machines", or all nose tissues "Kleenex").

Crypto is very well taught for algorithms, classes of algos, history, couple
other things, but the classes of apps of crypto technology is (intentionally?)
very poorly documented and taught. Something that screws up screws up digital
sigs probably has little to do with the issues in encryption or authentication
or whatever. Its an area ripe for someone to write a book, or a startup to do
"something" probably something educational.

~~~
api
But be careful when compressing. Never compress something secret together with
something that an attacker can repeatedly influence, or you're possibly
vulnerable to compressed length oracle attacks similar to CRIME.

~~~
rst
BREACH is a variation on the theme (compression at a different protocol layer,
but again pre-encryption). We're going to see more of these.

------
Shorel
This is just theory getting up to speed with current practice.

Password breakers (when having the hash result) do not assume that the
passwords are random strings, they use several assumptions that greatly reduce
the time they need to break the passwords.

------
tehwalrus
This is an interesting and important result. I was fascinated by the Shannon
entropy when I first heard of it in undergrad, and I'm now very tempted to go
read all the other entropys people came up with too :)

------
Andome
Well, the guessing probability is the logarithm of the min-entropy, not the
Shannon entropy. As for Shannon entropy, it's impossible to get SE with our
current code generation because there will always be a small difference from
theoretical limit. When the length of our codes get bigger the Shannon
entropy/uniformly distribution model breaks down and the Guesswork time isn't
as long as we once thought.

------
MichaelMoser123
You can send a key for a pseudo random number generator in the start of the
message, use the key to seed it, the then use the output of the pseudo random
number generator and xor it with each byte of the message. that would be some
kind of 'salt' that would make the message more redundant.

~~~
MichaelMoser123
I meant sending the seed key the initial part of the encrypted message; all
further data would be retrieved after decryption by XORing the PRNG output.

------
Retric
When they say the same number of zeros and ones at maximum enthropy that's not
actually true. The obvious case being a single bit file or even just an odd
number of bits. Instead there is no bias in the file so each bit could just as
likely be 1 or 0 which prevents efective compression.

~~~
sp332
You need an equal number of 0's and 1's, and you can use compressibility to
see why. If there are more 1's than 0's, your compression tool could start by
assuming that all the bits are 1's, and only storing information about where
the 0's are. To minimize compressibility, you have to make sure there's no
bias toward certain patterns, even patterns of length 1.

~~~
dlubarov
Let's say the string contains 100 0s and 101 1s. How would you encode the
positions of the 0s without using more than 201 bits in total?

Nearly all random strings of sufficient length have different numbers of 0s
and 1s. So if your argument were true, your compression scheme would reduce
the average length of completely random strings, which is impossible.

~~~
Retric
Use 200 bit's to encode the message except the last bit which is determined if
the message had 100 1's or 101 1's. QED knowing the number of 1's buy's you at
least one bit of information.

Edit: You can do better than this by counting the number of 201 bit messages
with 100 0's which is well below 2^201 and then encoding which of those
corresponds to the original message. Aka if you had an 8 bit message with one
zero 11111101 you have a total of 8 options which means you can encode it as 3
bits.

------
may
I'd be curious on cperciva's take on this.

------
plg
I bet non-compliance with best practices accounts for 10x the amount of "less
secure" than anything else. (Like in birth control)

