
Why should hard be secure enough? Information and non-invertibility - diogomonicapt
https://diogomonica.com/2017/02/03/why-should-hard-be-secure-enough-information-and-non-invertibility-2/
======
klodolph
The concepts are a bit sloppy here. "Zero information" is a bit of hand waving
which relies on the idea of having a uniform distribution on a set with
infinite measure, which is not mathematically sound nor is it useful as an
approximation.

All messages that you choose to transmit follow some probability distribution
with each message having finite, nonzero probability. This is what the author
gets wrong, on a very fundamental level. If this were not true, your messages
would require an infinite number of bits to transmit.

Studies of the usefulness of hash functions are based on comparisons to an
"ideal" hash function called a random oracle. The idea behind a random oracle
is that you can't reverse it, but you can try to remember what outputs it
gives for known inputs.

This "perfect hash function" has been extensively studied and still suffers
from all of the flaws that the author mentions in this article. And yet people
prove rather strong properties of cryptography systems under the random oracle
assumption. So the author's criticisms, fortunately, have no bearing or
relationship to real cryptography, because the good systems are already
designed to be immune to such attacks.

I'm on mobile so I can't really give a play by play but the article is really
based on nothing more than bad math.

~~~
danbruc
I think the article mixes up two things, namely the hash function business and
uninformative priors.

Hash functions. In case of a good hash function the information about the
actual distribution of the inputs is erased. That the hashes are uniformly
distributed does not imply that the inputs are uniformly distributed which in
turn renders any conclusion about the distribution of the inputs after
applying a transformation invalid.

Uninformative priors. This seems to be more of a philosophical problem I don't
really know much about. But as far as I can tell if one tries to quantify a
lack of information in a naive way using probability distributions, then one
gains information, for example that the value is uniformly distributed over
some range, and this information has consequences like specific distributions
of derived values.

So the attempt to quantify a lack of information turns into a self-defeating
endeavor, not necessarily because of any inherent information but because of
the information injected during the modeling process.

~~~
klodolph
That is not a very precise way of describing the uniformity of hash functions,
and it's not actually true. For example, if my input has some element with 50%
probability, then the hashed output is going to have some element with at
least 50% probability.

But this is well-known, and because it's well-known, no sane cryptographic
system cares that hash outputs leak information that way. For example, look at
PBKDF2, HMAC, or various asymmetric key authentication schemes.

~~~
danbruc
You are right, I did not correctly word what I wanted to say. I was not
thinking about repeated messages, I was only thinking about subsets of all
possible messages. Say you are hashing HTML documents, then all your inputs
will be in the subspace having <!DOCTYPE html><html as first characters but
that information will be erased by the hash function.

------
Animats
From the title, I thought that this was going to be about the lack of lower
bounds in cryptography. Factoring is generally thought to be hard, because
many smart people have spent many years trying to make it easier. But there's
no proof of that. Maybe somebody will make a breakthrough. In the early days
of asymmetric crypto, the knapsack function was believed to be hard. Then a
fast way to invert it was found.

As far as I know, there's no publicly known one-way cryptographic function for
which there's a provable minimum level of effort to invert.

On the hash function front, where the author seems concerned about something,
perhaps the lesson is that filling up hash tables to the 90% level before
expansion is pushing too hard.[1] At 70% fill, the hash function doesn't have
to be near-perfect, just not awful.

[1]
[http://accidentallyquadratic.tumblr.com/post/153545455987/ru...](http://accidentallyquadratic.tumblr.com/post/153545455987/rust-
hash-iteration-reinsertion)

------
Twisol
If we have _any_ probability distribution, that tells us something about x. If
someone tells us that license plates numbers are uniformly distributed, we can
pretend to be a license plate maker by sampling from a uniform distribution,
and nobody else could tell the difference by looking at the license plate
numbers we make.

Zero information is more like not knowing what probability distribution a
variable comes from. Rather than hiding "what value does x take", we can hide
"what probability distribution does x come from". That's one level up. (Then
you could ask what the likelihood of x having a given probability distribution
is, and so on -- zero information is having _none_ of this information, all
the way up.)

~~~
klodolph
Uniform distribution gives minimum information, assuming the range is known.
Any other distribution will give more information. Talking about information
capacity of coded messages, here.

I think what you are saying when you use the phrase "zero information" is
really "zero knowledge".

~~~
Twisol
"Minimum information" is much more accurate than "zero information". A
necessary condition for a good cryptographic hash function F is that F(X) is a
uniform distribution if X is a uniform distribution. This is a property of F,
not of X, which is where the article is a bit muddy.

The article seems to want to say that "zero information" ought to be a
probability distribution such that for all functions f, f(X) is the same zero-
information distribution, i.e. from nothing we get nothing. The point is that
no such X exists, because every probability distribution encodes _some_ amount
of information. What we _want_ is some function F such that for every X in a
large class of probability distributions, F(X) is uniform. Which is exactly
why x^2 is a terrible hash function.

------
lisper
This article takes a very long time to make an elementary point: if x is a
uniformly distributed random variable, f(x) is not a uniformly distributed
random variable if f is non-linear.

~~~
doesnotexist
Which is visualized nicely in these 20 second videos.

Non-linear transform of a random variable:

[https://www.youtube.com/watch?v=hQjk4ClpuUk](https://www.youtube.com/watch?v=hQjk4ClpuUk)

Linear transform of a random variable:

[https://www.youtube.com/watch?v=cKo6-DnIxCg](https://www.youtube.com/watch?v=cKo6-DnIxCg)

------
geofft
A good hash function doesn't, and cannot, ensure that you have no information
about the input. It can't take information away from you.

A good hash function ensures that, _whatever_ information you have about the
input, you have _no more_ information upon seeing the hash. If you believe x
is equiprobable over some range, seeing H(x) should cause you to continue to
believe it's equiprobable over that range. If you believe x is the square of
some random variable y which is equiprobable in some range, seeing H(x) won't
convince you that x is equiprobable!

If your prior for x is equiprobable over the space of Microsoft Word documents
confessing to high treason and 0 probability elsewhere, seeing H(x) won't tell
you _which_ Word document it is, but it certainly won't cause you to believe
that it might be an MP3, either.

------
atemerev
The author considers speed of light being the absolute, irrevocable certainty,
but how does he know that?

Does he know _why_ c is constant? Had he mastered general relativity enough to
be 100% sure that it can't vary in time or space? Or that our current
understanding of physics is total and there absolutely can't be something that
we don't know about speed of light?

[https://en.m.wikipedia.org/wiki/Time-
variation_of_fundamenta...](https://en.m.wikipedia.org/wiki/Time-
variation_of_fundamental_constants)
[https://en.m.wikipedia.org/wiki/Variable_speed_of_light](https://en.m.wikipedia.org/wiki/Variable_speed_of_light)

~~~
boomboomsubban
They know it because it's the best theory we currently have. Things aren't
100% sure in physics, there are no laws. Just the best available theory which
you should treat as fact while knowing it may not be.

~~~
atemerev
...which works exactly as well for hash functions.

------
amlozano
"...if we have zero information concerning the value of x, than we are also
clueless in what concerns the value of, say, x^2"

But that is not true, we have -some- information concerning the value of x,
namely that the value of x is equiprobable.

------
kutkloon7
There are some technical issues with the article, but the point the author
makes (hashes are imperfect since they carry some information, but it is
really hard or even impossible to do something better) still stands.

------
kutkloon7
While I like the style of writing very much, I don't like the pseudo-
scientific take on the 'problem'.

In any probability class, you will learn that you can't have an uniform
distribution on an infinite set.

~~~
obastani
Nitpicking, but you can have a uniform distribution on the unit interval,
which is infinite :). What you cannot have is a uniform distribution on an
unbounded set.

Edit: As pointed out, the correct assumption is not unboundedness, but having
infinite measure (since then no normalization constant exists). I thought they
were equivalent, but a simple counterexample is [0,1] \cup Z.

~~~
klodolph
Mathematician here. You can't have a uniform distribution on a set with
infinite measure. Unbounded sets are fine as long as they have finite measure.

~~~
inimino
What is an unbounded set with finite measure?

~~~
klodolph
Other answers failed to give examples of a set with positive finite measure,
which is what you really need. What we want is a set that has no upper bound,
but isn't really weird or tricky—if it's too tricky, then the uniform
continuous distribution doesn't make any sense—and an easy way to do this is
take a union of a countable set of intervals.

For example, consider a set that contains (0,1/2), (1,1+1/2), (2,2+1/4), ...
etc, so interval i has measure 2^-i. It should be obvious that there is no
largest element, and if you sum the measures of all the sets you will find
that the measure is 1. A big chunk of "doing mathematics" is having a library
of techniques on tap to come up with weird objects like this to attack
assumptions.

~~~
inimino
Thanks. That seems obvious now that you point it out. Just like Zeno's
paradoxes, which are also about stretching a finite measure over a countable
infinity.

------
dmd
I don't understand why the x^2 thing is interesting at all.

If I multiply x by 0, the distribution is now just 0.

If I take abs(x) now it is positive.

Are these confusing to anyone?

~~~
StavrosK
It's confusing to me why the author is confused by these. As you say, he's
literally saying "if I know nothing about x, then I should also know nothing
about 0x". Wtf? No.

~~~
geofft
I think the confusion is between the idea that a hash function conveys zero
information about the input, and the idea that you will have zero information
about the input once seeing the output of the hash.

But if we move it out of math, it's easier to understand: if you have some
guesses about x, and I tell you nothing, you have the same guesses about x.
You don't _stop_ having guesses.

------
caconym_
One random (heh) thing that occurs to me reading this article is that if you
are using hashing for security (e.g. password hashing, key derivation, ...)
then what the author is calling "invertibility", i.e. there are few values in
the input space that map to some value in the output space, is actually a good
thing. All other things being equal, the more such input values there are, the
more feasible collision/preimage attacks become. It seems clear that for these
purposes, the information content of the output should be at least equal to
that of the input.

------
zaptheimpaler
I think information can not be defined based on the source material alone. Why
don't we define information like this?

An observer X has a prior estimate of the distribution d1 of variable x.

The source material yields a distribution d2 of variable x.

Then for that observer, the information gained from the source is something
like |d1-d2|.

~~~
Twisol
Yes, you can think of it like an information game where your function is a
black box that you can feed values and get back values. If you take lots of
samples from a uniform distribution and feed it to the black box, you can
estimate how the function maps one distribution to the other, even though you
might not know where all the individual values in the domain go. A good hash
function produces a uniform distribution of some variety no matter what
distribution you feed it. Bad hash functions deviate from uniform in some way
correlating with the input distribution, i.e. they leak information.

The ideal hash function leaks no information. In practice, we can only make
one leak very little over a long time.

------
drallison
diogomonicapt writes "Light does not travel at the speed of light (no pun
intended) because it is hard for it to do otherwise. No. It travels (and does
so at that particular speed), because it is impossible for light to do
otherwise."

Actually,the speed of light in a vacuum is an upper bound. The actual speed of
light can be substantially slower.
[https://en.wikipedia.org/wiki/Slow_light](https://en.wikipedia.org/wiki/Slow_light)

Some cosmological theories do not require the speed of light to be constant
and fixed, and replace that axiom with requirements on the relationship of the
speed of light with other physical parameters.

~~~
eridius
He didn't say "the speed of light in a vacuum", he just said "the speed of
light". And in the next sentence he said

> _… it is simply not possible for light to be still, or even propagate at a
> different speed in the same medium._

So he's already acknowledged the fact that the speed of light depends on the
medium. His point, though, is that light must still travel at this speed (even
though the speed itself depends on the medium).

~~~
danbruc
The speed of light in a medium is still the speed of light in vacuum if you
look carefully enough. The apparent slowdown is just an emergent effect due to
the interaction of photons and the particles making up the medium.

