I love zero knowledge proofs. My professor in college explained the idea in what I thought was an especially clear way:
Say that you and a friend are reading a Where's Waldo book, and you want to prove that you have found Waldo, but you don't want to tell your friend where Waldo is. This seems impossible. However, you could take a large piece of cardboard, cut out a Waldo-shaped hole, and place it over the book. Now you have proven that you can find Waldo.
"But wait!" cries your friend. "How do I know the book is even under there? Let me see." But you can't do that, since he knows roughly the spot Waldo would be if you lifted the cardboard.
So, you get a second piece of cardboard and put it on top of the first. Now, you play a game. You ask your friend whether you should lift one piece of cardboard (to verify the picture of Waldo) or two pieces (to verify that the book is beneath the second piece). You can play this game as many times as required for your partner to gain a reasonable confidence that you're not cheating.
There's a little bit more to it than that. Someone watching that transaction between you and your friend doesn't learn anything either, since your friend could be colluding with you and placing a printed out piece of paper with waldo on it when your friend would say "show me you know where waldo is" and the book when he would say "show me there's a book under there"
Correct me if I'm wrong but that example doesn't seem quite right. If you lift one piece of cardboard the person now sees roughly where Waldo is, right? Sure the cardboard could be HUGE to obscure the size and position of the book but then asking to lift both pieces will show you the position and size of the book making it immediately obvious where Waldo is.
Perhaps a good intuition pump is this: after every guess the person claiming to have found Waldo gets to position three things: the book, the waldo mask, and the covering cardboard.
This then means the subsequent trials don't reveal any side channels about Waldo's location in the book. (And of course in practice the "pieces of cardboard" are big enough to obscure all book information.)
1) It doesn't have to be THAT big. As long as the cardboard has twice the dimensions of the book, having the cutout be in the center ensures that you could reveal a Waldo anywhere on the page without letting the other person see any bit of the book to try to reason which section the book is under.
2) They only get to choose one piece of cardboard to remove per iteration of the game - they don't get to lift one piece and then immediately the other, they only lift one. That means that the person placing the cardboard can't cheat (because they don't know which piece they are going to lift, the one revealing the whole book or the one revealing the waldo), and the other person can't gain any information from the reveal. You can repeat the process many times (moving BOTH pieces of cardboard each time to ensure no gained information), but you can only do one cardboard lift per cycle.
No, that's wrong. The book always has to be under there, otherwise you aren't proving anything (or at least, the odds are worse).
The idea is that your counterparty cannot ever ask for both the book under there and to see Waldo, so they never know where the book is. You can spin the book around under the cardboard to put it in arbitrary locations, therefore just seeing the Waldo doesn't help.
Hmmm, so "knowledge" in this case means having access to the intersection between two sets of information that allows you to hone into the solution? That's an interesting way to think about knowledge (and confidence intervals).
I think that is an acceptable understanding. In a more formal setting, we talk about witnesses [1]. In effect, there is some piece of "knowledge" that will convince someone that you can solve something. With Waldo [2], that would be the actual location (x-y coordinates) of Waldo. If I want to convince you that I know where Waldo is, I can just give you the X-Y coordinates of his location. This has the unfortunate side effect of giving _you_ the ability to convince people you know this knowledge (you now know where Waldo is) - we have revealed the witness to you. In a general sense, a zero knowledge proof is a class of protocols where you convince another party that you know the witness of a problem without having to reveal it to them (they receive zero knowledge about the witness).
This has always been my favorite "intuitive" explanation of zero-knowledge proofs, but I never even knew the second part about two cardboards, which makes this even better!
I find this about basically every Wikipedia article on a math topic. They are very precise - but basically completely useless for learning anything about the topic.
I once edited a Wikipedia article on an Einstein paper to make it much more clear. It was subsequently reverted. I understand the need for brevity, but the domain experts at Wikipedia revert many seemingly irrelevant edits.
The edit I made was to explain how Einstein made the leap to E=mc^2 by including the step that showed how E=mv^2 for particles of a given velocity. Of course this is well known to every engineer and physicist, but the article on the proof was so hard to follow without it.
Note that this explanation is not fair. If Bob stops the protocol when he gets the answer (equal or not) then Alice doesn't learn anything.
In cryptography there are ways to make such a protocol "fair". Basically both learn "bit by bit" the answer, if Bob stops early, he only gets one bit of information more than Alice and if he can bruteforce the rest, so can Alice (except if Bob is the NSA).
Your reasoning behind downvoting is totally unfounded. Collisions may not be the primary reason for not using a hash function, but to discount the fact that it results in an unreliable equality operator is just ignorant.
This fact applies to any hash function even those you don't consider broken.
A hash function with a fixed output length and arbitrary input length necessarily implies that collisions exist. So in a very strict sense, you are correct that equality of hash output doesn't prove equality of input. But that's much too limiting for the real world.
As optimiz3 already pointed out, a critical property of crypto hash functions is extreme difficulty in finding collisions, let alone meaningful ones. So if a hash function passes muster cryptographically, then we can use it to practically assert equality of inputs.
In other words, it's so improbable that we treat it as if it's impossible. When that assumption is not safe to make anymore, that's when we upgrade to a new hash function.
I very strongly disagree with the premise of your thinking. The reason hash collisions do not matter in practice is due to the fact that cryptographically strong hash functions have preimage resistance, making it robust to attacks.
However it is an extremely reasonable point that a small probability of error in determining equality is much more of a concern. Imagine if your '==' operator only worked 99 times out of a hundred. Your emphasis on practicality is unfounded in this context.
You are dramatically undervaluing collision resistance.
> Imagine if your '==' operator only worked 99 times out of a hundred.
Imagine if the probability of your '==' operator failing was 2^(-128). I would be fine with that. And that's the same "gamble" made by any protocol that relies on the collision resistance of SHA-256. You would have to try 1 quadrillion per second for 10 quadrillion years before having a 50% chance of hitting a collision.
Your reasoning is incorrect due to the fact that the probability is simply for the average case.
Just because the chance of it happening is small, does not mean you will have to try 1 quadrillion per second for 10 quadrillion years before a collision occurs.
Granted collisions may not occur frequently, but a collision could occur at any time.
FYI, procedures like these would not pass the FAA code regulations for airlines.
Let's remember the context of this discussion. We are talking about hash functions in cryptographic protocols. No one is proposing that you use a hash function to test the equality of two values when you're not constrained to the parameters of some cryptographic problem. E.g. when the two values you want to check are known.
> "Granted collisions may not occur frequently, but a collision could occur at any time."
Your reasoning sounds something like this: "The risk of a collision is greater than zero, therefore you should worry about collisions."
Which is like saying: "The risk of being hit by a meteorite is greater than zero, therefore you should worry about meteorites."
The problem with such reasoning is that it ignores probability.
> "Just because the chance of it happening is small, does not mean you will have to try 1 quadrillion per second for 10 quadrillion years before a collision occurs."
You're right! It means that you would have to try 1 quadrillion per second for 10 quadrillion years before having a 50% chance of hitting a collision.
If you don't like that 50% number, then let's use 10^(-15); a probability of 1 in 1 quadrillion. According to the birthday problem [0], and using the same collision resistance (2^128) and hash rate (1 quadrillion per second) from my previous example, you would have to work for 475 million years before having a 10^(-15) probability of hitting a collision.
Accidental collisions simply are not a realistic problem to worry about. I don't believe that any such collisions are known to have ever occurred in the wild with modern crypto hash functions.
Denying collision resistance renders hash functions almost useless, even for protocols where pre-image resistance is the key property. Think about that. Even if you were confident that a hash input wasn't crafted to conduct a pre-image attack, but you were concerned about collisions, then you should mistrust the hash result simply because it may have been an accidental collision.
I'm sorry you disagree with my reason for downvoting your initial comment. But I did so (or tried to) because (1) I feel it was not a correct answer to the question, and (2) it contained misinformation by implying that the likelihood of collisions is much greater than it actually is.
Collisions are when two different inputs hash to the same value, which breaks the security of the hash function.
If a hash function is cryptographically secure, it has the property where finding collisions is infeasable, which makes it suitable for evaluating equality.
This is why cryptographic hash functions are used as a proxy for passwords in order to avoid storing the plaintext.
"Collisions are when two different inputs hash to the same value, which breaks the security of the hash function"
This is so wrong. Cryptographically secure hash functions merely claim that the probability of a collision is low. By your definition, it is impossible to find a secure hash function that works on an arbitrary input size.
"If a hash function is cryptographically secure, it has the property where finding collisions is infeasable, which makes it suitable for evaluating equality." This is so wrong.
It has a high probability of evaluating equality but in no way is it suitable. The primary benefit that cryptographic hash functions offer is that it is impractical to conduct a chosen plaintext attack.
"This is why cryptographic hash functions are used as a proxy for passwords in order to avoid storing the plaintext."
This is true, and is due to preimage resistance and the infeasibility of a chosen plaintext attack.
See Enigma [1] and homomorphic encryption for a related concept:
The key new utility Enigma brings to the table is the ability to run computations on data, without
having access to the raw data itself. For example, a group of people can provide access to their
salary, and together compute the average wage of the group. Each participant learns their relative
position in the group, but learns nothing about other members’ salaries. It should be made clear
that this is only a motivating example. In practice, any program can be securely evaluated while
maintaining the inputs a secret.
Note that this explanation is not fair. If Bob stops the protocol when he gets the answer (equal or not) then Alice doesn't learn anything.
In cryptography there are ways to make such a protocol "fair". Basically both learn "bit by bit" the answer, if Bob stops early, he only gets one bit of information more than Alice and if he can bruteforce the rest, so can Alice (except if Bob is the NSA).
no need if the protocol is "fair". He just gave a simplified example on how to solve the millionaire problem. There are many ways to solve it, some are unfair, some are fair.
For example you can use fully homomorphic encryption to trivially solve this problem.
Logjam works like rainbow tables, but for prime fields. Supposing that a way to reverse the function (a hash in rainbow table case, discrete logarithm in prime fields) in time X for a single case, you can instead pre-compute something in time P, which allows computing a specific instance in time Y. P + Y is longer or equal to X, and Y is significantly less than X. To compute n inversions takes nX for the first case, and P + nY for the second. However, the single instance case X still needs to be solvable. For a large enough prime field, X is still incredibly difficult, so P + Y is going to be incredibly difficult as well.
It's much more significant that Logjam changes the field to something weak, than all of the servers use the same prime field.
DarkNetMarket. I apologize for the late reply. I read HN frequently, rarely log in, and have yet to figure out how I can tell if someone has sub-commented me.
> Alice and Bob have secret values x and y, respectively. Alice and Bob wish to learn if x = y without allowing either party to learn anything else about the other's secret value.
This is better put as: without allowing either party, in the event that the equality is false, to learn even so much as whether x < y or x > y.
Of course if the equality is true, each party knows everything about the other's value.
Say that you and a friend are reading a Where's Waldo book, and you want to prove that you have found Waldo, but you don't want to tell your friend where Waldo is. This seems impossible. However, you could take a large piece of cardboard, cut out a Waldo-shaped hole, and place it over the book. Now you have proven that you can find Waldo.
"But wait!" cries your friend. "How do I know the book is even under there? Let me see." But you can't do that, since he knows roughly the spot Waldo would be if you lifted the cardboard.
So, you get a second piece of cardboard and put it on top of the first. Now, you play a game. You ask your friend whether you should lift one piece of cardboard (to verify the picture of Waldo) or two pieces (to verify that the book is beneath the second piece). You can play this game as many times as required for your partner to gain a reasonable confidence that you're not cheating.
And that's a zero knowledge proof.