Socialist millionaire protocol

BrandonY · on Nov 9, 2015

I love zero knowledge proofs. My professor in college explained the idea in what I thought was an especially clear way:

Say that you and a friend are reading a Where's Waldo book, and you want to prove that you have found Waldo, but you don't want to tell your friend where Waldo is. This seems impossible. However, you could take a large piece of cardboard, cut out a Waldo-shaped hole, and place it over the book. Now you have proven that you can find Waldo.

"But wait!" cries your friend. "How do I know the book is even under there? Let me see." But you can't do that, since he knows roughly the spot Waldo would be if you lifted the cardboard.

So, you get a second piece of cardboard and put it on top of the first. Now, you play a game. You ask your friend whether you should lift one piece of cardboard (to verify the picture of Waldo) or two pieces (to verify that the book is beneath the second piece). You can play this game as many times as required for your partner to gain a reasonable confidence that you're not cheating.

And that's a zero knowledge proof.

fryguy · on Nov 9, 2015

There's a little bit more to it than that. Someone watching that transaction between you and your friend doesn't learn anything either, since your friend could be colluding with you and placing a printed out piece of paper with waldo on it when your friend would say "show me you know where waldo is" and the book when he would say "show me there's a book under there"

BinaryIdiot · on Nov 9, 2015

Correct me if I'm wrong but that example doesn't seem quite right. If you lift one piece of cardboard the person now sees roughly where Waldo is, right? Sure the cardboard could be HUGE to obscure the size and position of the book but then asking to lift both pieces will show you the position and size of the book making it immediately obvious where Waldo is.

gbhn · on Nov 9, 2015

Perhaps a good intuition pump is this: after every guess the person claiming to have found Waldo gets to position three things: the book, the waldo mask, and the covering cardboard.

This then means the subsequent trials don't reveal any side channels about Waldo's location in the book. (And of course in practice the "pieces of cardboard" are big enough to obscure all book information.)

cortesoft · on Nov 10, 2015

Couple of things:

1) It doesn't have to be THAT big. As long as the cardboard has twice the dimensions of the book, having the cutout be in the center ensures that you could reveal a Waldo anywhere on the page without letting the other person see any bit of the book to try to reason which section the book is under.

2) They only get to choose one piece of cardboard to remove per iteration of the game - they don't get to lift one piece and then immediately the other, they only lift one. That means that the person placing the cardboard can't cheat (because they don't know which piece they are going to lift, the one revealing the whole book or the one revealing the waldo), and the other person can't gain any information from the reveal. You can repeat the process many times (moving BOTH pieces of cardboard each time to ensure no gained information), but you can only do one cardboard lift per cycle.

skj · on Nov 9, 2015

The idea is that some the time, it's not the book, but just a random small picture of Waldo.

Since you don't know if it's the book under there or not, you don't know if this example is a "control".

The proportion of times that it is the book has to be the same as the odds that you could randomly guess where Waldo is, I think.

edanm · on Nov 10, 2015

No, that's wrong. The book always has to be under there, otherwise you aren't proving anything (or at least, the odds are worse).

The idea is that your counterparty cannot ever ask for both the book under there and to see Waldo, so they never know where the book is. You can spin the book around under the cardboard to put it in arbitrary locations, therefore just seeing the Waldo doesn't help.

caf · on Nov 9, 2015

The book gets to be repositioned somewhere else under the huge piece of cardboard between each trial.

BinaryIdiot · on Nov 10, 2015

Wasn't sure who to reply to but thanks for all the clarifications!

manoDev · on Nov 9, 2015

Hmmm, so "knowledge" in this case means having access to the intersection between two sets of information that allows you to hone into the solution? That's an interesting way to think about knowledge (and confidence intervals).

huntaub · on Nov 9, 2015

I think that is an acceptable understanding. In a more formal setting, we talk about witnesses [1]. In effect, there is some piece of "knowledge" that will convince someone that you can solve something. With Waldo [2], that would be the actual location (x-y coordinates) of Waldo. If I want to convince you that I know where Waldo is, I can just give you the X-Y coordinates of his location. This has the unfortunate side effect of giving _you_ the ability to convince people you know this knowledge (you now know where Waldo is) - we have revealed the witness to you. In a general sense, a zero knowledge proof is a class of protocols where you convince another party that you know the witness of a problem without having to reveal it to them (they receive zero knowledge about the witness).

[1] https://en.wikipedia.org/wiki/Witness_(mathematics)

[2] http://www.anagram.com/jcrap/Volume_0_1/crv0n1-3.pdf

edanm · on Nov 10, 2015

Thank you!!

This has always been my favorite "intuitive" explanation of zero-knowledge proofs, but I never even knew the second part about two cardboards, which makes this even better!

icedchai · on Nov 10, 2015

This doesn't make much sense, sorry. Perhaps you could explain it better.

jxm262 · on Nov 9, 2015

Here's another simplistic explanation for this concept - http://twistedoakstudios.com/blog/Post3724_explain-it-like-i...

I've been reading alot about the cryptography (and alot of bitcoin stuff), there's alot of interesting material out there.

fredsted · on Nov 9, 2015

Thanks, the Wiki article is complete gibberish to me.

AMcQuarrie · on Nov 9, 2015

I find this about basically every Wikipedia article on a math topic. They are very precise - but basically completely useless for learning anything about the topic.

big_youth · on Nov 9, 2015

While not every wiki page has it enabled I find that the simple english really helps my understanding. For ex:

https://en.wikipedia.org/wiki/Probability_theory

https://simple.wikipedia.org/wiki/Probability_theory

https://en.wikipedia.org/wiki/Advanced_Encryption_Standard

https://simple.wikipedia.org/wiki/Advanced_Encryption_Standa...

3pt14159 · on Nov 10, 2015

I once edited a Wikipedia article on an Einstein paper to make it much more clear. It was subsequently reverted. I understand the need for brevity, but the domain experts at Wikipedia revert many seemingly irrelevant edits.

The edit I made was to explain how Einstein made the leap to E=mc^2 by including the step that showed how E=mv^2 for particles of a given velocity. Of course this is well known to every engineer and physicist, but the article on the proof was so hard to follow without it.

0xbadcafebee · on Nov 9, 2015

Sometimes there will be a Simple English translation for math articles. Here's an index of articles on Cryptography: https://simple.wikipedia.org/wiki/Category:Cryptography

baby · on Nov 10, 2015

Note that this explanation is not fair. If Bob stops the protocol when he gets the answer (equal or not) then Alice doesn't learn anything.

In cryptography there are ways to make such a protocol "fair". Basically both learn "bit by bit" the answer, if Bob stops early, he only gets one bit of information more than Alice and if he can bruteforce the rest, so can Alice (except if Bob is the NSA).

pnut · on Nov 9, 2015

[flagged]

griffinmahon · on Nov 9, 2015

How did you know what he meant then?

rix0r · on Nov 9, 2015

Can someone "explain like I'm five" why calculating a hash and comparing those isn't good enough?

Is it to protect against bruteforcing? What if the hash is made expensive to compute?

buzzdenver · on Nov 9, 2015

One of the requirements is to avoid a man-in-the-middle attack. If such an attacker sees the two hashes, then he'll know if the numbers match.

IshKebab · on Nov 9, 2015

Yeah, trivial to brute force.

baby · on Nov 10, 2015

No reason to downvote you, brute force is indeed one of the reason using hashes is not an elegant solution.

machinelearning · on Nov 9, 2015

The possibility of collisions renders this approach invalid. If the hashes match, there is a possibility that it is a false positive.

ademarre · on Nov 9, 2015

I accidentally upvoted your comment, but I meant to downvote. Sorry.

If fear of collisions is the reason you don't use a crypto hash function, then you are using a broken hash function.

A hash doesn't satisfy all the requirements of the problem, but it's not because of collisions.

machinelearning · on Nov 10, 2015

Your reasoning behind downvoting is totally unfounded. Collisions may not be the primary reason for not using a hash function, but to discount the fact that it results in an unreliable equality operator is just ignorant.

This fact applies to any hash function even those you don't consider broken.

alayne · on Nov 9, 2015

I don't understand. Aren't collisions why a hash of any kind isn't sufficient for proving equality of inputs?

ademarre · on Nov 9, 2015

A hash function with a fixed output length and arbitrary input length necessarily implies that collisions exist. So in a very strict sense, you are correct that equality of hash output doesn't prove equality of input. But that's much too limiting for the real world.

As optimiz3 already pointed out, a critical property of crypto hash functions is extreme difficulty in finding collisions, let alone meaningful ones. So if a hash function passes muster cryptographically, then we can use it to practically assert equality of inputs.

In other words, it's so improbable that we treat it as if it's impossible. When that assumption is not safe to make anymore, that's when we upgrade to a new hash function.

machinelearning · on Nov 10, 2015

I very strongly disagree with the premise of your thinking. The reason hash collisions do not matter in practice is due to the fact that cryptographically strong hash functions have preimage resistance, making it robust to attacks. However it is an extremely reasonable point that a small probability of error in determining equality is much more of a concern. Imagine if your '==' operator only worked 99 times out of a hundred. Your emphasis on practicality is unfounded in this context.

ademarre · on Nov 11, 2015

You are dramatically undervaluing collision resistance.

> Imagine if your '==' operator only worked 99 times out of a hundred.

Imagine if the probability of your '==' operator failing was 2^(-128). I would be fine with that. And that's the same "gamble" made by any protocol that relies on the collision resistance of SHA-256. You would have to try 1 quadrillion per second for 10 quadrillion years before having a 50% chance of hitting a collision.

machinelearning · on Nov 17, 2015

Your reasoning is incorrect due to the fact that the probability is simply for the average case.

Just because the chance of it happening is small, does not mean you will have to try 1 quadrillion per second for 10 quadrillion years before a collision occurs.

Granted collisions may not occur frequently, but a collision could occur at any time.

FYI, procedures like these would not pass the FAA code regulations for airlines.

ademarre · on Nov 17, 2015

Let's remember the context of this discussion. We are talking about hash functions in cryptographic protocols. No one is proposing that you use a hash function to test the equality of two values when you're not constrained to the parameters of some cryptographic problem. E.g. when the two values you want to check are known.

> "Granted collisions may not occur frequently, but a collision could occur at any time."

Your reasoning sounds something like this: "The risk of a collision is greater than zero, therefore you should worry about collisions."

Which is like saying: "The risk of being hit by a meteorite is greater than zero, therefore you should worry about meteorites."

The problem with such reasoning is that it ignores probability.

> "Just because the chance of it happening is small, does not mean you will have to try 1 quadrillion per second for 10 quadrillion years before a collision occurs."

You're right! It means that you would have to try 1 quadrillion per second for 10 quadrillion years before having a 50% chance of hitting a collision.

If you don't like that 50% number, then let's use 10^(-15); a probability of 1 in 1 quadrillion. According to the birthday problem [0], and using the same collision resistance (2^128) and hash rate (1 quadrillion per second) from my previous example, you would have to work for 475 million years before having a 10^(-15) probability of hitting a collision.

Accidental collisions simply are not a realistic problem to worry about. I don't believe that any such collisions are known to have ever occurred in the wild with modern crypto hash functions.

Denying collision resistance renders hash functions almost useless, even for protocols where pre-image resistance is the key property. Think about that. Even if you were confident that a hash input wasn't crafted to conduct a pre-image attack, but you were concerned about collisions, then you should mistrust the hash result simply because it may have been an accidental collision.

I'm sorry you disagree with my reason for downvoting your initial comment. But I did so (or tried to) because (1) I feel it was not a correct answer to the question, and (2) it contained misinformation by implying that the likelihood of collisions is much greater than it actually is.

[0] https://en.wikipedia.org/wiki/Birthday_attack#Mathematics - Look at the table. My numbers were taken directly from the 256-bit row.

optimiz3 · on Nov 9, 2015

Collisions are when two different inputs hash to the same value, which breaks the security of the hash function.

If a hash function is cryptographically secure, it has the property where finding collisions is infeasable, which makes it suitable for evaluating equality.

This is why cryptographic hash functions are used as a proxy for passwords in order to avoid storing the plaintext.

machinelearning · on Nov 10, 2015

"Collisions are when two different inputs hash to the same value, which breaks the security of the hash function" This is so wrong. Cryptographically secure hash functions merely claim that the probability of a collision is low. By your definition, it is impossible to find a secure hash function that works on an arbitrary input size.

"If a hash function is cryptographically secure, it has the property where finding collisions is infeasable, which makes it suitable for evaluating equality." This is so wrong.

It has a high probability of evaluating equality but in no way is it suitable. The primary benefit that cryptographic hash functions offer is that it is impractical to conduct a chosen plaintext attack.

"This is why cryptographic hash functions are used as a proxy for passwords in order to avoid storing the plaintext."

This is true, and is due to preimage resistance and the infeasibility of a chosen plaintext attack.

roymurdock · on Nov 9, 2015

See Enigma [1] and homomorphic encryption for a related concept:

The key new utility Enigma brings to the table is the ability to run computations on data, without having access to the raw data itself. For example, a group of people can provide access to their salary, and together compute the average wage of the group. Each participant learns their relative position in the group, but learns nothing about other members’ salaries. It should be made clear that this is only a motivating example. In practice, any program can be securely evaluated while maintaining the inputs a secret.

[1] http://enigma.media.mit.edu/enigma_full.pdf

limelight · on Nov 9, 2015

That's a rather interesting choice of name for an encryption tool.

azernik · on Nov 9, 2015

I find that my interest in algorithms goes up the more outlandish their names get.

christianmann · on Nov 9, 2015

Math works the same way. Have you heard of the Ham Sandwich Theorem[1]?

[1]: https://en.wikipedia.org/wiki/Ham_sandwich_theorem

Rexxar · on Nov 9, 2015

"Hairy ball theorem" is cool too: https://en.wikipedia.org/wiki/Hairy_ball_theorem

Tinyyy · on Nov 10, 2015

Chicken Mcnugget Theorem: http://www.artofproblemsolving.com/wiki/index.php/Chicken_Mc...

barkingcat · on Nov 9, 2015

Is this the one where both millionaires pledge to donate 100% of their wealth to ending world hunger - and in the end seeing who's donated more?

mathgeek · on Nov 9, 2015

It works for comparing any arbitrary x and y, so long as equality and inequality can be determined.

et2o · on Nov 9, 2015

I love encountering a seemingly difficult problem that in fact has an easy solution like this.

emiliobumachar · on Nov 9, 2015

Easy to implement once designed. Who knows how much effort went towards designing it.

baby · on Nov 10, 2015

Note that this explanation is not fair. If Bob stops the protocol when he gets the answer (equal or not) then Alice doesn't learn anything.

In cryptography there are ways to make such a protocol "fair". Basically both learn "bit by bit" the answer, if Bob stops early, he only gets one bit of information more than Alice and if he can bruteforce the rest, so can Alice (except if Bob is the NSA).

tomasien · on Nov 9, 2015

Is anyone familiar with how this differs from a zero knowledge proof, or is this a TYPE of ZKP?

baby · on Nov 10, 2015

ZKP is Alice proving to Bob something, without telling him what it is. Alice doesn't learn anything.

In the Millionaire problem both learn the result of the comparison

tomasien · on Nov 10, 2015

Thanks!

zkhalique · on Nov 9, 2015

Are the millionaires able to trust each other's self-reporting?

This sounds a bit like mental poker.

hissworks · on Nov 9, 2015

From the article:

"Even if one of the parties is dishonest and deviates from the protocol, that person cannot learn anything more than if x = y."

swang · on Nov 9, 2015

No. The two parties trust each other. This is to secure the fact that Alice and Bob are talking to each other and only know that x=y

baby · on Nov 10, 2015

no need if the protocol is "fair". He just gave a simplified example on how to solve the millionaire problem. There are many ways to solve it, some are unfair, some are fair.

For example you can use fully homomorphic encryption to trivially solve this problem.

rwmj · on Nov 9, 2015

Is the use of a fixed choice of prime vulnerable to the Weak DH / Logjam attack? (https://weakdh.org/)

fryguy · on Nov 9, 2015

Logjam works like rainbow tables, but for prime fields. Supposing that a way to reverse the function (a hash in rainbow table case, discrete logarithm in prime fields) in time X for a single case, you can instead pre-compute something in time P, which allows computing a specific instance in time Y. P + Y is longer or equal to X, and Y is significantly less than X. To compute n inversions takes nX for the first case, and P + nY for the second. However, the single instance case X still needs to be solvable. For a large enough prime field, X is still incredibly difficult, so P + Y is going to be incredibly difficult as well.

It's much more significant that Logjam changes the field to something weak, than all of the servers use the same prime field.

rwmurrayVT · on Nov 9, 2015

OTR messaging is commonly used with Pidgin and is considered a requirement for DNM conversations.

p4bl0 · on Nov 9, 2015

I have never seen "DNM" before. According to the urban dictionary, it means "deep and meaningful". Is that what you meant?

rwmurrayVT · on Nov 18, 2015

DarkNetMarket. I apologize for the late reply. I read HN frequently, rarely log in, and have yet to figure out how I can tell if someone has sub-commented me.

VLM · on Nov 9, 2015

wow, works just as well for dark net markets.

Prove you have a working zero day, for example.

kazinator · on Nov 9, 2015

> Alice and Bob have secret values x and y, respectively. Alice and Bob wish to learn if x = y without allowing either party to learn anything else about the other's secret value.

This is better put as: without allowing either party, in the event that the equality is false, to learn even so much as whether x < y or x > y.

Of course if the equality is true, each party knows everything about the other's value.

neogodless · on Nov 9, 2015

Well, it says "anything else" which covers exactly that. You don't need to be more explicit, because the entire goal is "learn if x = y"

kazinator · on Nov 9, 2015

If you learn that x = y, then there isn't any "anything else".