Zero Knowledge Proofs: An illustrated primer 185 points by sohkamyung on Nov 28, 2014 | hide | past | web | favorite | 30 comments

 One of the things that I've been frustrated by existing ZKP primers while educating new Bitcoin engineers on the tools in this space is that they don't form any intuition as to how the you can prove something other than some very narrow (usually seemingly toy-like) problem.Even if the problem is NP-complete, the transformation from something you'd actually like to to be able to prove to graph colouring or SAT or cycles, is not obvious.So, I described a novel ZKP for boolean circuits as a teaching tool:https://people.xiph.org/~greg/simple_verifyable_execution.tx...It's explicitly not very efficient (since I eschewed all tools from asymmetric cryptography which could have been used to make it efficient), and it may well be not very secure (it's not peer reviewed), but the people I've shown it to have found it to be pretty useful in terms of expanding their perspective and giving them some intuitive confidence that a zero-knowledge proof of the output of an arbitrary program on secret inputs is possible.Some people here may find it interesting for the same reasons.
 Can someone help clarify a point about the zero-knowledge part for me?As far as I can tell calculation is performed (which I would argue is equivalent to knowledge being created) in the process of first checking the value and then rewinding the program to a previous state. Consider the following program (pseudocode):`````` start: int value = rand(10) //take a random number between one and 10 if (value == 2) {return "success"} //here we are checking if our "answer" is correct else {goto: start} //rewinding time and trying again `````` I would argue that this program calculates the number two, and is functionally equivalent to checking a value and is identical to the scheme outlined in this article. Isn't a calculation a creation of information?If the program was changed so that the if block had (value == 2 `times` 2) I would argue that the program had calculated 2 times 2 (albeit in a stupid way) and thus information had been created.Can someone point out the flaws in my reasoning? (Obviously there are some, I'm just being too obtuse to see them at the moment)(EDITS: code syntax and unable to use asterisk as times in the comment so put `times` in instead, hope it's clearEDIT #2: just realised I said "information is created" which is obviously incorrect as information is conserved. I still think calculation is being performed. I would argue it is the same as performing a random search on the search space. so the information is provided when we check the valueEDIT #3: expanding upon the previous point, assuming we control the time (which we model as the scientist going back in time in the given example) surely in the Verifiers timeline it seems that there is no computation occuring, but in the scientists timeline (assuming the scientist must go back in time and set pick a new set of random colorings) computation is occuring and thus we break our tenet. )
 In his example Google could colour each vertex a completely different colour. That way when you revealed any random two it would always look like that had done it correctly. I suppose to get round this you have Google tell you which three colours they are using in advance.
 That wasn't spelled out perfectly in the post. The answer is that the set of crayon colors (e.g., red, blue, purple) is fixed and known to all parties. The only thing Google gets to swap is the ordering of which of the three colors it uses to color in each vertex. If I pull off a hat and find an illegal color -- e.g., hot pink, no color at all -- then I know Google is cheating.
 This is a good time to point out that the solution to GOOG rainbow color cheating isn't to reveal 4 instead of 2 and make sure only 3 colors are seen, because then GOOG wouldn't have to get paid because you'd end up with a long enough list of known pairs to not need their services, because each trial run would leak exactly one pair (assuming a trial didn't indicate GOOG cheating, obviously). This is also why GOOG has to shuffle between each and every sample.Its interesting to think about acceptable losses in zero knowledge... If the GOOG had the computing power to solve a problem 1000x bigger than you can solve, they can leak down to only maybe 10x and it'll still economically make sense for you to play fair and pay them.Another fun mental exercise is to analyze the info leakage rate and truth belief rate of allowing GOOG to falsify 1%, 1/3, 1/2 of all colors.
 Actually, it's not necessary to build a list of known pairs that you eventually link into a solution; if a challenge reveals at least three vertices you can identify each vertex unambiguously one by one. (That's because there are only three colors, so as long as you include the same two vertices v1, v2 in every challenge, you can classify every vertex into [color of v1], [color of v2], or [color 3] as soon as you see it.) A list of pairs only becomes necessary when there are more than three colors allowed.
 This is explicit in the OP:> Google can now enter enter[sic], shuffle a collection of three crayons to pick a random assignment of the three agreed-upon crayon colors (red/blue/purple, as in the example above), and color in the graph in with their solution.
 Related, from two months ago
 In the example, the vertices colors for a graph edge are requested, but how can one trust that those colors aren't being generated on-the-fly? Effectively, how can I trust that the "warehouse" isn't cheating me?
 The OP glosses over this a bit, but it's mentioned in the footnotes: the "hats" represent a commitment scheme, which is a way for the prover commit to a specific color assignment, before revealing what that assignment actually is.When the verifying party chooses the two vertices he wants to see, the prover can reveal only those two, and the verifier actually has two things to check:1) the colors revealed are actually two different colors (otherwise the purported solution obviously isn't correct)2) the colors revealed are consistent with the prover's overall commitment. (in other words, that the prover didn't change his answer "on-the-fly")
 Check outhttp://en.wikipedia.org/wiki/Shamir's_Secret_SharingEncrypt the answer and maybe some salt or if you're really brave, some known plaintext provided by the other guy (here be dragons), give the pub key for every node to everyone, split the private key using SSSS or some other strategy, and give all SSSS key-halve to the other guy and google only gives up precisely two of their halves upon request so you only get to SSSS reassemble the private keys for two nodes and thus only know two nodes.There are some other schemes involving hashing that I feel a little fuzzier about.Also look into "double spend" algos for digital cash. Go ahead, opfor, make my day, you double spend a token and you get more data but GOOG gets the transaction's swiss bank account number so the financial transaction auto-completes. That might take a little more fooling around at the protocol level to set up.
 You physically can't put a crayons under the hat on the fly.Similarly in the equivalent digital experiences, the prover commits to a value and must be able to prove that it didn't change it (for instance it must provide the verifier a hash of the chosen color + some random salt).
 > the result was something like checking which subgroup a point was in leaking log2(cofactor) bits about the secret. The normal way of avoiding this by multiplying all your values by the cofactor isn't readily available when you're doing some ugly trick of constructing points with no known discrete log from a hash.Ah, a partition attack. Could you fix this with more ugly by just iterating the hashing/munging process until your commitment was in a predetermined subgroup? I'm not at all familiar with EC subgroup theory, but the EC-SRP scheme proposed already does some deterministic munging to select the y-coordinate. Of course, even if you can do this, you're open to another timing side-channel.
 I don't see what the problem is, exactly. If you have a secure hash function taking some string into a point of elliptic curve E, multiplying said point by the cofactor is also a secure hash function into the relevant subgroup of E. See ยง6.1 of [1].
 Neat. It's also pointed out in Icarts paper[0] that the naive 'try and increment' algorithm, which I believe is utilised in Yongge Wangs draft EC-SRP proposal I linked up above, is vulnerable to timing attacks. OopsInterestingly, both Dragonfly and TLS-SRP describe a 'Hunting and Pecking' algorithm to paper over this problem. Perhaps Icarts solution would be more elegant
 Hunting-and-pecking is essentially the same algorithm as try-and-increment, where instead of incrementing an x-coordinate on failure one picks another x (pseudo-)randomly. Therefore both are susceptible to the same kind of timing attacks. Dragonfly went to some pains to try to make the process side-channel-free, but it looks ugly as sin, and it's still unclear whether it is actually secure (cf [1]).
 Thanks for that. The tone of Dan Harkins reply is fun. There's clearly a lot of ego in crypto ;) Not sure I see the point in balanced PAKE myself. If you turn weak passwords in to preshared keys then why not just use strong preshared keys to begin with and trust in the bits? Worrying about parameter binding to the verifier seems irrelevant to EC as well.
 I'm probably not the best person to argue in favor of one flavor or another of PAKE, but balanced schemes give you forward security whereas a naive preshared-key scheme does not.
 > If you have a secure hash function taking some string into a point of elliptic curve E, multiplying said point by the cofactor is also a secure hash function into the relevant subgroup of E.Fair enough. Though this requires an additional multiplication (at least it's a cheap one). I'm pretty sure some protocols have missed this.
 I don't quite follow this description:> Ultimately what we get is the following theorem. If there exists any Verifier computer program that successfully extracts information by interactively running this protocol with some Prover, then we can simply use the rewinding trick on that program to commit to a random solution, then 'trick' the Verifier by rewinding its execution whenever we can't answer its challenge correctly. The same logic holds as we gave above: if such a Verifier succeeds in extracting information after running the real protocol, then it should be able to extract the same amount of information from the simulated, rewinding-based protocol. But since there's no information going into the simulated protocol, there's no information to extract. Thus the information the Verifier can extract must always be zero.This seems to identify a way of showing that some protocol is zero-knowledge. But, as I read the paragraph, the requirements aren't strict enough.Imagine a closely-related protocol, where I get to challenge two edges at once and the four (or three, I guess) incident vertices are revealed. Obviously, I can successfully extract the full 3-coloring through repeated runs of this protocol (revealing four vertices when there are only three legal colors means I can identify which vertices are the same color as which other vertices; given a bounded number of challenges, I can get them all). Also obviously, you can use a time machine to pass any particular challenge in this protocol almost as easily as you can use one to pass the less leaky one-edge challenges.But I don't see what's making the difference, or how the "theorem" helps me distinguish the protocols. The OP identifies three properties that a zero-knowledge protocol must have: 1, if the prover has a solution, I become confident that it's real; 2, if the prover doesn't have a solution, I become confident that they don't; and 3, zero-knowledgeness.So, if I'm an honest verifier with no memory (except for previous challenge pass/fail results), the two-edge challenge protocol has the first two properties, just like the one-edge protocol: I'll quickly gain confidence that a genuine solution is real, and I'll quickly discover that a sham solution is false. If there's a time machine involved, I'll quickly become convinced that the prover has a solution even though they don't, because it's easy to fake three or four vertices being properly colored.If I'm a nefarious verifier with a memory, I can use the extra information that leaks from the two-edge protocol to recover the full solution if one is given, and I can also use it to eventually discover that a nefarious prover is using a time machine to lie to me (when my own illicit copy of the graph-coloring hits a contradiction). The two-edge protocol doesn't fail the completeness or soundness tests. So it must fail the zero-knowledgeness test. How can I tell that? Does the paragraph I quoted indicate how the two-edge protocol fails zero-knowledgeness?
 I agree, the quoted paragraph is not clear. It seems to me that the Verifier could not be tricked even using the rewinding trick if it was allowed to use its memory (internal copy of the graph-coloring).
 Well, in the protocol described in the OP, the verifier can't build an internal copy of the graph-coloring, because knowing that two adjacent vertices in the graph are two different colors doesn't help anything; that much is required of any solution. In the protocol I describe, the verifier can do that, because revealing more than two vertices simultaneously leaks information that isn't necessarily true of all possible solutions (at least, it can; if three vertices which connect in a triangle turn out to be three different colors, you haven't learned anything because any valid solution will have that property).The problem I'm having is that I agree that the one-edge protocol is zero-knowledge (and that a verifier could be tricked indefinitely by a prover with a time machine), and obviously the two-edge protocol isn't zero-knowledge (and, in probably-related-but-I'm-not-sure news), a verifier using the two-edge protocol cannot be tricked indefinitely by a time machine). I can tell that, but I tell it by looking within my soul and thinking "yes, there is no information leak when one edge is revealed, but there is a leak when two edges are revealed". That's not a reliable method; I'd like to know how to prove that information does not leak through a protocol, and I don't see that anywhere in the OP.
 Roughly speaking, the question here is: can you simulate a multi-edge protocol. That is, can you achieve the following properties using rewinding:1. The 'fake Prover' simulating the protocol has no knowledge of an actual graph coloring. 2. From the Verifier's PoV, the distribution of the simulated (rewinding-based & fake) protocol transcript is identical to that of a real protocol run where the Prover really does a solution. 3. The simulation still requires only time polynomial in the problem instance (when you include all the rewinding trips).I think if you follow this through you'll find that simulating a multi-edge protocol that 'looks like' a real transcript is highly challenging.The reason for condition (3) is to rule out an obvious hack wherein the simulation takes so long that the 'fake Prover' can simply find solve the problem (e.g., by finding a graph coloring in an exponential number of time steps).
 As I see it, if the transcript consists of a record like the following:`````` Trial 1: PASS Trial 2: PASS Trial 3: PASS Trial 4: PASS `````` etc. etc, then simulating a two-edge protocol is, while more time-consuming than simulating a one-edge protocol, still pretty easy. If the record looks like`````` Trial 1: PASS (v1 red -- v2 green, v3 green -- v4 blue) Trial 2: PASS (v1 blue -- v2 red, v7 blue -- v9 red) Trial 3: PASS (v1 green -- v2 red, v7 green -- v4 red) `````` then after Trial 3 you can tell that the prover is cheating. The fake trials seem to achieve your properties (1) and (3). I guess they violate #2? What is the "distribution" of a simulated transcript?
 Sorry my reply wasn't clear, I understood the question you were asking in your original comment. I was actually referring to your 2-edge protocol when I wrote that it seems the Verifier could not be tricked even in the simulated protocol (since it could detect inconsistencies as you mentioned).I think the theorem says that if your Solver can successfully trick the Verifier in the simulated protocol (aka using the rewinding trick), then your protocol is zero-knowledge. But I'm really not sure... Let me know if you find a more satisfying answer.

Search: