This paper is difficult to follow. The pseudocode in particular contains errors. For example, the insert function looks at left.key and right.key without regard to whether left and right, respectively, are leaves and thus have keys. And insert calls itself recursively with the wrong number of arguments.
It seems to me that the interesting part of the paper is the claim of history independence. If I understand correctly, the basic idea is that, for a given set of keys and values, there exists only one possible tree. If that’s the intent, there should be a proof and probably a description of what the tree is for a given set of keys. If that’s not the intent, then a proof that the membership algorithm actually works is needed.
Author here, thanks for reading through the paper.
I think the confusion is because I wrote the pseudocode in a functional programming style so there are a few keywords being used which are causing confusion. ( Which i mentioned in the paper. )
The insert function uses the left.key and right.key to calculate the minimum distance for the key to be inserted.
And there is a cond statement which will execute only of the conditions.
And the pseudocode I wrote was in a functional programming style so the arguments in both the functions are same, just the name of the last argument is different, i.e root and leaf as I wanted to imply pattern matching happens.
The interesting part of the paper is that it outlines a new way to create a Sparse merkle tree, which inherently becomes ordered due to the insertion algorithm used. This ordered property is exploited to find the non-membership proof in a new way which does not require empty hashes and is compact. History independence also follows from the insertion algorithm used.
I'll be releasing an implementation for this paper soon so the proofs can be checked empirically as well.
There is another method to support compact membership proofs and have history-independence. Ethereum uses a Patricia-Merkle tree structure to store its state snapshots. I'm not sure if it supports compact non-membership proofs though.
There are certain implementations out there which support non-membership proofs but most of them use empty hashes to find the non-membership proofs and the underlying SMT is unordered.
The approach taken in the above paper is novel in the sense that the way in which values in SMT are inserted make it ordered and this property is exploited to find the non-membership proofs through a window based method.
To prove a certain value Y doesn't exist in the tree, the closest two values X and Z which bound Y form the non-membership proof.
This method avoids using empty hashes and the repetitive computation which comes with it.
And Merkle Patricia Trie currently does not support Non-membership proofs.
Interesting ideas. I understand the high-level idea of using X and Z to bound Y.
Questions: Can you clarify what it means to use empty hashes for non-membership proofs? / For the Merkle-Patricia Tree structure, if the branches are ordered, wouldn't it be possible to use bounding for non-membership proofs?
Oof, it takes several seconds to load on Android. I think it's particularly heavy JS rendering the page and the PDF. I do recall iOS JS being particularly bad under certain loads.
CS papers mostly try to explain an idea or an experimental result, describe why it is important, how it relates to existing approaches, what situations it may be useful for, what the drawbacks are, how those might be addressed in future work, etc. Pretty much none of those things are code.
When some algorithm, metric, etc. does need to be presented, it's important to make it as understandable as possible to the reader. This has a few consequences:
- Only the core parts are shown, i.e. the new bits. All of the surrounding infrastructure that would inevitably be required (I/O, error handling, etc.) is omitted, because it's routine stuff for basically any programmer, would distract from understanding the main idea, and probably wouldn't be any good if included (since that's not necessarily what the researchers are good at).
- Programming snippets require a programming language, and these go through fads and fashions over time. I've read papers from many decades ago, with code snippets in MACLISP, Smalltalk72, Algol60, etc. and the ones which remain understandable are those which either limit themselves to a tiny, commonly understood subset of the language (e.g. variables + if/else + loops); or those which have an accompanying description of the language features in English. If we're giving English descriptions anyway, we might as well leave out lots of actual code (especially due to page limits in journals/conference proceedings).
- Since the point of code snippets is to get across the idea, and the notation might also require an accompanying description, it may be better to use a notation specialised to getting across our idea. Usually this is some form of mathematical notation (e.g. superscripts, subscripts, various forms of brackets, bold face, etc.), with an accompanying description of what these things represent.
Perhaps you missed it, in Section 5 I've mentioned that the hash function SHA256 behaves as an ideal hash function, an assumption in cryptography called Random Oracle Model.
By virtue of this model, the proofs for Structure, Space and Max-Proof of SMT are implied.
I'm also working on an implementation for the concepts mentioned in the paper so the proofs can also be verified emperically.
It's not the reader's job to fill in implied proofs. It's your job as the author to give them enough information to check your argument. In some cases "this follows from results A, B, C" is sufficient, but it's often not.
It seems to me that the interesting part of the paper is the claim of history independence. If I understand correctly, the basic idea is that, for a given set of keys and values, there exists only one possible tree. If that’s the intent, there should be a proof and probably a description of what the tree is for a given set of keys. If that’s not the intent, then a proof that the membership algorithm actually works is needed.