Compact Sparse Merkle Trees – Efficient Non-Membership Proofs

amluto · on Oct 9, 2018

This paper is difficult to follow. The pseudocode in particular contains errors. For example, the insert function looks at left.key and right.key without regard to whether left and right, respectively, are leaves and thus have keys. And insert calls itself recursively with the wrong number of arguments.

It seems to me that the interesting part of the paper is the claim of history independence. If I understand correctly, the basic idea is that, for a given set of keys and values, there exists only one possible tree. If that’s the intent, there should be a proof and probably a description of what the tree is for a given set of keys. If that’s not the intent, then a proof that the membership algorithm actually works is needed.

farazhaider · on Oct 9, 2018

Author here, thanks for reading through the paper.

I think the confusion is because I wrote the pseudocode in a functional programming style so there are a few keywords being used which are causing confusion. ( Which i mentioned in the paper. )

The insert function uses the left.key and right.key to calculate the minimum distance for the key to be inserted. And there is a cond statement which will execute only of the conditions.

And the pseudocode I wrote was in a functional programming style so the arguments in both the functions are same, just the name of the last argument is different, i.e root and leaf as I wanted to imply pattern matching happens.

The interesting part of the paper is that it outlines a new way to create a Sparse merkle tree, which inherently becomes ordered due to the insertion algorithm used. This ordered property is exploited to find the non-membership proof in a new way which does not require empty hashes and is compact. History independence also follows from the insertion algorithm used.

I'll be releasing an implementation for this paper soon so the proofs can be checked empirically as well.

farazhaider · on Oct 10, 2018

I have released a framework which has a C-SMT module. https://github.com/ZanjeerPlatform/bargad

nayuki · on Oct 8, 2018

There is another method to support compact membership proofs and have history-independence. Ethereum uses a Patricia-Merkle tree structure to store its state snapshots. I'm not sure if it supports compact non-membership proofs though.

farazhaider · on Oct 8, 2018

Author here.

There are certain implementations out there which support non-membership proofs but most of them use empty hashes to find the non-membership proofs and the underlying SMT is unordered.

The approach taken in the above paper is novel in the sense that the way in which values in SMT are inserted make it ordered and this property is exploited to find the non-membership proofs through a window based method. To prove a certain value Y doesn't exist in the tree, the closest two values X and Z which bound Y form the non-membership proof.

This method avoids using empty hashes and the repetitive computation which comes with it.

And Merkle Patricia Trie currently does not support Non-membership proofs.

nayuki · on Oct 10, 2018

Interesting ideas. I understand the high-level idea of using X and Z to bound Y.

Questions: Can you clarify what it means to use empty hashes for non-membership proofs? / For the Merkle-Patricia Tree structure, if the branches are ordered, wouldn't it be possible to use bounding for non-membership proofs?

farazhaider · on Oct 8, 2018

Hey guys, author here. Seems like the OSF link is failing for some people. I have hosted the paper on github as well now. https://github.com/farazhaider/CSMT/raw/master/CSMT.pdf

mbroncano · on Oct 8, 2018

On Safari/iOS I only get a blank page

eximius · on Oct 8, 2018

Oof, it takes several seconds to load on Android. I think it's particularly heavy JS rendering the page and the PDF. I do recall iOS JS being particularly bad under certain loads.

farazhaider · on Oct 8, 2018

You can download the article directly from this link https://osf.io/8mcnh/download

The reason it's slow is because the OSF site loads the pdf in-browser.

rjzzleep · on Oct 8, 2018

Serious question, can someone explain to me why these things are latex documents with a little bit of code?

chriswarbo · on Oct 8, 2018

CS papers mostly try to explain an idea or an experimental result, describe why it is important, how it relates to existing approaches, what situations it may be useful for, what the drawbacks are, how those might be addressed in future work, etc. Pretty much none of those things are code.

When some algorithm, metric, etc. does need to be presented, it's important to make it as understandable as possible to the reader. This has a few consequences:

- Only the core parts are shown, i.e. the new bits. All of the surrounding infrastructure that would inevitably be required (I/O, error handling, etc.) is omitted, because it's routine stuff for basically any programmer, would distract from understanding the main idea, and probably wouldn't be any good if included (since that's not necessarily what the researchers are good at).

- Programming snippets require a programming language, and these go through fads and fashions over time. I've read papers from many decades ago, with code snippets in MACLISP, Smalltalk72, Algol60, etc. and the ones which remain understandable are those which either limit themselves to a tiny, commonly understood subset of the language (e.g. variables + if/else + loops); or those which have an accompanying description of the language features in English. If we're giving English descriptions anyway, we might as well leave out lots of actual code (especially due to page limits in journals/conference proceedings).

- Since the point of code snippets is to get across the idea, and the notation might also require an accompanying description, it may be better to use a notation specialised to getting across our idea. Usually this is some form of mathematical notation (e.g. superscripts, subscripts, various forms of brackets, bold face, etc.), with an accompanying description of what these things represent.

farazhaider · on Oct 8, 2018

Thanks for explaining it better than I could have. The implementation is also in the works, it'll be released soon.

sitkack · on Oct 9, 2018

Many journals also have a 10 page maximum. This is why one occasionally sees multiple versions of a paper.

farazhaider · on Oct 8, 2018

It's a technical paper which describes the theory behind the data structure/algorithm containing proofs which people can verify independently.

I'll be releasing a project soon which incorporates the concepts mentioned in the framework.

pmiller2 · on Oct 8, 2018

It’s more of a paper describing the concept than an actual, production implementation.

joker3 · on Oct 8, 2018

This is interesting work, but the claims in section 5 need to be proved.

farazhaider · on Oct 8, 2018

Perhaps you missed it, in Section 5 I've mentioned that the hash function SHA256 behaves as an ideal hash function, an assumption in cryptography called Random Oracle Model.

By virtue of this model, the proofs for Structure, Space and Max-Proof of SMT are implied.

I'm also working on an implementation for the concepts mentioned in the paper so the proofs can also be verified emperically.

joker3 · on Oct 8, 2018

It's not the reader's job to fill in implied proofs. It's your job as the author to give them enough information to check your argument. In some cases "this follows from results A, B, C" is sufficient, but it's often not.

kfogel · on Oct 8, 2018

Two-dimensional Bloom Filter?

gtycomb · on Oct 8, 2018

the 'Download preprint' keeps failing (Firefox, Fedora Linux)

farazhaider · on Oct 8, 2018

Use this link: https://github.com/farazhaider/CSMT/raw/master/CSMT.pdf