
Compact Sparse Merkle Trees – Efficient Non-Membership Proofs - farazhaider
https://osf.io/8mcnh/
======
amluto
This paper is difficult to follow. The pseudocode in particular contains
errors. For example, the insert function looks at left.key and right.key
without regard to whether left and right, respectively, are leaves and thus
_have_ keys. And insert calls itself recursively with the wrong number of
arguments.

It seems to me that the interesting part of the paper is the claim of history
independence. If I understand correctly, the basic idea is that, for a given
set of keys and values, there exists only one possible tree. If that’s the
intent, there should be a proof and probably a description of _what_ the tree
is for a given set of keys. If that’s not the intent, then a proof that the
membership algorithm actually works is needed.

~~~
farazhaider
Author here, thanks for reading through the paper.

I think the confusion is because I wrote the pseudocode in a functional
programming style so there are a few keywords being used which are causing
confusion. ( Which i mentioned in the paper. )

The insert function uses the left.key and right.key to calculate the minimum
distance for the key to be inserted. And there is a cond statement which will
execute only of the conditions.

And the pseudocode I wrote was in a functional programming style so the
arguments in both the functions are same, just the name of the last argument
is different, i.e root and leaf as I wanted to imply pattern matching happens.

The interesting part of the paper is that it outlines a new way to create a
Sparse merkle tree, which inherently becomes ordered due to the insertion
algorithm used. This ordered property is exploited to find the non-membership
proof in a new way which does not require empty hashes and is compact. History
independence also follows from the insertion algorithm used.

I'll be releasing an implementation for this paper soon so the proofs can be
checked empirically as well.

------
farazhaider
I have released a framework which has a C-SMT module.
[https://github.com/ZanjeerPlatform/bargad](https://github.com/ZanjeerPlatform/bargad)

------
nayuki
There is another method to support compact membership proofs and have history-
independence. Ethereum uses a Patricia-Merkle tree structure to store its
state snapshots. I'm not sure if it supports compact non-membership proofs
though.

~~~
farazhaider
Author here.

There are certain implementations out there which support non-membership
proofs but most of them use empty hashes to find the non-membership proofs and
the underlying SMT is unordered.

The approach taken in the above paper is novel in the sense that the way in
which values in SMT are inserted make it ordered and this property is
exploited to find the non-membership proofs through a window based method. To
prove a certain value Y doesn't exist in the tree, the closest two values X
and Z which bound Y form the non-membership proof.

This method avoids using empty hashes and the repetitive computation which
comes with it.

And Merkle Patricia Trie currently does not support Non-membership proofs.

~~~
nayuki
Interesting ideas. I understand the high-level idea of using X and Z to bound
Y.

Questions: Can you clarify what it means to use empty hashes for non-
membership proofs? / For the Merkle-Patricia Tree structure, if the branches
are ordered, wouldn't it be possible to use bounding for non-membership
proofs?

------
farazhaider
Hey guys, author here. Seems like the OSF link is failing for some people. I
have hosted the paper on github as well now.
[https://github.com/farazhaider/CSMT/raw/master/CSMT.pdf](https://github.com/farazhaider/CSMT/raw/master/CSMT.pdf)

------
mbroncano
On Safari/iOS I only get a blank page

~~~
eximius
Oof, it takes several seconds to load on Android. I think it's particularly
heavy JS rendering the page and the PDF. I do recall iOS JS being particularly
bad under certain loads.

------
rjzzleep
Serious question, can someone explain to me why these things are latex
documents with a little bit of code?

~~~
chriswarbo
CS papers mostly try to explain an idea or an experimental result, describe
why it is important, how it relates to existing approaches, what situations it
may be useful for, what the drawbacks are, how those might be addressed in
future work, etc. Pretty much none of those things are code.

When some algorithm, metric, etc. _does_ need to be presented, it's important
to make it as understandable as possible to the reader. This has a few
consequences:

\- Only the core parts are shown, i.e. the new bits. All of the surrounding
infrastructure that would inevitably be required (I/O, error handling, etc.)
is omitted, because it's routine stuff for basically any programmer, would
distract from understanding the main idea, and probably wouldn't be any good
if included (since that's not necessarily what the researchers are good at).

\- Programming snippets require a programming language, and these go through
fads and fashions over time. I've read papers from many decades ago, with code
snippets in MACLISP, Smalltalk72, Algol60, etc. and the ones which remain
understandable are those which either limit themselves to a tiny, commonly
understood subset of the language (e.g. variables + if/else + loops); or those
which have an accompanying description of the language features in English. If
we're giving English descriptions anyway, we might as well leave out lots of
actual code (especially due to page limits in journals/conference
proceedings).

\- Since the point of code snippets is to get across the idea, and the
notation might also require an accompanying description, it may be better to
use a notation specialised to getting across our idea. Usually this is some
form of mathematical notation (e.g. superscripts, subscripts, various forms of
brackets, bold face, etc.), with an accompanying description of what these
things represent.

~~~
farazhaider
Thanks for explaining it better than I could have. The implementation is also
in the works, it'll be released soon.

------
joker3
This is interesting work, but the claims in section 5 need to be proved.

~~~
farazhaider
Perhaps you missed it, in Section 5 I've mentioned that the hash function
SHA256 behaves as an ideal hash function, an assumption in cryptography called
Random Oracle Model.

By virtue of this model, the proofs for Structure, Space and Max-Proof of SMT
are implied.

I'm also working on an implementation for the concepts mentioned in the paper
so the proofs can also be verified emperically.

~~~
joker3
It's not the reader's job to fill in implied proofs. It's your job as the
author to give them enough information to check your argument. In some cases
"this follows from results A, B, C" is sufficient, but it's often not.

------
kfogel
Two-dimensional Bloom Filter?

------
gtycomb
the 'Download preprint' keeps failing (Firefox, Fedora Linux)

~~~
farazhaider
Use this link:
[https://github.com/farazhaider/CSMT/raw/master/CSMT.pdf](https://github.com/farazhaider/CSMT/raw/master/CSMT.pdf)

