
Authenticated Data Structures Generically - kushti
https://github.com/adjoint-io/auth-adt
======
petertodd
This stuff is _much_ simpler than it sounds, and that's a good thing.(1)

The intuition here is that with hashed data structures like merkle trees, if
you and I agree on a hash digest, then we also agree on all the data hashed by
that hash digest. Since you can hash a hash digest, this agreement operates
recursively.

When we talk about a "query", all we're really saying is Alice wants to prove
to Bob that if he runs some function on some datastructure, he'll get a
particular answer. For instance, Alice might tell Bob "Hey, you know that
merkle tree we both agree on? If you run get_index(10) on it, you'll get the
value `foo`"

She can _prove_ that to Bob, by running that operation _herself_ on her
computer, and then recording _what_ data was accessed during that operation.
Next, she serializes that data, and replacing the parts that weren't accessed
with their hash digests. Finally, she sends that partial data structure to
Bob.

When Bob gets that data, he first hashes it to make sure the root hash digest
actually matches what he expected. Next he runs the operation, which will
return the same result as when Alice ran it. This is a proof for a very simple
reason: _both_ sides ultimately did the exact same thing!

If it helps, think of virtual memory: the data that Bob is missing, is similar
to a page in memory that has been flushed to disk. Unless your computation
actually needs the data in that page, it'll run successfully even if your disk
isn't working.

So why does all this matter? Basically, this is _exactly_ what blockchains are
supposed to do: let multiple parties verify the same thing is true, based on
the same data. So in, say, Certificate Transparency, the operator of the CT
log can generate can extract one of these proofs to convince the webbrowser -
who doesn't have all the CT log data - that yes, if you run the operation
"fetch cert #12345 from CT log" it will in fact return the cert they're
expecting it too (in other words, CT already does exactly this). Where this
improves on naive blockchains is simple: now you don't have to have the whole
blockchain to verify something.

I actually did a Python implementation of these ideas years ago for a client:
[https://github.com/proofchains/python-
proofmarshal](https://github.com/proofchains/python-proofmarshal) I'm also
(slowly) working on a Rust implementation.

FWIW, I think it's an awful example of how a really good idea has been
explained terribly by academia. It actually pre-dates Andrew Miller's paper,
but the previous example I found never caught on, likely because no-one
understood what they were talking about. Heck, Andrew himself tried to explain
it to me years ago, and I didn't get it, then went on to reinvent the same
thing, which I only realized about another year later. We really need to find
better ways to bridge those two worlds. :/

1) EDIT: And to be clear, I don't mean for that to come across as a negative
way! Rather, it's good that academics have thoroughly proven these techniques
actually work. But for programmers, the mechanics of those proofs aren't
necessarily all that important. Same as how you can use calculus effectively
to solve many problems without necessarily understanding in detail how we
actually proved it worked.

~~~
joshuak
This was a big barrier to learning for me for a decade or more. As books and
papers related to algorithms I needed to implement seem to always preoccupy
themselves with proving to me that what they are saying is true, or
illustrating the derivative insights that lead them to the discovery.

Reading them I would find myself thinking "Nice, such a lovely history of your
enlightenment, but HOW!!?? How do I do it!?!"

For an embarrassingly long period of time, I misinterpreted these discussions
to be a breakdown of a process to it's constituent parts. The 'proof' that it
is correct is of passing concern to me. Like headache medicine on the shelf.
Have you proven this compound won't kill me (quickly), great I'll take a
bottle because I have a headache from reading the theoretical validation of
cache-oblivious algorithms when what I need is a practical discussion of how
to implement Funnelsort.

It's like there is a missing level of academic publication in which the
initial proof, is supplemented with a paper on implementation. Here we prove
this is correct, and here we implement one with Elmer's glue and popsicle
sticks. We see publications on implementation if course, but they are usually
much later and by different people. I don't mean to devalue anything in the
existing academic process. I just feel there is a missed opportunity to
communicate better by explaining things from the practical(aka outside) side
in, in addition to the theoretical foundations (aka inside) out.

It would be nice to have it from the horses mouth, so to speak, by making such
explanations a more formal expectation of academic papers.

~~~
petertodd
I think a lot of that problem just stems from mismatched goals(1) and
mismatched skillsets. Academics are rewarded for pushing academia forward,
which means rigorous techniques that others can build on top of. But what
those trying to use those techniques need are good models that they can build
_with_.

Secondly, the skillset required to explain something well doesn't have all
that much to do with the skillset required to rigorously prove things. Sure,
they overlap a bit because you need to have a reasonably good understanding of
something to explain it, but that's where it ends. It's can also be quite a
lot of work: I personally have made a bit of a career out of explaining this
tech to people, and it's taken me a lot of work figuring out explanations that
actually resonate with people.

Unfortunately I don't really have any easy solutions to solving this problem.

1) I'd say mismatched incentives, but I don't want to imply anything nefarious
about this problem in general!

~~~
munin
Programming is harder than proving. Well, programming poorly is easier than
proving, but making something that you could release and won't look like gum
and bailing wire is harder than proving. You wind up with a gum and bailing
wire implementation that lives in your home directory until your laptop dies,
and then you realize that nothing of value was lost because it has been years
since you touched that project, and you move on.

Proofs are the ideal artifact for a researcher because they don't require
maintenance and don't have users. Think hard, write it down, publish, and move
on. Academia punishes researchers that create and maintain tools for other
people to use (for an example, look at the Sage people) and rewards
researchers that publish papers.

As a researcher, this feels wrong and frustrating, but there's really nothing
I can do about it from within the system, or they'll kick me out for not
publishing enough papers.

