
Transparent Logs for Skeptical Clients - skybrian
https://research.swtch.com/tlog
======
lachenmayer
The hypercore[1] signed append-only log which is the heart of the Dat
protocol[2] is an implementation of very similar ideas. The data structure
consists of an append-only log, a Merkle tree (stored in the "inorder storage
layout" described in Appendix B), and signatures for each of the root hashes
of the Merkle tree. The public key ("read key") of the key pair which is used
to sign the tree roots can be used as a unique identifier of the contents of
the log, eg. for looking up peers in a DHT which can serve the contents. It
features a replication protocol which is similar to the one described in the
article (if you squint hard enough). A fantastic in-depth explainer of the
data structure and the protocol is available at [3].

The current implementation is in Node.js, but a Rust port is under way. [4]

[1]:
[https://github.com/mafintosh/hypercore](https://github.com/mafintosh/hypercore)
[2]: [https://www.datprotocol.com/](https://www.datprotocol.com/) [3]:
[https://datprotocol.github.io/how-dat-
works/](https://datprotocol.github.io/how-dat-works/) [4]:
[https://github.com/datrs/hypercore](https://github.com/datrs/hypercore)

~~~
rzzzt
I wish other protocols were described in "How Dat Works"-style. Very nicely
done.

------
k_harper
For fun, I tried to find a simpler formula for seq(0,K) in Appendix A, but I
didn't check if it's computationally faster and I doubt it is. The formula I
found was: seq(0,K) = 2(K+1) - Hamming(K+1) - ffs(K+1)

Explanation: When the newest hash is at position K (zero-indexed), there are
K+1 level 0 hashes being stored.

The stored permanent hashes form a set of perfect binary trees. There is
either 0 or 1 of each size of perfect tree. To determine if a tree of a given
size exists when there are K+1 hashes, you can check the corresponding binary
digit in K+1. For instance, if the eights place in K+1 is set, then there is a
perfect tree with eight level 0 hashes.

Each perfect tree contains 2N - 1 hashes, where N is the number of level 0
hashes in it. To calculate the total number of permanent hashes, we can sum
the totals for the individual trees. The sum of N for all of the trees is just
the total number of level 0 hashes, K+1, so the 2N term becomes 2(K+1). The
total number of trees is equal to the number of set bits in the binary
representation of K+1, which is the hamming weight[1]. So -1 becomes
-Hamming(K+1).

We now know that when the last record is at position K (zero-indexed), there
are 2(K+1) - Hamming(K+1) permanent hashes stored.

Unfortunately, the last level 0 hash might not be at the end of the file, so
we need to subtract the number of hashes that come after it. The number of
hashes after it is the height (L, zero-indexed) of the smallest perfect tree,
which is the position of the first set bit (one-indexed)[2] of K+1, less 1:
ffs(K+1) - 1. We then subtract 1 to convert the number of records to the zero-
indexed position. The -(-1) and -1 cancel out.

[1]
[https://en.wikipedia.org/wiki/Hamming_weight#Efficient_imple...](https://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation)
[2]
[https://en.wikipedia.org/wiki/Find_first_set#CTZ](https://en.wikipedia.org/wiki/Find_first_set#CTZ)

------
yinyinwu
What are the applications for this?

------
flipp3r
Literally just a usecase for a blockchain, reinvented..

~~~
nickpsecurity
It's a hashchain. They predate blockchains. I used them a long time as just
signed, text files full of hashes. Trusted Timestamping used them. Blockchains
were a highly-wasteful modification of hashchains for a different purpose.
Databases, logs, and hashchains are still better for most things. Centralized
services with decentralized checking of a signed log is still more efficient
since centralized tech is itself more efficient.

~~~
darkerside
I was hoping this would be more informative...

[https://en.wikipedia.org/wiki/Hash_chain#Hash_chain_vs._bloc...](https://en.wikipedia.org/wiki/Hash_chain#Hash_chain_vs._blockchain)

~~~
geofft
A blockchain is a modification of a hash chain to make it usable in cases
where different transactions conflict with each other. In the case of TLS
certificate transparency, Go packages, etc., a log entry "This object with
this hash was published" does not conflict with a log entry "That object with
that hash was published." You can append them to the chain in either order and
it's fine. For a financial ledger, "This money was moved from A to B"
conflicts with "This money was moved from A to C," and so one needs to be
applied first so that the other can be rejected. If both transactions are
signed it doesn't matter which one (for the correctness of the system, at
least), but all users have to come to consensus about which of the two is
accepted. Otherwise you have the "double-spend" problem.

Bitcoin's blockchain solves this problem with "proof-of-work," i.e., making it
computationally difficult to add another entry to the hash chain. Because it's
difficult, it's unlikely that both transactions will be accepted at the same
time, which serves as a publicly-verifiable way to arbitrarily pick one. There
is a small financial incentive for the network entity that successfully solved
the computational problem, hopefully offsetting their expense in "mining" the
"block." (As a further means of making sure only one is accepted, even if both
are mined in quick succession, it's common practice in Bitcoin to wait for a
few more blocks to be mined before B or C accept the transaction, which forces
the network to accept one chain or the other.)

Satoshi's innovation was that proof-of-work can be used to allow arbitrary /
anonymous entities to participate in this system without risking someone
creating several thousand anonymous entities voting in their favor, or
creating one view of the hash chain for B and another for C to convince them
both to accept the transaction. (Satoshi's innovation was _not_ the append-
only cryptographically-verifiable store; that's just a hash chain. It was
securing updates to that store without a central trusted coordinator.)

There are other ways to arbitrarily pick one transaction or the other, e.g.,
using a distributed consensus algorithm and admitting participants based on
"proof of stake." Or you can redefine the problem to allow double-spend to a
limited extent and make A responsible for paying out both B and C - this is
roughly Stellar's approach.

But if you don't have a double-spend problem (or if you have a central trusted
coordinator), you just need a boring old hash chain, which can be implemented
much more efficiently - and possibly expanded to a hash tree, as described in
this article.

