Hacker News new | comments | show | ask | jobs | submit login
Is Git a Block Chain? (domustower.com)
126 points by argon on Apr 25, 2015 | hide | past | web | favorite | 49 comments

Both git repositories and bitcoin are specialized Merkle trees. Merkle trees are incredibly useful and general; they are used in many kinds of verification, especially of large chunks of data.


The power of the Merkle tree is pretty amazing. IPFS is a good example https://www.ipfs.com.

I thought it was a directed acyclic graph. Care to share the distinction?

The terminology used in #bitcoin-wizards is to call it a merkelized dag, or merkle-dag.

Similarly, we also refer to "Merkelized Abstract Syntax Trees", a way of hashing code originally proposed by Pieter Wuille and Russel O'connor(1) that will probably be added to Bitcoin's scripting system eventually.

Pretty much any data structure can have hash functions added to it to "merkelize" it, producing an authenticated data structure: http://www.cs.umd.edu/~amiller/gpads/

1) https://download.wpsoftware.net/bitcoin/wizards/2014-12-16.h...

Bitcoin Script is Forth-like RPN, why does it need AST? Do you mean there are plans to add more languages/DSLs to Bitcoin which with multiple compiler front-ends generating ASTs. which then translated to "machine opcodes"? Something similar to Ethereum?

Or maybe it's a solution for transaction scriptSig malleability? [1]

[1] https://en.bitcoin.it/wiki/Transaction_Malleability

You can see git's DAG as a dedup'd Merkle tree: in other words, if two child pointers in two places in the tree point to nodes (blobs) with the same hash, make them share a single node instead.

This works because the hash "is" the content/entire subtree (modulo hash collisions), so the resulting data structure has the same meaning as before.

Content/hash-addressed stores are related to referential transparency and immutability (and deep equality) in languages like Haskell, FWIW. I've always thought it's sort of beautiful how cleanly the ideas come together like that.

"Blockchain" is generally used to refer to systems that either 1) use the sequence of blocks to model changes in custodianship or 2) (more generally) enforce a set of rules governing the correctness of a given block.

That is, a block in the bitcoin blockchain is valid not only if its hash matches what one would expect given the included transactions, but that those transactions adhere to the rules of bitcoin. (No double-spends, no dust transactions, etc).

While there are data structures in a git commit that must be present and/or follow a particular set of semantics, git does not enforce anything about the _contents_ of those commits.

Another key distinction: blockchains seek consensus, whereas divergent forks in git repos are by design.

EDIT: I should probably not distinguish too much between consensus and rule enforcement, as those two are obviously intertwined. :)

No, its not. Its just a chain. There's no block. The block in a blockchain is everything to do with distributed / trustless conscensus.

Imagine how cool it would be if I could share a guid for my repo - and then your bit client (let's call it gitcoin, or maybe just bit) can fetch new commits from a distributed block chain (essentially the git log). Github is no longer an intermediary or a single point of failure. Private repo? Don't share the guid.

That's a great idea, and some bright folks are working on it! http://ipfs.io/

I want to make sure people don't overlook this IPFS link as just a decentralized GitHub. IPFS is building a content-addressable web. Think about how BitTorrent magnet links work—it's a hash of the actual movie (or Linux distribution) you're trying to download, and using that hash, you can connect with all the people who've already downloaded it to get it from them. On a content-addressable web, pulling up the New York Times would be looking up a certain hash, which would pull the content from your neighbor three doors down who read it earlier this morning. The web won't be servers that you hit with your browser anymore. It'll be content that lives forever, always reachable by its hash.

As long as people want to read what you're publishing, publishing is free. And fast. Around the planet. The web is about to get way better, and IPFS and its competitors are going to be what pushes it forward.

IPFS does look interesting, but how does it provide updateable references? What's the equivalent of "give me today's headlines", which is pointedly not content-addressible because you don't know the content yet?

IPNS is a layer on top of IPFS that is essentially a map from public key to something signed by the corresponding private key. So you can store an IPFS hash in IPNS under your public key. Then whenever you create new content, just update the hash in your IPNS entry. People only need to remember your public key to find your latest content.

I suspect that there's a way to identify the latest content built into IPFS, but if not, it's easy to build that on top of something like Ethereum. If you're new to Ethereum, it's a global, trustless, blockchain-based coordination platform—it gives us the ability to determine what "The New York Times" is, without having to trust someone else to tell you. The New York Times would have an Ethereum contract with storage space that only their private key can write to that would hold the hash of the latest state of the site. A site state would contain set of hashes that point to the day's articles and perhaps previous states of the paper.

In the very near future, the entire New York Times archives will be fetchable just by asking for it. Your computer will ask your neighbor for the front page for July 21, 1969, and they'll send it right over.

> I suspect that there's a way to identify the latest content built into IPFS

Yeah-- the IPNS records have a notion of recency, as well as being able to write version history datastructures (e.g. git)

Name resolution is the answer - folks go to your /ipns/com.example.blog site, which you've updated with the new reference for your new content ..

There is some documentation about IPNS here:


I think that's what ipns is for. Basically synonymous with git refs; a link to a hash.

Thank you for taking the time to write this explanation :)

Isn't a distributed GitHub just git? The GUID for a repository is the repository URL, the git log is your distributed block chain, and your bit client is just git.

The only reason GitHub is an intermediary and single point of failure is because it's much more convenient than raw git. There's nothing that stops you from dropping your git repo on a webserver, Amazon S3 bucket, or any other data store. Heck, my last startup used Heroku as our master git repository because we didn't want to pay for a private GitHub repository.

That sounds like gittorrent[0]. Never went anywhere, unfortunately.

[0] https://code.google.com/p/gittorrent/

though it may sounds similar, it's very, very different. take a deeper look at the demo on http://ipfs.io

If it's distributed then I don't think private repos would be possible, unless they were encrypted.

Specifically, the block is an optimization for more transaction throughput in a consensus system. Without blocks, consensus would take longer because every binary decision requires at least a couple rounds of communication amongst every "validator".

Consensus seems like an essential property that distinguishes a blockchain from a Merkle tree, and git does not provide consensus.

Yep! entirely agreed.

The correct answer

Related is the Gitcoin mining challenge from Stripe's Capture The Flag 3.

Here's a writeup: https://github.com/ctfs/write-ups-2014/tree/master/stripe-ct...

A Gitcoin miner updates the ledger (a text file in the repo), and "mines" the next-smaller hash.

It doesn't have consensus, but it does give a neat programming challenge.

Not sure I get it. Appears to be technically correct assuming we view the term "block chain" as a basic combination of the words as opposed to a term of art, but without the PoW it seems to be missing the point.

I'll quote myself (http://blog.oleganza.com/post/85111558553/bitcoin-is-like):

Bitcoin is like Git: in Git (a distributed version control system) all your changes are organized in a chain protected by cryptographic hashes. If you trust the latest hash, you can get all the previous information (or any part of it) from any source and still verify that it is what you expect. Similarly, in Bitcoin, all transactions are organized in a chain (the blockchain) and once validated, no matter where they are stored, you can always trust any piece of blockchain by checking a chain of hashes that link to a hash you already trust. This naturally enables distributed storage and easy integrity checks.

Bitcoin is unlike Git in a way that everyone strives to work on a single branch. In Git everyone may have several branches and fork and merge them all day long. In Bitcoin one cannot “merge” forks. Blockchain is a actually a tree of transaction histories, but there is always one biggest branch (which has the value) and some accidental mini-branches (no more than one-two blocks long) that have no value at all. In Git content matters (regardless of the branch), in Bitcoin consensus matters (regardless of the content).

Yes they are Merkle trees. But the missing part of a blockchain is the incentive part. Obviously you don't even need git or anything so complicated for a blockchain. All you need is a network of computers that would periodically:

  * receive all the transactions (or their hashes) that occurred
  * sign this with some HMAC, and publish it
  * then we know that the transactions occurred by X time
of course, the question is how do we know transaction A really occurred after X time? Perhaps some will claim it occurred before X, but just wasn't included by anyone in the network for X because they are discriminating against that transaction's participants, or something.

And what if it was included by some in the network but not others? How do we know some are not cheating?

That's where proof-of-work comes in. You can also have proof-of-stake and other mechanisms, by which to solve the Byzantine Generals problem. How do we know that the signers aren't cheating? Who watches the watchers?

All of the complexity of bitcoin (including having and paying miners) comes from the requirement to be completely distributed, with no authority that could refuse to include "bad" transactions.

But if you're not signing any black- or grey-hat documents or transactions, then you can just pick a trustworthy central party to receive the hashes and publish the chain.

The startup GuardTime has been doing document timestamping and integrity checks with a blockchain for a bit while longer than bitcoin.

* You hash documents that you want to timestamp, and send the hash to them.

* They hash all received hashes and the previous head of the blockchain together, and publish the head hash in the paper copy of Financial Times: https://guardtime.com/photos/ft14-2.png

* The exact contents and "created before than" timestamp of the documents can now be fully verified, unless someone can manage to gather and destroy all paper copies of Financial Times published after it was sent to them.

The incentive for them is an usual freemium payment model.

The problem is, as zkhalique said, that while it's easy to show a transaction happened before time X, it's hard to show it after. The single publisher (in this case the Financial Times or GuardTime) could have refused to include the transaction. Or the transaction might have not happened. How do you know which it is? What I pay you 10,000 bicoins and I promise GuardTime 3,000 to retract that fact later?

This is a more straightforward description of what GuardTime does than anything the company themselves have managed to publish.

The strength of bitcoin is the blockchain. It prevents you from going back and rewriting history--you can only append to the current block.

In Git you can change history. So I don't think Git can be called a blockchain, although it shares the sames concepts, like the merkle tree.

I feel like that's not really a fair statement. With git, if you change the history, it effectively tosses out the old tree and creates a new one with new hashes. You're really not modifying the old history, you're saying "Get rid of the current history tree and create a new one with this change in it". You can do the same with Bitcoin, you could modify your copy of the blockchain to insert some new transaction, and then replay all the subsequent transactions that happened on the blockchain (All of these transactions get different hashes).

The result for both is the same, both of them will throw errors if you attempt to pull data from a git repo or blockchain which has invalid hashes that don't match the ones you already have - This is why it's a problem if someone decides to use rebase on a public branch that people are pulling from, the hashes will change and people will get errors when they try to pull because the histories don't match.

The difference is that for the git chains it is not possible for anyone to independently verify what the true chain is. You can rewrite history and it will be just as correct as the first one.

PoW (Proof of Work) makes it impossible to just "replay all the subsequent transactions that happened", because your fake chain will have significantly less proof of work.

Only if they're getting the chain for the first time. Git becomes... fairly wrothful if someone rewrites the history on a remote branch.

Here is how a website can implement git in a decentralised blockchain-like consensus.

Launch Gitcoin.com. The company's shares are called gitcoins which is a premined cryptocurrency by gitcoin.com. The aim of the company is to balance the distribution of shares according to their value, to maximize its own value. Parties who support this company's cause or want to gain a monetary incentive out of the value that gitcoin would gain would mine gitcoins or buy them in exchanges.

Authors publish the meta data about their content hashed with any private key they want. Each of these private key is a separate logical 'view' for showcasing the Authors content. For eg, one private key each for friends, business and everyone, so that one category cannot view the contents of the other, and the share the respective public keys with concerned parties using any medium. The aim would be to earn gitcoins and go and sell them at an exchange to earn a living wage.

The consumers of code would pay for the code they want to view to the public key, which is also a gitcoin address, which they already have! This automatically sends gitcoins to the author.

All code is open to everyone. If you have a public key you can view its associated code and change it, by sending money to it. Gitcoin.com takes a cut in providing this service.

This website also invents two brand new features in a website - no password and no captcha!

- No Password because you can view any code if you send money to it. Authors will keep the rates very high and share a new private category's public key to people they wanna give discount to. You can have as many identities you like and can club, fork and terminate them.

- No Captcha because captchas were invented to prevent robots. But robots are not always bad. So we want people to build helpful reddit-like robots but not bad ones. Gotcon.com solves this by allowing all robots according to an economic incentive motive. For ex, a user can build a good robot that shows the issues list corresponding to a gitcoin address. People pay this robot in gitcoins and so this robot has enough gitcoins to open the community desired code. But if a bad robot tries to spam this good robot, people will simply create a new address and copy all previous data and move to it, all of this at a click of 'block' button.

So basically, by changing the behavior of block button to create a decentralised copy of gitcoin.com itself and then moving to it eliminates the bad actor immediately and making it yet expensive for him to keep paying to see the code.

As others have mentioned, the one main distinguishing factor between blockchains and git (at least in practice because bitcoin's POW makes rewriting the chain expensive) is the consensus among its general users and in inability to actually change the history of the blockchain. Mathematically they are very similar, but git repos aren't "secure" in the same way. That's why I would argue against the author that git isn't the most valuable blockchain in use today although it could very well be the biggest.

Is this list of comments a block chain?

Um no, these comments don't have hashes associated with them.

They might! Who knows how the HN architecture actually works.

Not likely, given that you are allowed to edit comments.

Would this qualify ZFS as a blockchain as well?

but SHA-1 is considered not secure, so you can't just trust commit-id

no but wikipedia kind of is

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact