The blockchain is supposed to be used as an append-only data structure. There are a number of nodes who have copies of what is supposed to be (or at least, converge to) exactly the same data. The nodes are all constantly telling each other what they believe to be the most up-to-date version of the data. The purpose of the blockchain is to support a consensus-finding algorithm by which the nodes may start out disagreeing on the contents of the most recent few blocks in the log, but eventually converge to a consensus, at least for relatively old blocks.
But some of the nodes are evil (or maybe just faulty, or maybe just out-of-touch), and want to try to change past data and get the other nodes to accept their fake history (the "fake chain").
When any node notices that some other nodes are saying different things (proposing different chains), it prefers to believe in whichever chain is longer (this is a slight simplification, actually it's accumulated difficulty).
Many of the nodes are constantly accepting new data and 'mining' new blocks to append to the end of the real chain. They are doing this as fast as they can. The problem of finding a new hash for new blocks is embarrassingly parallel, so if there are 1000 nodes in the network they can mine about 1000x as fast as one node (however, the protocol is constantly adjusting the difficulty (the number of prefix zeros required in the hash) to ensure that on average the blockchain is getting longer at a fixed rate).
If the evil nodes want to change something far back in history, they're going to have to try to mine a whole bunch of new blocks before the fake chain gets as long as the real chain. Recall that the other nodes will reject the fake chain as long as they are aware of another chain which is longer. But while the evil nodes are trying to catch up, the good nodes are also going to be trying to mine new blocks to append to the end of the real chain.
Assuming there are more good nodes than evil ones (or rather, that the total computing power of the good nodes is greater than the total computing power of the evil ones), on average the speed that the evil nodes can mine new blocks is slower than the speed that the real chain is getting longer.
Therefore the rule that the longest chain is the right one works.
Now, through random chance it's always possible for the evil chain to get lucky and mine a block much faster than the good chain. But if it alters something deep in history, then in order to catch up, it would have to get lucky in this way many times in a row; the chance of that happening decreases geometrically with the number of blocks it is behind. Therefore, you can be very confident that a block deep in the chain won't be altered.
To reiterate, the reason that it's important that the further back you change something, the more hashes you need to recompute, is that this leads to the following property: if there are two competing chains of different lengths, the probability that the shorter chain will eventually become the longest decreases geometrically with the initial difference in lengths. This property is why the algorithm converges to a consensus on the data in older blocks.
Thank you very much for that expanded explanation. I now know more, but knowledge is a dangerous thing.
Let's pretend I have the fastest hash generation engine (actually I'd need 2 for this scheme). I would create a real node that uses my engine and becomes part of the node community. Then I create 100 bogus nodes that proxy to my real node. Now I have a large number of nodes that are essentially using my version of "reality" which, in the beginning, is what everyone else says is truth. Meanwhile I'm busy re-writing history to give myself 10 bazillion bitcoins. My other hash engine is recomputing the chain with my bogus history in the background. At some point it catches up with the present. At that point I substitute my bogus chain for the real chain on my main node. My main node is now in disagreement with everyone else's view of reality. My bogus chain also shows up on my 100 other nodes that are proxying my main node. I now have 101 nodes showing my bogus bitcoins. If 101 nodes isn't enough to win the vote then add more bogus nodes until I have 51% of the total nodes.
Also, what's to prevent me from adding many real but zero sum transactions to my chain before I tell the world about those transactions? He who has the biggest chain wins.
Unrelated thought: quantum computing sounds like it could throw a monkey spanner into the wrench works.
> If 101 nodes isn't enough to win the vote then add more bogus nodes until I have 51% of the total nodes.
You could do that, but it would be expensive. It's not the number of nodes that matters, it's the total computing power of those nodes, because they need to hash faster than the rest of the network. Your nodes, combined, would need to have more computing power than all of the other nodes, combined. If the blockchain you are attacking is popular, the cost of this much computing power would be prohibitive (eg for Bitcoin today [https://gobitcoin.io/tools/cost-51-attack/] estimates it would cost around $1 billion for the machines plus $2 million per day for electricity).
> Also, what's to prevent me from adding many real but zero sum transactions to my chain before I tell the world about those transactions? He who has the biggest chain wins.
Since each block must contain the hash of the previous block, these blocks, although empty of data, still have different hashes. So you have to compute just as many hashes, regardless.
> Unrelated thought: quantum computing sounds like it could throw a monkey spanner into the wrench works.
But some of the nodes are evil (or maybe just faulty, or maybe just out-of-touch), and want to try to change past data and get the other nodes to accept their fake history (the "fake chain").
When any node notices that some other nodes are saying different things (proposing different chains), it prefers to believe in whichever chain is longer (this is a slight simplification, actually it's accumulated difficulty).
Many of the nodes are constantly accepting new data and 'mining' new blocks to append to the end of the real chain. They are doing this as fast as they can. The problem of finding a new hash for new blocks is embarrassingly parallel, so if there are 1000 nodes in the network they can mine about 1000x as fast as one node (however, the protocol is constantly adjusting the difficulty (the number of prefix zeros required in the hash) to ensure that on average the blockchain is getting longer at a fixed rate).
If the evil nodes want to change something far back in history, they're going to have to try to mine a whole bunch of new blocks before the fake chain gets as long as the real chain. Recall that the other nodes will reject the fake chain as long as they are aware of another chain which is longer. But while the evil nodes are trying to catch up, the good nodes are also going to be trying to mine new blocks to append to the end of the real chain.
Assuming there are more good nodes than evil ones (or rather, that the total computing power of the good nodes is greater than the total computing power of the evil ones), on average the speed that the evil nodes can mine new blocks is slower than the speed that the real chain is getting longer.
Therefore the rule that the longest chain is the right one works.
Now, through random chance it's always possible for the evil chain to get lucky and mine a block much faster than the good chain. But if it alters something deep in history, then in order to catch up, it would have to get lucky in this way many times in a row; the chance of that happening decreases geometrically with the number of blocks it is behind. Therefore, you can be very confident that a block deep in the chain won't be altered.
To reiterate, the reason that it's important that the further back you change something, the more hashes you need to recompute, is that this leads to the following property: if there are two competing chains of different lengths, the probability that the shorter chain will eventually become the longest decreases geometrically with the initial difference in lengths. This property is why the algorithm converges to a consensus on the data in older blocks.