Interestingly, the main bitcoin developers generally treat other implementations of the protocol as being suspect and more-than-possibly threatening, since if you're not bug-for-bug compatible with the satoshi client that is a security risk. This remains true even if you improve upon the satoshi client with regards to, for example, hewing closer to the published spec.
This might come as something of a surprise to many HNers, who would think "But wait. The Internet would sort of suck if HTML5 were a purely advisory document and failing to match IE6's actual behavior 100% of the time was a fatal flaw in a browser." But, to quote a fairly representative post from their forums:
Any implementation needs to specifically test for uniformity with the network: Bitcoin is a distributed consensus algorithm and differences in what nodes accept or reject in the blockchain— things which would be minor harmless behavioral differences in most software— can often result in fatal security flaws where an attacker can move the nodes in question onto a separate fork and double-spend their funds away or partition the network. This requires a unusual level of care and system level tests.
Many of the most interesting cases are the great many things which must be rejected as no amount of exposure to the live network will trigger those cases (until an attacker exploits them to partition the network). (cite: https://bitcointalk.org/index.php?topic=192880.0)
Many Bitcoin developers are actually quite positive to the idea of alternative implementations; all they're saying is "be careful".
> Gavin wrote to me only days after the BitCoinJ release to tell me how happy he was to see an alternative implementation. Satoshi expressed very similar sentiments. Nobody is against alternative implementations.
> What some people, especially Satoshi, have said is that there’s an unusual amount of risk involved with reimplementing the full system and using that reimplementation to mine. Bitcoin is very complex and if you aren’t skilled and very thorough you are likely to diverge from its behavior in small, hard to detect ways. This can fork the chain and split the economy. It’s one of the few things that could instantly kill Bitcoin beyond legal harassment of its users.
- Mike Hearn 2011
> part of the solution is to encourage alternative implementations that make different trust/convenience tradeoffs than the reference implementation. There has been a lot of behind-the-scenes work on cross-implementation testing (the “testnet3″ blockchain contains hundreds of transaction validation test cases, for example), and new features are being added to the protocol to support alternative implementations
-Gavin Andresen, late 2012
If you're not mining, I personally would say it can only be good if you just follow the protocol spec and try to implement Bitcoin from there. If you fall off the main chain and it's not obviously your fault, then either the protocol spec was unclear and can get improved or the original client has a bug, in which case there are ways to carefully phase the bug out of existence with a patch.
I probably haven't been keeping up with the bitcointalk forums as much as yourself, but I haven't gotten the impression that the main bitcoin developers treat alternative implementations as threatening.
There are already several moderately-popular alternative implementations in the wild. Bitcoinj is used by pretty much every Android-based Bitcoin app. Electrum (a popular thin-client) is using a custom implementation on their backends, too.
Every time I ask or see a question from people who are working on alternative clients, the developers have always been supportive and informative.
The biggest problem, and possibly the underlying spirit of the warning you quoted, is that the current version of the protocol is not super-well documented. Some data structures are undefined and need to be reverse-engineered from other implementations (such as net_time), others are ambiguously defined.
My takeaway from the thread you linked is closer to this quote:
"Yes, please, feedback from re-implementors is very helpful."
— Gavin Andresen, lead Bitcoin developer.
How can a bug in the implementation be a security risk? Aren't the artefacts of bitcoin (I don't know the correct terminology) cryptographically unambiguous, completely independent of the implementation used to create them?
The "problem" with HTML incompatibilities is that there is room for plenty of ambiguity in both standard and implementation, without compromising the fundamental mission of the users of the protocol (rendering web content).
How can a bug in the implementation be a security risk?
This gets a little complicated. "Bitcoin" is a commodity, a protocol (series of promises), a protocol (computer code to implement that series of promises), a particular distributed ledger, a network (of computers implementing the computer code implementing the series of promises with regards to one consensus distributed ledger), etc etc.
Consider bitcoin the series of promises. One such promise is under what circumstances opcode 32 causes a transaction to be marked as invalid. (Don't bother looking it up -- you'll have to go down a rabbit hole for a while.) It is critically important that all code implementing opcode 32 comes to the same conclusion regarding every transaction in history and every transaction in the future.
Consider an edge case for opcode 32, occurring in a transaction that I'll call T, two years ago. T is, as of right now, accepted as valid in the history that bitcoin (as you understand that term) accepts as correct. If a malfunctioning bitcoin client were to reject T, they would reject large portions of history after T, because they'd be predicated on an invalid series of transactions. Following me so far?
So the more interesting case isn't when there is a difference of opinions in the past, it is when there is a predictable difference of opinions in the future. In this case, one set of clients starts building a transaction ledger B, and one starts building a transaction ledger B'. They are different transaction ledgers. People subscribing to either ledger, which they think of as The One True Bitcoin Ledger, will be mighty pissed when other people in The One True Bitcoin Economy tell them that their unforgeable, unloseable, ungovernable units of crypocurrency are actually worthless bits.
This is called a blockchain fork. Bitcoiners are terrified of it happening again. (Yes, it has already happened at least once. Long story. "No harm, no foul, he wasn't trying to crash our entire economy so we dodged a bullet." is pretty much the consensus among Bitcoiners who actually understand what happened there.)
So, when this happens again, and then is discovered, who decides which partition of the tree is the "fork" and which is the main branch?
I presume it happens by consensus? But... where does consensus come from, in the general case? Does it operate like much open-source governance, via a mix of public and private meetings on IRC between the maintainers of the biggest clients? When that "community" decides to invalidate the mobster's Bitcoins, is there an appeals process, and do the judges have bodyguards?
How does one get a seat at the Bitcoin developer summit in the first place? Do the contributors of the most code have the most influence? Which in practice usually translates to "friends and allies of the project founder" and "rich individuals and companies who pay developers to contribute to the project in order to influence its direction?"
Short forks happen all the time, it's a natural consequence of the finite speed of light. So long as any divergence is detected and resolved within the window of typical forks (e.g. <6 blocks) is mostly a non-issue.
The natural thing to accept is the history acceptable to most distinct makes of software (least common denominator) which would be the automatically successful one with no intervention at all if not for the extreme consolidation of hashing power on a couple overly large pools.
In that particular case it was weird because something like 80% of the public network nodes / users / services were on <0.8 but most of the hash-power was on 0.8.0 simply because the two largest pools were already on 0.8. (0.8's performance improvements could increase miner's income due to needing less time to process and switch to new blocks)
If you're a merchant: please stop processing transactions until the chains converge.
Is this real? This is like data processing slapstick comedy. I'm imagining the scene where someone steps into a boardroom and says this to Jeff Bezos. "Don't worry, I'll pay you back out of my salary. Wait, how many transactions per second did you say?"
When a Bitcoin company puts up a job ad asking for devops engineers: flee in terror.
Bitcoin is a distributed consensus algorithm. Software which implements a subtly different algorithm might be able to join the consensus until some case arises which triggers the difference, after that consensus between the implementations would be impossible.
Unfortunately, it's rather hard to demonstrate that two dissimilar complex programs actually implement exactly the same function.
This is doubly true because performance is very important, so some techniques that might help— e.g. define a kind of abstract turing machine, write the rules for the distributed algorithm in its language, then different implementations use the same rules and only have to prove their turing machine implementation is functionally identical— aren't readily available.
It can be hard to gain confidence through testing because some of the most important rules are for cases which are forbidden, so these cases will not show up on the production network— they only happen when an attacker produces them. There are an infinite number (subject to memory, and already the function input is gigabytes of data) of valid and invalid cases, so exhaustive testing of the whole function is not possible. They can also arise out of implicit behavior, "what happens when this value overflows?", "what is the maximum size of this structure", etc.
This is further complicated by the fact that Bitcoin implementations use third party code that wasn't written with this kind of must-have-exact-behavior in mind. The reference implementation uses things like OpenSSL's crypto and bignums, various boost data structures, BDB (previously, now leveldb)... and subtle potentially undocumented and unknown behavior from this third party code may be leaking into the definition of the consensus algorithm. The reference implementation has been slowly disentangling these dependencies, but things like the hidden BDB locking limitations (which depend on the layout of the database on disk) are easy to miss.
(Third party code isn't just a reference implementation issue, e.g. BitcoinJ's full node code has had algorithm inconsistencies arising out of undocumented behavior in a database library it uses)
Satoshi's answer for this was that there should only be one implementation of the full node software.
He should have talked to a release engineer. There are always at least two implementations of a piece of software: the stable version, and the prerelease version.
With Herculean effort and focus one could theoretically arrange a system where every customer on Earth runs version N until midnight on Dec 31 and runs version N+1 one microsecond later. You would have to distribute the new code in advance. But it wouldn't ever be perfect, because networks get partitioned: someone's PC would be unplugged when the patch was pushed. And what's the rollback procedure when you push the unforeseen bug?
I think this paper presents a poor security definition for Bitcoin, at the very least because it assumes a non-adaptive adversary (i.e. the adversary cannot change which nodes it controls over time). That is not how things work in the real world; in the real world, the adversary will keep increasing his computational power (the number of "pawns") until his attack is successful. I am also skeptical of a computational model that assumes the adversary's computation is somehow synchronized with the honest parties'; that is not even remotely realistic, and it is very weak.
In other words, that paper does not tell us anything we did not already know: Bitcoin is secure only if nobody does as much computational work as the rest of the network combined (though there may be other attacks on the protocol that have not yet been discovered). If anything, this paper is a step towards formalizing a security definition for a protocol that is similar to Bitcoin, or a step towards proving that there will always be a polynomial time attack against such protocols (as was the case with Merkle's puzzles).
It assumes an anonymous protocol. The identity of the nodes does not matter.
> a step towards proving that there will always be a polynomial time attack against such protocols
I had considered that to be the definition of a majority consensus. I find it sort of surprising that you'd think otherwise.
Lets just assume that some alternative protocol has a property where a an attack with the >50% computing power would be ignored. Then it follows that it would also allow an attack with <50% computing power— unless "attack" could be detected as a function of network state, in which case any sane system would just ignore those entirely— as Bitcoin does, e.g. a transaction outputting more coins than it inputs is ignored regardless of the hashpower— so they're not the kind of attacks we're talking about here.
Even if you dispense with all the crypto-computing-power-mumbo-jumbo: A _consensus_ ultimately depends on linear energy applied to an attack. Lets imagine a magical version of Bitcoin solves the sybil problem completely and counts the consensus of _users_ instead of computing power. China (for example) could reorganize the consensus by spending a lot of energy to manufacturer a lot of additional people. So long as the attacker put in more energy mining people than all the honest participants they'd always eventually win.
"It assumes an anonymous protocol. The identity of the nodes does not matter."
Nor would it matter in a protocol where each node is assigned a unique ID. All that matters is whether or not the nodes are malicious and whether malicious nodes can violate some security property of the protocol.
"I had considered that to be the definition of a majority consensus."
This is kind of like saying that a Merkle puzzles approach is the only key exchange system.
"Lets just assume that some alternative protocol has a property where a an attack with the >50% computing power would be ignored."
Why even bother with "computing power?" Let's make the adversary more powerful by allowing attacks that run in polynomial time, and furthermore allowing the attacker to coordinate parties of its choosing, and to adaptive corrupt more parties in the system. Security against such attackers is not at all unheard of:
"Then it follows that it would also allow an attack with <50% computing power— unless "attack" could be detected as a function of network state,"
Yes, that is how secure multiparty computation is usually approached. That is why secure protocols in the malicious model (or some multiparty variant of it) often involve zero-knowledge proofs, commitment schemes, and so forth.
"A _consensus_ ultimately depends on linear energy applied to an attack"
Again, it sounds like you are arguing against the use of consensus systems. It sounds like you are saying Bitcoin cannot be secure (at least not as a digital cash system), at least not by any cryptographic security definition.
To put it another way, would you want the person or group with the loudest voice to prevent you from spending your money, or to take money they gave you back against your will? It is one thing for a currency issuer to destroy an economy; it is another for anyone who spins their CPU to be able to cheat or engage in targeted attacks.
The only reason an agreement protocol is needed is because nodes are allowed to have different opinions (in Bitcoin's case, on the contents of the block chain) and so the protocol agrees on one of these opinions. If a system could be designed such that there is no room for opinion then it would be obvious whether a block is correct or not and thus having more resources would not benefit an attacker. This might require too much synchronization to be practical, though.
A bitcoin node tries to determine what the consensus is about the bitcoin ledger by looking for the longest block chain.
However, in order to do that it must first collect blocks from other nodes and reject any invalid ones. If a certain implementation disagrees with the rest of the network about the validity of a block, then it will disagree with the rest of the network about the state of the ledger. This can then be exploited by an attacker, e.g. by paying a merchant with a transaction that seems to OK to him but is not accepted by the rest of the network.
I don't think that this is a big problem though. A node knows that it is no longer in agreement with the majority of the network when it sees an "invalid" block chain that has the most proof of work done on it. In this case it should shut itself down.
"The smart bitcoin users will choose the best client available, not just the default one."
What you are saying is that the security of Bitcoin depends on everyone other than the smart users. If the block chain is forked because of an incompatibility between two Bitcoin clients, the fact that smart users choose the "best" client is irrelevant; you need most users to choose that client. You should not rely on most of the users of a cryptosystem to be "smart" about using it:
A split that was near 50/50 (e.g. 'most') would be an utterly pessimal outcome, while a 99.9/0.01 split is probably pretty harmless, at least if you talk in terms weighed by some kind of amorphous economic significance.
> The implementation is not yet entirely done, but most core features such as transaction verification, database interaction and network connectivity are tested as working, and the company has released one component of the system for public review...
I am willing to re-state "I have a different definition of 'implementation'"? It actually occurred to me that that is how "full" might be being used in this context (I spent some time recently learning about bitcoin and even ran a class on it a few days ago, and had run into the distinction), but if I were to say I had "an end-to-end website rendering engine written in Erlang" would you not also feel a little disappointed when you got to my page and found only that "most of the core pieces, such as CSS3 selectors, DOM event bubbling, and network connectivity are done and we've open sourced the HTML parser"?
i think there has been some confusion in links and titles: btcd is a full-node alternative implementation of the bitcoin protocol. this was not meant to be misleading and you are right to point out that the btcwire package is far from the entire piece of software.
in another few weeks all the pieces will be public and it will be closer to a full implementation, per my interpretation of your use of the word full.
Some variety in Bitcoin software could be a good thing. However, I'd just like to leave a reminder here that beta-level software is not something you should trust significant amounts of Bitcoins to, or build a business on, until it stops being beta-level.
Some have said that it fixes potential issues with Bitcoin (quicker confirmation time for instance). However no one really knows how much confidence will be needed for real transactions at this point, so most of the fixes are of questionable importance.