Hacker News new | past | comments | ask | show | jobs | submit login
Bitcoin From Scratch – Part 1 (monokh.com)
293 points by monokh 86 days ago | hide | past | favorite | 69 comments

I wrote a toy (but working) implementation of the Bitcoin algorithm here: https://github.com/tlrobinson/tomcoin

The main logic is a few hundred lines of code: https://github.com/tlrobinson/tomcoin/blob/master/src/node.j...

A somewhat unique aspect is since it's written in JavaScript a node can run on the server or the browser, which makes for easy demos:

1. Open https://tomcoin.herokuapp.com/ and https://tomcoin1.herokuapp.com/ in two browser windows (the private key is stored in localStorage so using different domains ensures each gets a different wallet)

2. Click "Start Mining" on one or both nodes

3. Once you've mined some TomCoin (should just take a few seconds unless more people start mining) copy the public key from one node to the other's "publicKey" field, enter an amount, and click "Transfer"

4. Wait for another block to be mined and you should see the balances transfer

(There's no persistence so the chain will reset when all the Heroku nodes idle out and other nodes close)

(It also just uses WebSockets to connect to the instances on Heroku, or in theory other server instances people are running. I meant to implement WebRTC for P2P between browser nodes but never got around to it)

This is great. I especially like that it follows the protocol's implementation for the most part. i.e. you still have utxos and script verification even if it doesn't actually support the array of script op codes.

There is a fully compatible JavaScript implementation that you could use as reference to extend yours: https://github.com/bcoin-org/bcoin

With my post, I tried to boil down the concepts as much as possible to save some mental bandwidth and make understanding the overall easier. We will see how that turns out

    const {
    } = require("./util");
What does this do?

And will this really run in a browser? I think brosers don't have require()?

Es6 destructuring- should run in most modern browsers although I’m a Es5 transpiler fan still!

Most modern browsers have require()? I thought they only have import?

You can see what your functions do by opening the file called util.js in the same directory. And, yes you can run this in the browser by either polyfilling "require" or transpiling it

By that logic, everything runs in a browser. The principle of computational equivalence tells us that one turing complete language can do what any other can.

So yes, everything can be polyfilled or transpiled.

But that does not really meet my understanding of "Runs in the browser".

When I open https://tomcoin.herokuapp.com/ in a browser I see it running. That meets my definition of "runs in the browser".

Would you have been happier if I wrote "written in a dialect of JavaScript which is transpiled to ES5 using Webpack and Babel so it can run in the browser"?

Computational equivalence doesn't include things like networking and resource access if you run in a sandboxed environment.

If you run browserify on an index.js file, it will usually spit out a single .js file that will run in the browser.

> The PoW is suffice if the hash begins with a certain number of 0s

This is a common explanation of PoW, but is actually incorrect. If you think about it, this would mean that the PoW difficulty could only increase (or decrease) by a factor of two. In reality, the block hash is simply interpreted as a (very large) number, and this number must be less than some other very large number (the "target"). So you do end up with a lot of leading zeros -- but these are just a side effect, not the thing being measured.

(author here) Absolutely. Thanks for pointing this out. If difficulty adjusted in factors of two, the ability of the network to accurately maintain its 10 mins blocktime become impaired.

I try to keep these concepts rather simplified to ease the reading and learning process. Perhaps there should be a note for these situations.

In defense of the explanation, sometimes it's asking for an unnecessary mental leap of the reader to think of a bit array as a number, especially if the reader is not accustomed to thinking in base-2 or base-16 number systems.

The hash is just a number regardless-- surely no one has any problem understanding any other number in the protocol as a number! :)

Even this sentence is just a number.

Really it’s not because it’s a sequence of glyphs first and foremost. It can be mapped to a number though.

Nope, no mapping necessary. 36 octets, so a single 288 bit number in base 2. (Big-endian, unsigned, I guess) Then convert to any base you want.

(I should have probably used 32 octets so that it fits neatly in 256 bits)

Isn’t that a mapping? Because the primary purpose of the string of glyphs is readable by humans. Such representation is mapped to the numerical representation for storage in the computer. It may also be mapped to lines of ink written on a piece of paper for transmission

Oh, in my example, it's just a coincidence that the number was readable by humans! Other numbers may come out as complete gibberish.

The "mapping" you may be referring to is happening outside and subject to how your brain /your browser interpret it. (How do you know that when I posted the comment, that I didn't supply the raw bits of the number?! ;-))

Anyway, it was just a joke for low level-programmers I guess. Have a good day ;-)

You could easily explain the basic principles in human-sized numbers (ie, less than 1024).

TIL. I wonder why that myth is so commonly perpetuated.

It used to work this way, but got changed to a less-than check for performance reasons. The Bitcoin whitepaper [0] even has the original approach:

> The proof-of-work involves scanning for a value that when hashed, such as with SHA-256, the hash begins with a number of zero bits.

[0] https://bitcoin.org/bitcoin.pdf

Bitcoin never worked that way. However the original hashcash proposal was specified as leading zeroes. I presume that’s what you are mixing up with.

> but got changed to a less-than check for performance reasons.

Performance as in "it takes less time to verify the PoW," or something else? Because the "leading zeros" and "less than target" approaches would both take a negligible amount of time.

The story I was told from Bitcoin developers was that it was done for code performance reasons. However, since the less-than check goes back all the way to the code's "Initial commit" [0], I'm not sure if that's fully accurate -- it might have been based on correspondence from the original author. At the very least, the misconception comes from Bitcoin's own whitepaper.

[0] https://github.com/bitcoin/bitcoin/blob/e071a3f6c06f41068ad1...

Weird. That doesn't make any sense to me.

btw, the whitepaper probably used "number of leading zeros" because that's what Hashcash does.

The article mentions the hash is required to begin with a certain number of 0s in hex digits (half byte) whereas the Bitcoin whitepaper refers to it as number of bits. I'm unsure about the comparison (== or >) but the whitepaper seems accurate with regards to the target.

> It used to work this way, but got changed to a less-than check for performance reasons.

I don't know who told you that, but they were weirdly confused.

It not unlikely the case that sometime early in development (long before publication) that it was bits-based, not only is this how the standard hashcash code works-- but the Bitcoin code calls the relevant field that encods the difficulty "bits".

But Bitcoin itself never worked that way, not from the first instant. Changing it would have been a consensus incompatible change, and Bitcoin is still more or less consensus compatible all the way back. (There are outright bugs that make the old software get stuck, but if you fix those, it follows along).

I don't know who told you that, but they were weirdly confused.

The bitcoin whitepaper told them that, in more than one place too, so it's not a simple typo, e.g.

...we implement the proof-of-work by incrementing a nonce in the block until a value is found that gives the block's hash the required zero bits.

I did not read the sentence you are responding to as implying it was changed after final network launch, only that it was changed.

Well there is no evidence of that other than "bits" and that's how hashcash worked. Sounds like an invitation on the meaning of the word "changed".

No bitcoin software was ever released that worked another way.

Gotcha, thanks for the explanation.

There’s a difference between mining bitcoins, and mining for network fees.

What difference would that be? Both the fees and the block reward go to the miner who first published a valid block with hash lower than the current difficulty. The only difference is where those bitcoins came from, but that has no impact on how easy it was to mine.

Yeah as a miner you add the fee and the earned coins to the single coinbase transaction, so it’s a single mining action to get both.

It’s not a myth. It’s a useful simplification.

It's also accurate if you say that hash value H has

-log_2 (H / 2^256) = 256 - log_2 H leading 0s

and allow for fractional number of leading 0s.

The nBits header field closely resembles a 32-bit floating point representation of this number (I say resembles because it's really a floating point representation of the threshold H value).

That doesn't seem like a simplification that misrepresents a relevant, core dynamic of Bitcoin. The exact details of how you ramp up difficulty are a pretty low-level technical issue.

Do you have a source I can look at or do I have to go look at at or can you point to code by chance?

I'm quite surprised I wasn't aware of this if this is actually how it is implemented, the leading zeroes seemed right (and the difficulty by factor of 2 also!)


Someone else linked it below: https://github.com/bitcoin/bitcoin/blob/e071a3f6c06f41068ad1...

Equivalent check in btcd, an alternative Bitcoin implementation: https://github.com/btcsuite/btcd/blob/991d32e72fe84d5fbf9c47...

How is network consensus achieved on the difficulty adjustment?

Each node independently comes to the new difficulty on their own in an entirely deterministic way.

After 2 weeks there should be 2016 blocks mined, 1 block every 10 minutes. After each 2 week period every node recalculates the new difficulty based on the actual number of blocks mined in the last 2 weeks. If the actual number of blocks produced was 25% higher than the target of 2016, then simply increase the difficulty by 25%.

Thanks. Given that this relies on subjective timestamps, is this really entirely deterministic? I can imagine that one node counted 2014 blocks in the past two weeks, another 2015. What happens in such cases?

Difficulty only changes at heights that are a multiple of 2016.

It is computed using only information embedded in the block chain, specifically the timestamps of blocks 2016 * i and 2016 * (i+1).

But aren't blockchain timestamps unreliable? It was my understanding that difficulty was based on subjective node timestamps..

The miner embeds their current (local) timestamp in the header, so there can be small variations. In fact the only requirement on the next timestamp is that it not be less than the median of the past 11 timestamps.

Once embedded in the header, everyone will agree on past timestamps as they agree on the chain with the most accumulated difficulty.

For people that need or want to know (really understand) in depth about Bitcoin without any previous background in Computer Science (for example, in Finance, investors, journalists, historians, curious people in general) I have to recommend Ted Nelson's "How Bitcoin actually works" (2014) from his series "Computers for Cynics" [0]. I enjoyed it. (+)

[0] https://youtu.be/3CMucDjJQ4E

(+) For the people with background in CS: it might be that the video at some point talks about the number of zeros instead of the comparison, but really, it does not detract a 'bit' from its value.

edit: formatting

Here is a simple bitcoin implementation in ruby including the networking for interested, that I did some time ago:


Clear, concise and to the point. Looking forward to reading the rest. If you finish the whole series and fix/improve the posts in this sequence using the feedback from here, I think it might serve as one of the go-to pages for technically inclined people who are interested to learn about Bitcoin and blockchains.

Except they’re polarising as hell. Expect someone to chime in and state how they are a complete waste of time and energy.

That's generally true. The trouble is that your post more or less does to the thread what you're predicting that people will do to the thread—the polarity of the high-order bit doesn't matter much. It's best to abstain.


https://github.com/lhartikk/naivechain is a rather decent introductory implementation as well.

Love this way of learning about tech. : build one yourself (the truly noble way would be to try and build without having seen the specs, just from a description of what it does, but that'd take ages).

However: when it comes to implementations, the real hard part of the whole Bitcoin affair is where the gnarly details hide, namely consensu, and it is missing at this point.

Mining part is confusing for me. Is POW is essentially to achieve difficulty level ? You do in this while loop until you have achieved this difficulty level ? Is this the primary factor that differentiates having various protocol in crypto ?

Yes. Proof of Work is required so that there is a global limit on the rate at which new coins are created.

It also:

- Acts as the primary mechanism to make the data in the chain immutable, as the cumulation of the work (chain of valid blocks) becomes harder and harder to reproduce.

- Anti spam, i.e. you need to put something at stake (electricity and hardware cost) to participate in the issuance and transaction ordering

There are different ways to think about it, but I find this the most illustrative.

Mining is about authorship of the blockchain. The system is simple enough; people broadcast transactions, and they get collected and entered into a shared ledger. But the details are tricky. Order is one major issue; does A or B come first? A and B may be mutually exclusive, so whichever is first is critical. Because the network is imperfect, and network users may be malicious, it is very important that we come up with an extremely robust way of deciding how transactions are entered into the blockchain. Authorship - that is, mining a block - is everything.

The simple solution is to make it random. If it's random, then no individual miner can reliably control authorship, and thus cannot engage in attacks like double spends (unless they control a majority of the mining power, i.e. a 51% attack).

POW is a way to achieve randomness in a robust way. The only way to mine a block is to solve a problem that can only be solved via brute-force hashing. This is costly to do, so incentives are offered to get people to do it via coinbase rewards and transaction fees. This adds to the security model because miners now have strong economic ties to the POW algorithm and blockchain with capital and marginal costs.

Of course, there are other ways to achieve randomness, but can you think of one that will force a decentralized network to achieve robust consensus (i.e. to not fork)? Much of the 'magic' of Bitcoin comes when you realize the incentives under which miners operate. Once a valid block is broadcast on the network, miners have extremely powerful economic incentives to accept this block and begin mining on top of it.

Consider what happens if, instead, the miner continues to mine at the 'old' block height. They may come up with a valid block fairly soon (it's random, so they have no way of knowing), but each moment that passes, the other miner's valid block is being transmitted over the network. Any subsequently transmitted valid solution is at a disadvantage in becoming the consensus, and that grows as time passes. Basically, any time spent mining on the old block once a valid one has been seen is pure waste. Since the entire network operates from these same incentives, consensus grows very strongly as no one wants to mine on a minority chain.


Difficulty is a different issue; this is just a way to keep block times fairly consistent, even as the network grows and more mining power comes online. Because it is calculated directly from the block chain, difficulty rules follow the same strong incentives that build consensus on the majority chain.

Makes bit more sense now than I read originally. But I guess I have lot more questions now than before i came to read. Will have check more on this

Please finish this series! Especially the networking and consensus models it's an interesting read, really like that it's in Rust as I'm also learning Rust so it's nice to see the concepts in that language :)

> the transaction needs to be "Signed" by the `from` public key

Is that so? The code actually signs with the private key. Although I might have misunderstood what that sentence refer to.

Thanks, that was a mistake. I've corrected it.

Thanks for this info Monokh. Just want to raise a minor typo, which is the first line of code in "Signed transactions". I think it's missing two of the map keys.

I really like the intention and good will behind the post but almost every part is incorrect (even on an abstract level not restricting to bitcoin). I feel like OP is yet another someone who needed the keyword "blockchain expert" on his/her linkedin profile and trying to justify it now with misleading and incomplete information.

This is HN; you are supposed to point out (like some people have in this thread about the leading 000 count) the incorrect parts, providing some links or explanation why they are incorrect. Many people probably clicked here to learn something and so it's helpful to point out what they should be learning instead of this if it is 'incorrect'.

(author here) I'd love to know the inaccuracies so I can address it. Generally I tried to simplify things to make it easy to follow in code. Anything worth noting can be added and I've done this with some things brought up in this thread. I'd love to do the same with your suggestions.

I already qualify for the "blockchain expert" tag by working in the space for 2 years, though I don't find it that desirable. Maybe a few years ago.

Could you give some examples of what is incorrect?

It would be nice to be told what programming language this page is using.

Bookmarking this one.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact