Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Legion, an as-simple-as-possible blockchain server written in Haskell (github.com/aviaviavi)
239 points by aviaviavi on May 29, 2017 | hide | past | favorite | 71 comments



I'm tempted to just make a youtube video going over this code line by line.

It's a really great introduction to a non-trivial bit of Haskell.


I want to chime in that this would be amazing also. However, an actual line-by-line video might be less helpful for those without any Haskell or blockchain experience.

I (and thousands of others at HN) really enjoyed Jeremy Ruten's tutorial of building antirez's 1000-line text editor, Kilo, from scratch (antirez is the author of Redis which is renowned for its code quality and approachability) - http://viewsourcecode.org/snaptoken/kilo/ . Additionally, the author of that walkthrough has published the project used to create that tutorial here - https://github.com/yjerem/leg

I realize this might be less trivial than creating a Youtube video, but if you have the bandwidth I'm certain creating a similar walkthrough for Legion would not go unappreciated here :)


I actually think that if I did it line by line in the traditional haskell style that'd be more approachable for this project.

Kilo is quite a bit bigger.

With respect and praise offered to the author, Legion is a simple project. I think an approach of building it interactively and showing how a developer actually makes the types line up and uses the tooling probably answers more questions about what it's like to use Haskell than you may give it credit for.


Regarding the Kilo tutorial: a text editor is something that most people intuitively understand how to use, but the author took the time to explain the challenges that needed to be solved in order to create a terminal text editor, and built it incrementally while explaining each problem encountered and then how each iteration provided a solution.

I can definitely also see your argument that a smaller and more straightforward project may be easier to explain. As someone at least superficially acquainted with how blockchain technology works, my fear was that a lack of understanding of the thing being built might be a stumbling point for readers/listeners of a line-by-line code walkthrough.

However, you seem confident that a line-by-line will be sufficient to explain both the Haskell and bitcoin sides of things concisely, so I'll defer to your judgment, and look forward to it!


Hah, I think the challenge of explaining blockchains is why anyone thinks they solve distributed systems problems and then working through how they don't.

Anytime anyone asks me how they work I say, "They don't, actually. They're really a data integrity system that's being misused, and the current economically motivated implementations (currency) are motivated by the assumption that a majority of computing power in the system is guaranteed to work with the system and not try to take it over."

I will probably avoid that conversation.


This is a cool idea. I would love to do this, but realistically won't have time for a little while. I can certainly look into it more when I get some spare time!


That would be awesome! Hopefully over time I (or anyone contributing) can spend a little more time making it as idiomatic as possible and beefing up the docs a little bit.

Please let me know if you end up doing that!


Okay.

And your code is just very clean and approachable and doesn't try to do popular gymnastics. I'll ping you via GitHub.


Would it make sense to make the code Literate Haskell?


A Literate Haskell blockchain might just be Peak HN.

(Definitely peak HN if it runs in the browser).


Is GHCJS using WASM yet, because if so...


There's a Summer of Haskell project going this year. I think it's called WebGHC?

Edit: https://webghc.github.io


Haha, when that happens, what will it even mean anymore. What even will software be.


A literate blockchain implementation?? YES please!!!


I'm not familiar with Literate Haskell, what would that involve?


"Programs should be written for people to read, and only incidentally for machines to execute." (Structure and Interpretation of Computer Programs" by Abelson and Sussman)

The idea is that you write a "Book" and the program is extracted, from the book.

Literate Programming Wikipedia: https://en.wikipedia.org/wiki/Literate_programming

A nice talk about it: https://youtu.be/Av0PQDVTP4A


It's just where the majority of the file is a comment instead of a minority.


Thank you. Thank you so much.

It's an old pet peeve of mine. I cringe inside every time someone starts waxing about "writing your thoughts first, and your code only incidentally" or "it's a completely different paradigm of programming" and other such pieces of useless, discouraging new age fluff.


Please do


Great use of distributed-process and distributed-process-p2p! Being able to access service discovery and peer-to-peer communication as simple Haskell packages is kind of amazing.

* https://hackage.haskell.org/package/distributed-process-p2p

* https://hackage.haskell.org/package/distributed-process


Those libraries were essential for this project! High quality. Are you a contributor to them?

I actually had a slightly tough time finding good examples to work off of for p2p, (the cloud haskell website had the best ones), so hopefully this code can also work as another good example for those packages. To be fair, that's really a complaint of the ecosystem as a whole, not distributed-process-* specifically.


I had a brief look through, and it looks like there's no proof-of-work in the mining?

What stops somebody from rewriting history and passing it off as legitimate?


Nothing. It's a minimal blockchain.


A blockchain without a consensus algorithm is just a linked list.


Correct, the goal for this was as minimal as possible.

However, if I (or anyone wanting to contribute) can come up with a very compact way to implement that, we could definitely add it!


A very simple proof of work is adding a token to the data before hashing:

sha256(token + data + previous hash)

i.e. (for simplicity, previous hash = "")

sha256("1" + "example" + PreviousHash) -> ee73735c2355bc4d63319a6638e356ef42d0d56746317c99f52d3e7ac71cbf52

sha256("2" + "example" + PreviousHash) -> 6236cd42286c74a1683eaf394b3b2f40a3a52479e1a3d7e732175fcdfa49b931

...

sha256("10" + "example" + PreviousHash) -> 346fea90403e733fd6e925920eb10be06171b4f77f7cc8c4ef214a12f04d79c1

And only accepting hashes as valid where the first two characters form "34". So the last example (token 10) would be valid. This requires then to do multiple hashing operations until a valid hash emerges, thus the proof of work.


Note that "the first X characters" is kind of misleading, because it is usually more fine grained.

It is better (and IMHO even simpler!) to think of the hash as a large integer, not a string. Then you want that integer to be smaller than a certain value.

With strings, you can only say: "00123" shall start with "00", while wih integers, you would say: 00123 shall be smaller than 300.

(To be that fine grained with strings, you would either need multiple prefixes ("000", "001", "002") regexes ("00[012]"), both more hassle than a simple less-than operation.)


I agree with the gist, but note that even with strings, "00122" is less than "00200" :)


Oh, you are right! However, my point still stands that this is not about prefixes, but about a less-than comparison.


You can implement 0 prefix checking pretty easily in bitstrings:

    0 == hash >> (hash_length - prefix_length)
Or, alternatively:

    !(hash & prefix_mask)
where `prefix_mask` is 1111..000, with `prefix_length` number of 1s.

I think that working in terms of bitstrings generated by the hash gives more context to why those integers are chosen (that they represent a certain portion of the space of possible answers), but it's also a matter of taste/background.


This is still more combersome yet less precise than a straight greater-than check.


Thanks for simple explanation! Maybe between myself and perhaps 18nleung we can get a simple version of this in there


To implement rudimentary proof of work, you could add a nonce field and then force every block hash to start with a certain number of zeroes - I'll see what I can do and submit a pull!


Awesome, please do!


Neat. Just browsing through the source, is there a reason that:

   calculateBlockHash (Block i p t b _)  = concatMap hashString [show i, p, show t, b]
is not

   calculateBlockHash (Block i p t b _)  =  hashString $ concat [show i, p, show t, b]
i.e. why is a block hash a concatenation of 4 SHA-256 hashes instead of just 1? Is there some security benefit of doing it one way vs. the other?

edit: thinking about it a little more, option 2 seems better because in option 1 you can calculate 3/4 of a block hash without knowing the previous block's hash. Maybe that's not a problem in this context though?


You're correct, it really should be the 2nd way. The current form isn't really a problem per se, mostly just that the generated hashes are really long and make the output harder to read. I'll get that updated, great catch.

EDIT: code updated.


Of course there are name collisions everywhere, but, there's a metacomputing/grid computing/distributed OS thingie named "Legion" from the University of Virginia that has a bunch of papers about it in the CS literature.

Oh, and we saw off the American Legion threatening to sue us by saying "We're the State of Virginia, go right ahead..."; what's your strategy? :-)


Heh, well if there are actually problems I can definitely change it... But I only meant this to be a fun, small, open source blockchain example (and to learn more Haskell). So I didn't put too much effort into the naming part :)


You mean this project might be summoning daemons that will claim, "Our name is Legion"?

http://bible.oremus.org/?ql=363093142


Humorously, the professor who named it thought the quote was from Heinlein. Not much of a Bible-reader.

Edit: Apparently I'm not very good at remembering my SciFi, maybe it's this? https://en.wikipedia.org/wiki/My_Name_Is_Legion_(short_story...


Nicely done. It reads like Ikea instructions. I didn't really know how blockchain worked until I read this code. If this was written as a teaching tool or a portfolio piece or even an art project, then I say its a big success.

One question:

    getBlockChain :: (SpockState m ~ BlockChainState, MonadIO m, HasSpock m) => m [Block]
What does the squiggle mean?


If you learned a thing about blockchains from this project, I'd call it a big success too! :D So happy to hear it was helpful


Thanks for making this. It reminds me of the day I fell in love with Haskell. I wanted to learn how a Verilog simulator works. There's a couple good open-source ones but I found the code impenetrable, so I went looking for something like an introductory course and found this obscure, disused code called Hydra which is hard to find. Let me see ... here it is:

https://github.com/garry-sharp/x/tree/master/CA4/HydraLib-0....

After reading for just an hour, the lightbulb came on and I not-only felt I had learned how a digital simulator works but also what a mighty hammer Haskell can be for teaching.

Since then I've read (and unfortunately written) a lot of Haskell that is far less elucidating. It's great to come across something that reminds me how I got turned on in the first place and that I should aspire to produce code that not-only barely gets the job done but also explains itself and embodies a modicum of beauty. Thanks again.



Thanks, but I still don't get it. So SpockState m equals BlockChainState? So getState's type is m BlockChainState?

    (BlockChainState chain _ _) <- getState
But it has to be anyway so, I guess I'm still confused. Is the squiggle for extra documentation or is it a hint to help the type inference? Why would type inference need help? Does it have to do with some of the extensions that are applied in the source (but appear to be otherwise unused)?


Yeah, it's for the compiler, otherwise it can't figure out that a BlockChainState is actually a SpockState m. I'm not really sure how it could otherwise, since the definition is completely decoupled from Spock.

It should be noted that it's entirely possible there's a less confusing way to write that type signature. I'm far from an expert in Haskell.


I'm farther from an expert than you. I tried (and failed) to find a less confusing way...

One thing I see people doing when they have real gnarly type signature is to concentrate the horror in one place and paint a veneer of simplicity elsewhere:

    type Get stateType returnType
      = forall monad.
      (SpockState monad ~ stateType, MonadIO monad, HasSpock monad)
      => monad returnType
Then elsewhere:

    getBlockChain :: Get BlockChainState [Block]
    getLatestBlock :: Get BlockChainState Block
This may only help superficially as type errors will still be hard-to-understand.


Ah, I think this is a better though! If you got this working, feel free to open up a PR. Or I can look into it when I get time.


Also, unless I'm mistaken, TypeFamilies are the relevant GHC extension here. I'm not exactly sure how this would be written differently without that


My takeaway from this was to learn more about Literate Haskell. As somebody points out a Blockchain without a consensus mechanism is simply a linked list ie this is not really about pushing the envelope of Blockchain tech.


In case you're not aware of it, also check out Haskoin, a Bitcoin implementation in Haskell.

https://github.com/haskoin/haskoin


Can someone explain to me in short summary why blockchain without crypto currency suddenly is a thing and why anybody would want it?


ELI5: A blockchain is an »I told you so« store. Future statements/promises are linked to past ones so you can’t mess one without having to mess with all the others. It’s like a black board you can’t wipe clean again, adding more text (and space to write to) is the only thing possible.

For example, you could promise to paint your house red in a blockchain. You later regret that promise and tell your friends you never said that, but since they all have access to all promises made, they have good evidence that what you say is not true.


Thanks, very good explanation. So blockchain enables us to move from "our data warehouse is the single source of truth" to "our data warehouse is the single History of truth".


I'd say more like "this tiny [latest] block is a strong proof of history".


Except without proof-of-work, rewriting all of history is trivial.


A simple counter-example comes to mind: rewriting history is trivial only if you want no central authority.

Even if we lose the "decentralized consensus" aspect with no PoW, it still can be advantageous. E.g.: simply having an authoritative server always responding with the latest block hash. The server is very lightweight (as in 80s microcontroller lightweight) and it can offload the actual blockchain handling to clients.


Thanks for sharing, the code looks very elegant. This alone makes me want to sit down and learn Haskell.


Thanks, you definitely should! It's very fun to write code in.


Haskell is known for that


... had this been written in a language less esoteric than Haskell, it would've been even better for the rest of us :)


Almost exactly my thought every time I see something written in JS.


I suggest you take a look at https://github.com/lhartikk/naivechain, the original version of this work, written in JS


I second the recommendation for Naivechain. It is really easy to understand.

For me, reading the Javascript code of Naivechain made me understand Blockchains in an easier-to-understand way than all the tutorials that try to explain how to implement a blockchain!!


Haskell is only esoteric if you understand programming to be mutating variable contents. Once you let go of this odd way of defining program logic, Haskell is actually a much simpler way to program, since you're writing your logic directly, instead of emulating it using mutation of state.


I think it's too bad we don't have module interoperability. There's never going to be just one programming language.

The situation on the JVM -- where Scala, Java and Kotlin can use one another's modules -- is a positive example. Clang modules and Swift's auto-magic import of them is another.


I'm pretty uninformed about libffi but can it help with kind of module interoperability? I've used it to get Ruby to call C functions. I would guess that it could be used for other dynamic languages to call compiled functions such as JS to Haskell like the OP seems to be desiring. Is this true?


I think the problem is more around signatures, actually. If we can find the code, we know how to call it (libffi makes that easy) but how do we expose the library interface?

Clang modules takes some steps towards that for native code. On the JVM, everything is a Java class in the end, with Java type signatures for its methods (which can point to other Java classes, and so on) so rich cross language interfaces are a solved problem.

Other posters have raised concerns about garbage collection or "losing all of the nice features of whatever languages you're using" but I have some doubts:

* Interface definition languages like Protobuf or Cap'N'Proto actually show us just how much commonality there is across languages. They provide for many features that are fancy relative to C but not really relative to anything else; and the interfaces are definitely useful.

* Many languages with sophisticated type systems work by compiling object code and then appending an interface file to capture the type information. Haskell, Rust, OCaml and many others do this, essentially treating every module as foreign code with additional type information.

A good cross language module system would need those features that make sense across languages -- parameterized array and map types among them -- and this does mean, not every piece of Python (for example) would make sense as a Rust library. Consider the interface of the linked project, though -- only `mineBlockFrom` needs to be rethought to be typable in a way that makes sense across many languages.


Most languages have some degree of interoperability with C, but that means losing all of the nice features of whatever languages you're using, even if they're implemented on both ends.


The hardest part of combining two languages through a foreign function interface (eg Ruby/C using libffi) is making their garbage collectors work together. You basically have to either use a single garbage collector across the language boundary, like the family of languages implemented on the JVM do, or require one or both of the languages to implement GC-less memory management, like in the Ruby/C case.


Well uh, value judgements aside my reading of those code is that it is short, straightforward in its choices and generally quite good.

I'm not sure what there is to be mad about. This seems pretty approachable.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: