
Scaling Blockchains with Apache Kafka - nestlequ1k
https://blog.gridplus.io/scaling-blockchains-with-apache-kafka-814c85781c6
======
galeaspablo
> If you aren’t familiar with the database pattern known as event sourcing
> (don’t worry — it’s relatively new),

It's not relatively new. That “transaction file” thing in your database? Event
Sourcing.

[https://goodenoughsoftware.net/2012/03/02/case-
studies/](https://goodenoughsoftware.net/2012/03/02/case-studies/)

> If you’re not looking at the public chain, you’re wasting your time

I disagree. Not having a single point of failure (one place that can get
hacked) is valuable.

> From a trust perspective, it makes no difference if your banking cartel is
> writing to a Quorum, Hyperledger, or Kafka instance.

Of course it does. The protocol of blockchains makes them work with "proof of
X". Appending to any event store, whether in Kafka or SQL does not require
proof of anything.

> Blockchains are built for trust, databases for throughput. Event sourcing
> allows us to achieve a hybrid model with characteristics of both.

No, the reason blockchains can't have high throughpout / almost infinite
horizontal scalability... is because there's a logic check. E.g. in bitcoin,
you can't send more bitcoins than you have a balance. Event sourcing gives you
the high throughpout if there's no logic checks across aggregates --- if there
are, you won't have immediate consistency, and you have to be ready for
compensating events.

I recommend two books, that cover event sourcing from a Domain Driven Design
perspective. The consequences are similar.

[https://www.amazon.co.uk/Domain-driven-Design-Tackling-
Compl...](https://www.amazon.co.uk/Domain-driven-Design-Tackling-Complexity-
Software/dp/0321125215) [https://www.amazon.co.uk/Implementing-Domain-Driven-
Design-V...](https://www.amazon.co.uk/Implementing-Domain-Driven-Design-
Vaughn-Vernon/dp/0321834577)

\-----------------

If that doesn't do it for you, please just remember the good old CAP theorem.

[https://en.wikipedia.org/wiki/CAP_theorem](https://en.wikipedia.org/wiki/CAP_theorem)

~~~
hudon
>> From a trust perspective, it makes no difference if your banking cartel is
writing to a Quorum, Hyperledger, or Kafka instance.

> Of course it does. The protocol of blockchains makes them work with "proof
> of X". Appending to any event store, whether in Kafka or SQL does not
> require proof of anything.

The author should have qualified that from a user's perspective, it makes no
difference. If my bank decided to store its users' transactions on a proof of
work database, I wouldn't even know. Which is the author's point: it makes no
difference from a trust perspective, I'm still trusting the bank to store and
settle my transaction either way.

It's not proof of work by itself that makes something like Bitcoin trustless
(again, from the user's perspective). It's the fact that both the proof of
work and blocks are public and verifiable, thus I can validate the blockchain
and make sure the miners are doing the work correctly (my transactions are
there and the proof of work is valid). Proof of work without making the
database public and audit-able by users is pointless. But if it is public and
it's shown that the miners are not settling transactions as they should, then
users can fork or move to a blockchain that doesn't censor transactions.

~~~
galeaspablo
May I refer you back to

>> If you’re not looking at the public chain, you’re wasting your time

> I disagree. Not having a single point of failure (one place that can get
> hacked) is valuable.

I.e. if your chain isn't public there are benefits to using it.

If you suddenly say the same benefits can be obtained with kafka or a
relational database, you will be introducing proof of something... Which means
you'll now have a blockchain / distributed system based on a relational
database. Which comes with the limitations imposed by the CAP theorem.

The most popular version of event sourcing produces such a high throughput,
because immediate consistency is sacrificed. I'd like to see what the author
proposes in a production system. Global rules would not be enforceable (e.g.
no balance under zero), unless throughput is sacrificed to allow for immediate
consistency.

------
buckie
Preface: lead for Juno & ScalableBFT

First, some additional benchmarks:

* Juno (w/ hardcoded language): 500 tx/s

* TendermintBFT w/ EVM: 1k tx/s

* ScalableBFT w/ Pact: 8k tx/s

The thing about the high-performance private blockchains is that they are
limited by the sequential smart contract execution performance. Juno ran an
embedded "rough draft of a langauge" do it doesn't really count (not a full
language, more like a complicated hardcoded counter). From TendermintBFT's
docs, if memory serves, they say that if you hardcode a counter they hit
+10k/s. For ScalabeBFT, it's it's ~14k/s. This is a minor difference, by the
way, that isn't due to the consensus mechanism but more to the engineering of
the system.

The reason for the non-hardcoded performance difference is that ScalableBFT
runs Pact for smart contracts, which was designed to be a high performance
interpreted language in part because of this bottleneck. Even if/when the EVM
moves to WASM, the performance bump only impacts fat contracts by making them
kill performance less. As in, if your 10k-step contract takes 200ms to execute
and that drops to 2ms you can get 100x perf (not quite but it's fine)... but
that only takes you from 5/s to 500/s and not to 1k/s or 8k/s.

The numbers above are for a simple coin transfer contract so the performance
is mostly dependant on the read/write performance for keys in the DB. There's
just not much contract level work to do when you're transferring coins between
accounts so the WASM move won't bump things up much if any.

More broadly, I think that the article misses the point of private blockchains
when it discusses them:

> I would discourage you from blockchain consortia if your intention is to
> never use the public chain and if you don’t care about Ethereum. I’m going
> to put it bluntly: if you’re not looking at the public chain, you’re wasting
> your time. The benchmarking numbers paint a pretty obvious story — Quorum
> will never give you the speed of Kafka, especially since blockchains get
> less efficient as more participants join (because of that pesky “consensus”
> thing).

They serve a specific purpose: being a multi-administrative DB. Distributed DB
systems (like Kafka, raft-based systems, etc.) can't robustly/safely serve
that end.

I have a longer comment about it here:
[https://news.ycombinator.com/item?id=14853521](https://news.ycombinator.com/item?id=14853521)

------
GordonS
> If used correctly, it is tamper-proof, just like the blockchain

Is tamper proofing typically a feature of event sourcing systems? If so, how
is it implemented?

~~~
olalonde
Yes in the sense that event sourcing systems typically have an "append only"
data store which gives a full log over all state changes. That makes event
sourcing particularly attractive for finance, gambling, etc.

~~~
GordonS
But surely the append only property is rather weak, unless it is backed by
physically read-only storage? Otherwise the system itself may not be able to
change existing events, but a rogue admin or other bad actor could.

~~~
olalonde
Yes, event sourcing is just a loosely defined pattern. It does make tamper
proofing a bit easier due to its log like nature. I know some systems do use
physically "append-only" storage.

~~~
GordonS
Do you have any more info on these RO storage systems? What springs to mind
is... tape?!

~~~
antonvs
There's some info here:

[https://en.wikipedia.org/wiki/Write_once_read_many#Current_W...](https://en.wikipedia.org/wiki/Write_once_read_many#Current_WORM_drives)

