
BigchainDB – A scalable blockchain database - jashmenn
https://github.com/bigchaindb/bigchaindb
======
warent
Why use Python rather than a statically-typed language? This thing looks like
it will become a resource gobbling beast that runs very slowly

~~~
trentmc
Hi, it's Trent here, CTO at BigchainDB.

Summary: Python isn't the bottleneck yet, and if it becomes one, C will become
the last 1%.

I've been working on production apps in Python since 2002, including ones
doing large-scale compute running 1000+ machines at once. How: 99% python, 1%
C. But the trick is, you only build in the C once you've worked out all the
kinks and optimized the big-picture stuff elsewhere. Python is great for not
only connecting things, but rapidly iterating on algorithms and building
maintainable code.

The AI / ML community has discovered this too: Python is now the most popular
language in that community. Despite the heavy compute. How: most of the
popular libraries have efficient C (etc) implementations under the hood.

This is exactly the philosophy we've been following at BigchainDB, with
success. Python to connect things, iterate quickly in improving algorithms,
and ship maintainable code. We haven't got far enough to resort to building
our own C libraries yet, though many 3rd party libraries we use are
implemented in C.

[EDIT] Based on the comments below, I'll now mention here too: BigchainDB
wraps MongoDB, which is written in C++. And, Python 3.5+ (which BDB uses) has
gradual typing, which brings many benefits of static typing to Python.

~~~
squaredpants
That still doesn't encompass the advantages of a statically-typed language...

~~~
trentmc
I totally acknowledge that there are pros and cons for both dynamically and
statically typed languages. (We could start a 10 page discussion here. But
I've been there before, perhaps you have too, how about we save our energy? ;)

~~~
tluyben2
For the latest and greatest 10 pages about that; just search for the the
Hickey 10 year Clojure (rant-ish in places) talk and the Haskell community
responding to it.

~~~
trentmc
OK! :)

------
snissn
I'm unclear how this is any different from a centralized mongodb as a service
platform. It seems that it doesn't offer any proof of work related security or
consensus building other than a centralized "trust our cluster" policy [1].

[1]
[https://docs.bigchaindb.com/en/latest/bft.html](https://docs.bigchaindb.com/en/latest/bft.html)

~~~
trentmc
Hi, it's Trent here, I'm CTO at BigchainDB.

BigchainDB targets giving the following benefits beyond traditional database
as a service:

1\. decentralized - no single entity controls it, which means tolerant to
malicious / Byzantine faults. Benefit: groups that don't necessarily trust
each other can share infrastructure.

2\. immutable - which practically speaking means more tamper resistant, e.g.
it's append only. Benefit: well-defined provenance for history of assets,
data, etc.

3\. assets - you can create and issue assets, where you own them if you have
the private key. How: each tx is signed. Benefit: moving around assets on a
substrate that no single entity owns or controls. Lower friction in exchanges.
And the digital signatures gives cryptographic proof that about who did what.

These are the targets. We're not fully there yet. Most notably, we still need
to address some Byzantine faults as our docs ([1] above) mention. This will
come in an upcoming release. We are also working on improved scalability while
maintaining the security guarantees [2].

Re consensus: BigchainDB has a two-layer consensus, as follows.

* The lower layer directly uses MongoDB's consensus (which it builds on) to agree on whether a transaction should be stored.

* The higher level has federation-style voting on whether a transaction is valid or not.

Our documentation describes this further.

BigchainDB is explicitly not trying to do Bitcoin-style proof of work. PoW
solves for an additional problem: (theoretical) anonymity of servers. That
additional goal compromises scale. And, in practice you know who's running the
servers anyway (ie big Chinese Bitcoin miners), which is why I say
"theoretical". BTW I am a fan of Bitcoin, it just has different goals than
BigchainDB.

[2] [https://blog.bigchaindb.com/bigchaindb-developer-
update-2d32...](https://blog.bigchaindb.com/bigchaindb-developer-
update-2d329b055d4a)

~~~
willitpamp573
How is this different from Mysql with sharding+encryption?

~~~
trentmc
Neither sharding nor encryption give "decentralized", "immutable", or
"assets". E.g. how many sysadmins can MySQL have? (And the answer is 1. Hence,
not decentralized.)

------
stuxnet79
I find it very amusing that "scalable blockchain database" is the core selling
point here. I must be experiencing buzzword fatigue I think. Why would I want
to Chuck my boring MySQL installation for this?

~~~
trentmc
Hi, it's Trent here, CTO of BigchainDB.

FYI we started working on the precursor to this in 2013 (ascribe). Bitcoin
hadn't even hit the mainstream then, let alone blockchain. We didn't build it
to mash together buzzwords. We built it because we saw a clear need for it. We
had been building on Bitcoin, and (a) didn't scale to meet our needs and (b)
was super-hard to use because it didn't act like a database. So we built
BigchainDB to address issues (a) and (b).

Obviously, good old "boring" MySQL is incredibly useful for tons of problems.
If you're already solving a problem with MySQL, then BigchainDB is not a fit.
Don't use it, stick with the thing that's working.

Where it _is_ useful is applications that want at least one of the following
benefits:

1\. decentralized, so that >1 orgs can share resources

2\. immutable / tamper resistance, for provenance of ownership of art, spare
parts, food, etc

3\. assets, so you can exchange digital goods more readily.

More details here: [https://blog.bigchaindb.com/three-blockchain-benefits-
ae3a2a...](https://blog.bigchaindb.com/three-blockchain-benefits-ae3a2a5ab102)

~~~
willitpamp573
My Cassanda db can be "decentralized" too by your definition because I can
have it distributed between multiple corporations, where multiple people have
read/write access. This is extremely common practice.

~~~
trentmc
I'm not defining "decentralized" as simply "distributed between multiple
corporations, where multiple people have read/write access".

To be "decentralized", it's key that there are >1 sysadmins (ie runners of
server nodes); and bad behavior from some sysadmins does not take the whole
system down.

We experimented with a bit with decentralizing Cassandra. From what we saw, it
is possible to decentralized it in the way I define it. We also tested other
DBs. We chose RethinkDB because we liked their approach to global oplog; and
we preferred a document-store interface over a column store interface. Later
we added MongoDB support because of customer interest and some technical
benefits. (For the record we added this support before Rethink's woes. Glad to
see it found a foundation:)

------
yahyaheee
I’m curious about the choice of python here. If your building a scalable high
throughput data store, python seems like an odd choice

~~~
trentmc
See the comment "Why use Python rather than..." and my response to it. Cheers,

------
k__
I never understood how blockchains could scale.

A global list of hashes that is also append only? This just screams to blow up
sooner or later.

~~~
orthecreedence
I know. I think if I read "scalable" and "blockchain" in the same sentence
again I'll scream.

Once bitcoin can handle Visa's transaction volume (250M transactions/day) we
can talk scale.

EDIT: after reading the comments more, seems the project operates without PoW,
which allows it to scale more. I'm curious, then, what differentiates it from
something like cockroachdb. It seems to be an append-only, distributed
database. How does the bockchain fit in?

~~~
trentmc
Hi, it's Trent here, the CTO of blockchain.

> I know. I think if I read "scalable" and "blockchain" in the same sentence
> again I'll scream. I acknowledge the hype out there.

FYI we were working on blockchain in 2013; long before the hype. We started
encountering massive scale problems in 2014 and working on it in 2015, long
before the "scale + blockchain" hype. We started work on this. So we're not
doing this because it's some fancy combo of buzzwords, we're doing it because
we identified a problem years ago, and have been making steady progress to
improve it ever since. It was a surprise to me to see BigchainDB to hit HN
front page today, since we've been shipping it since Feb 2016!

> Once bitcoin can handle Visa's transaction volume (250M transactions/day) we
> can talk scale. We're not trying to improve Bitcoin. We're building our own
> thing. As we continue to improve the technology, it becomes useful to ever-
> wider classes of users.

Re CockroachDB: it's a cool technology. The big difference is that it's
_distributed_ but not _decentralized_. That is, the compute resources (in this
case mainly storage) are spread across many machines; but the control is in
the hands of a single entity / sysadmin. Whereas decentralized means the the
control is spread across many sysadmins; and even a few rogue sysadmins won't
take down the system.

~~~
k__
Oh I don't think it's a hype thing, I just think I don't understand it :D

As I understood blockchains, they are basically linked lists of hashes, yes?

And the decentralization means, that every node hast this list, not just parts
of it, so everyone can always check if the list is consistent.

Also, these lists are append only.

The part were every node has a copy AND the list is append only leads me to
something that doesn't scale well. It will always get bigger with every action
that is appended AND it will always be multiplied by every node in the
network.

I'm probably missing something here, but that is my current state of
blockchains, haha.

~~~
trentmc
Many people disagree about what blockchain technology is. To me, it' about the
characteristics, rather than _how_ they're implemented. I see three
characteristics: decentralized, immutable, assets [1].

Under this framing, the "linked list of hashes" is one partial way to achieve
immutability. And "every node has this list" is one partial way to get to
decentralization is achieved. But that's only part of it. Eg you need to
address: what if a node acts badly? And you want a means to create & issue
assets.

> every node has a copy AND the list is append only leads me to something that
> doesn't scale well.

Correct. That's why there is work to scale better, e.g. via sharding by
BigchainDB and by others.

[1] [https://blog.bigchaindb.com/three-blockchain-benefits-
ae3a2a...](https://blog.bigchaindb.com/three-blockchain-benefits-ae3a2a5ab102)

~~~
k__
Does sharding still let you know the "whole truth"?

~~~
k__
Also, I had the impression that a blockchain is a specific type of data
structure.

So, what you are describing seems like a way to accomplish behavior of this
kind of data structure with a different one so you may get bettee scaling out
of it.

~~~
trentmc
As you'll see in my other comments (and articles on it), "blockchain" is
better described as a field with a set of related goals for technology
artifacts, rather than a specific data structure [1]. I frame it as: it has
blockchain characteristics if it's decentralized, immutable, and assets [2].

This is a much healthier framing, because it doesn't constrain the goals to a
particular approach (e.g. a particular data structure).

[1] [https://blog.bigchaindb.com/blockchain-as-a-
field-47c9f45894...](https://blog.bigchaindb.com/blockchain-as-a-
field-47c9f4589418)

[2] [https://blog.bigchaindb.com/three-blockchain-benefits-
ae3a2a...](https://blog.bigchaindb.com/three-blockchain-benefits-ae3a2a5ab102)

------
ronaldmannak
Some types of data need to be geographically stay in a certain area. For
example, European PII data cannot be hosted on servers outside of the EU.
That's easy to handle if you're using AWS for example.

How can a company comply to these kinds of laws using a distributed system
like BigChainDB? Is there a some kind of geo fencing possibility?

~~~
trentmc
Hi, it's Trent here, CTO of BigchainDB.

That's a great question. It's surprising how few people are aware of the
current German data protection laws (where we're based) and the upcoming EU
data protection laws aka GDPR.

There are a few ways to address the issue:

1\. Don't store any PII on the database, rather only use it to link to data
that's stored on-premise in many places. The database has permissioning, and
therefore acts as (decentralized) access control logic. Have a TOS with proper
legal teeth so that if a database user does store PII on the database, they
are liable in the real world.

2\. Run an instance of BigchainDB within a region, e.g. within Germany, and
comply with the appropriate laws there. Let PII be on the database. But, each
node must follow data protection guidelines, similar to how a single
centralized entity would, but now do it for each node.

3\. Force encryption of all PII, and pray.

(3) is really a non-option. I stated it because many people are saying "just
encrypt". But the problem is quantum computing. In 5-15 years quantum
computing will be sufficiently easy to access that any encrypted data that's
publicly available can be decrypted. You might say "well let's migrate to
quantum-tolerant crypto before then" but that doesn't stop a malicious actor
from copying encrypted PII now. You might say "let's use quantum tolerant
crypto now" but we've seen with most crypto algorithms that it takes years to
harden them. Would you trust _your_ PII with untested crypto algorithms? I
wouldn't. In short: putting encrypted PII on public nets is a bad idea.
Please, please don't do it.

~~~
ronaldmannak
Hi Tent, thanks for your answer. Can you elaborate on #2? Is your suggestion
to run a private BigChainDB network of nodes you control?

Also, are suggesting that no sensitive data should ever be stored in a
BigChainDB, or I misinterpret #3?

~~~
trentmc
Re (2): this would be a group of people or organizations running nodes
together. (If it was just nodes you controlled it misses the point of being
decentralized.) You could store sensitive data in this setup, _if_ each
person/org had the proper data protection setup. This is not easy, however.

> Also, are suggesting that no sensitive data should ever be stored in a
> BigChainDB, or I misinterpret #3?

Actually option (2) shows a way to store PII on BigchainDB. But it's not easy.
My recommendation is to do (1). And, like my comment before, please please
don't do (3) ;)

------
foxhedgehog
just a suggestion but it might be worth having in your FAQs some technical
questions like "why would I want to use this over a SQL alternative?" and
"what kind of applications can I build with this?"

~~~
trentmc
Hi, thanks for the thoughts.

For "why would I want to use this over a SQL alternative" see [2].

For "what kind of applications can I build with this?" see [1][2][3][4].

Cheers!

[1]
[https://www.bigchaindb.com/usecases/](https://www.bigchaindb.com/usecases/)

[2] [https://blog.bigchaindb.com/three-blockchain-benefits-
ae3a2a...](https://blog.bigchaindb.com/three-blockchain-benefits-ae3a2a5ab102)

[3] [https://blog.bigchaindb.com/six-blockchain-application-
verti...](https://blog.bigchaindb.com/six-blockchain-application-
verticals-1-bonus-aa7caa5764e2)

[4] [https://blog.bigchaindb.com/where-does-blockchain-
scalabilit...](https://blog.bigchaindb.com/where-does-blockchain-scalability-
matter-specific-use-cases-from-digital-art-to-hr-9cf5ad8f7042)

------
mrguyorama
"Scaleable BlockChain" is redundant, isn't it? Same with "BlockChain
Database"?

The abstract also seems to imply that this in fact IS NOT a blockchain?

~~~
Devagamster
correct me if I'm wrong, but I'm pretty sure block chains are not super
scalable ATM. bitcoin is blocked at 6 transactions a second and future changes
are supposed to improve it to 30? that doesn't seem very scalable to me. I
could very well be wrong about the details but the point stands. block chains
aren't all that scalable

~~~
stale2002
Decentralized blockchains aren't scaleable.

Centralized blockchains on the other hand are easy to scale.

Being centralized kinda defeats the point of them though.... It's better to
just have a database.

~~~
trentmc
(Hi, it's Trent here, CTO of BigchainDB.)

We can do a lot better than the scale of Bitcoin. And we are. Scale is part of
the point. And you don't need to centralize to get scale. We did improve upon
the Model T, didn't we? Or, I remember programming with 16K memory on my
computer. Technology improves. And it is here too. That's what we do at
BigchainDB.

~~~
stale2002
Hmm, after skimming the white paper, it seems like what you are doing is you
are having your consensus algorithm be that Nodes simply vote on what they
believe to be the current blockchain.

How does this solution respond to someone spinning up a thousand nodes, and
simply voting for their double spend attack?

In part of the paper it is states that " In a BigchainDB network, the
governing organization behind the network controls the member list, so Sybil
attacks are not an issue.", which is directly contradictory to your statement
that it is decentralized.

A decentralized network has no "governing organization".

~~~
trentmc
> How does this solution respond to someone spinning up a thousand nodes, and
> simply voting for their double spend attack? This is the classic "Sybil
> attack". But I bet you knew that:)

If you have a member list (ie list of public keys) of who can be server nodes,
then you can control this. Each member (public key) only gets one vote. So
even if that person makes 1000 copies, it's only 1 vote total from that
member.

> governing organization behind the network controls the member list, so Sybil
> attacks are not an issue.", which is directly contradictory to your
> statement that it is decentralized. A decentralized network has no
> "governing organization".

Great question. However the control of this organization is decentralized too.
Here's how. IPDB is the BigchainDB public net, and foundation to help govern.
Net: each server node is run by a "caretaker". Foundation: each caretaker has
one vote. They vote to control the member list (list of caretakers), as well
as IPDB board. So, it's decentralized: no single entity is controlling it.

There are other ways to curate "member lists" to address Sybil attacks. E.g.
Bitcoin's PoW is basically "one electron one vote" on average (assuming
everyone has a modern ASIC). In search of block rewards, many players work
hard to maximize their electron spend (ie big ASIC farms), which of course
eats a lot of power. Or BitShares' PoS is a riff on "one token one vote".
There are more. We simplified the problem for IPDB: start with a great initial
member list of reputable orgs that deeply care about the future of the
internet (Internet Archive, Open Media Foundation, COALA, etc); and give them
control from there. Some heavy lifting up-front to set this up allows great
gains in efficiency.

------
beager
What if I don’t need a scalable blockchain, but I need a tamper-proof
persistent ledger? Are there tools that are lighter-weight that can help me
there?

~~~
trentmc
Perhaps you're looking for an append-only logging / messaging system like
Apache Kafka? Good to explore what's out there and understand what's possible.

BTW using BigchainDB can feel pretty lightweight: it feels like a DBaaS but
you don't have to set up the back end, you just get going. In the following,
you'll have a tx on the BigchainDB public net (IPDB) in seconds. And the JS or
py code to do it yourself is right there too.
[https://www.bigchaindb.com/getstarted/](https://www.bigchaindb.com/getstarted/)

