Summary: Python isn't the bottleneck yet, and if it becomes one, C will become the last 1%.
I've been working on production apps in Python since 2002, including ones doing large-scale compute running 1000+ machines at once. How: 99% python, 1% C. But the trick is, you only build in the C once you've worked out all the kinks and optimized the big-picture stuff elsewhere. Python is great for not only connecting things, but rapidly iterating on algorithms and building maintainable code.
The AI / ML community has discovered this too: Python is now the most popular language in that community. Despite the heavy compute. How: most of the popular libraries have efficient C (etc) implementations under the hood.
This is exactly the philosophy we've been following at BigchainDB, with success. Python to connect things, iterate quickly in improving algorithms, and ship maintainable code. We haven't got far enough to resort to building our own C libraries yet, though many 3rd party libraries we use are implemented in C.
[EDIT] Based on the comments below, I'll now mention here too: BigchainDB wraps MongoDB, which is written in C++. And, Python 3.5+ (which BDB uses) has gradual typing, which brings many benefits of static typing to Python.
I encourage you to give BigchainDB a whirl, it's easy: https://www.bigchaindb.com/getstarted/
How one goes about getting those characteristics is wide open. Most blockchain systems do have a full copy of the database at each server node, i.e. fully replicated. Also, they are "peer to peer" which means there is no distinction between clients and servers. (They do have SPV wallets though which is kinda similar.)
BigchainDB's focus has always been about scale. We're partly there but not fully: we are currently fully replicated but are targeting sharding to address that . Where we do get scale already is properly distinguishing between between clients and servers. Servers are "super peers", decentralized among themselves. They do the heavy lifting, i.e storage. Apps don't need to run a server node; instead they simply are clients to the network, and of course can query >1 node.
I feel that gradual typing offers the best of both worlds: rapid prototyping, but also the strictness of types where you need it.
BigchainDB targets giving the following benefits beyond traditional database as a service:
1. decentralized - no single entity controls it, which means tolerant to malicious / Byzantine faults. Benefit: groups that don't necessarily trust each other can share infrastructure.
2. immutable - which practically speaking means more tamper resistant, e.g. it's append only. Benefit: well-defined provenance for history of assets, data, etc.
3. assets - you can create and issue assets, where you own them if you have the private key. How: each tx is signed. Benefit: moving around assets on a substrate that no single entity owns or controls. Lower friction in exchanges. And the digital signatures gives cryptographic proof that about who did what.
These are the targets. We're not fully there yet. Most notably, we still need to address some Byzantine faults as our docs ( above) mention. This will come in an upcoming release. We are also working on improved scalability while maintaining the security guarantees .
Re consensus: BigchainDB has a two-layer consensus, as follows.
* The lower layer directly uses MongoDB's consensus (which it builds on) to agree on whether a transaction should be stored.
* The higher level has federation-style voting on whether a transaction is valid or not.
Our documentation describes this further.
BigchainDB is explicitly not trying to do Bitcoin-style proof of work. PoW solves for an additional problem: (theoretical) anonymity of servers. That additional goal compromises scale. And, in practice you know who's running the servers anyway (ie big Chinese Bitcoin miners), which is why I say "theoretical". BTW I am a fan of Bitcoin, it just has different goals than BigchainDB.
> Not sure whether BigchainDB implements consensus/PoW
It has a 2-layer consensus algorithm. (See a comment above, or BigchainDB docs.) It does not use PoW because PoW is inefficient; BigchainDB doesn't target applications needing Sybil tolerance.
FYI we started working on the precursor to this in 2013 (ascribe). Bitcoin hadn't even hit the mainstream then, let alone blockchain. We didn't build it to mash together buzzwords. We built it because we saw a clear need for it. We had been building on Bitcoin, and (a) didn't scale to meet our needs and (b) was super-hard to use because it didn't act like a database. So we built BigchainDB to address issues (a) and (b).
Obviously, good old "boring" MySQL is incredibly useful for tons of problems. If you're already solving a problem with MySQL, then BigchainDB is not a fit. Don't use it, stick with the thing that's working.
Where it is useful is applications that want at least one of the following benefits:
1. decentralized, so that >1 orgs can share resources
2. immutable / tamper resistance, for provenance of ownership of art, spare parts, food, etc
3. assets, so you can exchange digital goods more readily.
More details here:
To be "decentralized", it's key that there are >1 sysadmins (ie runners of server nodes); and bad behavior from some sysadmins does not take the whole system down.
We experimented with a bit with decentralizing Cassandra. From what we saw, it is possible to decentralized it in the way I define it. We also tested other DBs. We chose RethinkDB because we liked their approach to global oplog; and we preferred a document-store interface over a column store interface. Later we added MongoDB support because of customer interest and some technical benefits. (For the record we added this support before Rethink's woes. Glad to see it found a foundation:)
A global list of hashes that is also append only? This just screams to blow up sooner or later.
Once bitcoin can handle Visa's transaction volume (250M transactions/day) we can talk scale.
EDIT: after reading the comments more, seems the project operates without PoW, which allows it to scale more. I'm curious, then, what differentiates it from something like cockroachdb. It seems to be an append-only, distributed database. How does the bockchain fit in?
> I know. I think if I read "scalable" and "blockchain" in the same sentence again I'll scream.
I acknowledge the hype out there.
FYI we were working on blockchain in 2013; long before the hype. We started encountering massive scale problems in 2014 and working on it in 2015, long before the "scale + blockchain" hype. We started work on this. So we're not doing this because it's some fancy combo of buzzwords, we're doing it because we identified a problem years ago, and have been making steady progress to improve it ever since. It was a surprise to me to see BigchainDB to hit HN front page today, since we've been shipping it since Feb 2016!
> Once bitcoin can handle Visa's transaction volume (250M transactions/day) we can talk scale.
We're not trying to improve Bitcoin. We're building our own thing. As we continue to improve the technology, it becomes useful to ever-wider classes of users.
Re CockroachDB: it's a cool technology. The big difference is that it's distributed but not decentralized. That is, the compute resources (in this case mainly storage) are spread across many machines; but the control is in the hands of a single entity / sysadmin. Whereas decentralized means the the control is spread across many sysadmins; and even a few rogue sysadmins won't take down the system.
As I understood blockchains, they are basically linked lists of hashes, yes?
And the decentralization means, that every node hast this list, not just parts of it, so everyone can always check if the list is consistent.
Also, these lists are append only.
The part were every node has a copy AND the list is append only leads me to something that doesn't scale well. It will always get bigger with every action that is appended AND it will always be multiplied by every node in the network.
I'm probably missing something here, but that is my current state of blockchains, haha.
Under this framing, the "linked list of hashes" is one partial way to achieve immutability. And "every node has this list" is one partial way to get to decentralization is achieved. But that's only part of it. Eg you need to address: what if a node acts badly? And you want a means to create & issue assets.
> every node has a copy AND the list is append only leads me to something that doesn't scale well.
Correct. That's why there is work to scale better, e.g. via sharding by BigchainDB and by others.
Then to answer your Q: a good sharding approach should let you see all digitally signed claims (including when those claims were made) with probability --> 1.0. I'm framing this probabilistically because many sharding approaches rely on that definition. (And even non-sharded blockchains like Bitcoin itself.)
So, what you are describing seems like a way to accomplish behavior of this kind of data structure with a different one so you may get bettee scaling out of it.
This is a much healthier framing, because it doesn't constrain the goals to a particular approach (e.g. a particular data structure).
blockchain != bitcoin. solutions such as ripple, iota, and others have tech in place that can scale to those levels and beyond. i don't think any of them are ready to replace visa yet, but the hurdles are not fundamental to blockchain.
BTW we work closely with the Ripple folks (especially on Interledger Protocol) and the iota folks (they're also in Berlin:)
How can a company comply to these kinds of laws using a distributed system like BigChainDB? Is there a some kind of geo fencing possibility?
That's a great question. It's surprising how few people are aware of the current German data protection laws (where we're based) and the upcoming EU data protection laws aka GDPR.
There are a few ways to address the issue:
1. Don't store any PII on the database, rather only use it to link to data that's stored on-premise in many places. The database has permissioning, and therefore acts as (decentralized) access control logic. Have a TOS with proper legal teeth so that if a database user does store PII on the database, they are liable in the real world.
2. Run an instance of BigchainDB within a region, e.g. within Germany, and comply with the appropriate laws there. Let PII be on the database. But, each node must follow data protection guidelines, similar to how a single centralized entity would, but now do it for each node.
3. Force encryption of all PII, and pray.
(3) is really a non-option. I stated it because many people are saying "just encrypt". But the problem is quantum computing. In 5-15 years quantum computing will be sufficiently easy to access that any encrypted data that's publicly available can be decrypted. You might say "well let's migrate to quantum-tolerant crypto before then" but that doesn't stop a malicious actor from copying encrypted PII now. You might say "let's use quantum tolerant crypto now" but we've seen with most crypto algorithms that it takes years to harden them. Would you trust your PII with untested crypto algorithms? I wouldn't. In short: putting encrypted PII on public nets is a bad idea. Please, please don't do it.
Also, are suggesting that no sensitive data should ever be stored in a BigChainDB, or I misinterpret #3?
> Also, are suggesting that no sensitive data should ever be stored in a BigChainDB, or I misinterpret #3?
Actually option (2) shows a way to store PII on BigchainDB. But it's not easy. My recommendation is to do (1). And, like my comment before, please please don't do (3) ;)
For "why would I want to use this over a SQL alternative" see .
For "what kind of applications can I build with this?" see .
The abstract also seems to imply that this in fact IS NOT a blockchain?
The word "blockchain" is much misunderstood. There is a ton of argument over what it actually is. Just a linked list of hashes? Bitcoin and nothing else? I could go on and on.
To me, that debate is less interesting than building systems that actually work. For this, it's useful to think about compute stacks in the past, from mainframe to desktop, from web to cloud to mobile. In each, there are core building blocks that each have their own way of instantiating the elements of computing (storage, processing, communications). Yes, let's go back to first principles:)
Take cloud, on say AWS. Here are some blocks:
* Storage:blob storage -- S3
* Storage:database -- DynamoDB
* Processing -- EC2
* And so on.
The emerging decentralized stack  is no different. There is no single monolithic block called "blockchain" that magically does everything, though much of the rhetoric would have you believe that. Rather, there are emerging building blocks.
* Storage:blob storage -- IPFS + FileCoin (and more)
* Storage:database -- BigchainDB
* Storage:pure-play-token -- Bitcoin (this is specific to decentralized space)
* Processing:business logic (aka "smart contracts) -- Ethereum (and more)
"Blockchain" is best treated as a label for the space of decentralization. Side by side with other fields like "artificial intelligence" or "cloud computing".
I think we can all agree that DynamoDB is not "a cloud". It's just an implementation of the "database" building block for the "cloud computing" field. Similarly, BigchainDB is not "a blockchain". It's just an implementation of the "database" building block for the "blockchain" field.
Centralized blockchains on the other hand are easy to scale.
Being centralized kinda defeats the point of them though.... It's better to just have a database.
We can do a lot better than the scale of Bitcoin. And we are. Scale is part of the point. And you don't need to centralize to get scale. We did improve upon the Model T, didn't we? Or, I remember programming with 16K memory on my computer. Technology improves. And it is here too. That's what we do at BigchainDB.
How does this solution respond to someone spinning up a thousand nodes, and simply voting for their double spend attack?
In part of the paper it is states that " In a BigchainDB network, the governing organization behind the
network controls the member list, so Sybil attacks are not an issue.", which is directly contradictory to your statement that it is decentralized.
A decentralized network has no "governing organization".
If you have a member list (ie list of public keys) of who can be server nodes, then you can control this. Each member (public key) only gets one vote. So even if that person makes 1000 copies, it's only 1 vote total from that member.
> governing organization behind the network controls the member list, so Sybil attacks are not an issue.", which is directly contradictory to your statement that it is decentralized.
A decentralized network has no "governing organization".
Great question. However the control of this organization is decentralized too. Here's how. IPDB is the BigchainDB public net, and foundation to help govern. Net: each server node is run by a "caretaker". Foundation: each caretaker has one vote. They vote to control the member list (list of caretakers), as well as IPDB board. So, it's decentralized: no single entity is controlling it.
There are other ways to curate "member lists" to address Sybil attacks. E.g. Bitcoin's PoW is basically "one electron one vote" on average (assuming everyone has a modern ASIC). In search of block rewards, many players work hard to maximize their electron spend (ie big ASIC farms), which of course eats a lot of power. Or BitShares' PoS is a riff on "one token one vote". There are more. We simplified the problem for IPDB: start with a great initial member list of reputable orgs that deeply care about the future of the internet (Internet Archive, Open Media Foundation, COALA, etc); and give them control from there. Some heavy lifting up-front to set this up allows great gains in efficiency.
Also, I encourage you to try out BigchainDB. Within just a few seconds you can send your first transaction to IPDB (BigchainDB public net): https://www.bigchaindb.com/getstarted/
Correct, traditional blockchains like Bitcoin aren't scalable.
The whole point of BigchainDB is to bring scale to (the database part) of the blockchain space, using the learnings from distributed databases which do scale.
More info at www.bigchaindb.com, I encourage you to have a read:)
BTW using BigchainDB can feel pretty lightweight: it feels like a DBaaS but you don't have to set up the back end, you just get going. In the following, you'll have a tx on the BigchainDB public net (IPDB) in seconds. And the JS or py code to do it yourself is right there too. https://www.bigchaindb.com/getstarted/