Hacker News new | comments | ask | show | jobs | submit login
FlureeDB, a Practical Decentralized Database (2017) [pdf] (flur.ee)
92 points by tosh 9 months ago | hide | past | web | favorite | 42 comments

The paper says there aren't practical decentralized solutions, but doesn't mention any prior research/comparisons to existing systems?

How does it compare against:

- IPFS (yes, plenty of people are doing DB stuff on IPFS)

- GUN (full disclosure, I'm the author, https://github.com/amark/gun )

- Scuttlebutt ( http://scuttlebot.io/ )

- Beaker Browser

They go on to talk about "append-only" data structures (this alone does not make a system a blockchain or decentralized, but I don't see any further reasoning on it), because technically, you could do things very similarly with:

- Kafka

- Cassandra

- Couch

And "append-only" is not practical, there are lots of caveats you have to handle (you don't get eventual consistency for free, performance is terrible on reads, etc.), it often times just makes scaling up harder, which is not "practical".

The next section begins to discuss "blockchain" technology, but I don't see code on GitHub:

- https://github.com/fluree/flureedb (empty repo) ?

But their main website is already advertising $1000/month "Enterprise" plans. The whole point of Blockchain and Decentralization is not to be vendor-locked into a proprietary centralized DBaaS. Why isn't the code Open Source?

The point of blockchain is to have something trendy to point to when your investors ask about it.

Agreed, they seem to ignore literally everything that's going on the open source decentralized storage space.

Adding to the list: - BigchainDB (https://github.com/bigchaindb) a practical BFT blockchain DB (disclosure, I'm the co-author)


- OrbitDB

And also, it's a very very bad idea to have public storage - one (encrypted) PII entry illegal datum and it's over.

I really appreciate the approach that BigchainDB and Ocean Protocol take in addressing the extremely serious challenges around removing illegal content from a decentralized database. I don't remember exactly where I read/heard the details, but there is a talk Trent McConaghy gave where he describes the process of forking around a node where illegal content has been discovered and flagged via a takedown notice. I'll see if I can dig up the specific reference. The important takeaway for me, as I work on integrating decentralized and distributed database architecture into a platform for cancer drug discovery and the enrichment of user data through the incentive mechanisms within the decentralized databases that also have cryptographic tokens and utilize cryptographic primatives a la curation markets, is that hybrid solutions are starting to show promise in combining traditional centralized databases and decentralized ones that have novel token economics built it. I think BigchainDB and Ocean Protocol do a great job with this, as does OpenMined, and I'm interested to see where Cardstack goes with their approach. For me, it's not a binary set of options, rather about finding projects that are interoperable, and also take seriously their responsibilities to maintain compliance with the ever changing legal landscape surrounding data and rights management.

I am very interested in the area and will highly appreciate any additional links you could dig in.

I really would like to work on something like that but so far it's only enthusiasm and zero education. I am a pretty senior programmer in my eyes (16 years of official career, 25 years in total tinkering, started at 12-13 year old teenager) but sadly all that experience does not translate directly to areas like these.

"Fault​ ​tolerance​ ​and​ ​censorship​ ​resistance​ - Due to its decentralized nature, FlureeDB guarantees maximum uptime and no possibility for data regulation".

Censorship is rarely a technological issue, but one of laws.

No possibility for data regulation? Well, data regulation is already there, and they're effectively advertising that they are trying to bypass those regulations. If they actually can do that, it won't be long before someone rules "Use of FlureeDB violates GDPR compliance" and all your nice tech isn't worth squat. The database may prevent data regulation, but you can still put the people using it in jail.

The GDPR question and "right to be forgotten", consent, etc., gets asked around this time in the Epicenter podcast with the founders: https://youtu.be/fdne9EvFNaw?t=3147

* Brian Platz addresses the question

* first goes through benefits of immutability in Fluree in context of auditing

* suggests one approach is storing PII (personally identifiable info) in encrypted manner in public fields, and storing the decryption keys separately -- the act of "forgetting" is the act of destroying the decryption keys

* another approach: on the public Fluree blockchain DB store a UID that has no public ties to PII ... and store the PII in a private corporate-side Fluree blockchain DB ... the Fluree system allows JOINs across Fluree systems, so internal corporate-side queries can do JOINs to consolidate public info with PII stored internally ... now to the "forgetting" part ... in spite of immutability guarantees, Fluree does have one escape, which is the possibility in the private Fluree (for which the corporation completely controls consensus) to "retract" that entity containing the PII, along with a tombstone and a full audit trail of the retraction

Both these approaches seem to address the right-to-be-forgotten pretty well. The Fluree guy also suggests that for the PII part perhaps Fluree isn't the system you want, and perhaps that should be stored in some other system.

That obviously solves the problem. But do you see an application where Fluree adds value over just storing everything in the private database that keeps your PII?

I imagine the most interesting and relevant parts of Fluree-based applications will be in the data apart from the PII, the data visible to anybody who wants to view or update that info.

Yeah, but what's an application where this would be useful? What kind of other data would the database store, and for what purpose?

Hmmm ...

* governments want to improve public health

* parents in a given geographic region can opt-in on birth of a child to enroll their child in a longitudinal data-gathering regime

* parents' and child's PII (personally identifiable information) is stored in a special private Fluree instance (or other DB), and each is assigned UIDs that can be public without disclosing their identity

* the child's UID, along with basic demographic info (that can't be used to resolve their identity) is pushed to the public Fluree DB ...

* some info, such as genetic profile, very specific outcomes of educational tests, etc., might be stored over time in a very secure DB not open to the public

* basic measures of health, education, nutrition, could be stored publicly and available to researchers ... with more and more info stored over time

* more info about a child's "history" that might be useful in correlating with positive / negative outcomes also might be stored publicly

* public health researchers could analyze trends, etc., to determine what education and nutrition programs, environments, etc., contribute positively to good outcomes

* public health and other researchers, such as medical, pharma, nutrition, etc., researchers, could use public data to find "cohorts" of children that might be suitable for studies ... and reach out in the system to families (whose identities will not be disclosed until appropriate) of those children to give them more info about why their child should become part of a study, and give them a chance to opt-in or contact researchers for more info

That's just one very interesting application off the top of my head, and Fluree (I'm not associated), as a Public Benefit Corporation (PBC) who states they eventually plan to open-source all or much of their technology, might be a good venue in which to deploy such a system ... a system that would require high levels of confidentiality, trust, while still being open to the extent possible and advisable.

Sorry, but I think this is one of the worst use-cases of a distributed database. Many government databases are, by definition, centralized, no matter what technology it uses. The record the government holds is the official record.

As a result, copies of the original document are pretty valuables unless they are certified by a court/agency as being official. Governments should definite move towards more modern databases and technology, but given that they "are" the central database, I don't see the benefit of decentralized one.

Leaving aside any questions of how much data you could actually make public without risk of depseudonymization, what benefit does this have over a 1998-vintage database-backed website run by the government? Like, the government licenses doctors and schools and such, so aren't they the central party controlling who can write to the database regardless?

What about accidental releases in the public network? We know it's just a matter of time before someone makes a mistake.

> the act of "forgetting" is the act of destroying the decryption keys

I'm not sure cryto-shredding is a legit method to "delete consumer data"

Why wouldn't it be? An encrypted file with the appropriate cipher set is synonymous with noise. It is only the encrypted content plus the unlock key that makes the file contents accessible.

If it is irretrievable, and the crypted data looks like noise, I would argue that the content is gone.

Especially if quantum computing develops enough to break modern encryption.

Unless the encryption uses post-quantum crypto, in theory it’ll all be “un-shredded” if quantum computing becomes feasible. We’ll have bigger problems at that point, but still, it’d make you “unforgotten”.

If I could favorite comments, I'd favorite this one.

You can. That's an option on HN. I think you need to click on it to find the "favorite" option.

People are sleeping on hyperdb: https://github.com/mafintosh/hyperdb

It's built from hypercore, the same library powering the DAT protocol.

Also, someone recently wrote a graph database on top of hyperdb: https://github.com/e-e-e/hyper-graph-db

The more I read white papers, the more I appreciate academia's requirement of a summary of previous related work.

Those sections are the source of a nice chunk of obscure references I post here that people like. They're a gold mine. The other is serendipitous search where you find what you're not looking for directly. Main way is to learn the buzz words of serious researchers in each sub-field plus verbs they use to describe results. Then, you just do permutations of those with quotes, minuses, and places like Citeseerx. You get the diamond mine once you master that. ;)

This looks a lot like Datomic if you add hashing to the datomic transaction (this would have to be done as a datomic stored procedure). Given that this is hosted I am wondering whether his is using datomic underneath or if it is written from scratch?

Two critiques:

* Given that this is a hosted offering this seems like an odd definition of de-centralized, from my reading a more accurate description would be: allows cross-partition queries. Edit: response below indicates this aspect is missing from the linked whitepaper but available elsewhere.

* The graph database section is hand-wavy. This seems like it could be similar to Datomic where with a small enough dataset you may have amazing performance, but at a large enough scale you will be lacking index-free adjacency.

If I understand properly, Fluree will partner with various "federated" enterprises to all run a Fluree ecosystem where everyone's data can be replicated, and queries can be served via multiple gateways ... in that sense Fluree will be decentralized.

Note that I think I heard this in the Epicenter podcast (https://www.youtube.com/watch?v=fdne9EvFNaw) featuring the Fluree founders. ... aahh, at this point in the podcast they talk about the federated/distributed Fluree: https://youtu.be/fdne9EvFNaw?t=3770

EDIT: pointer to "federated" / distributed database in podcast

Sybil attacks are real and is the biggest road block towards resilient decentralized networks. In FlureeDB, Blockchain is an attempt to solve Sybil attack by attaching proof of work as reputation to counteract malicious nodes. While it does raise the bar significantly it's not a panacea -- still more work to be done!

> a practical decentralized database did not exist before FlureeDB

It was hard to keep reading after this.

I’m not sure what you misunderstood, describing your skepticism would have added to the conversation. Having read beyond that statement, I’m actually intrigued. What “practical decentralized database” are you thinking of such that you didn’t need to read further?

Haven't had a chance to read the fluree paper, but does hyperledger fabric count?


It was the claim that no other practical decentralized databases exist that turned me off. If you want I'll name a few.

> If you want I'll name a few.

Yep, that’s what I’ve asked above. Might spur a conversation if your claim is legitimate.

Well, the one I use is CouchDB. It is very practical. Others I can think of quickly are CockroachDB, Cassandra, and Riak. And there are many others.

Are you possibly limiting the term decentralized to blockchain DBs? That would be a stretch. Decentralized DBs existed for a long time before blockchains.

I’m not limiting the term, that’s what the paper is about. And why I wanted to know more about your claim. Might be a good idea to finish the paper now ;). The paper is actually very interesting.

> Might be a good idea to finish the paper now ;).

I did read the entire paper and it was interesting.

You're more specific meaning of the word should have been spelled out before the claim. The meaning that I read was perfectly valid and shared by many others.

Again, it wasn’t my use of the term “decentralized”, it was the use of the term in the paper. I read past the phrase for which you claimed to have dismissed the paper and understood their use of “decentralized” pretty clearly, as you should have if you understand distributed computing. Having taken advanced distributed computing courses, I, honestly, never interpreted the use of “decentralized” as you did. But, yeah, in practice, I can see your confusion. Nevertheless, dismissing a paper on such pedantic terms as you originally did is probably not worth commenting unless you really have a good reason for doing so. And, then, it’s probably best to comment only after you have read the paper and then formed an informed response. Anyway, you keep escalating this, so I keep responding in kind to help you. At some point I have to be done with it. I honestly thought that you might have something useful to contribute due to your terse response, but I was clearly wrong ;).

> it wasn’t my use of the term “decentralized”

I apologize. I have a serious problem in forum discussions of not paying attention to who writes each post. I absorb all replies and then reply to the group. My bad. (Edit: And I thought you were the author).

>dismissing a paper on such pedantic terms as you originally did

I never dismissed the paper. Read my first post carefully. I enjoyed it and did read it before commenting. I was just pointing out my shock at the claim, which seemed ridiculous given my understanding.

> you keep escalating this

I thought I was participating in an argument about the meaning of the term "decentralized". I was defending my understanding to multiple people telling me I didn't know what I was talking about.

> Having taken advanced distributed computing courses

I have been working with databases for over 45 years. Most of that time decentralized meant lack of a centralized (single point of failure) computer, not a person or organization. It is only recently (10 to 15 yrs?) that the user meaning has appeared. Please forgive my classic interpretation.

Those are not decentralized.

Conflating distributed with decentralized

A db is either centralized with a master server or not. These definitely work without a master.

The tagline at the github repo at https://github.com/basho/riak is "Riak is a decentralized datastore from Basho Technologies". I don't think they would lie.

By decentralization they mean decentralization of control, which is not the same thing as in Riak, where control is very much centralized. Think distributed hash tables, like in bittorrent or, obviously, bitcoin.

Fun flashback to my my Masters research paper where I wrote a system called FlurryDB (unrelated).


Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact