Hacker News new | past | comments | ask | show | jobs | submit login
Blockchains for Artificial Intelligence (bigchaindb.com)
121 points by trentmc on Jan 9, 2017 | hide | past | web | favorite | 60 comments



This really doesn't make any sense.

>blockchains introduced three new characteristics: centralized / shared control, immutable / audit trails, and native assets / exchanges.

Blockchains aren't immutable, they are just expensive to mutate.

Blockchains aren't centralized.

Blockchains didn't introduce audit trails, these have always been possible simply through having a transaction table that is only appended to. This of course does require trust in the central authority.

>(4) Leads to provenance on training/testing data & models, to improve the trustworthiness of the data & models. Data wants reputation too.

Is training set fraud really an issue in training AI?

>(1) Leads to more data, and therefore better models. >(2) Leads to qualitatively new data, and therefore qualitatively new models. >(3) Allows for shared control of AI training data & models.

The author has a poor understanding of both AI and blockchain technology[1]. Blockchains are for decentralized consensus, but it seems the author is vaguely proposing using a blockchain as a mass datastore (with ownership labels) for both training data and AI algorithms.

Of course AI is an exciting field so this means you can generate hype by implying the field of AI has yet to solve the problem of sharing data with fellow researchers until now.

[1] https://download.wpsoftware.net/bitcoin/alts.pdf


Hi, it's the author here.

> This really doesn't make any sense.

Disagree. Below, I respond to each of your points.

> Blockchains aren't immutable, they are just expensive to mutate.

Agreed; very little is truly absolutely immutable. It's all shades of grey. I actually prefer the word "tamper-resistant" and I usually say that next to the "immutable" definition, such as in the first paragraph of https://bigchaindb.com/whitepaper. But "immutable" makes for a good shorthand, especially because that's the label that the community uses.

> Blockchains aren't centralized.

Oops, that was a typo. I meant to say "decentralized". Fixed it. (That was a pretty big oops!)

> Blockchains didn't introduce audit trails ...

Correct.

> ... [status quo] require[s] trust in the central authority

Exactly. And it's crucial to note that once you don't have to trust a central authority to store your audit trail, you have a way more trustworthy audit trail that improves many applications and unlocks new ones.

> The author has a poor understanding of both AI and blockchain technology ... "treatise on altcoins"

Disagree. First, you don't need an "altcoin" to have a blockchain. To understand of what's special about blockchains, you first have to understand what already exists for distributed databases, then what the delta is.

Second, consensus in a distributed database has been around for decades; Satoshi did not invent it. Lamport laid down much of the theory for FT and BFT consensus in 1982. What Bitcoin brought forward, in addition to BFT-ish consensus, was Sybil tolerance (addressing attack-of-the-clones).

Third, as for my understanding of AI: I've been doing it professionally since the late 90s; here are my publications: http://trent.st/publications. Doing AI in the 90s was one of the least popular things one could possibly do, so I certainly didn't do it for the hype.

> vaguely proposing using a blockchain as a mass datastore (with ownership labels)

Obviously I gave much more specific proposals than that. I have other writings that dive into more detail on some of the use cases, such as an IP registry [1] and for AI DAOs [2].

[1] https://medium.com/ipdb-blog/a-decentralized-content-registr... [2] https://medium.com/@trentmc0/ai-daos-and-three-paths-to-get-...


> First, you don't need an "altcoin" to have a blockchain.

Hmm, you're treading on thin ice here. What exactly do you mean by a "blockchain", then? I tend to work on the assumption that people are talking about something closer to the whole Satoshi consensus stack on top of hashed linked lists (the _actual_ blockchain), rather than just the hashed linked list data structure. Which segues into -- Satoshi consensus was designed against a stringent set of requirements, and the security model for the algorithm he devised absolutely demands a miner reward of some sort that's denominated in the same currency as that being used in transactions (in those circumstances, double spend attacks can be proved irrational once a transaction is deep enough inside the chain)

I'm not saying you're flat-out wrong, but you do need to specify which properties of Satoshi consensus you are willing to discard to make that statement true.


Throw a rock in the blockchain space and you'll find a different definition of blockchains, or what it means for one thing to be a blockchain or not. It doesn't really help anyone to argue for hours on end over this, when it's largely a matter of opinion. To me, what's more interesting is to identify what are the new characteristics (in terms of benefits) that blockchains have, above and beyond traditional distributed databases; which in turn unlock new applications or improve existing applications. To me, those characteristics are: decentralized, immutable, assets; I describe them in the article, and also in more detail in https://bigchaindb.com/whitepaper.


> To me, those characteristics are: decentralized, immutable, assets;

How do you achieve fully trustless decentralisation without the currency aspect, while still being resilient to sybil attacks? Or are you willing to sacrifice truslessness? In which case, how are you defining decentralisation?


I define "decentralized" as "no single entity owns or controls". It can be further distinguished with "server-based decentralization" and "server-free decentralization" [1].

You only need to be Sybil tolerant if you want your validating nodes to be anonymous. There's certainly some applications where that's useful. But it's not a requirement for being decentralized. (Some will argue otherwise, and that's ok; once again it depends how you define "decentralized"; my approach is about what benefits come to the application.)

[1] https://blog.bigchaindb.com/the-dcs-triangle-5ce0e9e0f1dc


Care to link to sources for server-based and server-free decentralisation that aren't your own blog posts?

(Also, that blog post you linked is flat out wrong -- Bitcoin, Ethereum, et al are all instances of eventually consistent systems that can and do prevent double spends)

Still -- if you're willing to sacrifice anonymity for the sake of avoiding sybil attacks, then how do your nodes federate without a central authority?

This is why having precise definitions matters: With each question I ask, we're eroding away the guarantees that such a system provides, and the implementation requirements with them. At what point do we just give up on "blockchains", and just adapt a run of the mill distributed log-based db instead, which gives you BFT and immutability, and where the asset layer is easy enough to add to the top?


Alternatively this is why definitions don't matter at all -- it only matters what guarantees (and other properties) a system provides, and not at all what we call it.


That's a fair point too.

I think we're in agreement, though: From the beginning, this discussion is about precisely that -- the word "decentralisation" usually entails a certain set of very specific properties so which properties, exactly, are we talking about here?


Ditto on agreement about what properties (including guarantees) a system provides.

To me, definitions are nonetheless useful to help summarize a set of properties, help communication, and more. Even if people have different definitions, the definitions typically have similar themes; they're not totally arbitrary. For example, with "decentralization" you'd find a lot of people agreeing that "no single entity owns or controls" within the same theme as other definitions.


Indeed.


> Care to link to sources for server-based and server-free decentralisation that aren't your own blog posts?

The best precedent is simply the long-standing difference between servers and clients in computing systems.

In blockchain discourse, this difference had not been acknowledged as much; though of course a similar pattern exists between full nodes vs light/SPV clients.

To my knowledge, no one else had made the clarification of "server-free" vs "server-less" as different types of decentralization. It is a useful distinction as the article discusses.

> Also, that blog post you linked is flat out wrong -- Bitcoin, Ethereum, et al are all instances of eventually consistent systems

As the article states, they can and do prevent double spends, so we agree there. But that's not what "consistent" means in a CAP setting. As the article states, "they they never have a deterministic guarantee of a consistent order; they’re only eventually consistent (in a probabilistic sense). But let’s be generous and call them consistent, because in practice they are used that way, the workaround being higher latency as one waits for a sufficiently high probability of avoiding inconsistency."

> how do your nodes federate without a central authority?

Each node votes on any transaction coming through. The transaction only clears if it gets enough positive votes.

> just adapt a run of the mill distributed log-based db instead ...

Well in some cases that's all that people actually need; sometimes find myself referring people to Kafka and the like.

But Kafka and the like are still controlled by a single admin; you can do more to decentralize. As for immutability, it's all shades of grey, and certainly being a log-based db (read-only) helps a lot. You can do more with Merkle DAGs, continuous backup to write-only media, etc.

To me it's not about "eroding" guarantees. It's about saying "ok, I have this database, what properties do I want?" The potential results might be blockchain-like or not. If decentralization, immutability, or assets are potentially interesting, then a blockchain technology could be interesting. Otherwise it comes down to other questions to choose among traditional DBs.


> As the article states, they can and do prevent double spends, so we agree there. But that's not what "consistent" means in a CAP setting.

You're the one who asserted that CAP-style consistency is required to prevent double-spends. Eventual consistency is a weakened form of consistency in the CAP sense. Both Bitcoin and Ethereum have eventual consistency as much more than a "theoretical" concern. In your own words: "But let’s be generous and call them consistent, because in practice they are used that way, the workaround being higher latency as one waits for a sufficiently high probability of avoiding inconsistency." The only way this is true is if you accept latencies measured in hours. For real-world applications, you absolutely need to deal with the eventual consistency (and, in fact, I've written several applications that deal with precisely that).

> Each node votes on any transaction coming through. The transaction only clears if it gets enough positive votes.

I didn't ask how you establish consensus. I asked how the nodes federate -- if a node tries to peer with you, how do you decide whether or not to accept the node? You suggested anonymity is out the window, so who controls node identity?


> You're the one who asserted that CAP-style consistency is required to prevent double-spends

Could you point me to where? I like my thinking and expression to be consistent (pun intended:)

> I asked how the nodes federate .. who controls identity?

Each node has a list of the public keys of other nodes. There are various ways to handle key distribution, of course.


From the blog post you linked three comments up or so:

> Big “C” means all nodes see the same data at the same time. Being big-C consistent is a prerequisite to preventing double-spends, and therefore storing tokens of value. There are many models of consistency; we mean “consistent” in the same way that the CAP theorem does, except a little bit more loosely (for pragmatism). Little “c” means strong eventual consistency, such that when data gets merged there are no conflicts, but not consistent enough to prevent double spends.

> there are various ways to handle key distribution, of course

Right. Having the public keys advances the discussion precisely nothing, because public key auth is pretty much a given if we're talking about the servers identifying themselves.

What tells us if we have a decentralised system with no central authority is: who's the gatekeeper? Who controls key distribution, and which keys are accepted into the pool?


> Throw a rock in the blockchain space and you'll find a different definition of blockchains, or what it means for one thing to be a blockchain or not. It doesn't really help anyone to argue for hours on end over this, when it's largely a matter of opinion.

It is true that "blockchain" is increasingly a literally meaningless marketing buzzword (e.g. R3 Corda, a "blockchain" product that brags of not actually containing a blockchain). However, this doesn't really help convince.


> increasingly a literally meaningless marketing buzzword Agree, "blockchain" is getting more diluted.

To me, it still has enough information content to be useful. Just because different people have different definitions doesn't mean the information content is zero. People may agree on the broad ideas and disagree about the specifics. And as discussed in another comment, definitions are nonetheless useful to help summarize a set of properties, help communication, and more.

For example, I think most people would agree a blockchain needs to have the property of having a consensus mechanism. Versus say a file system which typically doesn't.


Right. So what about your system is "blockchain" that, say, rsynced git repos (Merkle trees! Globally-identified hashes!) wouldn't be by that definition?


IPFS is very close to what you describe (Merkle trees! Globally-identified hashes!) but it doesn't claim to be a blockchain because it doesn't have a consensus mechanism, let alone a way to prevent double spends.

And that's ok! It's designed for different problems, like file system style blob storage. It's not about "whether something is a blockchain", it's about "what's the right tool for the job"? I see value in complementary decentralized pieces of file systems, databases, processing, and more.


IPFS is very close to BitTorrent entirely using magnet: links, which does indeed have something close to a consensus mechanism (everything is checksummed to the hilt).

> It's not about "whether something is a blockchain"

It arguably is when you call your post "Blockchains for Artificial Intelligence".


> have something close to a consensus mechanism

For this context it's really about whether or not it can prevent double spends. Currently, it can't. Though its protocol stack has a place for consensus algorithms to achieve CAP-style strong consistency; and the IPFS team is working on consistency algorithms.

> It arguably is when ...

Point taken. The post is about things that many people consider to be blockchains or at least blockchain-like; BigchainDB, Ethereum, and others are in that category.


>Disagree. First, you don't need an "altcoin" to have a blockchain.

In a public permissionless blockchain the token is a system-native item of value used to compensate miners/stakers/validators for securing the system. If you remove it from the architecture then you need a compelling answer to the question, how is the system secured? I haven't heard such an answer yet, though it may be out there and I just haven't come across it yet. Have you?

Essentially what I'm looking for is an explanation of a tokenless security-model in similar depth as this explanation of Bitcoin's security model:

http://www.coindesk.com/bitcoins-security-model-deep-dive/

Haven't seen that yet.


To me, incentives can be intrinsic (built into the system) or extrinsic (outside the system). Each has pros and cons. Bitcoin is intrinsic, based on mining reward and tx fees. Traditional private blockchains are extrinsic, where the folks running the network are incentivized to pay for hardware based on cost savings, higher profits, etc, ie something extrinsic to the network itself. IPDB is also extrinsic where the folks running the network participate because they care about the future decentralized internet, and their costs of running a node are covered by SaaS-style transaction fees.


> Agreed; very little is truly absolutely immutable

Dissonance leads to conflicted statements. Parsing conflicted statements often leads to dissonance. Tread carefully.

The only absolute truth is that I exist. That fact, while I'm observing it, is immutable. The cost of this immutability is my life, or my life's suffering in aggregate. The fact my life will end at some point means that, when viewed in aggregate, all existences are mutable at some point.

Which means that nothing is permanent, which means your statement is in conflict. Conflicted statements may be parsed and proved wrong. It is my opinion your thoughts here are valuable and should not be proved wrong until explored fully.

As long as the cost of changing the blockchain is higher than the return on changing it, the chain remains immutable. It may turn mutable someday, but that doesn't mean it's not immutable right now. Anyone arguing contrary to these evident facts is practicing/exploiting inefficiencies in the system.


>As long as the cost of changing the blockchain is higher than the return on changing it, the chain remains immutable

I'm sorry, but that is not a logically correct statement; it's a statement of rational behaviour - I give you that, but there is no bar to an irrational actor altering the blockchain purely out of spite or caprice - possibly incurring mortal harm in the process.

>It may turn mutable someday, but that doesn't mean it's not immutable right now.

Again - this is not a logically correct statement.

canbemutaable(blockchain) -> immutableNow(blochain)

how so?

>Anyone arguing contrary to these evident facts is practicing/exploiting inefficiencies in the system.

Can you explain what "practicing/exploiting inefficiencies in the system" is and why it's wrong?


> > As long as the cost of changing the blockchain is higher than the return on changing it, the chain remains immutable > I'm sorry, but that is not a logically correct statement

I'd love to see a logical argument presenting how a hypothetical irrational actor can alter a blockchain of significant value without either a hypothetical attack vector or a few billion years compute time.


Well, what about collusion of the mining pools in China?


"China" is an aggregate of consensus and cannot act "alone" as an irrational actor. If they colluded to change a transaction, there would be a net split and everyone would know. Again, I'm standing by the cost analysis of acting as an irrational actor indicating those actors will not act unless the aggregate itself is irrational, which is a rare and separate issue as that being discussed.


> They’re Software-as-a-Service on steroids.

I've been writing about this for a while and I'm thankful someone else "gets it" on a level. The key to understanding why blockchain data structures are a big deal for software (including AI) is realizing how they enable change in business models and the more customer focused "business layers" in which those models operate.

It's worth pointing out that SaaS is a evolutionary step up from the older MSP models. Assuming software models do not change over time is dogmatic.

Blockchains, like Bitcoin, enable a means by which suffering/cost of work can be encapsulated in a transaction that cannot be altered later by the consumer of that suffering cost or the producer of that suffering cost. This means the suffering (work units) incurred by the AI (or the human involved) can be preserved in a way that is immutable and used later for efficiency improvements. That's not to say the work done by the AI is necessarily valuable, but by making it immutable the work done by the AI can be judged/measured to have been worth the amount of resources it took from something else (i.e. an other's suffering) to do the work it did.

It is my belief that adversarial machine learning may benefit from blockchain based data stores.

All of this is very important given our current infrastructure, and all software models that go with it, resemble Swiss cheese from a security perspective. Super viral growth models may make VCs and a few scrappy millennials wealthy, but they don't scale long term for customer satisfaction. We certainly don't want an AI going around creating super viral growth models that drain the world's economy either!

Blockchains enable "do better". "Do better" is the first step on the path to enlightenment.

I should note that my use of the term "suffering" relates to the cost of causality, either by one's own choice or by something outside one's own choice. It is not meant to refer to the human emotion of suffering, although the two can certainly be related.


But the definition of work changes and the value of that work also changes. Bitcoin's internal self-referentiality and internally self-transparent genesis sidesteps this issue.

"This means the suffering (work units) incurred by the AI (or the human involved) can be preserved in a way that is immutable and used later for efficiency improvements. That's not to say the work done by the AI is necessarily valuable, but by making it immutable the work done by the AI can be judged/measured to have been worth the amount of resources it took from something else (i.e. an other's suffering) to do the work it did."

I think you are alluding to that fact in that above quote, but I think it still begs the question of preserving a standard, abstract 'value of work' across a constantly changing AI dataset.


As I see it, the main challenge with much of this proposal is that it isn't clear how to convince the kind of broad sharing much of this needs (for training). There's some incentive, in the sense that if you participate you reap AI improvement, but traditional (and regulated) players will be hesitant, on pure risk-management terms (and conservatism).


It's the author here. (Hi!)

I agree, for the use cases that involve sharing, how do you incentivize this when many see data as the moat? This is why I wrote the section about "Hoard vs. share?". In some applications, there will be more incentives to hoard; in others, to share.

Note that one thing that softens this impasse is the idea of data exchanges. So rather than "hoard vs share" you can think of all data as potentially "for sale, but with a price" and where the price could vary dramatically. And that's where a data exchange could come in to reduce friction for price discovery.


I don't understand the assertion made regarding blockchains/digital signatures supposedly increasing accuracy or trust in IoT sensor-reading reliability.

Also, how do you respond to the inevitable criticism that advertising your solution as a "shared global registry" invokes the "How Standards Proliferate" XKCD[1]?

Finally, I like the idea of creating a marketplace or exchange for AI training information, but it's strictly not necessary to use a blockchain for this purpose unless a very large portion of your users are hyper-concerned about their anonymity. It's also misleading to try to build a marketplace of signed IoT data, with the assumption that it is more trustworthy than unsigned IoT data, when that is simply false.

[1]: https://xkcd.com/927/


Thanks for the thoughts. I'll respond to each point in turn.

> I don't understand the assertion made regarding blockchains/digital signatures supposedly increasing accuracy or trust in IoT sensor-reading reliability.

If each IoT sensor has a known public key then we can know that the data came from that sensor.

It doesn't solve all reliability problems of course. E.g. if you have a hardware failure, a digital signature ain't gonna help:)

> Also, how do you respond to the inevitable criticism that advertising your solution as a "shared global registry" invokes the "How Standards Proliferate" XKCD[1]?

I get this. What's cool is that IPDB doesn't need to be (and shouldn't be) the "one network to rule them all. The path is to leverage other existing standards and new connectivity standards, so that IPDB (or whatever registry) plays well with other emerging & existing registries.

The main lesson is from the internet itself, which was born by combining together disparate networks (ARPANET, NSFNet, etc) that didn't communicate previously, via the invention of TCP/IP. All the networks had to do was alter their top-level protocol and suddenly they could talk to other nets. There's a similar protocol for blockchain tech, called Interledger (https://www.interledger.org). Whereas TCP/IP connected networks with data, since you can re-send packets etc it doesn't account for double-spending; Interledger does this. You can view it as a way to connect networks of value.

Interledger has already been used to connect Bitcoin, Ethereum, and many centralized payment networks.

There are other protocols that are closer to the data level. E.g. simply using JSON-LD helps. And ILPD on top of that (roughly, a Merkle-ized JSON-LD). And domain-specific value transfer protocols above that, like COALA IP for intellectual property (which plays well with existing protocols like DDEX for music, PLUS for photography).

> Finally, I like the idea of creating a marketplace or exchange for AI training information, but it's strictly not necessary to use a blockchain for this purpose

Agreed; and I wrote about it: "An exchange could be centralized, like DatastreamX already does for data. But so far, they are really only able to use publicly-available data sources, since many businesses see more risk than reward from sharing. What about a decentralized data & model exchange? ..." and I went on to list some benefits.

> unless a very large portion of your users are hyper-concerned about their anonymity.

To me, there are much greater benefits. I'm hoping that the biggest benefit will be to further catalyze a truly open data market.

> It's also misleading to try to build a marketplace of signed IoT data, with the assumption that it is more trustworthy than unsigned IoT data, when that is simply false.

I'm not sure why you say this; knowing the provenance of the data has clear trust-related benefits in some cases like I described. Obviously just standing on its own, it has no benefit, it's all about usage farther along in the pipeline.


Agree that its very unlikely for currently closed big datasets (e.g. those that give Google and Facebook their competitive advantage) to be opened for this. But if existing shared but siloed datasets (e.g. ImageNet and Kaggle data) were moved (or copied) to the "centralised" decentralised database (bigchaindb) then there might be some benefit.


It's probably one of those things where there will be something disruptive like an Uber or AirBnB that forces their hand.


I'm looking forward to seeing these. It's gonna be fun:)


Just an observation on ML--

If the key to making ML useful has been huge sets of training examples, then that shows that current ML algorithms are much worse than humans at generalizing from a small number of examples. That suggests that current ML algorithms are leaving a lot on the table in terms of how much they are able to learn from each example, and there is a lot of room for improvement.


You're absolutely right - there is a lot of room for improvement. One of the big research threads in ML is better unsupervised learning - to learn from data without labels. Another hot topic is adversarial networks, which generate their own training data by playing against each other. (15 years ago this was "competitive co-evolution";)

> If the key to making ML useful has been huge sets of training examples

BTW this is only for some ML approaches and problems. There are some problems where getting more data is just way too expensive; e.g. building a model from silicon manufacturing where each mfg. run costs $50M; i.e. each datapoint costs $50M. Guess what - there are still useful ML-y things you can do here. Less extreme, there are many problems with only 100-1000 training points available, and you can still do a lot.


Big Data + Blockchain + AI, looks quite hand wavy, have you found a practical use case for BigchainDB yet?


Articles like the "AI + Blockchain" one are meant to be forward-looking, to help inspire people to build. Call it hand-wavy if you wish; I call it laying out a vision:)

But we're all about real apps getting built. Many comanies are building on BigchainDB. For example, I describe some of them here: https://blog.bigchaindb.com/where-does-blockchain-scalabilit....

There are many, many more companies that have not publicly announced what they're building yet. Stay tuned:)


I have also thought about the applications blockchains could have for AI. However I didn't consider blockchains as a means to store data but rather as a way to reliably track the performance/quality of various AI services. You can read about it here: http://jcfrei.com/the-ai-economy-bitcoin/


Oh cool! Thanks for the link. I just added it to the "further reading" section on the bottom.


Just off the top of my head:

* Why use a blockchain rather than a database?

* Why do you require blockchain rather than some other approach?

* Are you positing this as open to the hostile world? If so, why? If not, why use a blockchain?

* If you're open to the hostile world: what's your threat model, and how are you addressing it?

* How do your defences stop this from being the next Tay?


As the article discusses, it's not an either-or. Blockchains are simply databases with specific properties; if some of those properties help your app, consider using one. They are sometimes deployed to the public, sometimes private, with appropriate models. Something like Tay would be higher in the stack - we're gonna see a lot more, some good, some bad, some ugly.


Regarding "hostile world," I recently learned that on private networks where all participating nodes are authenticated, blockchains can substitute proof-of-(work|stake|...) for faster and more efficient algorithms for reaching consensus [1] (I only remember raft something). What's the use case then? At least for older data (in blocks far away from the current one), blockchain in an immutable database. On the other hand, it's fun operating with buzzwords.

[1] http://ieeexplore.ieee.org/document/7467408/?reload=true


Pretty much none. The very funniest one is Intel's "Proof of Elapsed Time", which might as well be called Proof of Buying An Intel CPU. Rather than have miners compete to produce the next block, a timer running in an environment secured by a DRM mechanism built into your Intel CPU picks if you get to do the next block. The white paper is an extended advertisement for Intel® Software Guard Extensions™ (SGX™). Also, they only have a simulated Proof of Buying An Intel CPU mechanism as yet.

https://intelledger.github.io/introduction.html

This doesn’t provide any security against malicious participants; the excuse is that private blockchains need speed over security. You might think that at that point you don’t need a blockchain at all, but you’re hardly going to sell any consultant hours with that sort of thinking.


This article is confusing and contradicting in so many ways. To start with, I was expecting some kind of trustless self-contained system whereby everything can be independently verified and concretely recorded in an autonomous system similar to a blockchain. But instead what OP seems to be describing here is more like a distributed database for storing AI training set data (similar to Storj) where he uses the blockchain more as a metaphor than anything specific.

To give you an example, he uses the distributed nature of the blockchain (as it pertains to storing full copies of the chain on multiple nodes) as a model for storing training set data within a distributed system and the benefit he states for this is that since the infrastructure isn't controlled by one person - organizations will be more inclined to share.

The author then goes on to describe how the integrity of training set data can be attested to in the form of ... "I believe this data / model to be good at this point" which is about as far away from verifiable as you can get. The problem is, the article is very high level and its filled with meaningless buzz words and double-speak like "DAO" ... which is apparently something that can be applied to a human-based reputation system and a blockchain (hence useless.)

There are ways to use artificial intelligence as [part of] the reward mechanism for a blockchain by creating an initial AI to recognize specific types of content and then rewarding miners for finding said content ("datachains" in data mining.) And there are also trustless ways for rewarding people for creating the best general-purpose intelligence for understanding a given type of data. But from the author's account its very hard to see if we're talking about anything concrete here as the article is so technically vague on details.

He does go into detail for several of these ideas. But of the ideas stated none of them are discrete, trustless, self-contained systems like a blockchain (or even really need a blockchain) which is a shame because AI, data, and blockchains can actually be unified to create bullet-proof entities (which in my opinion will be much more purpose-specific than any vague Ethereum-type "AI DAO" non-sense.)

Edit: The authors treatment of AI DAOs is worth the read.


I noticed that bigchainDB is written enitrely in python. I love python, but I'm surprised to see it used in the database space!

(if the author is reading this) - what were your interest in using python? Do you have any concern for it's performance/scalability as the service grows?


Hi, this is the author:)

I've been using Python in enterprise-grade, large-scale production systems since the early 2000s. And it works, well. Most python libraries where performance matters have C under the hood. E.g. numpy matrix manipulation. Parallel computing got way better with the async support.

As for BigchainDB itself: it wraps RethinkDB (and soon MongoDB) which is where much of the compute intensive stuff happens. Also, we leverage these fast Python+C libraries, and parallelization. If/when needed, we can always convert some of our core stuff to C or another fast language.


cool. thanks for your explanation


bigchaindb main product is actually a cluster of something like mongodb. Read the whitepaper, was disappointing.


Hi, it's Trent here, I'm CTO of BigchainDB and also the author of the blockchains-AI article.

Yep, it's a [database] cluster, and proud to be that:)

But, it's special in that there isn't a single organization that owns or controls it. That is, it's decentralized.

And in fact it draws on traditional distributed DBs by building on them, to get first-class scalability and querying abilities. The first version was built on RethinkDB and we're working on wrapping MongoDB too.

There's also a public version of BigchainDB that's getting rolled out, called IPDB (http://ipdb.foundation). With IPDB, people building apps don't need to roll their own cluster, they can just talk to it via http.

> was disappointing

How? We're always looking for feedback.


Yeah I get that, but a single "mongodb or whatever cluster database behind bigchaindb that you are using was called" drop tables * ; query would actually delete your so called blockchain. Thats a huge security whole, which last time I checked bigchaindb - would be solved "administratively" ie operational security level instead of protocol.

Not to sound like an ass though, I do keep an eye on bigchaindb for what it is to become as the people behind it do seem to have great talent.

Has that security issue been resolved? How is the ipdb secured?


We acknowledge this attack vector and others; and list them in github. It's in our roadmap to address them. The current schedule is the following: we're currently wrapping up the a base MongoDB integration (next couple weeks); on the heels of that we'll improve the MongoDB replication (a couple months); and on the heels of that we'll address the "drop" issue (and other unwanted commands) via a wire protocol firewall. More security stuff to follow that.

BTW thanks for the kind words about our team. I think they're amazing too:)


It's a project for decentralizing existing big data databases... Not sure where you've set your expectations?


Ive set them at blockchain level, so. Meh. PostgreSQL can also be distributed with paxos.


To clarify: "distributed" and "decentralized" are different. "Distributed" spreads compute among many physical resources, but may still have a single entity controlling it. "Decentralized" has no single entity controlling it. Paxos distributes but does not decentralize.


What about AI for blockchains?


Great question. As the article says: there are many ways that AI can help blockchains, such as mining blockchain data (e.g. Silk Road investigation). That’s for another discussion:)

I'd love to see more suggestions here or elsewhere. It would be my pleasure to write an article on this.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: