>blockchains introduced three new characteristics: centralized / shared control, immutable / audit trails, and native assets / exchanges.
Blockchains aren't immutable, they are just expensive to mutate.
Blockchains aren't centralized.
Blockchains didn't introduce audit trails, these have always been possible simply through having a transaction table that is only appended to. This of course does require trust in the central authority.
>(4) Leads to provenance on training/testing data & models, to improve the trustworthiness of the data & models. Data wants reputation too.
Is training set fraud really an issue in training AI?
>(1) Leads to more data, and therefore better models.
>(2) Leads to qualitatively new data, and therefore qualitatively new models.
>(3) Allows for shared control of AI training data & models.
The author has a poor understanding of both AI and blockchain technology. Blockchains are for decentralized consensus, but it seems the author is vaguely proposing using a blockchain as a mass datastore (with ownership labels) for both training data and AI algorithms.
Of course AI is an exciting field so this means you can generate hype by implying the field of AI has yet to solve the problem of sharing data with fellow researchers until now.
> This really doesn't make any sense.
Disagree. Below, I respond to each of your points.
> Blockchains aren't immutable, they are just expensive to mutate.
Agreed; very little is truly absolutely immutable. It's all shades of grey. I actually prefer the word "tamper-resistant" and I usually say that next to the "immutable" definition, such as in the first paragraph of https://bigchaindb.com/whitepaper. But "immutable" makes for a good shorthand, especially because that's the label that the community uses.
> Blockchains aren't centralized.
Oops, that was a typo. I meant to say "decentralized". Fixed it. (That was a pretty big oops!)
> Blockchains didn't introduce audit trails ...
> ... [status quo] require[s] trust in the central authority
Exactly. And it's crucial to note that once you don't have to trust a central authority to store your audit trail, you have a way more trustworthy audit trail that improves many applications and unlocks new ones.
> The author has a poor understanding of both AI and blockchain technology ... "treatise on altcoins"
Disagree. First, you don't need an "altcoin" to have a blockchain. To understand of what's special about blockchains, you first have to understand what already exists for distributed databases, then what the delta is.
Second, consensus in a distributed database has been around for decades; Satoshi did not invent it. Lamport laid down much of the theory for FT and BFT consensus in 1982. What Bitcoin brought forward, in addition to BFT-ish consensus, was Sybil tolerance (addressing attack-of-the-clones).
Third, as for my understanding of AI: I've been doing it professionally since the late 90s; here are my publications: http://trent.st/publications. Doing AI in the 90s was one of the least popular things one could possibly do, so I certainly didn't do it for the hype.
> vaguely proposing using a blockchain as a mass datastore (with ownership labels)
Obviously I gave much more specific proposals than that. I have other writings that dive into more detail on some of the use cases, such as an IP registry  and for AI DAOs .
Hmm, you're treading on thin ice here. What exactly do you mean by a "blockchain", then? I tend to work on the assumption that people are talking about something closer to the whole Satoshi consensus stack on top of hashed linked lists (the _actual_ blockchain), rather than just the hashed linked list data structure. Which segues into -- Satoshi consensus was designed against a stringent set of requirements, and the security model for the algorithm he devised absolutely demands a miner reward of some sort that's denominated in the same currency as that being used in transactions (in those circumstances, double spend attacks can be proved irrational once a transaction is deep enough inside the chain)
I'm not saying you're flat-out wrong, but you do need to specify which properties of Satoshi consensus you are willing to discard to make that statement true.
How do you achieve fully trustless decentralisation without the currency aspect, while still being resilient to sybil attacks? Or are you willing to sacrifice truslessness? In which case, how are you defining decentralisation?
You only need to be Sybil tolerant if you want your validating nodes to be anonymous. There's certainly some applications where that's useful. But it's not a requirement for being decentralized. (Some will argue otherwise, and that's ok; once again it depends how you define "decentralized"; my approach is about what benefits come to the application.)
(Also, that blog post you linked is flat out wrong -- Bitcoin, Ethereum, et al are all instances of eventually consistent systems that can and do prevent double spends)
Still -- if you're willing to sacrifice anonymity for the sake of avoiding sybil attacks, then how do your nodes federate without a central authority?
This is why having precise definitions matters: With each question I ask, we're eroding away the guarantees that such a system provides, and the implementation requirements with them. At what point do we just give up on "blockchains", and just adapt a run of the mill distributed log-based db instead, which gives you BFT and immutability, and where the asset layer is easy enough to add to the top?
I think we're in agreement, though: From the beginning, this discussion is about precisely that -- the word "decentralisation" usually entails a certain set of very specific properties so which properties, exactly, are we talking about here?
To me, definitions are nonetheless useful to help summarize a set of properties, help communication, and more. Even if people have different definitions, the definitions typically have similar themes; they're not totally arbitrary. For example, with "decentralization" you'd find a lot of people agreeing that "no single entity owns or controls" within the same theme as other definitions.
The best precedent is simply the long-standing difference between servers and clients in computing systems.
In blockchain discourse, this difference had not been acknowledged as much; though of course a similar pattern exists between full nodes vs light/SPV clients.
To my knowledge, no one else had made the clarification of "server-free" vs "server-less" as different types of decentralization. It is a useful distinction as the article discusses.
> Also, that blog post you linked is flat out wrong -- Bitcoin, Ethereum, et al are all instances of eventually consistent systems
As the article states, they can and do prevent double spends, so we agree there. But that's not what "consistent" means in a CAP setting. As the article states, "they they never have a deterministic guarantee of a consistent order; they’re only eventually consistent (in a probabilistic sense). But let’s be generous and call them consistent, because in practice they are used that way, the workaround being higher latency as one waits for a sufficiently high probability of avoiding inconsistency."
> how do your nodes federate without a central authority?
Each node votes on any transaction coming through. The transaction only clears if it gets enough positive votes.
> just adapt a run of the mill distributed log-based db instead ...
Well in some cases that's all that people actually need; sometimes find myself referring people to Kafka and the like.
But Kafka and the like are still controlled by a single admin; you can do more to decentralize. As for immutability, it's all shades of grey, and certainly being a log-based db (read-only) helps a lot. You can do more with Merkle DAGs, continuous backup to write-only media, etc.
To me it's not about "eroding" guarantees. It's about saying "ok, I have this database, what properties do I want?" The potential results might be blockchain-like or not. If decentralization, immutability, or assets are potentially interesting, then a blockchain technology could be interesting. Otherwise it comes down to other questions to choose among traditional DBs.
You're the one who asserted that CAP-style consistency is required to prevent double-spends. Eventual consistency is a weakened form of consistency in the CAP sense. Both Bitcoin and Ethereum have eventual consistency as much more than a "theoretical" concern. In your own words: "But let’s be generous and call them consistent, because in practice they are used that way, the workaround being higher latency as one waits for a sufficiently high probability of avoiding inconsistency." The only way this is true is if you accept latencies measured in hours. For real-world applications, you absolutely need to deal with the eventual consistency (and, in fact, I've written several applications that deal with precisely that).
> Each node votes on any transaction coming through. The transaction only clears if it gets enough positive votes.
I didn't ask how you establish consensus. I asked how the nodes federate -- if a node tries to peer with you, how do you decide whether or not to accept the node? You suggested anonymity is out the window, so who controls node identity?
Could you point me to where? I like my thinking and expression to be consistent (pun intended:)
> I asked how the nodes federate .. who controls identity?
Each node has a list of the public keys of other nodes. There are various ways to handle key distribution, of course.
> Big “C” means all nodes see the same data at the same time. Being big-C consistent is a prerequisite to preventing double-spends, and therefore storing tokens of value. There are many models of consistency; we mean “consistent” in the same way that the CAP theorem does, except a little bit more loosely (for pragmatism). Little “c” means strong eventual consistency, such that when data gets merged there are no conflicts, but not consistent enough to prevent double spends.
> there are various ways to handle key distribution, of course
Right. Having the public keys advances the discussion precisely nothing, because public key auth is pretty much a given if we're talking about the servers identifying themselves.
What tells us if we have a decentralised system with no central authority is: who's the gatekeeper? Who controls key distribution, and which keys are accepted into the pool?
It is true that "blockchain" is increasingly a literally meaningless marketing buzzword (e.g. R3 Corda, a "blockchain" product that brags of not actually containing a blockchain). However, this doesn't really help convince.
To me, it still has enough information content to be useful. Just because different people have different definitions doesn't mean the information content is zero. People may agree on the broad ideas and disagree about the specifics. And as discussed in another comment, definitions are nonetheless useful to help summarize a set of properties, help communication, and more.
For example, I think most people would agree a blockchain needs to have the property of having a consensus mechanism. Versus say a file system which typically doesn't.
And that's ok! It's designed for different problems, like file system style blob storage. It's not about "whether something is a blockchain", it's about "what's the right tool for the job"? I see value in complementary decentralized pieces of file systems, databases, processing, and more.
> It's not about "whether something is a blockchain"
It arguably is when you call your post "Blockchains for Artificial Intelligence".
For this context it's really about whether or not it can prevent double spends. Currently, it can't. Though its protocol stack has a place for consensus algorithms to achieve CAP-style strong consistency; and the IPFS team is working on consistency algorithms.
> It arguably is when ...
Point taken. The post is about things that many people consider to be blockchains or at least blockchain-like; BigchainDB, Ethereum, and others are in that category.
In a public permissionless blockchain the token is a system-native item of value used to compensate miners/stakers/validators for securing the system. If you remove it from the architecture then you need a compelling answer to the question, how is the system secured? I haven't heard such an answer yet, though it may be out there and I just haven't come across it yet. Have you?
Essentially what I'm looking for is an explanation of a tokenless security-model in similar depth as this explanation of Bitcoin's security model:
Haven't seen that yet.
Dissonance leads to conflicted statements. Parsing conflicted statements often leads to dissonance. Tread carefully.
The only absolute truth is that I exist. That fact, while I'm observing it, is immutable. The cost of this immutability is my life, or my life's suffering in aggregate. The fact my life will end at some point means that, when viewed in aggregate, all existences are mutable at some point.
Which means that nothing is permanent, which means your statement is in conflict. Conflicted statements may be parsed and proved wrong. It is my opinion your thoughts here are valuable and should not be proved wrong until explored fully.
As long as the cost of changing the blockchain is higher than the return on changing it, the chain remains immutable. It may turn mutable someday, but that doesn't mean it's not immutable right now. Anyone arguing contrary to these evident facts is practicing/exploiting inefficiencies in the system.
I'm sorry, but that is not a logically correct statement; it's a statement of rational behaviour - I give you that, but there is no bar to an irrational actor altering the blockchain purely out of spite or caprice - possibly incurring mortal harm in the process.
>It may turn mutable someday, but that doesn't mean it's not immutable right now.
Again - this is not a logically correct statement.
canbemutaable(blockchain) -> immutableNow(blochain)
>Anyone arguing contrary to these evident facts is practicing/exploiting inefficiencies in the system.
Can you explain what "practicing/exploiting inefficiencies in the system" is and why it's wrong?
I'd love to see a logical argument presenting how a hypothetical irrational actor can alter a blockchain of significant value without either a hypothetical attack vector or a few billion years compute time.
I've been writing about this for a while and I'm thankful someone else "gets it" on a level. The key to understanding why blockchain data structures are a big deal for software (including AI) is realizing how they enable change in business models and the more customer focused "business layers" in which those models operate.
It's worth pointing out that SaaS is a evolutionary step up from the older MSP models. Assuming software models do not change over time is dogmatic.
Blockchains, like Bitcoin, enable a means by which suffering/cost of work can be encapsulated in a transaction that cannot be altered later by the consumer of that suffering cost or the producer of that suffering cost. This means the suffering (work units) incurred by the AI (or the human involved) can be preserved in a way that is immutable and used later for efficiency improvements. That's not to say the work done by the AI is necessarily valuable, but by making it immutable the work done by the AI can be judged/measured to have been worth the amount of resources it took from something else (i.e. an other's suffering) to do the work it did.
It is my belief that adversarial machine learning may benefit from blockchain based data stores.
All of this is very important given our current infrastructure, and all software models that go with it, resemble Swiss cheese from a security perspective. Super viral growth models may make VCs and a few scrappy millennials wealthy, but they don't scale long term for customer satisfaction. We certainly don't want an AI going around creating super viral growth models that drain the world's economy either!
Blockchains enable "do better". "Do better" is the first step on the path to enlightenment.
I should note that my use of the term "suffering" relates to the cost of causality, either by one's own choice or by something outside one's own choice. It is not meant to refer to the human emotion of suffering, although the two can certainly be related.
"This means the suffering (work units) incurred by the AI (or the human involved) can be preserved in a way that is immutable and used later for efficiency improvements. That's not to say the work done by the AI is necessarily valuable, but by making it immutable the work done by the AI can be judged/measured to have been worth the amount of resources it took from something else (i.e. an other's suffering) to do the work it did."
I think you are alluding to that fact in that above quote, but I think it still begs the question of preserving a standard, abstract 'value of work' across a constantly changing AI dataset.
I agree, for the use cases that involve sharing, how do you incentivize this when many see data as the moat? This is why I wrote the section about "Hoard vs. share?". In some applications, there will be more incentives to hoard; in others, to share.
Note that one thing that softens this impasse is the idea of data exchanges. So rather than "hoard vs share" you can think of all data as potentially "for sale, but with a price" and where the price could vary dramatically. And that's where a data exchange could come in to reduce friction for price discovery.
Also, how do you respond to the inevitable criticism that advertising your solution as a "shared global registry" invokes the "How Standards Proliferate" XKCD?
Finally, I like the idea of creating a marketplace or exchange for AI training information, but it's strictly not necessary to use a blockchain for this purpose unless a very large portion of your users are hyper-concerned about their anonymity. It's also misleading to try to build a marketplace of signed IoT data, with the assumption that it is more trustworthy than unsigned IoT data, when that is simply false.
> I don't understand the assertion made regarding blockchains/digital signatures supposedly increasing accuracy or trust in IoT sensor-reading reliability.
If each IoT sensor has a known public key then we can know that the data came from that sensor.
It doesn't solve all reliability problems of course. E.g. if you have a hardware failure, a digital signature ain't gonna help:)
> Also, how do you respond to the inevitable criticism that advertising your solution as a "shared global registry" invokes the "How Standards Proliferate" XKCD?
I get this. What's cool is that IPDB doesn't need to be (and shouldn't be) the "one network to rule them all. The path is to leverage other existing standards and new connectivity standards, so that IPDB (or whatever registry) plays well with other emerging & existing registries.
The main lesson is from the internet itself, which was born by combining together disparate networks (ARPANET, NSFNet, etc) that didn't communicate previously, via the invention of TCP/IP. All the networks had to do was alter their top-level protocol and suddenly they could talk to other nets. There's a similar protocol for blockchain tech, called Interledger (https://www.interledger.org). Whereas TCP/IP connected networks with data, since you can re-send packets etc it doesn't account for double-spending; Interledger does this. You can view it as a way to connect networks of value.
Interledger has already been used to connect Bitcoin, Ethereum, and many centralized payment networks.
There are other protocols that are closer to the data level. E.g. simply using JSON-LD helps. And ILPD on top of that (roughly, a Merkle-ized JSON-LD). And domain-specific value transfer protocols above that, like COALA IP for intellectual property (which plays well with existing protocols like DDEX for music, PLUS for photography).
> Finally, I like the idea of creating a marketplace or exchange for AI training information, but it's strictly not necessary to use a blockchain for this purpose
Agreed; and I wrote about it: "An exchange could be centralized, like DatastreamX already does for data. But so far, they are really only able to use publicly-available data sources, since many businesses see more risk than reward from sharing.
What about a decentralized data & model exchange? ..." and I went on to list some benefits.
> unless a very large portion of your users are hyper-concerned about their anonymity.
To me, there are much greater benefits. I'm hoping that the biggest benefit will be to further catalyze a truly open data market.
> It's also misleading to try to build a marketplace of signed IoT data, with the assumption that it is more trustworthy than unsigned IoT data, when that is simply false.
I'm not sure why you say this; knowing the provenance of the data has clear trust-related benefits in some cases like I described. Obviously just standing on its own, it has no benefit, it's all about usage farther along in the pipeline.
If the key to making ML useful has been huge sets of training examples, then that shows that current ML algorithms are much worse than humans at generalizing from a small number of examples. That suggests that current ML algorithms are leaving a lot on the table in terms of how much they are able to learn from each example, and there is a lot of room for improvement.
> If the key to making ML useful has been huge sets of training examples
BTW this is only for some ML approaches and problems. There are some problems where getting more data is just way too expensive; e.g. building a model from silicon manufacturing where each mfg. run costs $50M; i.e. each datapoint costs $50M. Guess what - there are still useful ML-y things you can do here. Less extreme, there are many problems with only 100-1000 training points available, and you can still do a lot.
But we're all about real apps getting built. Many comanies are building on BigchainDB. For example, I describe some of them here:
There are many, many more companies that have not publicly announced what they're building yet. Stay tuned:)
* Why use a blockchain rather than a database?
* Why do you require blockchain rather than some other approach?
* Are you positing this as open to the hostile world? If so, why? If not, why use a blockchain?
* If you're open to the hostile world: what's your threat model, and how are you addressing it?
* How do your defences stop this from being the next Tay?
This doesn’t provide any security against malicious participants; the excuse is that private blockchains need speed over security. You might think that at that point you don’t need a blockchain at all, but you’re hardly going to sell any consultant hours with that sort of thinking.
To give you an example, he uses the distributed nature of the blockchain (as it pertains to storing full copies of the chain on multiple nodes) as a model for storing training set data within a distributed system and the benefit he states for this is that since the infrastructure isn't controlled by one person - organizations will be more inclined to share.
The author then goes on to describe how the integrity of training set data can be attested to in the form of ... "I believe this data / model to be good at this point" which is about as far away from verifiable as you can get. The problem is, the article is very high level and its filled with meaningless buzz words and double-speak like "DAO" ... which is apparently something that can be applied to a human-based reputation system and a blockchain (hence useless.)
There are ways to use artificial intelligence as [part of] the reward mechanism for a blockchain by creating an initial AI to recognize specific types of content and then rewarding miners for finding said content ("datachains" in data mining.) And there are also trustless ways for rewarding people for creating the best general-purpose intelligence for understanding a given type of data. But from the author's account its very hard to see if we're talking about anything concrete here as the article is so technically vague on details.
He does go into detail for several of these ideas. But of the ideas stated none of them are discrete, trustless, self-contained systems like a blockchain (or even really need a blockchain) which is a shame because AI, data, and blockchains can actually be unified to create bullet-proof entities (which in my opinion will be much more purpose-specific than any vague Ethereum-type "AI DAO" non-sense.)
Edit: The authors treatment of AI DAOs is worth the read.
(if the author is reading this) - what were your interest in using python? Do you have any concern for it's performance/scalability as the service grows?
I've been using Python in enterprise-grade, large-scale production systems since the early 2000s. And it works, well. Most python libraries where performance matters have C under the hood. E.g. numpy matrix manipulation. Parallel computing got way better with the async support.
As for BigchainDB itself: it wraps RethinkDB (and soon MongoDB) which is where much of the compute intensive stuff happens. Also, we leverage these fast Python+C libraries, and parallelization. If/when needed, we can always convert some of our core stuff to C or another fast language.
Yep, it's a [database] cluster, and proud to be that:)
But, it's special in that there isn't a single organization that owns or controls it. That is, it's decentralized.
And in fact it draws on traditional distributed DBs by building on them, to get first-class scalability and querying abilities. The first version was built on RethinkDB and we're working on wrapping MongoDB too.
There's also a public version of BigchainDB that's getting rolled out, called IPDB (http://ipdb.foundation). With IPDB, people building apps don't need to roll their own cluster, they can just talk to it via http.
> was disappointing
How? We're always looking for feedback.
Not to sound like an ass though, I do keep an eye on bigchaindb for what it is to become as the people behind it do seem to have great talent.
Has that security issue been resolved? How is the ipdb secured?
BTW thanks for the kind words about our team. I think they're amazing too:)
I'd love to see more suggestions here or elsewhere. It would be my pleasure to write an article on this.