How often are blocks/writes written? Who pays for all that data storage? How "decentralized" would it really be, considering the incredibly onerous requirements to run a validator at the target scale?
Incentive alignment is key. We known since Bittorrent tit-for-tat that creating a micro-economy is a hard problem. I hope they get some sort of token and micro-economy going, but this is a known hard problem.
1. Will this scale?
Blockchains vary from the order of ~10^0 tps (PoW) to ~10^4 tps (PoS).
Wikipedia sees 50K HTTP requests/second and 80K SQL queries/second.
But it is also read-heavy. Let's say if 10% of the workload includes writes (probably too conservative), we barely squeak by... I'm willing to guess the number is way lower than that. They saw 10M page edits for the entire month of Dec 2018.
My guess is, this might be doable :-D
2. Who runs this? and who pays?
Great question. The blog completely skips the question about incentives which in my opinion will vary depending on the app. For creating an censorship-resistant Wikipedia, it'll be different than a communications app. Some apps might be better off permissioned (e.g. enterprise settings)
Wikipedia is 43GB of text and 23TB of rich media (images and video).
If we limit ourselves to just Wikipedia text, it might be reasonable to do this permissionless!
Hoping to write a blog post about that soon after we get more built to show.
Because if not, the first time someone uploads something illegal to your decentralized wikipedia and it gets into the block chain, you are in trouble.
It could also be possible to broadcast some kind of signed take-down message that could be propagated through all of the nodes. If I remember right, Aether said that the centrally hosted json file was needed for legal compliance though.
Like lwalton says, this is very difficult to do without centralizing authority, as a rebase is a fork, and so you need the whole community to rebase/fork simultaneously. This is a big deal if all data is in the same place.
One approach is to break these logs apart, e.g. per-user, like Cabal does it. Instead of replicating one giant log, you replicate many small logs. As a user, you can choose which logs you want to store, replicate, and which contribute to your materialized view. Or, you can delegate this work to a moderator you trust, who in turn may subscribe to other moderators or blacklists to inform their moderation.
In this way, there's no centralized authority - everyone is their own authority. Writing the user-facing moderation tools is important work, but is a relatively well-understood problem.
One mitigation I've used in one project is to hash records by hashing hashes of their fields, permitting record fields to be selectively deleted from storage (e.g. by a blacklist) without breaking data structure validation capability.
The whole world seems to see indelibility of data as a problem in this context. Yet immutability is seen as a generally good thing for a lot of other software cases. Further we cannot manipulate past events and there even is something about repeating history when you don’t know it.
So why would we really want to delete past data? From this standpoint wouldn’t deleting data be similar to trying to cover up the past? Sure we might also try to label it “humankind’s childish try to assert itself over the unstoppable arrow of time” but that is just too Freudian to me.
Is there some big idea I’m missing that explains the favoring of the “mutable past” view?
It is called law. There are lots of laws that govern circumstances under which data must be destroyed, and others that make certain data illegal to possess.
You can argue that they're bad laws, and I'd agree that some of them are. But whatever you think of them, ignore them at your peril.
Someone uploads pictures of a child being violently sexually abused to a blockchain. It is my honest opinion the majority of people would not want to continue distributing those photos. It is certainly the law of many countries that it is illegal to distribute such pictures.
This assumes that data is always true and accurate, which is not always the case. Say Bob was convicted for pedophilia but a couple months after the charge was retracted because it was an error. Now Bob can’t find a job because “the data” says he’s a pedophile, even if he’s not.
why is extrajudicial punishment allowed?
In some cases it is common sense. For instance: a pedophile shouldn't be allowed to be a school teacher.
What if someone posted false claims about you? Would you like recourse in that event?
yes, we need to be able to erase stuff from the internet.
The ideology is law, most blockchains as is do not comply with law. So the are effectively unusable in practice (generally speaking too)
You just have to sell it. Market it as "this decentralized network has delays to ensure dark and addictive patterns don't overrun the network - our network is more than just trying to see who has the most likes in the shortest amount of time."
And the benefit of doing this is to allow other users to be able to validate the integrity of data and ensure historical changes cannot be overwritten?
So who pays for the transaction storage costs? and how does this validator thing work?
Why not do it for decentralization purposes too?
Blockchains are just a another class of consensus protocols. IMO one of the defining aspects of blockchains is the stronger threat model. Traditional consensus protocols are designed to be crash-fault tolerant, not byzantine-fault tolerant.
I wouldn't get tripped up by the word "validator". In the academic world we probably would have used the term "node" or "replica". It has a special meaning in the blockchain world because it conveys a sense of trust and work-checking.
"Who pays" is a fantastic question. The blog completely elides the question about incentives which in my opinion will vary depending on the app. For creating an censorship-resistant Wikipedia, it'll be different than a communications app. Hoping to write a blog post about that soon after we get more built to show.
At a high level you could encrypt the data (URL) you want to send the webhook to, then (let’s say you’re using Ethereum) you could subscribe to an event & make sure you have the decryption key there to decrypt the URL & actually forward on the webhook.
In terms of the user auth question, you can use signed requests or just signatures in general - I love the localcryptos.com implementation of user login, for instance.
Happy to answer more questions like this, just email me: auston@quiknode
I don't think this is a useful definition. Blockchains are about trust and control, not Byzantine fault tolerance.
Bitcoin is more centralized ( via the mining cartel) than the existing banking system. The wealth gap is greater in crypto than the analog world too.
I'd say you identified the key aspect though, which I agree is not explicitly answered in this blog. "Who runs this" is a critical question and I think it will vary depending on the end application. I'm going to guess there will be settings where a permissioned deployment may be appropriate (think enterprise settings). There will be settings where a permissionless deployment will make more sense (think globally censorship-resistant Wikipedia). It all boils down to who wants to control it and how to convey trust. And for each of those, you probably have different access control policies too.
The tl:dr; appears to be "we built a blockchain based query logger" where essentially the consensus appears to just be that it is valid SQL, and defines an ordering of the statements. Then once everyone agrees to apply the change, then it is applied... With (AFAICT) no interaction with the DB during the consensus process?
This seems to be a bit naive of features like locks and transactions that are critical to most DBs, so while maybe you can define a global ordering of statements, the execution of those statements may be nonsensical and result in a lot of failed transactions. For example, it seems like you could grind the usability of the system to a halt by flooding the validators with 'valid' SQL designed to create as many conflicts as possible that would cause a ton of writes to fail to commit.
I agree with the desire for more services to become decentralized, and this is a novel idea, but (IMHO) more and more the technical hurdles of blockchains often bring more challenges than they solve.
1. Performance: The performance is probably going to be limited. In the sense that most databases have better performance than most blockchain implementations. So you're arbitrarily bottlenecking your writes. There are a ton of really great papers that show you can do WAY better if you are more careful with your distributed design. They usually involve getting into the weeds of redesigning the internals database. Just for fun, I really liked these papers:
So there is a ton of space to improve the performance here, I just wanted to build something quickly.
2. Authentication: Definitely an interesting technical challenge here. Most web apps are written assuming they are one-of-many servers accessing the database, but honest and has root authority. So how do we make sure that a validator has the authority to write to a particular table/row? Right now, this is still an open challenge and would love to engage the community on ideas here. There might be some way to tag/sign all tables/rows with creator, and do some write access control based on that? I'm not sure how many web apps would break though.
3. Write-conflicts: Even without good access control as specified above, most web apps are written assuming they are not the only web server to be accessing the database concurrently. So I would anticipate honest servers would have either good failover logic or be wrapping operations in SQL transactions if they really needed them to be atomic. Either way, I don't see any issues being serialized and appended to the log.