Hacker News new | past | comments | ask | show | jobs | submit login
AutoDapp: a proposal to decentralize existing web apps (raymondcheng.net)
95 points by jeffreyxdash on April 9, 2020 | hide | past | favorite | 44 comments

So, it's not even that the SQL state is distributed, it's that every change to the SQL state has to be "validated" and appended to a blockchain? At a 1M DAU traffic level?

How often are blocks/writes written? Who pays for all that data storage? How "decentralized" would it really be, considering the incredibly onerous requirements to run a validator at the target scale?

Indeed, as a professor who studies these systems for a living I also had to think. Will this scale? Big step forward if we get something like this finally to 1 million people. Who owns the server infrastructure? How are these miners paid? The usual token-based incentive? Where do I buy these with real Euros or Dollars?

Incentive alignment is key. We known since Bittorrent tit-for-tat that creating a micro-economy is a hard problem. I hope they get some sort of token and micro-economy going, but this is a known hard problem.

Hi all! I just wanted to say thanks for the awesome questions. I can't say I have the answers to these questions, but I can lay out some of my intuition and would love to work together with anyone that also finds these questions interesting!

1. Will this scale? Blockchains vary from the order of ~10^0 tps (PoW) to ~10^4 tps (PoS). https://decentralizedthoughts.github.io/2019-06-23-what-is-t...

Wikipedia sees 50K HTTP requests/second and 80K SQL queries/second. https://www.datacenterknowledge.com/uptime/data-center-provi...

But it is also read-heavy. Let's say if 10% of the workload includes writes (probably too conservative), we barely squeak by... I'm willing to guess the number is way lower than that. They saw 10M page edits for the entire month of Dec 2018. https://stats.wikimedia.org/EN/TablesDatabaseEdits.htm

My guess is, this might be doable :-D

2. Who runs this? and who pays? Great question. The blog completely skips the question about incentives which in my opinion will vary depending on the app. For creating an censorship-resistant Wikipedia, it'll be different than a communications app. Some apps might be better off permissioned (e.g. enterprise settings)

Wikipedia is 43GB of text and 23TB of rich media (images and video). https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

If we limit ourselves to just Wikipedia text, it might be reasonable to do this permissionless!

Hoping to write a blog post about that soon after we get more built to show.

This sounds similar to Holochain [1], which uses a BitTorrent type DHT for validation.

[1] http://developer.holochain.org

Just what we need, blockchain crowbarred into something else it's entirely unsuited for.

One thing I was trying to find. Is it possible to completely erase things from history?

Because if not, the first time someone uploads something illegal to your decentralized wikipedia and it gets into the block chain, you are in trouble.

This is difficult to handle without some centralized authority. One example I can think of is Aether [1], which has a non-decentralized blacklist for illegal content [2] that they can use to handle these instances.

It could also be possible to broadcast some kind of signed take-down message that could be propagated through all of the nodes. If I remember right, Aether said that the centrally hosted json file was needed for legal compliance though.

[1] https://getaether.net/ [2] https://static.getaether.net/Badlist/Latest/badlist.json

Decentralized systems that use append-only logs (which includes but is not limited to blockchain stuff) need the equivalent of a 'git rebase' to purge history.

Like lwalton says, this is very difficult to do without centralizing authority, as a rebase is a fork, and so you need the whole community to rebase/fork simultaneously. This is a big deal if all data is in the same place.

One approach is to break these logs apart, e.g. per-user, like Cabal does it. Instead of replicating one giant log, you replicate many small logs. As a user, you can choose which logs you want to store, replicate, and which contribute to your materialized view. Or, you can delegate this work to a moderator you trust, who in turn may subscribe to other moderators or blacklists to inform their moderation.

In this way, there's no centralized authority - everyone is their own authority. Writing the user-facing moderation tools is important work, but is a relatively well-understood problem.

Generally speaking, this is the right call - though you don't necessarily need to rebase the whole log. The hypercore logs that Cabal uses can partially sync. You can choose not to sync certain chunks without losing verifiability; all that would persist is the hash of that chunk.

I've heard this called a "pool pissing attack," as in peeing in the swimming pool, and it's already been done to some block chains, which contain CP and other illegal stuff.

One mitigation I've used in one project is to hash records by hashing hashes of their fields, permitting record fields to be selectively deleted from storage (e.g. by a blacklist) without breaking data structure validation capability.

I have yet to see this issue addressed in a meaningful way.

I am very curious about this ideology.

The whole world seems to see indelibility of data as a problem in this context. Yet immutability is seen as a generally good thing for a lot of other software cases. Further we cannot manipulate past events and there even is something about repeating history when you don’t know it.

So why would we really want to delete past data? From this standpoint wouldn’t deleting data be similar to trying to cover up the past? Sure we might also try to label it “humankind’s childish try to assert itself over the unstoppable arrow of time” but that is just too Freudian to me.

Is there some big idea I’m missing that explains the favoring of the “mutable past” view?

What you're missing is that in some domains like healthcare there are legal compliance requirements to delete past data in certain circumstances. If false or embarrassing or misattributed data ends up in my medical record I want it completely removed, not just marked as outdated. This isn't just ideology, it's a fundamental matter of patient privacy.

> Is there some big idea I’m missing that explains the favoring of the “mutable past” view?

It is called law. There are lots of laws that govern circumstances under which data must be destroyed, and others that make certain data illegal to possess.

You can argue that they're bad laws, and I'd agree that some of them are. But whatever you think of them, ignore them at your peril.

I think this is a case where "Think of the children" is the right answer.

Someone uploads pictures of a child being violently sexually abused to a blockchain. It is my honest opinion the majority of people would not want to continue distributing those photos. It is certainly the law of many countries that it is illegal to distribute such pictures.

> So why would we really want to delete past data? From this standpoint wouldn’t deleting data be similar to trying to cover up the past?

This assumes that data is always true and accurate, which is not always the case. Say Bob was convicted for pedophilia but a couple months after the charge was retracted because it was an error. Now Bob can’t find a job because “the data” says he’s a pedophile, even if he’s not.

and why can't bob find a job even if he was a pedophile or a murderer or whatever?

why is extrajudicial punishment allowed?

> why is extrajudicial punishment allowed?

In some cases it is common sense. For instance: a pedophile shouldn't be allowed to be a school teacher.

in that case, there should be a court order that prevents them from exercising such occupations.

Even if, there is still human bias.

What if someone posted false claims about you? Would you like recourse in that event?

well there was that new york time podcast on child abuse and how the internet made things exponentially worse... that was quite a big moment for me about being able to remove online data.

yes, we need to be able to erase stuff from the internet.

HIPAA, GDPR, CCPA, Right to be Forgotten, DMCA

The ideology is law, most blockchains as is do not comply with law. So the are effectively unusable in practice (generally speaking too)

How does Bitcoin handle this?

It doesn't.

Write actions that can take seconds to minutes to validate. This doesn't seem like a trade off any application I can think of would make.

Let's say you implement a Facebook-like social network (think something like Elgg). If you really need the benefits of decentralization, is waiting a few minutes for your post, profile update, file upload, etc. to go through that big of a deal?

Even if you don't need it, people are going to want their posts to go out "now". Seconds of latency is tolerable, but your average person is not going to be happy waiting minutes.

See, in the 90's when everyone was on dial-up, people really did wait minutes for stuff. I waited HOURS for a single MP3.

You just have to sell it. Market it as "this decentralized network has delays to ensure dark and addictive patterns don't overrun the network - our network is more than just trying to see who has the most likes in the shortest amount of time."

Thing is, I was there in the 90s, on crappy 14/28/56k dialup. I remember it, and it was shit.

The reason people tolerate Bitcoin is that it actually made some money for them.

Trying to understand this one - this is saying we take an standard 3-tier app and instead of storing data in our normal sql/nosql database, we route it through some kind proxy called a "validator", which is a blockchain based data store, that uses the same database API.

And the benefit of doing this is to allow other users to be able to validate the integrity of data and ensure historical changes cannot be overwritten?

So who pays for the transaction storage costs? and how does this validator thing work?

That's exactly right! The argument is that if you look at truly reliable cloud services (think Google). They already use consensus protocols to replicate the database globally for fault tolerance purposes. The spanner paper does a decent job explaining their architecture. https://static.googleusercontent.com/media/research.google.c...

Why not do it for decentralization purposes too? Blockchains are just a another class of consensus protocols. IMO one of the defining aspects of blockchains is the stronger threat model. Traditional consensus protocols are designed to be crash-fault tolerant, not byzantine-fault tolerant.

I wouldn't get tripped up by the word "validator". In the academic world we probably would have used the term "node" or "replica". It has a special meaning in the blockchain world because it conveys a sense of trust and work-checking.

"Who pays" is a fantastic question. The blog completely elides the question about incentives which in my opinion will vary depending on the app. For creating an censorship-resistant Wikipedia, it'll be different than a communications app. Hoping to write a blog post about that soon after we get more built to show.

So I'm not familiar with the dapp apps internals like the ones described in the article. Does anyone have a quick summary of: How do you do user authentication and other "secret data" cases if the data is visible to everyone? Let's say there's an app with user-configurable webhooks which need to be used by the server, but shouldn't be announced to everyone.

Cofounder of QuikNode.io here - this is kind of a “it depends” question/answer. What chain are you using? Ethereum? Bitcoin? EOS?

At a high level you could encrypt the data (URL) you want to send the webhook to, then (let’s say you’re using Ethereum) you could subscribe to an event & make sure you have the decryption key there to decrypt the URL & actually forward on the webhook.

In terms of the user auth question, you can use signed requests or just signatures in general - I love the localcryptos.com implementation of user login, for instance.

Happy to answer more questions like this, just email me: auston@quiknode

Maybe when computing gets really cheap this could be done. I mean cheap like downloading and running Google(front/back-end) on my phone.

"For the purposes of this blog, we will narrowly define blockchains as a Byzantine-fault tolerant consensus protocol"

I don't think this is a useful definition. Blockchains are about trust and control, not Byzantine fault tolerance.

And they seem to be doing poorly on the trust element. My lack of trust in the controlling entities is why I left the space.

Bitcoin is more centralized ( via the mining cartel) than the existing banking system. The wealth gap is greater in crypto than the analog world too.

That's a great point. And it's definitely a tricky term to navigate because "blockchain" means so many different things to different people. (e.g. tokenization, governance, finance).

I'd say you identified the key aspect though, which I agree is not explicitly answered in this blog. "Who runs this" is a critical question and I think it will vary depending on the end application. I'm going to guess there will be settings where a permissioned deployment may be appropriate (think enterprise settings). There will be settings where a permissionless deployment will make more sense (think globally censorship-resistant Wikipedia). It all boils down to who wants to control it and how to convey trust. And for each of those, you probably have different access control policies too.

Wouldn't any such apps used for anything beyond toy experimentation quickly re-centralize around maximal cliques? At scale, the demand for responsiveness tends to drive out the demand for everything else.

Hrm.. interesting idea!

The tl:dr; appears to be "we built a blockchain based query logger" where essentially the consensus appears to just be that it is valid SQL, and defines an ordering of the statements. Then once everyone agrees to apply the change, then it is applied... With (AFAICT) no interaction with the DB during the consensus process?

This seems to be a bit naive of features like locks and transactions that are critical to most DBs, so while maybe you can define a global ordering of statements, the execution of those statements may be nonsensical and result in a lot of failed transactions. For example, it seems like you could grind the usability of the system to a halt by flooding the validators with 'valid' SQL designed to create as many conflicts as possible that would cause a ton of writes to fail to commit.

I agree with the desire for more services to become decentralized, and this is a novel idea, but (IMHO) more and more the technical hurdles of blockchains often bring more challenges than they solve.

This is naive of any real-world, large scale use of RDBMS systems. Not to mention any actually distributed storage systems.

Agreed. You aren't going to get this to work with Wikipedia. But I could see some utility on the small scale (if it weren't easy to break with conflicts). A Discourse install for local discussion in an oppressive regime run across a few hundred commodity computers could be pretty hard to shut down and reasonable to have work for 50k people, the weakness here is that one bad actor could disrupt it just by flooding it with valid but conflicting writes (not to mention being able to just issue deletes against all the state if all actors are treated equally)

And don't expect it to support normal transactions or participate in distributed ones

great point! This was about as simple as it got in order to ship something quickly that was minimally intrusive. So the prototype as it is now buys you sequential consistency, but I want to detangle some great points you made.

1. Performance: The performance is probably going to be limited. In the sense that most databases have better performance than most blockchain implementations. So you're arbitrarily bottlenecking your writes. There are a ton of really great papers that show you can do WAY better if you are more careful with your distributed design. They usually involve getting into the weeds of redesigning the internals database. Just for fun, I really liked these papers: http://nms.csail.mit.edu/~stavros/pubs/hstore.pdf https://irenezhang.net/papers/tapir-tocs18.pdf https://www.usenix.org/system/files/conference/nsdi17/nsdi17...

So there is a ton of space to improve the performance here, I just wanted to build something quickly.

2. Authentication: Definitely an interesting technical challenge here. Most web apps are written assuming they are one-of-many servers accessing the database, but honest and has root authority. So how do we make sure that a validator has the authority to write to a particular table/row? Right now, this is still an open challenge and would love to engage the community on ideas here. There might be some way to tag/sign all tables/rows with creator, and do some write access control based on that? I'm not sure how many web apps would break though.

3. Write-conflicts: Even without good access control as specified above, most web apps are written assuming they are not the only web server to be accessing the database concurrently. So I would anticipate honest servers would have either good failover logic or be wrapping operations in SQL transactions if they really needed them to be atomic. Either way, I don't see any issues being serialized and appended to the log.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact