Hacker News new | past | comments | ask | show | jobs | submit login
Web Applications from the Future: A Database in the Browser (stopa.io)
183 points by w1nter 5 days ago | hide | past | favorite | 73 comments

Having been a dev on EtherPad, Google Wave, Coda, and other real-time collaborative apps with OT, undo, and so on...

I think it's correct that, ideally, there would be a framework that handles real-time collaboration, undo/redo, and offline support for you, and then you build your app with these problems already solved. I will probably create such a framework eventually. I don't see it as a database engineering problem, it's more like a framework or application architecture, which every app like Google Docs or Figma has its own version of. Writing such a framework is not too much harder than writing such an app, it just requires a little more abstraction and some documentation.

If you've never written an undo manager, sync engine, etc., and you aren't writing a complex app, it's hard to arrive at the right design by thinking about pure data sync. Also, storing and querying data are solved problems; it's more a question of coming up with a generic data model for an application and defining its semantics.

I’m exploring the ideas (an easy to use framework to build local-first [1] apps) in my library Reactive-CRDT (https://github.com/yousefed/reactive-crdt). Feedback welcome!

All credit for the underlying tech to YJS, which has been amazing as mentioned by others in this thread.

[1]: https://www.inkandswitch.com/local-first.html

The common abstraction is a shared log. I'm boiling down my current side project ( http://www.adama-lang.org/ ) into reusable components.

I'm also looking at WebAssembly as the way of doing work ( http://www.adama-lang.org/blog/micro-monoliths ) versus generic operations.

Something interesting to consider is how important is offline use these days? If we get to an online-99.9% world, then the solution feels ... simple.

Not long ago for work I was forced to use office365. It has collaborative editing feature that at the surface looks equivalent to Google docs, but in practice it's not: it wasn't reliable, I lost some edits a few times and even when it worked it was slow and sluggish (I knew that because I could see another person's shared screen in zoom), and it actually co-editing a file definitely felt a different experience.

The relevant bit your comment is: while most people indeed testified that the experience was not as good as Google docs, it wasn't that terrible to most of the people.

Turns out that it really depends how well you're connected.

Being connected 99.9% sucks more if that 0.1% is a crappy connection that drops packets occasionally as opposed to just saying "I'm online most of the time, I don't care if I need to edit the doc that rare 0.1% of the time when my connection is broken for good)

real time collab is a figment of coders. real business processes are async in nature. what people really want is a easy way to open someone elses work, review it or add to it, and move on without all the emailing, chatting, while important, tend to be wlso a aync need thats note mvp to the process

Yeah whatever, I'm not a product guy and I do not pretend to have any clue about what people really want.

My point of my anecdote was that "connected" is not a boolean property.

Offline first is a majors selection criteria for me.

First, it's just faster. Offline apps are an order of magnitude faster and the speed is stable.

Secondly, data is way easier to backup. I don't trust any provider with important data. Your account can be banned, or you could be locked out. There are outage, data corruption, human errors from people you share data with...

Then there is the no internet part. Internet at home can go off for many reasons, you may be abroad, in a plane, in a tunnel, in the country side. Or internet could be slow, unreliable, behind an agressive proxy, etc.

That's why i use thunderbird and not a web mail, dynalist and not the competition, mp3 and not spotify... That's why i keep osmand next to waze and torrents next to netflix.

Last week internet went down in accounting. They took the day off, because they used office 365. Great for them, not so much for the work to be done.

But you'd want to allow the client query a snapshot before doing a full sync on the shared log.

Absolutely, this is a key reason why I strongly believe what goes in the log matters. If the log is a list of commands, then you have two representations to contend with: the differential/patch form and then the aggregated state.

An area that I'm playing around with is a log reducer which transforms a region within a log into a single item. For instance, if the log entries are just JSON (without arrays) then json merge (RFC 7396) is an example reducer. See http://www.adama-lang.org/blog/json-on-the-brain for more detail.

Interesting approach to reduce or even eliminate the workload of left folding the logs

I have been thinking about these concepts in a person app. The primary point of collaboration in this personal app is the file system in real time. The framework I settled on for this problem is a security model.

In this app there are two types of agents: personal devices and users. Collaboration between devices is unrestricted because they are only physically separated segments of the same single user. Collaboration between users requires a share which is a segment of availability. By default a share is read only. Device identity is not exposed to other users so a separate user will have no idea they are collaborating across different computers if they are owned by the same user.

I have not written undo for the file system yet though. A requested feature is file/directory synchronization and I am considering adding desktop/camera video sharing.

There are a few of those frameworks. The first one that comes to mind is YJS here is a getting started[1]. 1. https://www.tag1consulting.com/blog/deep-dive-real-time-coll...

YJS (and Kevin Jahns!) is fantastic, although I'm not sure they're interested in addressing all of these problems. In so far YJS has been very focused on replicating state from one place to another.

And the parent comment may be making light of some of the big issues in the space. You need to be generic, but also performant and simple compared to the competition.

Real time text editing is not trivial although there's solid prior art now. Even undo management is a whole problem space (what should a cross user undo look like if there have been dependent changes).

Yjs is exactly that. It is a simple abstraction for building any kind of collaborative application. It has ready to use solutions for most problems related to this problem space. The selective UndoManager, for example, is generic&configurable and can be reused for all kinds of stuff. It supports many different editors. It supports many different (scalable) backends.

It is much more than state synchronization.

If there is anything missing, then let's work on it. Yjs is extensible and allows for custom features that others can reuse.

Do you know if any of these frameworks support different users being permissioned to see different documents or different views of collections.

Fluid Framework (a project I contribute to/work on through Microsoft) supports this via token in our default service implementation, but we don't have integration with an ACL DB.

Where would one go to follow along if you start building this framework? You’ve got one pending GH star from me :)

You could start by watching the Fluid Framework github.com/microsoft/fluidframework (I'm a dev on the team).

We solve a lot of these problems. This whole article speaks to the longer term vision. Even the article's Title is effectively our internal pitch: a web first database... that enables very low latency collab

Brought a smile to read this comment. Will look deeper on fluid framework — rooting for ya’ll!

+1. I would even pay to watch you code this.

Didn’t know there was a market for coding session watchers. Let me know if someone’s interested in watch me slow-code my current long running side project: Focusly.

> Didn’t know there was a market for coding session watchers.

I think this is an actual niche on Twitch.

You might like what I’m working on. A Node.js backend powered by Y.js.


Send me an email to humans@tiptap.dev if you want to take a peek.

I definitely want to check it out. Also, I love the simplicity of your site. Did you use a template or design it from scratch? I have a project that I need a page for and would want something similar to yours. Happy to pay for the work too.

Appears to be compiled using Gridsome, from looking briefly at source (CTRL+U).

something like Microsoft's Fluid Framework ?


Very excited to hear this!

I don't mind the technical discussion at all, it's a fun write-up, but if you "look at the the ecosystem of web applications and measure by difficulty" and find the most difficult possible app you can, and using that to shape your vision of how to build web apps, lionize what a web app is, I think one is very very liable to end up with a completely non-web monster that ignores many of the strongest strengths of application & site design that are possible.

Google Docs is themselves moving to canvas based rendering, which might as well be turning the screen into a giant VNC session into their codebases[1]. Web, dead, pushing pixels in people's faces, in. All the extensions that extend & enhance Google Docs are about to die, being replaced with a very small, much narrower API provided explicitly by google.

I see statements like,

> This gets me excited about whats to come, because what's at the edge of difficulty today tends to become the new normal tomorrow.

And think, please, let us not be path dependent on a web-recreation of classic desktop apps. The web is more interesting, it has online content, liveliness & connectivity to it that far surpasses the other platform's norms. Let us think of how we might advance the web for good web things. There's worth & value to examining hard problems, but I am worried that this attitude has us set out to build faster horses.

[1] https://news.ycombinator.com/item?id=27129858

I feel like rendering stuff like this is almost a violation of net neutrality. Large companies can spend lots of money on server-side compute and subsidize an experience that can't be replicated sustainably by smaller companies.

Great post. Captured the problem nicely. I went on a similar journey recently.

There is a moment when you step back and realize that all the data-wrangling code we write in apps is essentially what an SQL query planner does. And our frontend is just one big materialized view that we need to update.

I often think that if we had a database that runs in the browser, and we could subscribe to queries, and missing local data would be fetched on-demand, frontend development would be a lot easier. Its of course much more complicated than that.

I think the reality though is that backend scaling requirements always end up dwarfing frontend productivity concerns. And there is also a huge amount of glue between backend data sources, preventing the creation of a clean database on the backend to sync with, meaning we are creating API gateways and GraphQL federation layers, and all we can hope for is a good client-side caching layer.

If you look deeper though you will find many projects doing all these kinds of things already. Mobile developers are very familiar with offline techniques and SQLite on the client, and it doesn't feel like anything special to them. For the web/desktop, maybe no one has packaged it in the right way yet, or maybe we are still digesting GraphQL, SSR, and serverless, and then there will be another shift with offline-first, reactive SQL in the frontend.

It's funny though that if you think long enough about these concerns, you always seem to end up wanting some Datalog thing.

Not a single mention of CRDTs or operational transforms in the article. If all you need is a client side database to replicate Figma and Google Docs then they would have long shut down.

For readers like me who do not know every single term in data:


"as long as we’re okay with having a single leader, and are fine with last-write-wins kind of semantics, we can drastically simplify this and just facts are enough"

It's not a direct reference to CfRDTs or operational transforms. They're not wrong that last-write-wins is easiest to implement, but some applications it doesn't quite cut it.

Most CRDTs (last-write-wins being one of them) will result in a terrible user experience, and nobody would want to use it for collaboration. That's why most of these collab systems use OT, which is not trivial to implement.

Both CRDTs and OTs are broken. If you wrote them on your own, it's always "the next fixed bug will finally make it work!".

last write win isn't that easy either, it's easy only if you ever only send the full data at each write, which is not a great system for large data.

You should have a look at RxDB which does exactly what you have described. Also notice that neither firebase nor supabase is really offline-first.


RxDB + GraphQL (via Hasura) is amazing, and offers all the things the author notes.

We had Meteor with “mini mongo” and you locally subscribed to streams and it was all a quite bloated PÓS IIRC. At some point someone will make a virtual browser that runs in the browser, or maybe a kubernetes that runs in the browser and the circle of ironic self referencing will be complete. Just replace the browser and use ports 80 and 443 and be done with it, instead of pretending that cpu cycles and RAM are free. The browser is not the app platform of the future, it’s the current bandaid solution is all. I also don’t agree that this article is well written. It exclusively addresses the world according to front end dev, mulch as graphql does. I’m sorry that coding elaborate stuff requires you to keep track of your elaborate stuff. Set your global var equal to the parsed xhr.result and redraw your GUI and get over it, IMO

Backend devs, meanwhile, wonder where their Frontend as a service is...

"backend as a service" = "can you just, like, do that magic CRUD stuff or whatever you do"

I've always thought of apps like Retool as a Frontend as a service, and even as a mostly-front-end-person I've started using those more

anyway, Postgrest already existed before GraphQL Here's a frontend devs BAAS, replete with soup and desert. (if you don't mind using db users, you even get user-auth per route...)


But hey, now I'm suspicious... are there Meteor dev's here?

IIRC, right on the heels of Meteor we got Apollo from the same folks...

The thing is, I'm puzzled, given the insane unnecessary complexity all this ES2030 and "build toolchains" (who knew javascript needed to be compiled, are we making binary bitstreams? is this an embedded microcontroller? is this an FPGA?) and other esoteric "let's replicate all of comp-sci in the browser" frontend tech, why on earth do such smart folks need the backend as a service? surely such folks who can deal with webpack build scripts can write a little php or ruby or java or something...? What is with this desire to completely neutralize and gut the backend stack? why on earth invert what a webserver is? Request=> Response (which might possibly be a rendered webpage) do folks actually think any of this is necessary and/or good?

Look, I'm all for web API's, web audio, webrtc, etc... I just don't see where a database in the browser is anything but 'cruisin for a bruisin'...

it's not like databases get their own dedicated servers or anything..

or do you mean something like a mobile app using pouchdb? "offline first and syncing to a single real database in the cloud as soon as one rejoins the network"

because a browser-based database can only ever be a compromise, such that a compelling reason could force it, but that's something akin to "DB per user" aka the Couch/Pouch ecosystem.

Is Couch more secure than what I've come to expect from other Apache stuff?

> or maybe a kubernetes that runs in the browser and the circle of ironic self referencing will be complete


Wow! it just goes to show... something or other... How many recursive levels of that do you reckon are required to grind your average dev portable workstation to a halt? (let's say 32gb ram, presuming no actual program load, just overhead plus 'hello world') I suppose the real question is: how many engineers will a major corp throw at aforementioned wall/problem in order to make us not care and follow unwise engineering paths?)

I'm going to go with a unikernal wrapping a process before this stuff makes it's way through my intestines, color me skeptical of the notion that slippy couplings in between multiple impedance mismatches will be excusable much longer in a world that increasingly cares about power consumption (and battery life) as well as carbon footprints...

> a virtual browser that runs in the browser

I think there s an app for that

> instead of pretending that cpu cycles and RAM are free.

Don’t pretend security is free either.

is our non-browser (that happens to piggyback on 80 & 443) a single-purpose application? What are the presumed security vulnerabilities of a bespoke non-browser that happens to listen on 2 ports, and either negotiate a transfer from http(s) to our kustom-protocol or utilize existing protocols starting from http(s) (any version of your choice...)

I'm saying that we cannot presume the insecurity of a general purpose browser in a custom application that just happens to take advantage of open ports and an existing distribution network (upgrading/crossgrading connection or not...)

How uncommon is it to open a Zoom link from a browser source into the Zoom app, and countless other examples... not presuming any particular tech stack, which would imply it's own particular vulnerabilities or advantages...

Just curious...I think the "cram it all into a megabrowser from a megacorp and make it run" approach is rather 2021, is all, not necessarily the way of the still as yet undetermined future.

this comment turned into a bit of a personal rant, apologies.

I really liked this post, because it touches on so many things that I have to build as well at the moment. I'm building a configuration management interface; the front-end is basically authentication and heaps of forms, the back-end transforms it into XML and uses some shell scripts to rsync them to servers and SNMP to trigger a deployment. But the users have worries about overwriting each other's work, they ask for undo / revert support, and there's things like audit logging, user and permissions management, etc involved.

At the moment I'm slowly building all of that with a React/TS front end and a Go backend using a REST API, just trying to be as tidy and complete as possible. But it's a lot of work, and I'm afraid that once I get to things like versioning, locking, undo / revert, auditing and permissions, my fairly straightforward codebase will just explode in complexity, with each endpoint having loads of documentation and code to represent the different concerns. Client- and server-side validation is another concern, importing existing data, migrating user data from a staging to production environment, etc.

It's a lot. It's a project that should be done by a full team of developers, maybe even multiple but ATM I'm doing it on my own.

I'll never finish it. I'm currently trying to plough through and make sure everything is set up, but I'm hoping we'll get a big financial injection and I'd be able to hire a bunch of consultants (that I'm familiar with as being decent if expensive developers).

I don't know how the previous guy managed to get as far as he did on the older version, other than being a mediocre but productive and persistent developer for all that time, seeing the whole thing grow over time instead of trying to reach feature parity with a product nine years in the making.

Empathize with you, and from the writing I am sensing the stress you are in.

One thought that may help:

As an engineer, your job is to communicate risks. You can’t control whether this becomes impossible, but you can control how you work on it and how you communicate.

One way to do this is to keep an up to date design document with progress and risks. Looks like there are some issues that are worrying you. If mot already would write it out and get thoughts from stakeholders.

Rooting for you!

This is an incredibly well-written article and helps track and rationalize some of the trends I've been seeing as well around backends as a service that enable developers to easily spin up modern web applications

Thanks so much for writing it!

Thanks for the kind words!

To me it always come down to permissions. I've tried firebase, I've tried graphql. I just can't use them for anything other than admin endpoints or things with very simple permissions. I'm glad the author mentioned permissions, but I would need to see some seriously compelling evidence that I could trust a declarative, resource-based permissions system to accomplish what I need.

Facebook’s EntFramework did this _very_ well. I haven’t come across any essays that go deeply on it, but maybe from this comment someone will suggest an essay / if they are ex-fb write it up.

> declarative, resource-based permissions system

Isn't this how AWS does it?

You should also look to OpenDSU (opendsu.com) that covers in a unique ways these aspects. Basicaly it goes to build on the vision in wich applications are running inside digital wallets and you control your data (client side encryption) while beeing able to collaborate with others and get security in a decentraliassd way using various types of anchoring in ledgers (not necesarly distributed or your typical blockchains but of course this could be the case too)

Just use Firebase (firestore) ...

It's got: offline first, latency compensation, pubsub, partial sync, authorization rules and serverside timestamps, and global transactions, and 5 9s availability backed by spanner for umpteen languages. https://tomlarkworthy.endpointservices.net/blogs/firestores-...

But then one day your Google account is closed by AI. Or the product is abandoned by Google.

Firestore is not really offline-first. Your app wont start without internet connection because the auth handler needs that.

no, it works as long as you logged in once. You have to ensure auth persistence is on (https://firebase.google.com/docs/auth/web/auth-state-persist...). You cannot login when offline, but if you were already logged in, and expiry is set to indefinate, you can stay logged in while offline.

There is a very handy RDF DL subset that has all of the desired properties described in the article.

Funnily enough we are targeting the browser AND FPGAs, and I think the latter is the much more interesting use case for a distributed reactive CRDT-like database.

Datalog is actually trivial to incrementally materialize in open world semantics. Which is why datomic (while a great foundation in theory) turns out to be non ideal. TxnIDs and retractions are essentially nonmononic negation in disguise, and CALM (consistency as logical monotonicity) a.k.a. distributedness, doesn't go well with that.

Seeing that we're not the only ones dreaming of this gives me hope though, that we might get out of the tar pit someday.

We already have this in Visual Javascript. When you build an application and export it as HTML it has a full SQL database that runs just in the browser: https://github.com/yazz/visualjavascript

[Automatic Undo/Redo Using SQLite](https://www.sqlite.org/undoredo.html)

No sure on the scalability of SQLite but here’s a simple way to get database undo/redo.

A lot of the principles here echo the same design principles that MeteorJS was built on. Why didn't that work and why now? Is there something about _today_ that makes the web application space different?

People actually corralled into using webapps by big boys, phones not being pieces of shite.

API rate limit exceeded for user ID 984574.

For me, this was related to an adblocker

On it! Sorry about that.

[update] should be back

About storing history rather than current-state: I think Urbit works similarly.

Getting API rate limit exceeded.

What is this hosted on?

this site appears to have exceeded its API rate limit. I wonder if that is a problem what will be solved by 'web applications of the future'. but i guess ill never know now

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact