I think it's correct that, ideally, there would be a framework that handles real-time collaboration, undo/redo, and offline support for you, and then you build your app with these problems already solved. I will probably create such a framework eventually. I don't see it as a database engineering problem, it's more like a framework or application architecture, which every app like Google Docs or Figma has its own version of. Writing such a framework is not too much harder than writing such an app, it just requires a little more abstraction and some documentation.
If you've never written an undo manager, sync engine, etc., and you aren't writing a complex app, it's hard to arrive at the right design by thinking about pure data sync. Also, storing and querying data are solved problems; it's more a question of coming up with a generic data model for an application and defining its semantics.
All credit for the underlying tech to YJS, which has been amazing as mentioned by others in this thread.
I'm also looking at WebAssembly as the way of doing work ( http://www.adama-lang.org/blog/micro-monoliths ) versus generic operations.
Something interesting to consider is how important is offline use these days? If we get to an online-99.9% world, then the solution feels ... simple.
The relevant bit your comment is: while most people indeed testified that the experience was not as good as Google docs, it wasn't that terrible to most of the people.
Turns out that it really depends how well you're connected.
Being connected 99.9% sucks more if that 0.1% is a crappy connection that drops packets occasionally as opposed to just saying "I'm online most of the time, I don't care if I need to edit the doc that rare 0.1% of the time when my connection is broken for good)
My point of my anecdote was that "connected" is not a boolean property.
First, it's just faster. Offline apps are an order of magnitude faster and the speed is stable.
Secondly, data is way easier to backup. I don't trust any provider with important data. Your account can be banned, or you could be locked out. There are outage, data corruption, human errors from people you share data with...
Then there is the no internet part. Internet at home can go off for many reasons, you may be abroad, in a plane, in a tunnel, in the country side. Or internet could be slow, unreliable, behind an agressive proxy, etc.
That's why i use thunderbird and not a web mail, dynalist and not the competition, mp3 and not spotify... That's why i keep osmand next to waze and torrents next to netflix.
Last week internet went down in accounting. They took the day off, because they used office 365. Great for them, not so much for the work to be done.
An area that I'm playing around with is a log reducer which transforms a region within a log into a single item. For instance, if the log entries are just JSON (without arrays) then json merge (RFC 7396) is an example reducer. See http://www.adama-lang.org/blog/json-on-the-brain for more detail.
In this app there are two types of agents: personal devices and users. Collaboration between devices is unrestricted because they are only physically separated segments of the same single user. Collaboration between users requires a share which is a segment of availability. By default a share is read only. Device identity is not exposed to other users so a separate user will have no idea they are collaborating across different computers if they are owned by the same user.
I have not written undo for the file system yet though. A requested feature is file/directory synchronization and I am considering adding desktop/camera video sharing.
And the parent comment may be making light of some of the big issues in the space. You need to be generic, but also performant and simple compared to the competition.
Real time text editing is not trivial although there's solid prior art now. Even undo management is a whole problem space (what should a cross user undo look like if there have been dependent changes).
It is much more than state synchronization.
If there is anything missing, then let's work on it. Yjs is extensible and allows for custom features that others can reuse.
We solve a lot of these problems. This whole article speaks to the longer term vision. Even the article's Title is effectively our internal pitch: a web first database... that enables very low latency collab
I think this is an actual niche on Twitch.
Send me an email to email@example.com if you want to take a peek.
Google Docs is themselves moving to canvas based rendering, which might as well be turning the screen into a giant VNC session into their codebases. Web, dead, pushing pixels in people's faces, in. All the extensions that extend & enhance Google Docs are about to die, being replaced with a very small, much narrower API provided explicitly by google.
I see statements like,
> This gets me excited about whats to come, because what's at the edge of difficulty today tends to become the new normal tomorrow.
And think, please, let us not be path dependent on a web-recreation of classic desktop apps. The web is more interesting, it has online content, liveliness & connectivity to it that far surpasses the other platform's norms. Let us think of how we might advance the web for good web things. There's worth & value to examining hard problems, but I am worried that this attitude has us set out to build faster horses.
There is a moment when you step back and realize that all the data-wrangling code we write in apps is essentially what an SQL query planner does. And our frontend is just one big materialized view that we need to update.
I often think that if we had a database that runs in the browser, and we could subscribe to queries, and missing local data would be fetched on-demand, frontend development would be a lot easier. Its of course much more complicated than that.
I think the reality though is that backend scaling requirements always end up dwarfing frontend productivity concerns. And there is also a huge amount of glue between backend data sources, preventing the creation of a clean database on the backend to sync with, meaning we are creating API gateways and GraphQL federation layers, and all we can hope for is a good client-side caching layer.
If you look deeper though you will find many projects doing all these kinds of things already. Mobile developers are very familiar with offline techniques and SQLite on the client, and it doesn't feel like anything special to them. For the web/desktop, maybe no one has packaged it in the right way yet, or maybe we are still digesting GraphQL, SSR, and serverless, and then there will be another shift with offline-first, reactive SQL in the frontend.
It's funny though that if you think long enough about these concerns, you always seem to end up wanting some Datalog thing.
It's not a direct reference to CfRDTs or operational transforms. They're not wrong that last-write-wins is easiest to implement, but some applications it doesn't quite cut it.
"backend as a service" = "can you just, like, do that magic CRUD stuff or whatever you do"
IIRC, right on the heels of Meteor we got Apollo from the same folks...
What is with this desire to completely neutralize and gut the backend stack? why on earth invert what a webserver is?
Request=> Response (which might possibly be a rendered webpage)
do folks actually think any of this is necessary and/or good?
Look, I'm all for web API's, web audio, webrtc, etc...
I just don't see where a database in the browser is anything but 'cruisin for a bruisin'...
it's not like databases get their own dedicated servers or anything..
or do you mean something like a mobile app using pouchdb? "offline first and syncing to a single real database in the cloud as soon as one rejoins the network"
because a browser-based database can only ever be a compromise, such that a compelling reason could force it, but that's something akin to "DB per user" aka the Couch/Pouch ecosystem.
I'm going to go with a unikernal wrapping a process before this stuff makes it's way through my intestines, color me skeptical of the notion that slippy couplings in between multiple impedance mismatches will be excusable much longer in a world that increasingly cares about power consumption (and battery life) as well as carbon footprints...
I think there s an app for that
Don’t pretend security is free either.
I'm saying that we cannot presume the insecurity of a general purpose browser in a custom application that just happens to take advantage of open ports and an existing distribution network (upgrading/crossgrading connection or not...)
How uncommon is it to open a Zoom link from a browser source into the Zoom app, and countless other examples... not presuming any particular tech stack, which would imply it's own particular vulnerabilities or advantages...
Just curious...I think the "cram it all into a megabrowser from a megacorp and make it run" approach is rather 2021, is all, not necessarily the way of the still as yet undetermined future.
I really liked this post, because it touches on so many things that I have to build as well at the moment. I'm building a configuration management interface; the front-end is basically authentication and heaps of forms, the back-end transforms it into XML and uses some shell scripts to rsync them to servers and SNMP to trigger a deployment. But the users have worries about overwriting each other's work, they ask for undo / revert support, and there's things like audit logging, user and permissions management, etc involved.
At the moment I'm slowly building all of that with a React/TS front end and a Go backend using a REST API, just trying to be as tidy and complete as possible. But it's a lot of work, and I'm afraid that once I get to things like versioning, locking, undo / revert, auditing and permissions, my fairly straightforward codebase will just explode in complexity, with each endpoint having loads of documentation and code to represent the different concerns. Client- and server-side validation is another concern, importing existing data, migrating user data from a staging to production environment, etc.
It's a lot. It's a project that should be done by a full team of developers, maybe even multiple but ATM I'm doing it on my own.
I'll never finish it. I'm currently trying to plough through and make sure everything is set up, but I'm hoping we'll get a big financial injection and I'd be able to hire a bunch of consultants (that I'm familiar with as being decent if expensive developers).
I don't know how the previous guy managed to get as far as he did on the older version, other than being a mediocre but productive and persistent developer for all that time, seeing the whole thing grow over time instead of trying to reach feature parity with a product nine years in the making.
One thought that may help:
As an engineer, your job is to communicate risks. You can’t control whether this becomes impossible, but you can control how you work on it and how you communicate.
One way to do this is to keep an up to date design document with progress and risks. Looks like there are some issues that are worrying you. If mot already would write it out and get thoughts from stakeholders.
Rooting for you!
Thanks so much for writing it!
Isn't this how AWS does it?
It's got: offline first, latency compensation, pubsub, partial sync, authorization rules and serverside timestamps, and global transactions, and 5 9s availability backed by spanner for umpteen languages. https://tomlarkworthy.endpointservices.net/blogs/firestores-...
Funnily enough we are targeting the browser AND FPGAs, and I think the latter is the much more interesting use case for a distributed reactive CRDT-like database.
Datalog is actually trivial to incrementally materialize in open world semantics.
Which is why datomic (while a great foundation in theory) turns out to be non ideal. TxnIDs and retractions are essentially nonmononic negation in disguise, and CALM (consistency as logical monotonicity) a.k.a. distributedness, doesn't go well with that.
Seeing that we're not the only ones dreaming of this gives me hope though, that we might get out of the tar pit someday.
No sure on the scalability of SQLite but here’s a simple way to get database undo/redo.
[update] should be back
What is this hosted on?