Hacker News new | past | comments | ask | show | jobs | submit login
Building realtime collaborative offline-first apps with React, Redux, PouchDB (yld.io)
67 points by tilt on Dec 7, 2015 | hide | past | favorite | 14 comments

A couple of comments since I've spent the last month building out this exact architecture.

First, the author doesn't really address security and authorization, in fact they gloss over it. The typical solution with CouchDB/PouchDB seems to be per-user databases. That sounds horrendous to a SQL guy (me), but it turns out that CouchDB can happily handle hundreds of thousands of databases.

In my current application I have found couchperuser very useful. It's an Erlang Couchdb plugin that creates a database with appropriate permissions every time you create an entry in the _users database. (Access to the _users database is restricted to an admin account so is done via a parallel web service.)

If users are accessing CouchDB directly, of course you gotta worry about what horrors they're stuffing into your lovely pristine database. Control over that is provided via Javascript validation functions that are run on every document save. Those are saved into the per-user database at account creation time.

The biggest downside to this architecture that I've found is that there is two sources of truth on the client - the Redux store and the PouchDB database.

For the client there's only one source of truth: the Redux store.

The PouchDB store is a side effect of changes to the Redux store, no need for the client to worry about these.

I don't understand this obsession with offline support for collaborative realtime apps. There is an idea going around that you can magically sync everything using the same algorithm.

Accurate offline collaboration is impossible without some form of user-assisted conflict resolution - The core problem is that automatic sync algorithms cannot capture the collaborative intent of users.

For example, assume that you have a list of items and two users are interacting with this list while offline. Let's say that one user called 'John' coincidentally deletes all items which start with the letter 'A' but, while also offline, the other user 'Steve' renames one of those 'A' items such that it now starts with the letter 'B'.

There is a problem here, because if John comes back online first; all items starting with 'A' will get deleted. So when Bob comes online, either his rename operation will fail because the system cannot rename a non-existant item or it will re-create the item which John explicitly deleted (but this time with a new name - Which goes against John's intention).

The system can never know for sure the intent of both users so it cannot automatically resolve conflicts (at least not accurately). In some cases, the benefits outweigh the disadvantages but it's important to be aware of those disadvantages and that there are no one-size-fits-all solutions which work perfectly with all kinds of data.

Or you design the protocol (or rather, the operations which are applied to the data) so that conflicts can be trivially resolved.

In your example you could have a single operation, SET, which sets a field at a particular address to the given value. User John sends a list of SET operations, all in the form of 'SET "<item X>.deleted" true' for all items 'X' which he wants to get rid of. Steve sends a list of SET operations in the form of 'SET "<item X>.name" ???', which don't conflict with what John sent. The key point is to have stable addresses (eg. never delete objects so that addresses within it remain valid forever). And a small set of fundamental operations on them where conflicts can automatically be resolved by the computer (you get get by for a long time with just SET, and if you have just SET then conflict resolution is trivial).

Note that in that system there is no notion of a 'Delete all users whose names begin with 'A'" operation. That's a very complex operation with very narrow use case. And it requires a lot of logic to handle all the potential conflicts with other such complex operations.

That's exactly what CRDTs are all about. The garbage collection problem is a real issue in practice.

The authors of CouchDB understand this, too. The document would be marked with a conflict, and the application could try to resolve this conflict if it makes sense, or surface a UI for the user to resolve the changes.

"I don't understand this obsession with offline support for collaborative realtime apps."

You're already offline. Conflicts can always happen. Your 3G disconnects frequently. If you want a robust app, and especially if your target audience is often on the road, why wouldn't you be "obsessed" with offline sync for collaboration? And if you really need user-guided conflict resolution, you can just make that.

Yeah, this is right on the money. I've worked in data sync the last four years, supporting an app that's used in rural parts of the developing world. Unfortunately offline sync was the only available option. What I learned from the experience is that sync is a super hard problem, that conflicts are inevitable, and that you should sync as much information and context as possible. Sync the higher level business actions, not low level data. Because when there's a conflict, you need the extra context to allow the user to resolve it in a sensible way. If you only synced low level data, you frequently find it's impossible to recover from some kind of conflicts.

PouchDB has great support for getting conflicts and merging them, it doesn't have to be dealt with at the sync layer.

Since the client is the owner of the PouchDB database, it has direct access to conflicts and can fix them on the spot.

Your data should be devised to be easily mergeable. Some document schemas are more easibly mergeable than others, CRDTs being the easiest extreme I know.

"But yes, the server has no control over what data is being written. This is why this type of architecture only serves for personal / shared databases, not for public records."

A solution that doesn't take into effect security from the start is pretty limited for any real world usage beyond a simple personal data app in which case why over-architect it in the first place?

Yup. I put together a successful proof-of-concept with React and PouchDB / CouchDB, but quickly concluded the lack of granular security was its Achilles heel.

I switched to Meteor with GroundDB for offline support, and haven't looked back.

That generalization isnt totally accurate, CouchDB has in built ability to validate data that gets written to it, and PouchDB was built specifically to make that type of control easy and powerful.

This can be addressed in CouchDB, by creating a validating function that validates all the incoming updates.

Besides that, I'm addressing this via the architecture (there's an aside in the article where I cover this briefly), which will be a subject of a future article:

Central records shouldn't be directly changed by a user. That's now how central record systems should work.

Instead, we should view any document as a request to get or change the central records, to be processed by a clerk. The clerk then changes the state of the document according to the permissions and results.

Seems like Meteor can slot in another database next to the current MongoDB+MiniMongo implementation?

Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact