> 1. What to do with stale user data? What happens if a user doesn't open the app for a year? How do you handle migrations?
version = db.query("select value from config where key='version'").fetch_one()
switch (version) {
case 1:
db.migrate_to_version_2()
fallthrough
case 2:
db.migrate_to_version_3()
// ... and so on
}
assert(version == 3)
start_sync()
Just don't delete the old cases. Refuse to run sync if device is not on the latest schema version.
One of my Django projects started in 2018 and has over 150 migration files, some involving major schema refactors (including introducing multi-tenancy). I can take a DB dump from 2018, migrate it, and have the app run against master, without any manual fixes. I don't think it's an unsolved problem.
> 2. What about data corruption? What happens if the user has a network interruption during a sync? How do you handle partial states?
Run the sync in a transaction.
> 3. What happens when you have merge conflicts during a sync? CRDT structures are not even close to enough for this.
CRDTs are probably the best we have so far, but what you should do depends on the application. You may have to ask the user to pick one of the possible resolutions. "Keep version A, version B, or both?"
> 4. What happens when the user has millions of items? How do you handle sync and storage for that?
Every system has practical limits. Set a soft limit to warn people about pushing the app too far. Find out who your user with a million items is. Talk to them about their use cases. Figure out if you can improve the product, maybe offer a pro/higher-priced tier.
> Mobiles are really bad with memory. iOS and Android have insane level of restrictions on how much memory an app can consume, and for good reason because most consumer mobile phones have 4-6 gbs of RAM.
You don't load up your entire DB into memory on the backend either. (Well your database server is probably powerful enough to keep the entire working set in memory, but you don't start your request handler with "select * from users".)
You're asking very broad questions, and I know these are very simplistic answers - every product will be slightly different and face unique trade-offs. But I don't think the solutions are outside of reach for an average semi-competent engineer.
> You may have to ask the user to pick one of the possible resolutions. "Keep version A, version B, or both?"
For structured data, with compound entities, linked entities, both, or even both in the same entity, that can be a lot more complicated.
If a user has updated an object and some of its children, is that an atomic change or might they want the child/descendent/parent/ancestor/linked updates to go through even if the others don't? All of them or some? If you can't automatically decide this (which you possibly can't in a way that will satisfy a large enough majority of use cases) how do you present the question to the user (baring in mind this might be a very non-technical user)?
Also what if another user wants to override an update that invalidates part/all of their own? Or try to merge them? Depending on your app this might not matter (the user might always be me on different devices, likely using one at once, that is easier to understand than the user interacting with others potentially making many overlapping updates).
I think you misunderstand. My intention was not to say local-first is bad or impossible; it's not. We have been local-first at Notesnook since the beginning and it has been going alright so far.
But anyone looking to go local-first or build a local-first solution should have a clear idea of what problems can arise. As I said in the original comment: it's not all gardens and roses.
Just a few weeks back a user came to us after failing to migrate GBs of their data off of Evernote. This, of course, included attachments. They had successfully imported & synced 80K items, but when they tried to login on their iPhone, the sync was really, really slow. They had to wait 5 hours just to get the count up to 20K items. And that's when the app crashed resetting the whole sync progress to 0.
In short, we had not considered someone syncing 80K items. To be clear, 80K is not a lot of items even for a local-first sync system, but you do have to optimize for it. The solution consisted of extensively utilizing batching & parallelization on both the backend & the users' device.
The result? Now their 80K items sync within 2 minutes.
The problem wouldn't exist. This was about the phone fetching 80k new items from the server. If the phone just shows the item you're looking at, one at a time, and doesn't try to sync everything, there's no such problem.
One of my Django projects started in 2018 and has over 150 migration files, some involving major schema refactors (including introducing multi-tenancy). I can take a DB dump from 2018, migrate it, and have the app run against master, without any manual fixes. I don't think it's an unsolved problem.
> 2. What about data corruption? What happens if the user has a network interruption during a sync? How do you handle partial states?
Run the sync in a transaction.
> 3. What happens when you have merge conflicts during a sync? CRDT structures are not even close to enough for this.
CRDTs are probably the best we have so far, but what you should do depends on the application. You may have to ask the user to pick one of the possible resolutions. "Keep version A, version B, or both?"
> 4. What happens when the user has millions of items? How do you handle sync and storage for that?
Every system has practical limits. Set a soft limit to warn people about pushing the app too far. Find out who your user with a million items is. Talk to them about their use cases. Figure out if you can improve the product, maybe offer a pro/higher-priced tier.
> Mobiles are really bad with memory. iOS and Android have insane level of restrictions on how much memory an app can consume, and for good reason because most consumer mobile phones have 4-6 gbs of RAM.
You don't load up your entire DB into memory on the backend either. (Well your database server is probably powerful enough to keep the entire working set in memory, but you don't start your request handler with "select * from users".)
You're asking very broad questions, and I know these are very simplistic answers - every product will be slightly different and face unique trade-offs. But I don't think the solutions are outside of reach for an average semi-competent engineer.