Hacker News new | past | comments | ask | show | jobs | submit login
Offline First (rxdb.info)
407 points by thunderbong 59 days ago | hide | past | favorite | 246 comments



I still haven't found the "holy grail architecture" for offline-first with backend sync where the backend isn't just a simple data store but also has business logic and interacts with external systems.

Doing offline-first well implies that the app has a (sqlite or similar) local database, does work on that database and periodically syncs the changes to the backend. This means you have N+1 databases to keep in sync (with N=number of clients). Essentially distributed database replication where clients can go offline at any time. Potentially with different tech (for example sqlite on the client and postgres on the backend).

When the backend isn't very smart it's not too hard, you can just encapsulate any state-changing action into an event, when offline process the events locally + cache the chain of events, when online send the chain to the backend and have the backend apply the events one by one. Periodically sync new events from the backend and apply them locally to stay in sync. This is what classic Todo apps like OmniFocus do.

The problems start when the backend is smarter and also applies business logic that generates new events (for example enriching new entities with data from external systems, or adding new tasks to a timeline etc). Obviously the new server-generated events are only available later when the client comes back online.

When trying to make the offline experience as feature-rich as possible I always end up duplicating almost all of the backend logic into the clients as well. And in that case, what is even the point of having a smart backend.


For all the crap the DoD gets, this has been a solved problem forever in the military. Even putting aside technical things like the Blue Force Tracker or ABMS that present a global view disseminated to users in the field who also feed information back to the distributed data store, it is simply expected and accepted that data will not be consistent in the presence of sparse network connectivity, which you will inevitably sometimes have, especially in the older days when relying on radio mesh networks or even further back before this was primarily a technological problem when headquarters communicated with forward units via horseback messenger. If the decision point is more critical to get right, you wait until you have reasonable assurance your information is accurate and consistent. If the decision point is more critical to be moved past quickly, then you act on the last known state even though it may no longer be accurate. If it's most important that all units be on the same page, then you act on the last known agreed upon plan, even if some units have newer information. They don't update until they get positive confirmation from headquarters that every unit has received the new information.

Heck, as much as I don't like them, even Facebook got this right last I knew when I still used their app years ago. It never required a network connection. If you weren't actively receiving updates, it just showed you the cached last known feed, accepting that it wasn't up to date. And if you tried to post something, it would just cache that too and wait to send it. They didn't invent eventual consistency, either. It's been a basic operating principle of distributed organizations, especially armies, for thousands of years.


Seems like you're describing event sourcing. I'm building an offline-first app and doing pretty much what you're describing.

> I always end up duplicating almost all of the backend logic into the clients as well

Yep, this is a pain I feel acutely. I'm using dotnet (C#/F#) because it allows me to ship & run DLLs on the browser with Blazor, leading to significantly less duplicate code. F# can transpile to Javascript with Fable (F# + Babel), so that's also an option. I haven't fully vetted Blazor yet, but it seems good. Clojure can also run on the server and on the browser with ClojureScript.

The only other option I see is going fullstack Javascript, and I hate Javascript.


> Seems like you're describing event sourcing.

Yes it looks like event sourcing. But most of the classic event sourcing implementations I've seen are mostly backend only.

The project I'm currently hacking on has an MQTT broker that is used by both clients and backend and the events can come from anywhere. For example when a client sends a `location_updated` event, the backend reverse-geocodes this into an address and possibly sends out `place_entered` events, or sends out notifications for Tasks that are now relevant on the new location, or "completes" a "go to location X" task resulting in a `task_completed` event.

Enabling all this in an offline-first paradigm is hard and requires a lot of duplication, I am very close to just saying screw this and requiring network connectivity for most of the features.


.net is also my choice for this. Xamarin on mobile and hopefully blazor on web in the future. Kotlin is the other contender to keep an eye on in the future (Kotlin Native and Kotlin Multiplatform)


> I still haven't found the "holy grail architecture" for offline-first with backend sync where the backend isn't just a simple data store but also has business logic and interacts with external systems.

you might like the architecture of holo-chain (stupid name, imo), it's still mostly going under the radar (also because they chose to refactor their library from go to rust to improve security):

"Holochain combines ideas from BitTorrent and Git, along with cryptographic signatures, peer validation, and gossip. Holochain is an open source framework for building fully distributed, peer-to-peer applications. [It's] purpose is to enable humans to interact with each other by mutual-consent to a shared set of rules, without relying on any authority to dictate or unilaterally change those rules. Peer-to-peer interaction means you own and control your data, with no intermediary"

good dev intro: https://medium.com/holochain/holochain-reinventing-applicati...

"Holochain is [...] like a decentralized Ruby on Rails, a toolkit that produces stand-alone programs but for serverless high-profile applications."


That sounds like it solves part of the sync problem, but not conflict resolution or offline business rules in a general way.


> does not solve offline business rules in a general way

hmm no i think it does. the 'Resilience and availability' section is relevant for your question: https://developer.holochain.org/concepts/4_dht/

holochain is completely distributed, so there is no 'offline' and 'online' because there is no third party. it sounds weird writing that but i'm hoping it might challenge you to dig a bit deeper into the docs, because it's there. if not, please come back and let me know so i can pass on feedback about the docs that were unclear or unsatisfactory for you.

> does not solve conflict resolution

from the docs: "Holochain DNAs specify validation rules for every type of entry or link. This empowers agents to check the integrity of the data they see. When called upon to validate data, it allows them to identify corrupt peers and publish a warrant against them. [...] (the DNA is simply a collection of functions for creating, accessing, and validating data.)"

https://developer.holochain.org/concepts/7_validation/


Sadly they don't solve conflict resolution. They just push the problem to the application developers. From [1]:

> All the DHT does is accumulate all these [updates] and present them to the application.

1: https://developer.holochain.org/concepts/6_crud_actions/


hey so i messaged someone on the Holo team (Paul d'Aoust @helioscomm) with your question, this was his reply:

Yes, people will want conflict-resolution for contention on scarce resources, and it'd be lovely to have that baked into the system.

Two options (and yes, both still leave conflict resolution in the app dev's hands):

- Have the zome function resolve the conflict after it retrieves conflicting metadata from the DHT

- In the future we'll bake conflict resolution right into the network layer, so DHT authorities can resolve the conflict automatically using CRDTS or manually by pinging the nodes that published conflicting information so they can resolve it themselves.

It's also worth mentioning Syn, a library which uses operational transforms (a precursor to CRDTs) for conflict-free collaborative document editing. https://github.com/holochain/syn. I'm looking forward to seeing someone produce something similar, but with CRDTs.


Hmm I guess I didn't understand it then. Makes sense because I have a head cold : ) I'll take another look


hey James, so you were right that it doesn't (yet) do conflict-resolution out of the box. if you want you could check out the other comment with a detailed response from a Holo team member: https://news.ycombinator.com/item?id=28700602


Haha, didn't think I'd run into you here tonight


hey there friend


Hey, I'm not familiar with holochain project, but i heard about some HOLOfuel coins. I was afraid it's just another cryptoscam, but the holochain FAQ doesn't mention coins as part of the spec (good) and you have Marx and Engels in your bio, so consider me curious of the economics of that network.

How does it relate or differentiate with the GNU/Net project? What naming schemes does Holochain support and could it theoretically interop with the GNU Name System? [0]

[0] https://tools.ietf.org/id/draft-schanzen-gns-01.html


hey thanks for your question. so transitioning people over to a world of distributed apps is difficult because it essentially asks people to run the distributed apps completely on their own devices. that is a big shift (and responsibility) that we are not used to in the client server reality of today.

so for all the people who might struggle with this (like my mom), but who still want to enjoy these new apps, Holo (again, stupid name, but this time it's the org who stewards the open source holochain library [1]) came up with a strategy that allows people to rent out some spare processing power on their computer/home-server, to host holochain applications for the above-mentioned people (so sort of like an Airbnb for AWS). they even ran a successful crowdfund that sold a million dollars worth of hardware for committed early adopters (plug-and-play home servers) [2].

so essentially the Holo hosting network is IPFS (and Holofuel is like Filecoin), but instead of hosting someone's files, you are running encrypted app code for them so that they can take part in the app/network, without having the required technical chops (the holofuel currency measures/represents processing power). the FAQ does a good job of explaining it a bit more: https://holo.host/faq/

i'm not super interested in the whole holo thing (they did a filecoin-type crowdfund to pre-sell hosting credits), yet i am glad they did it because it meant the team could ramp up development. they are already alpha-testing now.

about marx and engels. i am personally most excited about the potential for http://valueflo.ws on top of holochain (library called hREA) [3], because it will allow us to move away from today's Enterprise Resource Planning (ERP) software, into a new paradigm of Network Resource Planning (NRP) software. i hope it will have a big impact and enable the growth of the democratic and transparent supply chains of the next (socialist) economy.

> How does it relate or differentiate with the GNU/Net project? What naming schemes does Holochain support and could it theoretically interop with the GNU Name System? [0]

i'll take a look at it, i'm not too familiar with GNU/Net.

[1] https://github.com/holochain/holochain

[2] https://www.indiegogo.com/projects/holo-take-back-the-intern...

[3] https://github.com/holo-rea/holo-rea


You should take a look at my project, Replicache: replicache.dev.

While it is true that you have to duplicate the mutations in the basic setup, you do not have to share the querying/reading code as it lives primarily on the client.

Also, if your backend happens to be javascript/typescript, then you can share the majority of the mutation code between client and server and the result is quite sweet.


When implementing something like this I indeed prefer to share as much code as possible between client/server.

My daily environment is mostly JVM-based (desktop/android/server) so your project is probably not a great fit for me but I'm definitely going to look into it for some inspiration.


Ive found the same. There is no silver bullet with offline-first. Its extremely product / architecture specific. But I do lean heavily on the side of offline-first being a _much_ better experience overall.


Check out WatermelonDB: https://nozbe.github.io/WatermelonDB


I've been looking for something similar and found a lot of approaches, spanning CRDTs, reactive databases, etc all of which look really promising. For me one of the missing pieces is situations where some parts of the object graph should not be visible or editable by certain users. That sort of thing comes up the whole time and can be pretty complex yet is often just handwaved away.


> one of the missing pieces is situations where some parts of the object graph should not be visible or editable by certain users. That sort of thing comes up the whole time and can be pretty complex yet is often just handwaved away.

I discovered this problem in my domain about two months ago. My solution was to split my Aggregate Root into smaller pieces, to use DDD terminology. In my domain, each user/client owns their own data. However they may choose to publish that data and have it be publicly visible so others may view/comment/copy/pull-request it. It's basically Github for flashcards. So, I have an Aggregate Root for publicly visible flashcards, and another Aggregate Root for a user's personal flashcards. It helps that there are distinct behaviors for each Aggregate Root - there's no real point to leaving a comment on a personal flashcard, and there's no real point to "studying" a public flashcard (because that implies logging your study history to the public card). It does mean, however, that there needs to be a translation layer - it should be possible to convert a private flashcard to a publicly visible one, and it should also be possible to copy a public flashcard to your own personal collection.

Obviously this is very domain specific.


I have one I've been working on for a few years.

It's based off something I call "source derivation"....

It has data sources where things get stored, repos that handle syncing and pulling data, and ents which represent the data but add functions to it.


What if N clients use N versions of the client software?


I played with rxdb about a year ago, it’s really clever and has some great ideas.

It’s built on the incredible PouchDB (the original offline first db) which is a truly feet of engineering. However I feel it is somewhat neglected (not implying that the original maintainers have any responsibility to the users, this is free Open-source, the fact they gave it away in the first place is brilliant). When I last used it about 6 months ago there had been very little activity in years on the github and I am concerned about their use of automatically marking inactive issues stale after 60 days. There is so little activity the only issues that are open are ones from the last 60 days.

I found a sync conflict[1] (basically a hash collision, the attachments are unintentionally stripped from the document before hashing) and submitted a ticket, unfortunately as there is so little activity the issues was marked as stale and closed automatically. It’s not a bug that many people will come across but is a legitimate issue for anyone using binary attachments.

So, rxdb is really great but I would be cautious about using it when the underlying platform is looking a little neglected. I truly hope someone has the time to take it on as it is an incredible toolkit.

1: https://github.com/pouchdb/pouchdb/issues/8257


Stalebot is awful. Issues don't stop being real if you neglect them!

Use a voting system or explicitly prioritize work instead.


A big part of it is that GitHub Issues has essentially been the same product for the past like, 10 years, and even 10 years ago it was still seriously behind contemporary alternatives in a number of ways. The only reason people tolerate it is basically the social inertia, obviously. I bet if you ask any maintainer of a major GitHub project what pain points they have, Issues will be a top contender.

If you've ever worked on a project with 1,000+ issues, you'll know how big a difference small stuff makes when it comes to efficiency, especially for people who maintain the project. GH Issues really is lacking in so many ways, and as a result people come up with all kinds of crazy automation strategies to help make the issue database more "useful" to the developers, even if it's basically second class automation. Stalebot is one of these. The idea is really (at heart) that you just want to keep the open ticket count low because that's one of the simplest Signal-to-Noise ratios you can use as a search filter or mental criteria. If you have better tools (more powerful search, customized forms, powerful tagging, etc), this isn't true, but you have to play the hand you're dealt.

I don't think it's a good strategy, mind you. But I think understanding these recent trends as an effect of older, more fundamental causes is worth pointing out. This is all based on my experience, mind you. But it helps understand the thought process. And people see these tools being used on their big projects, so they kind of naturally gravitate (or at least try them) out of curiosity.

Two issue trackers that are substantially better than Issues are both Maniphest (Phabricator) and Trac, for curious people. Trac was a bit annoying to run and I think effectively unmaintained now, but as a bug tracker it's actually still really good. (It was also small and easy enough to hack that we were able to make modifications to fit our workflow, and maintain them for a long time.) I still miss both of them a lot every time I open a big project on GitHub and have to start searching for issues... Here's hoping "GitHub Issues 3.0" will get some things right and they won't wait another 10 years before doing major updates.


Sometimes an issue gets fixed in a new release and nobody bothers to close the ticket, it may even go unnoticed altogether. Stalebot is just about the only way to get rid of that clutter. If the issue still applies, it can be reopened and maybe it will get some renewed attention.


Or you can just read through all your open issues once in a while?

It only becomes a problem if you leave so many issues untouched that that becomes a chore.


> Issues don't stop being real if you neglect them!

Debatable.

Issues don't stop being real if someone out there runs into them and can't find a solution because the issue has been closed. Issues very much stop being real if the only person in the world who cares about them decides that they don't care or solves it themselves without sharing their wisdom. The former hurts the people whose problems are considered unimportant, while the latter hurts the developers of the project who now have an essentially dead issue in their tracker.

The solution? Close stale issues, but with the possibility of letting people say: "Hey, this is closed but is now relevant to me, so let's reopen it."

That way, there are never open issues that no one cares about, while the ones that someone starts caring about can be reopened, until they're either fixed or no one cares about them yet again.


That's how you get a 4 year old issue comment thread that looks like this

- stalebot: closed due to inactivity

- reporter: still an issue

- stalebot: reopened

- stalebot: closed due to inactivity

- reporter: still an issue

- stalebot: reopened

- stalebot: closed due to inactivity

- reporter: still an issue

- stalebot: reopened

- stalebot: closed due to inactivity

- reporter: still an issue

- stalebot: reopened


In my eyes that's a good thing, it would immediately let you know that:

  - the problem persists and hasn't been solved to date
  - no one actually cares enough to solve it because the discussions just die out until someone runs into it
  - the problem is also unimportant enough for it not to warrant either a resolution, or getting closed by the devs as a "won't fix"
If i saw an issue like that, i'd just reconsider what i'm doing and would look for another technology/library for my needs. Essentially, seeing the above would just be a red flag.


better than a list of stale issues?


I'm happy to see news about RxDB. I really want them to do great but have the same concern with the inactivity of underlying tech. Both PouchDb and CouchDb have existed for ages and are such a great fit for modern web apps. They just lack the adoption and connection to newer frameworks.

An activily maintained SvelteKit/RxDB starterkit with built in auth might might get RxDB some new fans.


RxDB maintainer here.

I am aware of that problem. Pouchdb got some good love in the last weeks where some people made good PRs with fixes for the indexeddb adapter. But still it is mostly unmaintained and issues are just closed by the state bot instead of being fixed.

So in the last major RxDB release [1] I abstracted the underlaying pouchdb in a way that it can be swapped out for a different storage engine. This means you could in theory use RxDB directly on SQLite or indexeddb. In practice of course someone has to first create a working implementation of the RxStorage interface.

[1] https://github.com/pubkey/rxdb/blob/master/orga/releases/10....



> feet of engineering

Thanks for the chuckle.


Yeah, I can't believe they haven't switched to the metric system!


I use PouchDB. It works pretty well.


I do this for my mobile app, because the users are often working in remote areas without cell coverage or any network at all.

I have a pretty simple strategy though: I use SQLite on the mobile devices and when the device is back within network coverage, take a copy of the SQLite database, zip it up and throw it up to the server. It ends up in an S3 bucket (all the device database backups end up with a UUID as their name), kick off an automatic process via a Lambda (triggered by S3) that imports the SQLite database into the bigger DB, job is done.

It works pretty well, SQLite databases compress REALLY well. The only tricky bit is having to check the version code of the database in case I have an old version of the app floating around in the wild (it happens).


This sounds very naive to me. Maybe it works for your specific use case but it is never that easy most of the times. What happens if the user has your application on two devices? What happens if you want the server to update this data? What happens if the data involves multiple users?

What you're doing here is not much more than a backup.


So what happens if the client and server have a conflict? E.g. their versions of the data diverged..

These "merge conflicts" seems like something that must be handled in a custom way for many apps.


Is that safe? That sounds like intentional SQL injection


It sounds pretty safe. Your only copying rows from one database into another database. Unless you accept arbitrary SQL strings from the user it's not a significant security risk.

It does open up some possible vulnerabilities like can the user overwrite other people's information but mitigating that requires the same validation layer you should have anyway.


SQLite is explicitly not safe to be used on arbitrary DB files and there’s a nontrivial amount of exploits on it from DOS to heap overflows to remote code execution that stem from untrusted SQL queries or processing untrusted DB files [1].

At a minimum you have to follow [2] but you don’t get to say “it’s safe to open malicious files or process unrelated queries“ and “SQLite has a good security track record because all our CVEs are only from untrusted queries and malicious input files and CVEs are useless anyway“. Those are facially contradictory positions likely written by different team members that reflect their individual perspective rather then there being a well thought or security stance (at least in my opinionated viewpoint).

[1] https://www.sqlite.org/cves.html

[2] https://www.sqlite.org/security.html


I do have a few of these in my importer but thanks for the links, will be implementing a few more of these safety measures.


Exactly. It won't overwrite stuff that isn't yours, everythin is in a silo in the database.


As another comment pointed out, it's more dealing with carefully crafted db files that trigger issues or exploits, like a zipbomb would for archive processing.


Well, create a zip file with all SQLite tables converted to CSV. This can be done in a streaming fashion, using very little extra space.

Hopefully your CSV parser has fever vulnerabilities.


Its safe enough. The user never sees the database they send, and the import routine does an extract and conversion that only uses a few tables. None of it goes near the user auth infrastructure except to check that the sender is who they claim to be.


A user can trivially sniff the traffic, realize you are sending a zipped SQLITE database, and craft a malicious binary file, zip it, and send it to your API. What the user sees or can do using your app is irrelevant from a security perspective.

You are opening an untrusted binary file using SQLite on your backend. This is 100% not safe.

You should convert to JSON or some other serialization before you send it, then your API should only accept JSON. Zipping a SQLite database is not a good serialization method... Accepting and opening an arbitrary sqlite binary file is asking for trouble.


In my eyes, the following should be equal:

  - a binary file that's used for storing data, like an SQLite database
  - a text file that's used for storing data, like an XML or JSON file
Someone else linked a list of CVEs and how those could be exploited, the look of which isn't that bad: https://news.ycombinator.com/item?id=28690837

Those problems could be addressed in a pretty easy way, plus if you're security conscious, just run the importer in an entirely separate container which will basically be single use (possibly distroless, if you want to go that far), with resource limits in place.

But that's not my point. My point is that both of the data formats should be pretty much equal and them not being so in practice is just a shortcoming of the software that's used - for example, even spreadsheets ask you before executing any macros inside of them. There definitely should be a default mode of addressing such files for just reading data, without handing over any control of the computer to them.

> Zipping a SQLite database is not a good serialization method...

Therefore, with this i disagree. SQLite might be flawed, but zipping an entire dataset and sending it over the network, to be parsed and merged into a larger one is an effective and simple solution. Especially, given that your app can use SQLite directly, but probably won't be as easy to make while storing the state as a large JSON file, which will incur the penalty of having to do conversion somewhere along the way. Here's why i think it's a good approach: https://sqlite.org/appfileformat.html

Who's to say that JSON/XML/... parsers also wouldn't have CVEs, as well as the application server, or back end stack, or web server that would also be necessary? In summary, i think that software should just be locked down more to accomodate simple workflows.


> My point is that both of the data formats should be pretty much equal and them not being so in practice is just a shortcoming of the software that's used - for example, even spreadsheets ask you before executing any macros inside of them.

Okay, but you need to defend against reality, not against what could in theory be possible.

Sandboxing is a pretty good solution, at least.

> Who's to say that JSON/XML/... parsers also wouldn't have CVEs, as well as the application server, or back end stack, or web server that would also be necessary?

Raw SQLite files are a huge attack surface that isn't directly designed to be secure. JSON is an extremely simple format that can be parsed securely by just about anything (though occasionally different parsers will disagree on the output).


(Edit: Strange - this reply was intended to be attached to the grandparent of this post. Not sure why it ended up here.)

XML, a data format explicitly designed for interchange where parsing untrusted input was a design goal of the language.. contains ‘external entities’, which permits the person crafting an XML doc to induce a vulnerable reader of the document to pull in arbitrary additional resources and treat the data from them as if they came from the document creator.

There are all sorts of confused deputy attacks you can perform via this kind of mechanism.

If XML can have that kind of issue, when it ostensibly contains no arbitrary execution instruction mechanism at all, how can you expect a sqllite database file, which can contain VIEW and TRIGGER definitions, to be safe?


It's been repeated a few times here that SQLite is a big attack surface - might be worth taking that discussion to a new submission rather than continuing to hijack this one:

https://news.ycombinator.com/item?id=28691759


Your argument can be extended to claim just shipping executable binaries that output the data you want when you execute them should be equivalent as well.

It’s unsafe because the attack surface is so large and the use case of an untrusted attacker isn’t something strongly considered.

> In my eyes, the following should be equal:

In an ideal world maybe, but this hasn’t been true for the last 50 years.


These are good points, I do have some security concerns raised above that need addressing but overall, the profile of this app is tiny and the the chain of ownership is relatively secure so the risks seem pretty low to me.


At least for your first point : Security by obfuscation is not security. Counting on « the user not realizing » that your app has a massive security hole is not security.

For the second point, I would say that SQLite has a massive attack surface, it would be very difficult to ensure that that technique can’t lead to an exploit of some form.


On a rooted phone the local database copy could be fiddled with I guess, but the user needs to be authenticated to upload a database, the lambda that extracts the data is sandboxed to access only what it needs and nothing in sqlite is run, the extractor does a select on a few tables.

Unless there is some way to introduce a malicious side effect to a select statement in sqlite?


> On a rooted phone the local database copy could be fiddled with I guess

If you depend on users (attackers) not being able to modify their software or environment and poke around at each and every bit of your (publicly accessible) interfaces you are doing something awfully wrong!

> but the user needs to be authenticated to upload a database

Is registration for your service limited to a fixed amount of trustworthy people? Otherwise this isn't an obstacle.

> the lambda that extracts the data is sandboxed to access only what it needs

Using a simple serialisation format would be orders of magnitudes safer (and simpler)

> Unless there is some way to introduce a malicious side effect to a select statement in sqlite?

See all the links posted here already


It has nothing to do with fiddling with a phone. The user simply needs to run their traffic through a proxy, observe what kind of requests it's making, and then construct a malicious request from a machine that's well suited to doing so - probably their desktop rather than their phone. They can obtain the auth token either from sniffing the traffic or by extracting it from their phone; the former is easier. You seem to be assuming that the only way of maliciously making a request is by somehow altering the phone which is running the app. That is not how it works.

As for introducing a malicious side effect into the query, that's simple: just add an UPDATE, DELETE, CREATE, or INSERT. When you say that the importer can only run SELECT statements, do you mean that it's only authorised to make SELECT statements, or are you simply assuming that the importer won't be able to mutate any data? Because I suspect it's the latter, and that's not correct. I really truly hope your application is not responsible for anything important.


Can the authenticated user upload a modified database file that identities him as a different user? (For example by changing a uuid or username in the data before it is sent)? That could be done in JSON format, too, so it is not specific to SQLite. Just curious if this is a possibility.


Let’s assume your code does something along the lines of:

   Download sqllite file from S3
   Mount file as a sqllite database
   Execute a statement like SELECT * FROM userData on the mounted database
   Connect to an online database and insert the returned data into an importedData table for later validation and integration
I’m assuming you’re running this in an ephemeral lambda like context where it only has the data and permissions needed to accomplish those operations.

What can go wrong here, given the user has control over the sqllite file? How could someone who has observed that your system uploads a zipped sqllite file craft a payload to do something malicious?

Well, that code would run just fine even if userData was not a table - it could be a view. That means the data returned to your query doesn’t have to come from data in the sqllite file they uploaded, but could be being calculated at select time inside your process. Are you sure there aren’t any sqllite functions that a view could use that can read environment data or access the file system? If there are, they could get your code to import that data into their account - data that might easily include S3 access secrets or database credentials.

Are you also sure there’s nothing in a sqllite file that tells sqllite ‘load the data from this arbitrary external file and make that available inside the schema of this database’? Then a view could pull data from that context.

Maybe that would let someone craft a sqllite database that imports the contents of your AWS profile file as if it were one of their data values.

Now, I did take a look at the sqllite sql syntax and I will say I don’t see anything that looks immediately exploitable (no ‘readEnvironment()’ built in function or anything) but that doesn’t mean there’s nothing there (are there any undocumented test features built in to specific implementations of sqllite maybe?). But the question you need to consider is: Mounting fully untrusted db files just might not really be a vector sqllite is built to defend against, in which case that puts the onus on you to be sure that the file is as you expected.

Also, where are you left if in the future a new version adds a feature like that?

ANY mechanism along these lines that lets a sqllite db pull in environment or file system data would make this system exploitable within the bounds of sqllite, even if sqllite contained no ‘vulnerabilities’ like buffer overruns to maliciously crafted files.

And the crazy thing is, these kinds of vectors have shown up in data exchange oriented file formats like XML and YAML, so it’s honestly prudent to assume that in a richer format like sqllite they are almost certainly present until proven otherwise.


What do you do with the backups?


Keep them for a while, do an s3 sync to a local machine once a week for a good old tape backup, and use an s3 policy that sends older backups to s3 glacier. They are relatively small so this is manageable.


> We now have better mobile networks and having no internet becomes a rare case even in remote locations.

That's simply not true. There is no internet on many parts of a train journey. There is no internet in parts of the underground rail network. There is no internet on the plane. There is no internet in nature. There is no mobile internet outside the EU, until you get a new SIM card. There is no usable internet in some hotels.

If you travel a bit, you know that internet is far from guaranteed.


People have been saying this for a decade, and I still sit there realising how different a bubble a lot of commenters here live.

I visit my parents most months. They live in rural Ireland. Not like a one off build down a back road miles from civilisation, but a small village of a few hundred people. They absolutely do not have reliable internet, and at this point they've cycled through every available provider.

To visit them I take a few hours train journey. For about half this train journey the internet is not reliable, whether edge/3g/4g or onboard service.

Before the pandemic, I visited the US a few times a year, United's onboard internet is very limited, expensive, and reliability is "it sometimes works".

So I regularly run into cases where I need to prepare ahead of time for no/spotty internet and still get surprises as some app refuses to launch because it decided now is the time it needed to update over a connection that's doing single digit kilobytes per second, or speak to a license server or whatever.


>People have been saying this for a decade, and I still sit there realising how different a bubble a lot of commenters here live.

Nearly every thread involving cars or traffic also makes this abundantly clear.


No kidding. Not to mention the disdain for things that are nearly essential in rural areas: 4wd trucks and gas/diesel instead of electric vehicles, or the right to carry firearms.

I live in rural PNW. Electric is at LEAST 20 years from being a feasible and reliable solution. ...and I say that as a civil engineer with some specialization in creating EV charge stations. 30-50 might be even be more plausible.

When I'm in the field, small things can turn into life-and-death situations pretty quickly out here when you're an hour and a half drive from cell reception, followed by another hour and a half drive to town with a small hospital/sheriff station, and you're up a dirt road where a tow-truck simply won't come. And the geology is notoriously unstable and slides/washouts/fallen-trees happen constantly.

My clients LIVE in those places, I only visit. They drive big diesel rigs (They often NEED to transport big heavy stuff, and you can store diesel for long time periods in a tank onsite), and a lot of them also have transfer tanks in the back of their truck for extra range. (Because it's needed!) They also almost all carry food/water/shelter/A chainsaw/tools/a gun/etc in their vehicle for a reason. Self-rescue is all you got a lot of the time.


That's my perspective as well and where I have the most visceral reaction.

But, to be fair, there are a lot of folks who will say 'I'd never go without a car' that haven't lived in a place with great public transportation. I used to think that way as well until I spent six months living in a place that had great bike paths and lots of options for local and regional public transportation. I started off with the intent of trying no-car daily living and it was perfectly fine. In six months I rented a car one time for a weekend trip, that was it.

So to me the lesson is to try to have a bit of empathy for the personal experience of the person making bubble points and focus on expanding their perspective rather than debating their position.

This leaks into the misinformation topic as well but there's another thread for that :)


Em. Half of my relatives live in a rural area. Not a single one of them has a 4WD truck (or any trucks at all). Why is it considered a necessity in the United States?

It's a pretty big country too, with shitty roads and pretty low population density.


Trucks or vans with 4x4 drive are pretty common in Mexico, Brazil, the middle east.

You're falling into the same trap that the article and GP's are highlighting: your experience that a FWD vehicle is enough "for me / my relatives" does not apply to this situation.

1) survival in animal / vehicle collusions with bears, deers, and moose

2) delivery of goods, including building material, animal feed, human food, etc

3) lack of even basic road infrastructure maintenance by government or municipality

4) "over prepared is only prepared", you're it, you're on your own

I lived for 20 years in a semi-rural area; you could certainly live for 99% of the time without a large cargo vehicle or 4x4. The other 1% you were chancing your life. Cargo and deliveries were certainly an issue though. Now just slide that ratio towards the middle.


Watch Matt's Offroad Recovery on YouTube, sometimes you need 4WD and ground clearance to get around reliably. Certainly more people own vehicles in that caliber than are necessary, but that's not always the case.


Do they raise cattle / own farms like most of my clients? I don't know how you'd get by without one for that use case.


Where do your clients live to need guns for protection? I presume it's for wildlife? What kind of wildlife poses that kind of threat? I know it's mandatory to carry firearms outside of settlements on Svalbard for polar bear protection, and friends of friends have found out why the hard way. (One literally woke up with his head inside a polar bear's mouth.)

Also, I don't think "licensed hunting firearm for wildlife protection" is quite relevant to "gun rights". You don't need to buy your fifth AR-15 or full auto Uzi at the grocery store to protect yourself against bears.


To answer your question, the firearms are needed for protection from both wildlife and people, but much more so people. It's useful to have a rifle to put down a cow that's been hit by a vehicle and/or broken a leg or something.

However, I live in an area where stumbling upon foreign cartel marijuana grows is relatively common. They are known to aggressively defend them with firearms which are not legal to own in the jurisdiction they're in. It's also not uncommon for a truck full of guys intent on committing armed robbery to roll up onto a client's property.

An AR-15 would be an ideal defensive weapon for that use-case.


I have to admit, I'm not very experienced in armed conflicts with drug cartels, but the idea that engaging their people with your AR-15 would be better than retreating and/or deescalating seems more like a gun fantasy to me than a realistic assessment of such situations.


I have significantly less experience on the topic than Enginerrrd seems to, but I can confirm that:

> the idea that engaging their people with your AR-15 would be better than retreating and/or deescalating seems more like a gun fantasy to me than a realistic assessment

is more a product of your

> not very experienced in armed conflicts with drug cartels

that it is a accurate assessment of how effective/possible retreating and/or deescalating is.


Thanks! I can definitely see the merit of the cartel argument. Also good point about putting down injured animals.

(I still think gun regulation is a good thing, with some background checks to reduce the chances of people with e.g. psychosis getting their hands on assault rifles.)


You're coming into this pretty hot and it seems like you've fast-forwarded a few exchanges into a conversation that hasn't happened. It's going to difficult to exhibit empathy in that circumstance.


I don't understand most of your comment (English is my fourth language). I don't understand "coming into this pretty hot" and "fast-forwarded a few exchanges". I also don't understand the part about empathy. What's empathy got to do with anything? I obviously understand the individual words, but not the idioms behind or what you are trying to convey.

In case any parts of my comment were unclear, I'll try to reiterate or clarify.

First I'm genuinely curious about what kind of wildlife would pose a threat to the point where you need to defend yourself by carrying guns in your car, and where you would risk such an encounter. I try to convey that this is curiosity more than criticism by comparing it to the genuine need on Svalbard.

I then address their claim about "gun rights". My point is that protecting yourself from wildlife isn't about gun rights. No one (as far as I know) is looking to ban a licensed hunting rifle or high-caliber handgun where the owner would need it for protection. My point is that many "gun rights" advocates want to have five AR-15 or a fully automatic Uzi—guns that are highly capable of killing a high number of people and not very effective against bears. In other words, I understand their clients' (legitimate) needs but I don't think it's relevant to the concept of "gun rights".


> I visit my parents most months. They live in rural Ireland. Not like a one off build down a back road miles from civilisation, but a small village of a few hundred people. They absolutely do not have reliable internet, and at this point they've cycled through every available provider.

This is why state run services matter, Because they will 'serve' in areas where private businesses cannot turn up profit and so they have no incentive to serve.

5-6 years ago, In India only the state run telecom - BSNL's service would be available at remote, hill stations. But with 4G it couldn't keep up with private telecom and the company is nearly done for. So now again there's no connectivity in remote areas and mountains as private players don't bother ~~serving~~ doing business there.

This is especially worse with pandemic, many in such areas have lost communication outside world and remote-education is non-existent for children there.


> So I regularly run into cases where I need to prepare ahead of time for no/spotty internet and still get surprises as some app refuses to launch because it decided now is the time it needed to update over a connection that's doing single digit kilobytes per second, or speak to a license server or whatever.

Recently I was on a flight and prepared a book on my iPad the day before. iBooks decided that it was a good idea to “offload it into the cloud”, a book that wasn’t even 24 hours on my device with plenty of space available. Who knows…


Or when an app decides that it's too out of date spontaneously, and I literally get stuck on an error screen when I just launch the app that used to work up until today... "just install the latest update to enjoy this app"

Yeah, that's a dick move when I'm not on wifi or don't have data access at all on a long hiking trip.


> iBooks decided that it was a good idea to “offload it into the cloud”, a book that wasn’t even 24 hours on my device with plenty of space available

I had the same problem with Google Books on my Galaxy devices. There's a feature that will let you pin the book to be available offline. But you have to do that for every book.

It really takes away the usefulness of the devices.


We were promised rural broadband! 2 billion of rural broadband.


Also: there's no Internet in random buildings, underpasses, corridors - even in the middle of a metropolis.

For example, a grocery market near me somehow manages to attenuate cellular signals so badly there's no mobile connection more than 3 meters inside. Meaning no Internet when I walk between shelves, and no Internet when I stand in a queue for 10 minutes.

(Sure, the building has plenty of wired and wireless Internet connections in it, but I'm not allowed to use any of them.)


There's no Internet in my second bathroom when the cable is down.


My flat is completely shielded somehow, without the microcell I only get cell connection by opening the windows. Windows closed, if more than a meter from the window there’s basically no signal.

And like GP the nearest grocery has basically no signal inside the store, it only picks up beyond the checkout lanes.


Depending on the person, your flat either has an annoying bug or a fantastic feature.


Insulating windows often have metallic films that can block radio signals. Combined with the tendency to use metallic films for insulation and metallic meshes for support means that modern buildings are often Faraday cages.

Which is sort of ironic in that modern communications is dependent on radio. Older buildings tend to be better for cell reception.


> Insulating windows often have metallic films that can block radio signals.

Yep that's pretty much what I inferred at the time.

It was pretty frustrating back then though, good thing I moved in in late summer and opening the windows any time I needed to do admin (either online or by phone) was fine, would have been rather annoying in winter.

> Which is sort of ironic in that modern communications is dependent on radio. Older buildings tend to be better for cell reception.

Indeed, and moving from an older building (where reception had never been any issue) is exactly what I was doing.


Of course, when you get to buildings that are old enough, you have several layers of solid material (brick/stone) on the exterior, as well as things like double-thickness brick interior walls as well. This doesn't do RF propagation any favors, either.

My home was built in 1830 and 2.4Ghz wifi is strictly line of sight within the building, and cellular phones must be kept near windows on the side of the building facing the nearest cell tower to function.

And this isn't in a rural area either, this is life in a brownstone in the middle of a large city...


It gets more interesting. There used to be no internet in some corners of a fast food restaurant I frequent. But then I bought a new phone and now all of a sudden there is internet. I don't know what it is, is it the antenna layout, or is it the modem hardware or firmware, or did the carrier do something on their side, or is it something else entirely.


There's a few subtle points here. One is that I don't think the article actually disagrees with you — it just assumes internet access is readily available for the sake of argument, and moves on to explain why offline-first matters even despite that.

The more interesting point is that "internet access is flaky" isn't necessarily a good argument for going offline-first. Rather, it only suggests that you need your client to be more resilient to being knocked offline in an online-first world. Rather, the article argues that an offline-first is interesting unto itself as a completely different architecture that's closer to a desktop application where the binary is delivered through a browser.

Put differently — they're not trying to argue you should make Gmail work offline, but rather that you should consider structuring your application as Thunderbird inside the browser.


Yeah, I did not expect the comments of a pro-offline-first article to be filled with complaints that he didn't consider lots of places don't have reliable internet!


Amen brother! Nothing worse then some JS based APP using 2G/3G in Africa. Its not about speed but latency and packet drops :/ No one optimizes for this stuff. Makes the internet unusable even if you have nice bursts of 100kb-2mb/s sometimes.


You should read the rest of the article and realise the author actually agrees with you. The comment you are replying to is taking the quote out of context.


You can write wonderful JS-based offline-first apps.


The question is not whether you can, but whether you will.


Whether others who wrote the app you need to use did?


He should make a few trips to Africa.

Some MNO base stations are powered by solar and only operate a few hours a day, sometimes a full day and sometime a partial night (depending on batteries, if they have).

Not everyone has electricity, so people walk to town to charge their phones at spaza shops for a few minutes/hours. Fiber/POTS is non existent.

Outside of larger town and cities, reception is non-existent or spotty at best. You might get reception near a highway.


My understanding is when a cellphone has weak service it boosts the signal to try and reach a tower. In my experience, it feels like it drains the battery incredibly fast. Are there OSes or phones that accommodate this?


That's correct - if the signal is weak the battery is drained quickly. I'm not aware of any OS or phone which can mitigate that - is there any other way than to give up on the connection?


If you're battery constrained and not expecting a signal most of the day I imagine you could put the radio to sleep for 5-30m. I kinda understand a weak signal getting boosted and draining a battery, but when there's none at all? It just seems like a use case they didn't care enough to address. Since it sounds like this is the primary use case, I figured someone might have tried.


An extra phone or device as a hotspot


True. I notice it when out camping in the mountains where there is no reception. On airplane mode the battery can easily last 2 to 3 days max but if you leave airplane mode off, it will die within 1 day.


And even within the EU, it's not guaranteed. I'm currently in Berlin and the internet here is ass.

My current provider dies on me at least once a day and the speeds are atrocious. Mobile network is also weird. Every time I go into a store for shopping I lose my connection. Once I come out I get a cute text message saying "Welcome to Germany, don't forget to get tested"


A fellow Berliner here. Never understood this. I live inside the A ring, yet if I am not standing close to window, even the regular call doesn't work.


More like a German thing... This happens to me in Stuttgart, too. If I want to make a call when I'm at home, I usually just call with Signal over my wifi, because the normal cellular network is terrible.


Yeah same. Doesn’t work in parts of my apartment or Treppenhaus or Hof at all, also doesn’t work in shops for some reason which can be surprisingly annoying if I quickly need to look up something for a recipe or message my girlfriend to ask if we need something. Data is also absurdly expensive.


Sounds like the building is well shielded.

My flat is the same, essentially no signal inside, just open the window and I immediately get full strength LTE.

Ask your provider for a microcell, if you want to change the situation (and often waste less battery, unless cellular is disabled smartphones tend to dislike not having signal or having poor signal quite a lot).


I’m curious if this is the location or the building; either way is relevant to me when I next move. Altbau or new?


I'm also in Berlin. I know what you mean. It became a personal joke to note that the most remote corners of the world have better internet than parts of the Ring.


Internet? We don't need that newfangled bullshit, we build the best cars in the world /s


You are technically correct but the quote you are replying to is taken out of context and does not represent what this article is about.


It's the quote used to introduce the premise.


But isn't the rest of the article more true if reliable internet access isn't available?

Their opening remarks about poor internet access now being rare is followed by

>So do we even need offline first applications or is it something of the past?

Their premise isn't "now internet access problems are solved we should use offline first ", it's "even if we we say that internet access is solved, we should still use offline first".


I think the point is that even if you grant this significant apparent limitation of the premise, it doesn't mean the whole thing is useless. (Nevertheless, the presentation is still incoherent, because it both welcomes and disparages data with multiple current states in semi-independent databases.)

Even if we have perfect and constant connections, you still obtain benefits by writing in this model: for instance, if you assume you have a constant and perfect network connection, you can connect a websocket to the server to ensure you always have the current data for each page and data type. Or, you could follow the offline first model and have a singe update/subscription system to mirror the database locally.

I'm very nervous about their presentation. It says "offline first" and "websites lie because they show you the current state of the data at the last time you had a network connection". If the latter is a problem, a lie, then you can't possible write offline first. You might write using a model that works equally well online and offline, but you necessarily accept forking data and multiple current representations if you allow a computer to show the data without being connected to the authoritative repository of that data.

I think the author has many good ideas, and might have a very good implementation of a very good set of ideas, but this intro page reads like the sort of thing that gets misinterpreted a dozen times and you end up with something worse than current interpretations of "premature optimisation is the root of all evil".


Exactly.

And even if there is, there may be:

- a small bandwidth

- a limited contract

- a huge ping

- regular network errors

And even with a good, unlimited network, local will always be snappier than doing a round trip.


On my Berlin-Poznan trips, I sometimes had a 10s average ping, and an equally absurd packet loss rate. SSH worked great, but the regular internet was unusable.

I design my websites with those trips in mind.


s/ping/latency/


I've always been unclear of the difference between all the terms. Whats the diff between "ping", "jitter" and "latency"?


ping is what the ping program reports, which, depending on the QoS settings the provider has, can be very different from the roundtrip time of packets you actually care about.


Not at all, ping is just a nickname for an ICMP echo message and the name of the /usr/bin/ping program.

The output of the program are RTT minimum, max, average, mean deviation and percentage of packet loss.

You can also measure latency over UDP and TCP using other tools. For 99% of practical use cases there is actually no meaningful difference.


Can you explain to me why and how this can be different?


Your provider can for example filter out all ICMP packets giving you timeouts for "ping", but you can still reach the host just fine with other protocols. Similarly your provider probably prioritizes protocols that require low latency like VoIP over ICMP packets.


Alternatively, they might prioritize small packets and ping will be misleadingly fast compared to bigger data flows.


Another possibility too is that your pings don't experience any packet losses, whereas actual data does which would increase the real latency.

In short, there are many potential differences between ping and actual latency.


Also: People who just do not have service for accessibility reasons. I meet many people who only use WiFi either because they can't afford service or can't deal with the paperwork or some other reasons.


Or because they used up all their data for the month.


What's interesting is if you go to Columbia, South America you get signal everywhere, even in the jungle, because when the cellular companies came in and asked if they could create a network Columbia said sure, but it had to work everywhere, even in the mountains and jungle.

We could have internet everywhere if our governments gave 2 shits and wasn't corrupt as all get out.


To be fair, you'd get a surprising amount of the public objecting to building infrastructure for mobile networks in "scenic" areas like jungles or forests


I live in London. There are spots in London where mobile data just randomly does not work. Clapham Junction - the busiest rail station in the country was often a black spot for me just a couple of years ago. The rail corridors in to London often have sections with no 4g and variable 3g.


You don't need to go to a remote area either; there is absolutely zero signal in large parts of the subway in my (European) city. I'd like to see the author enjoying his "better mobile networks" while traveling around there. So yes, offline first is still a very nice thing to have.


Don't even have to travel far. We live in "golden horseshoe", the greater Toronto metropolitan area with 10m people and I am shocked how many towns have horribly poor service (and it has also made me realize how inefficient / data hungry most modern messengers are).


Yep,

Ikea in Ljubljana (slovenia), vilesse (italy) and klagenfurt (austria)... no mobile reception inside. They have wifi, if you remember to connect to it.


Heck, there is no internet a in corner of my kitchen; a couple of steps to the side, and there you go, full 4G coverage.


Those are country development and political choice issues. Last time I took the train (this weekend) it provided a free wifi access all long and I hadn't any cut. I can't remember last time I took a subway and hadn't 4G connectivity. Oh, actually I do, it was in Paris but I wouldn't dare using my iPhone there anyway. There technology is there but Western governments at all administrative levels are more busy bullying their citizens than providing comfort options like East-Asian administrations and companies are.


>[..] but Western governments at all administrative levels are more busy bullying their citizens than providing comfort options like East-Asian administrations and companies are.

I love complaining about the lack of effiency in the EU and the West as much as the next person. But you're comparing the West's "bullying" with a region in the world that consists of quite a few harsh dictatorships that do lot worse than bullying. I take a sketchy phone signal on a train over systemic human rights "bullying" any day.


That buildings shield reception is not a political issue but a physical limitation. When I'm in my super market I know that I will not have reception in the back, same in the underground part of my fitness center.


Apparently it's not. I live in Finland and I've never had my internet connection cut off in a supermarket (or any other large building). This has happened to me sometimes in Southern Europe, though. I don't know if they boost the reception somehow in large buildings here or if the building materials are just somehow different.


It's mostly your random good luck and other people's random bad luck. A lot of building design choices can result in severe signal attenuation in microwave range, and not all buildings will have internal femtocells to cover that - or femtocells that accept your SIM.

Also, some buildings change over time in how they attenuate, especially freshly built ones where the walls are still "drying" can have close to 0 reception inside. When my parents built their current home, I had to keep an informal map of where the signal was strong enough to use GPRS (yay 7s ping in MUDs) and for voice we usually went outside.


I'm not moving to another country to get better cellphone reception. What's your point?


Their point is probably that improving the reception in your location is a political rather than technical issue. I.e. it can be solved.


In my experience, subway connectivity got markedly better when the city hosted the Olympics and spent time/money attracting tourists. This was in line with more open wifi access or sim card availability.


> Offline-First is a software paradigm where the software must work as well offline as it does online.

I've been building offline-first apps[0] for quite a while in both desktop and mobile space.

I have a different definition[1] of what an offline-first app is:

Offline-first apps are apps which can run and function completely offline or without needing the internet for an indefinite amount of time. To offline-first apps, providing all functionality offline is the "primary objective" and any online functionality such as syncing to cloud is secondary.

Also, I personally don't consider an app using temporary cache to store data to be an offline-first app. It must use a local database. Sometimes the "offline-tolerant" apps are confused with offline-first apps. Offline-tolerant apps often provide partial functionality and eventually need an internet connection to sync data.

[0]https://brisqi.com

[1]https://dev.to/ash_grover/how-i-designed-an-offline-first-ap...


> Also, I personally don't consider an app using temporary cache to store data to be an offline-first app. It must use a local database.

From the website (https://rxdb.info/adapters.html)

> react-native-sqlite

> Uses ReactNative SQLite as storage. Claims to be much faster than the asyncstorage adapter. To use it, you have to do some steps from this tutorial.


> This can be either done with complex caching strategies, or by using an offline first database (like RxDB) that stores the data inside of IndexedDb and replicates it from and to the backend in the background.

This skips completely over the simpler options of not having a server at all. I guess because this is an ad for RxDB.

Edit: to be clear, I'm sure this paradigm is useful in many applications. But it strikes me as odd that something called "offline first" doesn't seem to include the possibility of software that runs entirely on one's own computer.


It doesnt have to replicate. Thats optional. Can work 100% offline.

The RX here means it is reactive - i.e. you can subscribe to state and react to it. This is how it updates the display across independent tabs when the data changes, for example.

THATS the main point of RxDB, the sync is just another feature (which you dont have to use and isnt even a default).

EDIT: Im nothing to do with RxDB, and dont use it - but i have investigated it previously.


That’s one type of software, but a lot of software these says is collaborative, with the need to share that data around. Even software for personal use is expected to be able to sync your content between the multitude of devices people now use.


no, i am with you.

i couldn't tell if they are being serious or not. they probably are, which is kinda depressing. "offline-first is a software paradigm where the software must work as well offline as it does online."


I agree it sounds weird, but isn't "offline first" a term specific to web apps? It's practically impossible to make one of those without a server.


Not specific to web apps, but usually only used with respect to them.

“Traditional” applications on desktops/laptops/similar where offline-only, some later getting online sync as optional features, so offline-first doesn't need to be stated as it is the assumed default when something isn't offline-only.

Web based applications started out online-only, only later in their evolution sometimes getting the ability to work properly offline. Offline-first is still an unusual property, and may forever be, so gets mentioned where it is relevant. Many are simply “works offline” where offline operation is bolted on as an afterthought and may not be at all optimal (for instance in the case of two edits to the same object the last to sync automatically wins, clobbering the other with no attempt to merge or branch and no notice that this has happened or is about to happen, and no care given to the consistency of compound entities when this happens).

Apps for phone & tables fall between, so the matter is more vague. I have heard offline-first in reference to them but usually they are either online-only or “works offline” (offering little or nothing more than buffering changes until a connection is available). Some, particularly games, are like traditional apps (offline only or offline with some sort of online sync/backup).


I'm not sure, this article is my first introduction to the term and it doesn't say the term is specific to web apps.


There is a lot of movement in the offline-first/multiplayer space at the moment, after apps like Linear [1] & Figma [2] have pushed the paradigms.

[1]: https://linear.app [2]: https://figma.com

Some other projects which will help you implement the pattern that are worth checking out:

Replicache [3] - real-time sync for any backend. Works via simple push and pull end points and is built by a small team of 3 devs with decent browser xp (Greasemonkey, Chrome, etc)

Logux [4] - a client/server framework for collaborative apps. From Evil Martians, well known for: postcss, autoprefixer, browserlist etc.

[3]: https://replicache.dev [4]: https://logux.io

RoomService also used to be in the space but recently left it to pivot to something else.

The largest problem you’ll end up solving is conflict resolution so having a good understanding of the tradeoffs involved with your (or the underlying) implementation is key.


The creator of Logux etc. is also known for forking a package and erasing git history: https://news.ycombinator.com/item?id=28659838


I haven’t used linear can you expand on what they do that pushes the paradigm? Gonna try it this weekend.


I have noticed that what we are missing in the Frontend space is some common solution for caching and reshaping data from the api. In every codebase I have worked on we just implement this ad hoc.

We have agreed upon solutions for: Api client (fetch, axios, react query, etc.) Ui state management (redux, mobx, xstate etc.) Ui frameworks (vue, react etc.)

But theres an intermediate piece between the api layer and the ui state that is always implemented ad hoc. We need some common solution to caching the api data, reshaping or querying it, and managing multi api workflows (for example a synchronous process where the frontend calls multiple microservices). Most answers to this end up being something like “get ur backend team to implement better apis” but thats not realistic in many cases.


IndexedDB technically does this rather well unless I’m crazy, what a WebWorker really needs is a “am I even online” mode, the event handlers are one thing but browsers dont convey online state very well is what I run into. If you build a PWA though you get up to a gig of caching you wouldn't get with normal frontend DOM APIs.


Yes! Every. Single. Time. Apollo Client is the closest thing I have used with that works decent out of the box.


yes to 'reshaping'

frontend schemas differ from backend schemas because they need to support sync, but it should still be possible to inherit the backend schema, transform it in predictable ways, save a lot of work

'full stack schemas' would be a 'pit of success' change IMO


GraphQL (coupled with get ur backend team to build it well) addresses these points. If you're a real zealot, you can use GraphQL to access and modify app state so all data operations across the stack have the same shape


I think modern solution is https://react-query.tanstack.com/

Haven't tested it though yet.


OK, you've compressed all your problems down into one problem - unfortunately that problem is bigger and more complex. How do you safely merge divergent changes to this shared uber-datastore? By breaking down all the operations my system supports into a set of simple REST endpoints, I (hopefully) simplify the space of things that could happen and get a good sense of which combination of requests I might have to support. If the space of user operations is instead "any database query", that sounds simpler but it's actually much harder to deal with.


Exactly. State management is the hard part. There are huge merits to having it all in one central data storage layer, where some smart backend code can choose the storage method best fit for the type of data (SQL/ACID, or KV-store, or ...).

Having a datastore in every device, that now need to sync is giving me headaches already. Syncing is potentially very hard.


Yeah, and how would you perform validations that shouldn't be client side?


You perform those in bulk, as a batch job when the client side pushes the state up when back online.

After all, if a client is offline, nothing they do can affect anyone else until they go back online.


And now one needs UI to show that all the forms one thought were saved are merely sent, and come back a "tasks" that still need user attention.

What was simple form validation is in "offline first" something akin to a mandatory content moderation step.


> Single page applications are opened once and then used over the whole day.

oh if only that were true. everyting is SPA now. imgur is a place where you go to see an image, then close. how many megabytes of javascript is it today


They have two use cases. You can use it to just click a link and close but the use case they go for is a full social media like reddit where the full features of an SPA make sense. Drive by viewers are probably lowest priority as they do not generate much revenue.

They were forced to go down this path as it was clear reddit would add their own image hosting and remove the need for imgur so the site would have to be self sustaining.


I just think that path is lack of imagination


If I have to count the number of hard-refreshes I do per day to get a SPA unstuck...


I just bought a home in "rural" Texas only 1 hour outside of downtown Houston.

Cell service is definitely spotty out here. The final selling point on this home was that it's serviced by AT&T Fiber. So I have gigabit internet. If AT&T goes dark for some reason -- I lose power, they lose power, someone cuts the line, whatever -- then I can't access the internet and can't call anyone unless I drive for a few minutes.

Offline first is a good thing to have but we do not have "better mobile networks" and "no internet" is only a rare case if you've stuck your head in a city.

At my folks' place on the other side of Houston there's zero bars of cell service sitting on their couch. But if I stand up then I get full voice and data cell service.


Offline first is a dream to me, I build a big note-taking app (midinote.me) which is 100% offline, but now the biggest pain point is the full text search, yes, we can use DB like this and PouchDB to store data, but currently, there isn't a good solution for full text search in javascript, I tried lunr.js the performance is poor, and researched FTS by sqlite, it don't support Chinese, I ever considered pack the lucene (on JVM) with Electron.js( the desktop wrap on JS) on desktop, I'm not sure it is a good idea, now I am going to re-think all these things, and considering give up offline and switch to server side full text search, it will save huge effort comparing to client-side search!



Yes, I used it, generally, on JS platform, the performance is not so good to do FTS, especially when document is long and the data getting huge.


Kudos on building offline first software!

Just spitballing here: maybe there’s a way to use Redis (and RediSearch) compiled to WebAssembly and then use it on client side?


Thanks, this solution might be better than mine, it don't need JVM


I’m in the same boat with my app. Have you tried Elasticlunr? It claims to be significantly faster than Lunr.


I ever researched this lib, I haven't any data to support my opinion, but I don't have too much confidence on it. Also, this repo didn't update for a while.


Yes, I am on javascript stack, maybe using other language client side FTS is easier? I don't know


This article has a lot of good points, but the UX is in my opinion terrible: UI elements are moving across the screen without the user of that screen doing any action.

We've all experienced this: you want to click a button (or select an input field, ...), but right before you do it moves away. Maybe something finally loaded which pushed the content down. Maybe some content was synced in (as is the case in the UX examples of this article).

The solution is, afaik, well known: add a UI element (that obviously does not move other elements) that informs the user that "new information is available".

For instance GMail's yellow drawer informing users (a) new message(s) is available in the thread.


There are a lot of apps that would benefit from being offline first.

Apps like Notion feels quite sluggish to me on a higher latency 15mbps. And Figma is pretty bad on a decent 3g when loading files on a fresh load.

Building a proper syncing solution isn't that simple especially when it's for multidevice and requires conflict handling. Replicache looks quite good to me but I haven't found any similar solution that's opensource with MIT/GPL license.


Surprised this article hasn't been shared in these comments yet as a corollary:

https://www.inkandswitch.com/local-first.html#crdts

I regard it as the definitive exploration of local or offline first software. They end up building an offline-first Trello clone which can sync with peers locally or on the internet.


One worry I have is with sensitive data e.g. HIPAA protected information. How do you guarantee that the data is safe at rest. Are there encryption at rest options? With online-first applications, you can just expire the sessions after x minutes and lock users out once their permissions change. How hard is it to do something like this in practice for an offline-capable web-app?


- You can do the encryption at rest on the OS level, e.g. with LUKS. Cloudant does that.

- You can also lock out users in Offline-first apps when a session expires or permissions are removed. Not difficult with CouchDB and certain libraries.


Encryption at rest is generally easier and more secure than encryption in flight. It eliminates whole classes of oracle and downgrade type attacks.


You can do offline first by installing CouchDB on the client side and using that to store user data. This only works on a desktop PC right now but for some apps that's a much better approach.

My own testing had my web browsers choking up when I had a few thousand documents stored in the browser's IndexedDB. This is on a 10 year old Mac Mini with 8gb ram. Could be that newer PCs do better but I doubt they'll do much better.

Using CouchDB with PouchDB.js provides a "Live Sync" option that syncs data both ways and that feature works very well with the apps I've made which do not have 1000s of users accessing the same DB. In my case there are probably not more than a dozen users accessing the same DB. And in my case there is not much chance more than one user is modifying a document at any given time.

Also, in my case, there is no "backend logic" being processed. That's all done in the user's web browser.


Did you try CouchDB's full text search? I red the doc it's using Lucene underlying.



I experimented with ObjectBox a bit, but I decided against using it because it isn't open source. The language wrappers are apache2 but the core library is only provided as a binary blob. I think this is bad because you can't inspect how it works, fix bugs, port it to new archs, integrate it fully into your build system, etc. Also, every cursor operation must call into their function entry points (as opposed to being inlined in your code for example).

Instead, I've been working on my own library which also uses LMDB and flatbuffers. It's C++ only and still a WIP, but in case anyone else is interested, it's here: https://github.com/hoytech/rasgueadb


Thanks for sharing your experience with it, makes me wary.


> With them you write all this lasagna code to wrap the mutation of data and to make the UI react to all these changes.

Yeah, and allowing developers to just access state from anywhere and mutate it doesn't generate lasagna code at all.

The issue so far with RxDB I see is how silly complex the syncing gets. You just see it doing its thing and hope whatever payloads it sends are optimal for your use case. And while offline-first seems neat in theory, it's not necessary for most web apps IMO. For desktop or mobile apps it's a different thing, but they have other options too that browsers don't allow.


Meteor uses `minimongo` on the client which sync's with the server; effectively doing the same thing. It's actually amazing how much better the UX is when you have spotty coverage.


Jake Archibald (who is an amazing presenter and you should definitely seek out) has a great PWA-oriented offline-first presentation [1]. It uses service workers but the underlying strategies are universal. Highly recommended if you are just getting started with offline-first.

[1] - https://www.youtube.com/watch?v=cmGr0RszHc8


I think Oracle APEX works on the same premise? Store locally in indexdDB and sync up when the connection is back on-line. No need for difficult programming, APEX does this out of the box .

Anyhow, a way to force this behaviour in APEX is to make every user interaction a write action on the DB. This way you either save locally or to the backend (but you don't have to worry about the sync between the two).


I agree that handling offline devices is important. I've tried before an "offline first" database before, and it ended up really difficult to feel like an online and synced app. So I think I'm going to have to go with some caching system.

I'm hoping this thread produces some good choices. Ideally, other than account creation (which seems like requiring internet is fine), everything caches automatically and syncs magically. Offline account creation would be great, until you want to verify people's email addresses or use OAuth


> Ideally, other than account creation (which seems like requiring internet is fine), everything caches automatically and syncs magically.

This is pretty much how an offline-first app using rxdb (or just PouchDB) together with PouchDB works.

Offline account creation? Just let the user use the app locally without creating an account.

I only have two pain points with a CouchDB/PouchDB setup:

1. Technically, a user could post some garbage into his database after obtaining a session token. Design docs can help, but still cause overhead.

2. Only the "one database per user" approach properly ensures that each user can only access his private stuff. But then, querying information across users always requires to write some script that fetches the info from each database and aggregates them - instead of making one simple query.


Give me a free solution to replicate a SQL database between mobile, web and server and I'm in. Something like Couch / Pouch DB but SQL.

IMHO reliable replication is one of the hardest thing to code right.


I guess on iPhones (safari engine, so all browsers) it doesn’t work at all or it the aggressive storage deletion makes it unusable.

Of course, because Apple wants to make the user “want” an app.


I work in ecommerce/erp space and I need something like this, but not JS (I using Rust now).

I tried in the past build my own but is truly hard!. I make a "event log" database but it sometimes could become so big that fill the server disk, and most events become uselles after a while. So figuring what is truly an "event is not as obvious.

Exist a paradigm that could work here? I using sqlite + postgresql (and can't move to any fancy nosql).


Can't you use SQLite in the way that's proposed here? Or Firebase (Firestore / Firebase Realtime) with persistence enabled? [0]

[0] https://stackoverflow.com/questions/61315478/what-is-the-bes...


A bit offtrack - does anybody know how rxdb differs from PouchDB?

https://pouchdb.com


It does not, it builds upon it.


Offline first is interesting, but what's a good solution for syncing with the main database after the device goes back online?


CouchDB / PouchDB(the JS compatible cousin) make use of a global sequence counter to track which documents have changed between sync.

Each database uses an arbitrary byte string to mark a position in a sequence of updates to the database. Each document has the sequence counter of when it was created/updated/deleted. This sequence counter only matters to that particular database (it does not need to be mirrored, it's just a local ref. of WHEN the doc was modified in that particular database).

Syncing is then a process of looking up the last read sequence counter from a checkpoint document (i.e. what was the last modification) and passing that sequence counter to the `changes` endpoint to get a list of all documents with a sequence counter AFTER that, and then pulling/pushing those documents to the local/remote database, and saving the new latest sequence counter.

The official docs [1] give some more info. One key point is if I modify a document several times between syncs, it will only show once in the changes feed with the latest sequence counter for that document. Couch's conflict resolution strategy is a topic for another time, but an interesting one.

1. https://docs.couchdb.org/en/3.1.1/replication/protocol.html


It’s an open question and ultimately depends on your app. For our mobile app we try to merge simple changes automatically and for trickier ones we allow the user to merge or select changes with a UI, a bit like git. We try to keep our objects small and granular enough that this very rarely occurs. For a lot of apps last write wins (e.g compare and always take the lastest timestamp) is enough and probably the most common solution you’ll see but you have to be OK with a certain amount of data loss. Newer, fancier CRDT based merging solutions are on the horizon, like Automerge for example but I feel they’re a little way off mainstream adoption.



Get all entities whose updated_at is larger than the max updated_at in your local database.


Then what if the schema changed? Columns renamed, dropped or new columns added that is non-nullable, or new foreign keys with non-null constraints?

The only place I've seen "offline sync" work is where MongoDB was used server-side for the sync, which then gets synced with postgres - in this case the models are forced to be the same.

But then you still have the problem of which side to sync first: postgres to mongo or mongo to postgres? What if a conflict arises? And having a device turned on after being off for a month WILL cause issues.

I don't think there is a real solution here yet. At least not one that is perfectly automatable.


At least for my pet project I am considering to add a version field to my documents so that I can migrate the document to the most recent version before persisting in to the server's database. When sending documents to the client you may have to force it to update to the latest application version though.


True.

For simple apps with not much logic, you can get away with different front-ends/offline-syncs adding their own rows, and then only respecting the latest row in the main db, usually based on timestamp. But if you have something more complex where multiple workflows get triggered as side effects, then you may run into trouble when the latest row is in fact not the truth of the reality that we wanted, so there is still some risk ending up with states or workflows based on a false reality.

This can become a nightmare if you a have few scheduled events/tasks/crons that does things periodically, thus the only way to mitigate that is to fully embrace eventual consistency and idempotency, and the "easiest" way to get there is to embrace the actor model paradigm (see erlang, akka/akka.net, Orleans framework, F# mailboxes, go-routines, etc).

Point being, comparing two versions of applications against each other is not enough - you may need to version your data too and use timestamps to make a final decision on which version is truth. You may also need to set or build a tolerance system to say it will only sync "old" data if within x amount of days, lets say less than 7 days old. And so on.


That's a really simple solution. It doesn't work for all kinds of data though. For some data you might want to have a more elaborate conflict resolution (e.g. manual merge or using smart data structures like CRDT).


Earlier this year I experimented [1][2] with combining CRDTs using the incredible yJS[3] with PouchDB. It worked really well, completely magic syncing with full automatic handling of sync conflicts! (Although I have concerns about the sustainability of PouchDB, see my other comment[4])

1: https://gist.github.com/samwillis/1465da23194d1ad480a5548458...

2: https://discuss.yjs.dev/t/distributed-offline-editing-with-c...

3: https://yjs.dev

4: https://news.ycombinator.com/item?id=28690886


I think my main problem with PouchDB and by extension CouchDB was that it seemed hard to add validation in the backend (including authentication/authorization). I remember having to build some kind of proxy that hooks into the CouchDB protocol to deny certain requests. I am pretty sure that's solved by now (or I was just asking the wrong questions back then).


That was a problem I always had with replicating directly to CouchDB. They have added more authentication methods now, like proxy auth and JWT, so authorizing on a per-database basis isn't too bad.

However, I gave up on CouchDB after my server kept getting hacked by crypto miners. I'm sure whatever exploit they were using has been patched, but I'm hesitant now to use a DB that's open to the world.


If your CouchDB was open to the world, then that's definitely a configuration problem.

Sure, earlier versions shipped with "admin party" enabled by default but the docs made it very clear to not do that in public.


It's possible to use the same design docs both for client- and serverside validation. They don't look pretty, but maintaining them in readable JS and deploying them via CI works fine.

Apart from Proxy Auth and JWT, just using basic Auth/session + a backend like Superlogin works for simple use cases.

But sure, you'll want to set up rate limits etc using something like HaProxy once you have actual customer data on a CouchDB instance.


I've briefly looked into CRDTs, but I have to ask, beyond a toy-implementation of a TODO list.... how much do they balloon the size of the data stored?

Particularly for a complex document like a report with hundreds of fields and arbitrary sized lists for comments/observations?


I think the most common CRDT libraries try hard to reduce the overhead of CRDTs. I am not expert but I could imagine that you could also remove some of the historical data if you know that all clients are reasonably up to date and if they aren't they have to discard their changes that are too old.


This is cool, I hadn't seen it until now. For those interested in a similar but more distributed option, hypercore-protocol will soon have multiwriter primitives!

https://www.youtube.com/watch?v=7S-D4yY1H48


So communicate with server exclusively with database sync instead of explicit requests to change server state?


This is a common problem in utility work, like meter readings and replacements. lots of times there is no good internet, basements and cages. We solved this years ago by, all business logic has to be at the front end. Load all workorders upfront. Exchange data when possible.


And local second? So my Nest app doesn’t take forever to load if it can just access an API on my Lan?


Tabbed examples work great - but how do I multi-tab the browser on my phone? Okay, it is possible, but the use case seems improbable as it is hard to share that limited screen estate and my finger and a soft-keyboard with splitscreen browser tabs.


Hey i'm all for doing away with loading spinners, but indexbd is not a catch all solution -- terribly slow for frequent simultaneous read/writes, better suited for other types of data access


I use Pouchdb local and Couchdb on server and it automagically keeps all clients in sync.


Does anyone know of something similar for native Swift (iOS/macOS), that can work with a standalone backend (not CloudKit/Firebase)?

Last time I did something similar I had to roll my own Rails API and client-side cache.


'lasagna code' as a description of writing redux is a good one


I can't access my Bitwarden account without an Internet connection. What if their servers got wiped or hacked? I need a local store of the passwords so I can export them in that scenario.


git --local-is-everything

The job of the server is syncing clients and not keeping the data way from the clients. Good examples are Git or any reasonable mail-client like Evolution or K-9. Bad examples? The GMAIL-Website and especially the GMAIL App, if you scroll too far you have to wait and hope that the servers are reachable and working.

Old school programmers know how to read- and write files on the local file-system and load it into the free store. Things work fast, stand-alone and reliable. But this people are more expensive.


After having used RxJS, redux-rxjs, and some other "Rx" related libraries I don't remember their names anymore, I'm not touching Rx[anything] ever again.

I still have nightmares.


> On offline first applications, there is always exactly one state of the data across all tabs.

Perhaps between tabs, but what about multiple devices/browsers?


True "Offline First" would be an app that I can install, and not just yet another JavaScript framework ~_~


So basically the browser has become a sandbox environment :)


This is the opposite of Elixir's Phoenix LiveView...


RxDB seems very cool and perfect use case for my dayjob.


> To manage this complexity it is common to use state management libraries like Redux or MobX. With them you write all this lasagna code to wrap the mutation of data and to make the UI react to all these changes.

ignorance is wisdom.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: