I've found even apps that support downloading content for offline usage don't work properly offline. They do stupid things like sign you out randomly, refuse to work without a connection, put un-dismissable banners about not having connections over controls, or hang in random spots trying to navigate their UI.
I'd settle for offline-second apps that, when ostensibly having the ability work offline, can work with any sort of designed or considered experience (dismiss banners about no connection, don't hang randomly, etc.)
Yeah, it's weird for Spotify to have all these loading spinners when I'm offline and just want to pick from my downloads folder.
Back when I used Spotify, I would navigate to where I needed to be while I had wifi to prepare for when I had no signal, like before a run or plane flight. Else it would often just fail to load at all.
Youtube and Youtube Music at least have a deliberate "Downloads" tab which is what I want when I know that I'm offline and want to browse my downloaded media.
You are not alone. I live in the mountains. The primary reason for downloading is to have music in low/no connectivity. It seems worst when I have a bad connection, and it just can't deal, wanting to download the whole thing again. I swear, they don't ever test this stuff in anything but ideal California reception.
Spotify is a Swedish company. And yeah, you just need to go to a lake a bit at the edge of a metro area to feel the bugginess of these apps with intermittent connectivity
Big facts. I don't leave the house much these days. Recently I went on a 30+ min drive to visit a friend. From a super dense urban area to a suburb that is growing into a dense area. And I legit had no internet for like 4 minutes. Obviously it'll be worse for non densely populated area
I mean, there is a use case. You have access to WiFi at home/work, and expensive or slow mobile data. So you can transfer the music when using a better connection and just check licensing in realtime.
But it does still suck. I'd imagine the main reason Spotify choose to limit it like that is some requirement put on them by the music labels.
As I said, it does suck. There are use cases where it helps, but it's much rarer than just letting you listen offline and checking once a week or whatever. I doubt it was Spotify, as opposed to the music labels, who thought it was a good idea to check constantly.
Thats (mostly) not offline-first. Thats offline-resilient. Theres a big difference. Offline first always hits the cache for its data so should provide the same exact navigation experience with or without connection, but the data is sometimes stale. Offline-resilient has a bimodal flow where it usually operates normally, then switches to the cache when connection is dropped.
The "no connection" banner is usually still present but shouldn't block UI.
Offline mode adds significant complexity to the service, since you are essentially replicating your entire server side stack (logic, storage, etc) on the client.
Offline mode adds a cost to every new feature, even if it's just deciding if you are going to support said feature offline.
If you're dealing with sensitive data, it also means you need a way to secure that data on the client device.
I'm not saying apps shouldn't have an offline mode, but software engineering resource isn't free and most businesses would need to justify the cost of offline (even if it's just an opportunity cost).
He said "software engineering resource isn't free" so online-only may require simpler and cheaper solution. For business point of view, that architecture is considerably great, not poor.
As someone who have done offline first apps, my default will be to make it all online unless it seriously impact the user experience. The complexity trade off is not worth it in most contexts. At the end of the day, it really depends on what your apps do.
Yeah offline first is absolutely an amazing user experience, especially once you start really taking advantage of preloading/eager loading. But it is far from free. The backend services and data modeling will be a large factor in how much you can do on the frontend. Conflict resolution is potentially an unsolvable problem. I dont think you need to replicate all backend logic on the frontend, but you likely do need some of it, which as you said, adds more cost to features (and usually has potential for bugs if you aren't in the same stack on frontend/backend).
The frameworks for offline are few to nonexistent, and of those that do exist they also seem very new and not battle tested in production. It’s a hard problem because there’s more states than online and offline. There’s also “a connection is detected, but I cannot connect to anything” state
One I’ve used before is BreezeJS[0], and it’s been around for over 8 years, maybe longer. It’s a data framework which takes care of managing dynamic object graphs, providing ways to query, save, and track local changes.
> Offline mode adds significant complexity to the service, since you are essentially replicating your entire server side stack (logic, storage, etc) on the client.
Not necessarily true if you can run the same code on the client as you do on the server.
That’s doable, but it’s not without its own issues.
Plus, you frequently don’t want to architect your client code the same as your server code (you’re not going to run a bunch of micro services on the client side for example)
Not necessarily. Most database libraries support SQLite as an option, and if you must use PostgreSQL or MySQL, then having a container app solves those purposes nicely. Personally I'd like to see more applications move to file based storage, which scales more easily on distributed storage systems.
File systems can be used similar to a database with user locking as well. I'm not suggesting it doesn't come with new challenges but the apps I've used that write and read from files are pretty phenomenal and much easier to scale.
Files store information. Databases store information. Structuring data into folders and files to store data is not much different than structuring data in a file to store data.
App and website are really interchangeable here. Both are just clients.
You can absolutely implement offline in a web application (Google drive does this). Obviously, you need to be online to get the app in the first place, but once it’s cached, you should be good to go.
Again, it’s a cost-benefit analysis for the business.
The most frequent reason for offline mode I heard is "using app while being off grid" which is quite ironic to me since it's when people suppose to do something else. This excludes offline-only apps like calculator.
I tried coding on a plane, it's super unproductive. Off-grid, to me, means reading book, playing sport, enjoying nature etc. No internet, no work. The time in your entire life without the internet connection is so short really.
I've adopted this approach with my own app [0], and there have definitely been some challenges, but ultimately I'm glad I made the decision.
A lot of the tech in the web world is focused around the traditional client <-> server / API model. Building a local first app feels like you're going against the grain in some ways and you're often not able to take advantage of the latest and greatest tech.
In my case I am using PouchDB as a local database, which sync to an instance of CouchDB on the server. I think there is room for improvement with this combo, but for the most part they are a mature, battle tested combo that works really well as an offline first sync solution.
I’ve tried PouchDB before but only on “toy” projects not deployed production. What challenges have there been with PouchDB or other offline-first techniques?
I like offline first app design, and have built several. However, I don't do it anymore unless there is a very clear use case and direct customer benefit. Some apps just don't work offline at all.
One of the decision trees is you often have some API needs and can't just live within the pouch db database. Particularly account creation and such. It may need to plug into other services and initiate them, you could use a database as a message queue, but it's awkward compared to an API.
The couch/pouch models gets somewhat more complex when you leave the one db per user model. If there are shared data buckets, the database access patterns require a good deal of thought and experience with the couch db auth model. Particularly it's limitations.
Offline access also requires thought around sensitive data. If a user has had their access revoked, an offline first app may preserve it which can rub against security requirements.
Initial sync can be very slow if you're dealing with a lot of data. This can be a poor experience if you make a new user wait minutes to get started. Or you can also build a backend so they can start immediately and then switch to the local copy after replication. I've build this kind of system before.
There are advantages to the route. Minimal data transfer from the server can represent lower load for your services. You can also do an offline only model for data, potentially lowering cloud costs for the number of users you serve - a definite boon for free tiers.
The replication is pretty slick and quick, web socket like snappy data without web sockets. Using tools that do most of the heavy lifting for you and you mostly have to just plumb correctly.
My startup built an SDK framework that’s allows companies to build offline apps that can even sync over a multihop mesh network. It’s powered by a CRDT to handle the conflict resolution.
From my experience most of our customers are not mobile developers (they’re JavaScript developers asked by their bosses to build mobile apps) and don’t even want to spend time learning coredata or room for iOS or Android. So most of the time they just stick to what they know, HTTP requests.
Luckily we made our SDK easy to use so that these JavaScript developers can get both the network communication and the offline first caching in one product. They absolutely love it.
Most apps aren’t offline first because it’s so hard to build the infrastructure to pull it off without a lot of bugs. Most apps focus entirely on the UI code layer. If infrastructure and frameworks made this a lot easier to use, I bet offline first would be a lot more popular and users would be a lot happier.
Are there similar services to your company's (https://www.ditto.live/) out there already? I haven't had the need for this yet, but I can see your product being pretty damn useful
I am sure it is but the good old sign up and we’ll tell you how much this magic costs is an marketing pattern that I don’t trust. If you can’t be transparent about price I don’t feel confident in giving you my info.
Regarding pricing, we are actively working on a general available pricing for our entire system. This is going to take us a couple of quarters to operationalize everything; we have a pretty good idea of what the pricing will be for the everyone-version.
Right now we have our enterprise pricing nicely figured out. This might sound a bit strange but unlike many startups, we actually went to market backwards by focusing on the enterprise first. When we started the company, we had this crazy idea that we could build a distributed, query replicated, database that could sync over a mesh network. No one had ever built anything like this and we had some skepticism that such a product could actually generate revenue to create a sustainable business. Eventually, most infrastructure or database startups survive in large part to enterprise deployments. So we sought enterprise use cases and dollars first. And I have to say that was totally worth it and we've been rapidly gaining customers here.
However, in our hearts, we are all frontend web, mobile, IoT app developers that all wanted to build collaborative apps like iMessage, Figma, Miro, Trello that had offline, sync, and conflict resolution capabilities out of the box. Once we make a couple of product and ops advancements over the next could of quarters, we will have very clear pricing like any other service out there. So just hang tight!
Want us to get to pricing faster? Definitely recommend some of your infrastructure, kubernetes, and rust distributed systems developers to join us. We are hiring hardcore!
Ditto is both an embedded + cloud database + a mesh network, it's really 2 startups in 1.
A) If you're looking for just offline-first and data sync, there are some company's that have done this
1. Realm (MongoDB) - this is the company that my CoFounder and I came from.
2. Firebase (Google) - one of the biggest inspirations for me personally in data sync.
3. Supabase - a very popular growing open source Firebase alternative
4. All the GraphQL Backend-as-a-Service like Prisma, Hasura etc... These have offline caching with a lot of the GraphQL client libraries. I don't think actually have a database underneath the hood that you can query.
B) If you're looking for just mesh networks:
1. Build it yourself using Bluetooth Low Energy, Local Area Network, P2P Wi-Fi Direct, Apple Wireless Direct, Wi-Fi Aware APIs that come with most of your device frameworks. Build an advertising system, a common communication protocol, and add your identity security system. If you want multi-hop, you'll need to create a dynamic routing and presence system on top of it. After that design an API to send data around, respond to errors. If you want offline-first you should research CRDTs and try to build a database replication system using the mesh network.
2. You could use Apple's Multipeer Connectivity framework: this is iOS, MacOS devices only. No multi-hop here but you can build a system on top if it. One thing I've noticed is Apple's framework is a ruthless battery drainer. My phone gets very hot after a minute. It doesn't look like it uses Bluetooth Low Energy and it's advertising system seems to be extremely aggressive
3. Google has an abstraction called Nearby Messages that uses Bluetooth Low Energy. It isn't very stable but you could try to trick it to re-establish connections. After that you'll want to investigate how to pull off multi-hop. It's the same as step 1 and 2 https://developers.google.com/nearby/messages/overview
4. There was a company called Hypelabs that offered mesh network solution, but not the offline-first part. I'm not sure what's up with them
5. There's another company called Bridgefy https://bridgefy.me/ that built a chat app used in some of the Hong Kong protests
6. Open Garden also had Firechat in 2014
Ditto is a combination of both families of problems, it's basically creating 2 startups at the same time (mesh + distributed database):
* Offline first embedded mobile, web, IoT database called the small peer
* A large distributed database in the cloud called the Big Peer (this is new and what we need to operationalize for general avaiability pricing)
* A replication engine that uses our mesh network powered by Bluetooth Low Energy, Local Area Network, P2P Wi-Fi Direct, Apple Wireless Direct, Wi-Fi Aware
The problems that we have to tackle are so crazy; network optimizations, compression, multi-plexing, conflict resolution, scaling on the edge and cloud etc.... It's like the product that we're trying to create is teaching us as we build. For example one of the challenges that we have now with multi-hop is scaling performance. A large mesh of 1,000 devices may chatter so much just on the distributed routing table that it can cripple the replication of the actual data! So we are trying novel ways to dynamic route data by also incorporating special characteristics of CRDTs so that chatter is reduced and performance increases. Other major things we will improve are ways to prevent denial-of-service attacks even with trusted actors, decentralized access control of data, graph centrality theory etc...
Regarding use cases?
1. Well anything that's latency sensitive is perfect for us. Think controlling robots, syncing whiteboard pen strokes across devices, games, VR+AR.
2. Industry wise, any place where _any_ issue to internet connectivity means a loss of money, life, user experience: aviation, hospitals, point of sale, education, manufacturing, defense. A lot of our customers have internet 99.9% of the time but even that 0.1% is a nightmare that causes great issues.
> 4. All the GraphQL Backend-as-a-Service like Prisma, Hasura etc...
Just wanted to quickly drop in to clarify that Prisma is not a GraphQL-as-a-Service tool any more but an ORM that gives you a type-safe JavsScript/TypeScript client for your DB and a migration tool. The main differences between Prisma 1 and the Prisma ORM (i.e. Prisma 2+) are explained here: https://www.prisma.io/docs/guides/upgrade-guides/upgrade-fro...
You see how hard it is to change perceptions? This is why people spend so much time on branding for developer tools, databases, and infrastructure companies.
I'm not sure if you know what CRDTs are but they're a family of data types that allow different actors in a distributed system to edit data concurrently even during network partitions. If enough data is shared, they will deterministically agree on the same value. They kind of give that "google docs" behavior if you're looking for an analogy. They're perfect for peer to peer and offline-first systems.
However there is actually more to it, and a much more detailed write up is coming soon. Ditto is a distributed database, each peer has it's own database. The database is organized into collections and each collection is a Ditto Document (this does not work like most NoSQL document databases). Each property of the document is it's own CRDT, you as the user can pick which CRDT you'd like to use, our current catalog includes:
* Registers (Causal Last Write Wins)
* Counters (sums of each writer's numeric values)
* Binary Attachments (same as a Register but you can put large arbitrary data like say video files, images, PDFs whatever)
* AddWinsMaps (coming soon) - This type allows for concurrent upserting and removing of values based on a key.
* ReplicatedGrowableArray - this is an array type that allows for concurrent insertions while preserving some semblance of order. It behaves rather closely to a collaborative text editor merge behavior.
Our AddWinsMap and ReplicatedGrowableArray are more special than you might think. They can actually host nested CRDTs. Think of it like a folder within a folder in Google Drive that can hold synced documents nested within.
I'd love to show you over perhaps a call! We tend to be perfectionist when it comes to documentation and have been so busy that we haven't fleshed it all out. Perhaps we might just open source our CRDT system.
Email is in my profile, I love chatting and sharing about this stuff!
A CRDT is a way to solve multi-leader replication without having the application code resolve conflicts.
This is what is required to build an app where any instance (node) can be offline for an arbitrary amount of time, but still be able to share state with the rest of the nodes when it's reconnected.
To implement this, every application node keeps a vector clock per register (an atomic piece of shared state). The vector clock allows any node the compare its own version of the register with the state received from any other node. Two values of a vector clock can either be causally related (in which case the most recent write wins) or concurrent. However the concurrency is from the system's perspective, but not necessarily from the user's perspective. An extra physical timestamp can be kept at the register level to order concurrent updates in a way consistent with the user's time perception.
Now, having the hybrid clocks in place to version each register on each node, the system must implement a protocol to ship every register update to all nodes (reliable broadcast).
Once all updates are shipped to all nodes, it's guaranteed that all nodes have the same (most recent) state.
(I built an offline-first product and had to roll my own protocol)
I've read the CRDT paper but never implemented it. Question - if you're not using LWW (instead you have concurrent values of a vector clock), this is where you have your CRDT and merge the states coming from every node?
3. there aren't a lot of resources to help you out if you aren't already familiar with CRDTs and the like (and that assumes your backend also already supports them)
I can't help with #1 and #2, but for #3 I found Jake Archibald to be the best source of information online. Here is a great high level presentation (he really is a gem of a presenter. I always drop what I'm doing to watch anything he puts out, offline-first or otherwise): https://www.youtube.com/watch?v=cmGr0RszHc8
Also, Trello put out a multipart blog series which gets into some nuts and bolts which are really valuable when designing your own app: https://tech.trello.com/sync-architecture/
EDIT: I just noticed Jake has a full course available. I'm not sure how good it is, or how indepth, but I'm sure its decent based on the author: https://jakearchibald.com/2014/offline-cookbook/
From my experience, building an offline-first app is much easier than a traditional client-server app.
I maintain an authentication backend for PouchDB/CouchDB-based apps and have received this feedback multiple times: It only requires a little bit of frontend web dev knowledge to store and retrieve JSONs locally + sync them up, without having to worry about authentication, API design, network connection or caching.
> 2. its a significant dev cost
It allows to move much faster than a traditional setup where frontend devs might need to sit in meetings with backend devs in order to argue about some API design. But early mistakes in the data model are much harder to fix because with CouchDB, your data model is your API.
Having built apps with Pouch/Couch for > 4 years now, I think the perfect middle ground is to only accept changes made by a user through an API (proper data validation, no conflicts,...) and to use Pouch/Couch in a read-only manner, replicating all necessary user data to the client.
I design for offline because my apps might be used in places where internet might not be easily available. Like in the basement of a factory, with a hostile IT department. Lots of places all over the place where you might have equipment but can't get internet service.
I worked on a construction app that was offline first, and let me tell you the reason why people don't do it by default is how much more expensive and difficult it is. There isn't good tooling out there that just makes it work. So you have to do everything custom and carefully. And broken sync happens and conflicts happen.
It’s sad that this is a question, at least for apps that don’t do inherently online things. Pretty soon the very concept of doing your own computation with your own data will be foreign to a lot of people.
I'd really like to see more elaboration of different architectures used for offline.
Having used Apollo's GraphQL recently, it definitely has a fairly robust toolkit for handling the cached data; it's a good mini-database-like thing. It feels more than a little like having a Spring repository[1], but in the front end.
My hope is, as much as address user needs, we also as a positive side-effect get a nice well defined front-end architecture. The question of how programmers represent & handle data is fascinating, and offline-first is a more general capability we should have some story for when architecting these client-server resourceful systems.
We need to differentiate between storage and sync is the problem. Sure, you might have something that supports the storage part, but sync is a domain/app specific problem. When you do offline first, you need to look at every endpoint and piece of data and make a decision about how it should be fetched/refreshed (and this is just considering the read-only usecase. read-write is a much harder problem).
There are good resources online that outline the different fetch strategies (e.g. cache-first, cache-then-network, network-only) but it doesn't prevent the developer from needing to go through and make the decisions around each of those strategies and to implement them.
No problem :) Only other talk so far is a very similar talk from the iOS side available here - https://youtu.be/rvfVs_29MsI?t=4578. We do intend to talk about many more things in the near future. Ill try to remember to report back here when we do.
This completely glosses over the rather gritty realities of how to sync data between devices. It's not easy and IME most people do an awful job because they think it will be.
The only thing approaching an off the shelf solution to this is the CouchDB replication protocol. While technically good, it suffers from a number of issues if you want to use it in practice:
- dearth of cloud providers for CouchDB
- There's pouchDB for the browser but there's nothing for native mobile clients
- While PouchDB is great, it has to run on-top of browsers IndexedDB implementation, all of which suffer from reliability and capacity issues
If you're like me and clicked to read this article because you thought it was going to have some suggestions about apps that actually implement these principles, I give you this as a consolation prize: https://secondwind.guardianproject.info/
BTW, I reported the fact that their "Add the Second Wind repo to your F-Droid app" link is broken several months ago, but feel free to report it to encourage them to get that AWS problem taken care of. <3
Wow very nice, thanks for the share. Though I admit I first saw the janky url and was not expecting such a pleasant website UI. They really should get own domain
Somehow we lost the understanding of offline-first, though back when I learned about it in the .NET Compact Framework days, it was just assumed - you had a Pocket PC with some cellular sure, but that was expensive and unreliable, and Wi-Fi wasn't a guarantee. Hopefully at some point you could get your Pocket PC docked in a cradle for Exchange ActiveSync, but otherwise it was almost Wi-Fi, IrDA, or bust.
So yes, you have a small SQL Server Express cache of data and you build around syncing data and working with it offline.
I think the difference is that nowadays almost all data is shared. Offline-first was way easier to reason about when you had a single user to accommodate when handling sync and conflict resolution.
It seems that there are few options overall, but particularly few OSS options for doing offline storage and sync in native code (_without_ using JavaScript). The only thing I've encountered that really met those requirements (OSS, non-JS) was Couchbase, but it's really not a great experience, and requires complicated setup both for the binary and the server. Adding P2P or CRDT features is even more difficulty.
I played around with offline-first a short while ago and building it into an app was a pain. What are the any libraries or tools that would make this easier?
Some apps only. Some apps are just useless without access to the server. App for creating diagrams based on the search request, for example - on the server side it will use a huge DB. Any application where “live” data is ultimately important.
This feature has a cost and might limit the functionality - that’s why it should be implemented only when necessary.
A few jobs ago we had an offline-first webapp. The offline support was so hard to avoid breaking that we ended up writing a proxy server for the test suite so we could make assertions about what the browser would do when it couldn't reach the site.
In practice, yeah - we could have also simulated slow / unstable connections but never found those features needed the same level of automated testing.
Why must I have three separate must-work activities (your front-end, my network, your server)? Why not just let me use the supercomputer I have in my possession and run the thing without the baggage and overhead?
> 1. Users are offline; experiencing latency issues or are in unreliable network conditions.
> 2. Fetching the data over the network will be slower than fetching it from a local source.
> 3. The app users should be informed about the low network conditions but it shouldn’t be a hindrance to their objective.
> 4. Users’ network and battery status are taken into account, and thus only the data that has changed since the last synchronization should be synced.
> Fetching the data over the network will be slower than fetching it from a local source.
A big aspect of this is data collection. Microsoft office has tons of features that could easily be local buy that it wants to be "connected services" that are subject to their own TOS so they can steal your data. Based on the lag, it seems like all spell and grammar checks go through some server somewhere, almost certainly with the primary goal of scraping as much info as they can from what you're doing. Online first is all about data harvesting, much less so ease of operation and certainty not customer experience.
> Why not just let me use the supercomputer I have in my possession and run the thing without the baggage and overhead?
Because the app would be too easy to steal and too hard to monetize.
At this point, from an economic perspective, apps either need to be completely open source or running in the cloud behind secured servers.
Even if I dismiss piracy as a problem (and I don't), the bigger issue is that if the app does something clever/technical and is somehow successful, someone will tear it apart, clone it, and put their own copy on the app stores.
Running in the cloud is how companies are dodging that race to the bottom (see: Chitubox as a prime example--good or bad example is up to you to decide).
Err, email copies, backups, USB sticks, save-to-sync, ... There are lots of shared/safe alternatives when saving my data/work. Not having to have to sweat net lag, though, is a big win. And devices are pretty damn reliable unless you drop the things.
And even then, I have this phone thingie that has lots of storage - why can't I sync to that directly without 3rd-party involvement? This would address you "what if it crashed" by giving a second copy (or more if there's multiple people sitting near me).
I'd settle for offline-second apps that, when ostensibly having the ability work offline, can work with any sort of designed or considered experience (dismiss banners about no connection, don't hang randomly, etc.)