
Local-first software: You own your data, in spite of the cloud - c-cube
https://www.inkandswitch.com/local-first.html
======
josephg
I've been thinking on and off about this problem space for about a decade now
- having worked on google wave, ShareJS and ShareDB. The architecture I want
is something like this:

\- My data is stored at a well known URL on a machine that I own. If people
don't want to host their own stuff, they can rent another person's computer.

\- We need a standard protocol for "data that changes over time". This is a
really obvious point once you start thinking about it - REST doesn't support
realtime edits, and websockets / zmq / etc are all too low level. We need a
standard way to express semantic changes (eg CRDT edits) and do catchup, that
can work across multiple devices / applications / underlying protocols. I've
been working on this as part of statecraft -
[https://github.com/josephg/statecraft](https://github.com/josephg/statecraft)
but its still hidden behind all the documentation I haven't written yet.

\- Then we need application-specific schemas to be published. Eg, there should
be a standard calendar schema with events / reminders / whatever. Any calendar
vendor could provide this. Then calendar apps could request on login from the
user where the calendar data actually lives. Those apps could be web / mobile
/ desktop / whatever, because remember - we have a standard way to
interoperate with data like this.

\- Ideally the data would also be stored encrypted at rest. The server
shouldn't need to be able to read any of the user's data.

You could build a peer to peer system whereby my desktop apps and phone share
data with one another. But ideally, data should be accessible from any device
at any time without worrying about whether your laptop is on or off. For that
we need servers. You could make a single persistent server be a peer in a
CRDT-based cluster of devices. That might be better - but its harder to
implement and might run into issues with bandwidth and size (I don't want my
phone to sync my whole photo library, etc). There are some generally unsolved
problems here, but I don't think they're beyond us.

If you're working on this problem and want to chat, throw me an email - I'm
me@josephg.com.

~~~
rakoo
I feel the dat project ([https://datproject.org/](https://datproject.org/))
ticks some boxes you want as a base protocol. Dat itself is an easy-to-use
tool for syncing large files between machines, but its core
([http://awesome.datproject.org/hypercore](http://awesome.datproject.org/hypercore))
gives you everything you need to build on top of.

With dat you have: \- urls to individual files, with the understanding that
they will change over time \- built-in encryption and non-repudiability of
every change \- storage wherever you want, no peer is more important than any
other

~~~
networkimprov
The Dat protocol is remarkably complex. [1]

Shared object updates should be deliverable by any protocol that works for a
specific application, whether client/server, peer-to-peer, or store-and-
forward.

[1] [https://datprotocol.github.io/how-dat-
works/](https://datprotocol.github.io/how-dat-works/)

~~~
rakoo
It is complex if you compare it to, say, a simple cp to another system, but
for all the features it has (and don't feel optional at this point) it is
remarkably easy to grasp. The BitTorrent protocol is extremely
straightforward, and dat clearly takes inspiration from it and build on top of
it. Perhaps I'm to use to it and see the increment as too small, though.

------
atoav
I use Syncthing¹ for years now, mainly to sync the notes I write on my phone
to my laptop and vice versa, but also as a way to sync my photos to my PC. Or
as a way to sync my keepass password safe to other locations.

It only really works when two of the sharing machines are online at the same
time. I work around this by having a rasperry pi running 24/7 which does it's
thing.

I always tried to avoid cloud based services, because I don't want to keep
thinking about whether I can trust cloud providers and I kinda like the idea
of a LAN beeing used for LAN things – you know.. _local_ stuff.

¹: [https://syncthing.net/](https://syncthing.net/)

~~~
TuringTest
Does Syncthing work reliably for you? I've been unable to make it work.

I've set it up to backup folders from my phone to a LAN storage drive (running
it as a service on Windows, with the storage connected as a network drive),
but it seems to "forget" the connection between the server and the phone. It
says the folders are up to date, and won't pull the most recent files and
photos from the phone.

~~~
atoav
I had problems on android too. if I recall right, it had to do with power
saving/background connections. Somwhere in the android app settings you can
allow syncthing to run in background or so, then it works

~~~
TuringTest
Yes, I allowed background running in the power saving settings - this helps in
the short term.

But the problem seems to be in the server; after restarting the computer or
the phone, the connection between folders seems to be lost - it doesn't "see"
the remote folder anymore, even if its ID has not changed.

To be fair, this doesn't happen with Syncthing only. Other tools like
Nitroshare or Dukto are also hit or miss when trying to connect from the
phone.

~~~
atoav
Okay, weird. I didn't have such a problem up to now.

------
toxicFork
This has been a dream of mine for so long, it is great to see excitement and
serious thought for it.

One hack to give this a try would be to play around with integrating APIs for
cloud storage to go through a local cache system first, then do async
synchronisation on demand. (Collaboration is not really enabled here but the
rest of the principles in [https://www.inkandswitch.com/local-
first.html#practitioners](https://www.inkandswitch.com/local-
first.html#practitioners) become trivial). Essentially firebase but with
developer facing API bridges, for dropbox, drive, Amazon, ftp, whatever. The
twist: you have more control of the data and it doesn't necessarily go up to
Firebase backend, the devices could even do the backend computations async.
Then you can build on top of any cloud storage platform, even decentralize
things. It gets even more exciting to think of decentralization.

I would like to additionally emphasize on the case for data being on the cloud
where it is not necessary to be there. Location history comes to mind. I want
it to be local-first, local-only.

------
dawnerd
I keep an eye on this list for new additions:
[https://github.com/Kickball/awesome-
selfhosted](https://github.com/Kickball/awesome-selfhosted)

Have found a couple really useful tools. My favorite so far is wallabag.

~~~
chicob
Thanks for sharing!

------
anty
I was hoping to read more about how merging of conflicts are done. The article
tells us, that "users have an intuitive sense of human collaboration and avoid
creating conflicts with their collaborators". Regarding non-clear merging, it
states that it's still an open research question of how to let the application
or the user with a view of the change history merge the conflicts.

How to merge conflicts is probably the most important part in a non-trivial
app. Does anyone know of examples or research that has been done in this
direction?

~~~
tarruda
I think the simplest strategy is to create "conflict documents" containing all
conflicted data and present to the user for manual resolution.

This seems similar to what Evernote does

------
realPubkey
I thought a lot about this type of application when Dapps (decentralized
applications) became a thing with ethereum. I tried to build one that is
really decentralized and also works on multiple platforms.

Opposite to the analysis of this article, I have chosen a webapp that runs in
the browser.

The App itself is just a mhtml-file that includes javascript, css and images.
See [https://en.wikipedia.org/wiki/MHTML](https://en.wikipedia.org/wiki/MHTML)

Users can send it around as they want and do not have to install anything.
Also the app makes a call to a server which makes recommendations on updates.

For the data-storage, I had some trouble because there was no database out
there which supported replication, json-import/export and encryption. That was
the reason I created one.
[https://github.com/pubkey/rxdb](https://github.com/pubkey/rxdb)

~~~
aeorgnoieang
Did you look at PouchDB for data storage? It's designed for replication, has
built-in JSON import and export, and their are encryption plugins.

~~~
realPubkey
RxDB is based on pouchdb

~~~
aeorgnoieang
Looks cool!

------
oblib
I've been working on software that meets at least most of the objectives this
outlines, and for the same reasons.

The concept was blown off by my "Group Mentor" in Startup School last October,
which was a bit disappointing, but it's good to see it being discussed here
and maybe it will get some legs as time goes on.

I'll be releasing a simple app soon that, hopefully, demonstrates the
advantages in a way that's easy to digest. At this point I'm not expecting
much positive feedback though. It doesn't use any of the current trendy tech
and there's nothing truly new or whizbanging about it. It is, however, fast,
solid, easy to develop and modify, and runs on any server without having to
install anything other than a web server.

------
madez
This is brave to post here. The web is based on taking control away, and many
businesses are based on that. Here we are at a place of investors in
businesses.

------
0xCMP
In my spare time I've been working on something I currently call Distos
(Distributed Operating System).

My goal is to create a sort of log database merged with an app platform which
maintains an encrypted and authenticated log structure that powers functions
in developer code that update/manipulate local stateful resources. The apps on
a user's device get access to resources managed by the platform like KV stores
or Sqlite databases. The App uses these log messages, which are filtered and
provided by the platform, to update these stores locally and creates new logs
on the user facing clients in order to make things happen both locally and
remotely once the logs are synced.

I am convinced that "logs" are the future for personal data.

~~~
refset
Take a look at Scuttlebutt [0] which works on very similar principles of
authenticated log replication.

Also note that Lotus Notes [1] has been doing all this replicated encrypted
app platform stuff since the early 90s.

I am working on a log-centric bitemporal database at JUXT which I think
intersects with this problem space as well, see my recent tweet relating to
the article [2].

[0]
[https://www.scuttlebutt.nz/applications](https://www.scuttlebutt.nz/applications)

[1]
[https://web.archive.org/web/20170228160130/www.kapor.com/blo...](https://web.archive.org/web/20170228160130/www.kapor.com/blog-
post-1/)

[2]
[https://twitter.com/refset/status/1124311089943019521](https://twitter.com/refset/status/1124311089943019521)

~~~
0xCMP
Yes Scuttlebutt is a big inspiration among others. I believe the primary
change/improvement I'm making is that it runs on the device and is also
focused on the user's data instead running on a server or doing distributed
social networking.

Reading [1] makes it seem to me like I might just be reinventing Lotus Notes.

~~~
ChristianBundy
> it runs on the device and is also focused on the user's data

Only a few of us run servers, most people just run Scuttlebutt on their device
so that they can their data as well as their friends' data:

\- Windows, macOS, and Linux:
[https://github.com/ssbc/patchwork](https://github.com/ssbc/patchwork)

\- Android:
[https://gitlab.com/staltz/manyverse/](https://gitlab.com/staltz/manyverse/)

\- iOS: WIP

------
Funes-
If I had to bet on personal computers' and other devices' future, I would say
that some years down the line--not many--only a barebones system will run on
them so they can connect to an OS delivered through a cloud service. The
turning point will be Google Stadia's success.

As much as I would prefer local-first as well as offline-first approaches, I
reckon that the future will only have a place on the fringes for them.

~~~
zzzcpan
The problem is to achieve acceptable quality of service for an online cloud OS
service with all the responsive UIs and expected reliability, everything has
to be done locally-first either way, using approaches like CRDTs. And even
more so for Google, because their level of service quality delivered over
public internet is nowhere near acceptable for an OS and will never be. But,
of course, they can still offer locally-first OS as a service, not giving
users any control.

------
gfiorav
I’ve worked in the industry of SaaS vs. Enterprise long enough to see Google
Cloud (with all its compute power) get turned down because it requires the
corporation to share its data.

~~~
hbbio
In my experience, most customers end up trusting at least one cloud vendor. If
not GCP (which does not have regions in France for example), they will trust
Azure or AWS (who do). If they are a big e-commerce company, they will hate
AWS but go to Azure, etc.

The reason is mostly the human cost of maintaining infrastructure, and the
global lack of good people have the knowledge to do so.

Edit: Trusting hundreds of SaaS vendors vs. one major cloud platform is
another debate though.

~~~
gfiorav
Not in the financial industry AFAIK.

Saving costs is a major drive, I agree...

------
z3t4
For latency it depends, having a fiber connection to a data center in a nearby
city gives 0-2ms latency. Using WiFi adds about 50-100ms. Keyboard-to-computer
10-20ms, computer-to-screen 10-20 ms, software rendering 0-2 ms. So if you
have a good Internet connection (and not using wifi) you wouldn't really
notice if the "app" was running on your computer or on a server. Try for
example ssh -X and start a GUI app/game on a server. The history tend to
repeat itself, maybe in a few years we will mostly be using realy lightweight
mobile devices connected to a powerful server in some noisy data-center. I do
have a hate and love relationship with hardware though, running your own
computer is one order of magnitude cheaper then running one in the cloud. So
server hosting prices need to go down.

~~~
thijsvandien
50ms for WiFi is some pretty horrible WiFi.

------
z0mbie42
Interesting, but I really would love to see honesty and the drawbacks of the
approach listed.

Saas software is not only an economic incentive, but also a UX win!

The user no longer need to update his software, you can deploy breaking
changes (or security fixs) in seconds (as opposed to weeks, waiting that all
users download the new release).

Security: the user no longer need to download random software from the
internet, and a random malware on his computer will have hard time to access
his online data.

I'm sure there is a lot more of advantages of the centralized model, but I
feel it's unfortunate that they are never put under the light.

~~~
boomlinde
Some flip sides:

 _> The user no longer need to update his software, you can deploy breaking
changes (or security fixs) in seconds (as opposed to weeks, waiting that all
users download the new release)._

The user can also no longer choose when to update the software, and you can
deploy breaking changes in seconds. Meanwhile, on my computer, I can choose
which software to update and when I want to do it, and I'll do so as to not
impact my work flow when I don't have the time to adapt to the latest and
greatest.

 _> Security: the user no longer need to download random software from the
internet, and a random malware on his computer will have hard time to access
his online data._

On the other hand, malicious parties interested in many users' data now have
less work ahead of them. Some shitty engineering at LinkedIn and suddenly
millions of users have their data leaked.

It is true that centralizing the data means that a team of experts can manage
the security as opposed to an amateur like me, but it's been proven again and
again that it's unreasonable to expect data that you share with a centralized
third party to be secure and private.

------
Vordimous
Wonderful article, RSS needs to make a comeback. Especially among friends and
family who love to make large posts about important topics. I try to tell them
to build a blog and then just link to articles that they write. Your article
and others inspired me to finally just put together a system to make it easier
to start blogging. I just mimic a social media platform, but since everything
is committed to a repository using the JAMstack it could easily be converted
to a full website. Any feedback would be wonderful. [https://your-
media.netlify.com/post/make-your-own-media/](https://your-
media.netlify.com/post/make-your-own-media/)

Everything is owned by the end user. This is only providing a recipe for
people to use.

I will also mention that
[https://www.stackbit.com/](https://www.stackbit.com/) is doing basically the
same thing but more from a “Make life easier for Website designers”
perspective.

------
foxhop
My gripe really isn't with SaaS. My gripe is less about "app" software. My
gripe is with printers, scanners, smart phones, IoT devices, TVs, Cricuts,
which only work by sending all your data into the cloud...

If you can send my data into the cloud you should also give me the ability to
easily mock your cloud API so I can also send the data somewhere else...

------
defanor
I found it rather confusing that "local-first" is defined here roughly as
"real-time collaboration software that doesn't rely on a central server". But
with this definition it's close to saying "CRDTs can be useful for their
purpose".

The examples (MS Office, Trello, Dropbox, etc) also seemed strange to me: I'd
think that neither an average MS Office user would care about privacy, data
ownership, etc, nor an average nerdy user who cares about those would want to
use something like MS Office or Trello. Then there's plenty of easier to solve
and related issues that aren't yet solved (e.g., plain offline usage of some
software, more widespread asynchronous collaboration), and the article talking
about privacy and data ownership ends with "We welcome your thoughts,
questions, or critique: @inkandswitch or [email protected]". Looks like a nice
summary, but maybe a bit otherworldly.

~~~
paxys
The average MS Office user works for a company that most definitely cares
about all those things.

------
agentultra
This is already a thing and there are lots of developers putting effort into
the `dat://` and `ipfs://` formats and protocols. Persistent naming in
content-addressable networks that can be trusted is presently being tackled...
and how to structure apps on these protocols...

I'd be down for working on projects in this space. I'm presently contributing
some work into the Lean theorem prover where I'm hoping, with a bit of elbow
grease, it will be fairly low-cost and attractive to build out more p2p
protocols and libraries that meet our privacy and security demands.

------
yingw787
I think this may be a great idea for indie software developers, like Timing or
Standard Notes, in order to expand a particular offering. SaaS will probably
rule for a lot of enterprise software, but there are always niches to be
filled and some of them don’t make sense pricing wise to do a subscription. I
like this!

------
pjkundert
The Holo / Holochain project is building this, and is planning to deploy this
year, at scale.

------
epaga
This is excellent and gets me excited - many of those "ideal" principles were
important to me when I designed and wrote my own iOS app Mindscope, basically
a Workflowy-meets-Scapple app for visualizing your thoughts hierarchically.
[https://itunes.apple.com/us/app/mindscope-thought-
organizer/...](https://itunes.apple.com/us/app/mindscope-thought-
organizer/id901513028?mt=8)

It's an app I wrote primarily for myself, but it's been great to hear from
lots of people who really "got" the vision themselves and use it a lot.

I simply love apps & sites that make immediacy and the feeling of "control"
core values of the UI. Wish I had more time to give Mindscope more development
love than I've been able to lately...working on that.

------
nojvek
I have to say, there are some brilliant ideas presented here. Obviously I
don’t know much about the details of CRDTs but going to dig deeper into them.

------
jbverschoor
I actually like what manager.io does

------
jasonkester
This is a lot of good work and thinking put in to a technical solution to
something that’s not a technical problem.

The reason that software is online is a business one, not a technical one.

Software as a Service is impossible to pirate and generates continuous income
rather than a single upfront fee. That’s all you really need to know to
understand why there is less and less desktop software coming to the market
these days.

So yeah, sure, if you were to build a piece of desktop software from a clean
sheet of paper today, this is a really good guide on how to do that. But
nobody is going to. Because it makes no business sense to do so.

~~~
archagon
_Most_ people won't make use of this kind of research. However, CRDTs aren't
just another way to architect the same kind of software. They are an inversion
of the tropes and techniques we've zealously stuck to over the last decade,
and they grant us brand new technical capabilities that _no_ SAAS player will
ever be able to offer:

• Offline-first support with real-time collaboration

• Real-time collaboration with local devices over Bluetooth/ad-hoc Wi-Fi

• End-to-end encrypted real-time collaboration without the server having
access to any of your content

• Transport-agnostic sync: use Dropbox, iCloud, and Bluetooth all at the same
time with no consistency issues

• The ability to switch to a different cloud provider with zero friction, and
to grab your (mergeable, collaborative, and versioned) documents from your
current cloud provider without conversion to any intermediary format

• Anxiety-free sync: the user can be 100% confident that their changes will
never fail to merge, even if they spent a month in the bush editing their
documents

These are off the top of my head, but there are many, many others. And they
are _features_. If enough people build software using these tools, people will
get used to them and start seeing the big players as annoying and clunky.
("Why can't I make changes to my spreadsheet when I go through a tunnel? Why
did I lose all the changes I've been working on over the last hour? What do
you mean this .doc file is just a link to a webpage?")

Is there Big Money in it? I don't know (or care), but I'm going to try hard to
make sure that any software I write on the side follows these principles, and
I hope others start to do the same.

You could have easily said that "the reason that software is online is a
business one" about time sharing versus personal computing, and yet here we
are. Focus on the user instead of your bottom line and you will (eventually)
win.

~~~
gritzko
Can't say it better.

I welcome your thoughts on swarmdb [http://github.com/gritzko/ron-
cxx](http://github.com/gritzko/ron-cxx)

That is syncable RocksDB with CRDTs inside. Pre-alpha.

------
marknadal
TLDR:

Local-first software is powered by CRDTs.

If you want to learn more about CRDTs, check out:

\-
[https://github.com/automerge/automerge](https://github.com/automerge/automerge)
(author's project, legit)

\-
[https://www.youtube.com/watch?v=yCcWpzY8dIA](https://www.youtube.com/watch?v=yCcWpzY8dIA)
(deep technical talk)

\-
[https://gun.eco/distributed/matters.html](https://gun.eco/distributed/matters.html)
(my Cartoon Explainer)

~~~
josephg
OT also works fine for this sort of stuff. OT algorithms are easier to
implement ([1] for an implementation I wrote of OT over arbitrary JSON
structures). OT just requires a central source of truth / a central authority.
For local first stuff depending on how you design it you can have one of those
- in the form of your server.

[1] [https://github.com/josephg/json1](https://github.com/josephg/json1)

~~~
zzzcpan
OT are inferior to CRDTs in every single way. In 2019 people shouldn't even be
looking at OT.

~~~
josephg
I disagree. I think OT systems are way simpler to reason about, because
they’re just an extension of event sourcing. Also CRDTs type implementations
have been trailing OT algorithms in terms of features forever. OT got JSON
editing support first, and JSON1 (the OT algorithm I linked above) also
supports arbitrary tree reparenting, which as far as I know is missing from
all CRDT algorithms. That’s needed to implement apps like workflowy, where you
can drag trees around.

CRDT algorithms have documents which grow without bound. With OT, the
documents are always minimal and it’s easy to reason about (and implement)
trimming operations.

CRDTs are a better tool for distributed applications, but for server client
stuff OT works fine.

~~~
marknadal
:) I got arbitrary tree reparenting and mutable (and immutable!) state working
in our CRDT, for about 4 years now. No unbounded growth!

Runs in production, having done 1TB p2p data in a day, on $99 hardware!
Internet Archive and others use it.

I do agree tho, most CRDT implementations have just as many scaling and
compaction problems as any append-only log system.

OT is worthwhile to understand.

~~~
josephg
Oh cool! GitHub link?

~~~
marknadal
[https://github.com/amark/gun](https://github.com/amark/gun)

var a = {b: 1, c: {d: 2}, e: 3, f: {g: 4}};

a.c.z = a;

Every object has its own UUID, which makes circular references & sub-objects
easy to reference.

a.c = (UUID pointer to c)

a.c.z = (UUID pointer to a)

a.f = (UUID pointer to f)

Now we can

(a.f.z = a.c) && (a.c = null)

c doesn't actually move, just the pointers on `a` and on f.

c now has a new parent (or could have 2 parents, since any graph is allowed).

Since everything is represented on disk/wire as a flat graph, all updates can
be well defined as operations on `UUID.property = primitive/pointer` atomic
changes in the CRDT.

This means there are only 7 operations to commute: (I hate switch statements,
but someone helping me formalize the CRDT wanted it expressed this way)

[https://jsbin.com/hedeqoxusa/edit?js,console](https://jsbin.com/hedeqoxusa/edit?js,console)
(click run to see each operation applied)

You see order-of-operation doesn't matter.

Once merge has happened, history does not need to be preserved (no unbounded
growth!) but it can if you want log/history.

Merged states can be stored as a flat graph on disk with a radix tree which
allows for O(1) lookups on UUID+property pairs regardless of graph size.

There are some caveats though, of course:

Strongly Eventually Consistent, so don't use for banking.

Counter operations still grow-only but can be done in 12 lines ontop (see
[https://gun.eco/docs/Counter](https://gun.eco/docs/Counter) ). Rich text also
grow-only.

Happy to expand on anything else, too!

~~~
canadaduane
Keep up the awesome work! I've been watching gun on the sidelines, and look
forward to its eventual domination ;)

------
harryking
Very nice initiative ! it would help users a lot in saving their valuable data

