
Ask HN: How can a web app operate on user-held data? - fouc
How can I reduce the need for gathering data from the user and storing it on the backend?<p>For example, let&#x27;s say an app that I&#x27;m building is based around birthday&#x2F;gift reminders, and depends on users adding important dates of their important people.<p>It would be nice to increase trust by not actually collecting the information, but leaving it on the user side instead.  Is this possible?  Is it worth it to the user to bother?<p>Perhaps if the data is user-side, that could require an algorithm that runs user-side.  However, why would anyone want to run untrusted algorithms locally?<p>Just interested in hearing any ideas out there.
======
hluska
You could pull this off quite easily with your browser's localStorage. Besides
a few solvable technical problems, your biggest (most potentially app killing)
problem will be at the user level.

This will primarily be a communications problem because very few users will
understand localStorage. People will lose data and constantly enter data on
the 'wrong' device/browser. They will tell you that your app is broken because
they spent three hours entering all this data on their sister's computer and
now she's five hours away where the data is of no use to them. At some point,
you will receive a technical support request from someone who is reasonably
informed that applauds your commitment to security, but asks you to build
"some kind of database" so that user can use the data on all their devices.

As far as running algorithms on the client, honestly, modern browsers are
pretty safe. If you run something complicated, be sure to test it out on a
range of machines with both healthy and badly bogged down browsers.

~~~
fouc
Right, so the localStorage data needs to be portable. But I guess that would
require a trusted intermediary / proxy of some sort.

~~~
hluska
If you could figure out a way to explain that to users, you could be onto
something big. I think there are enough people who care about privacy, but not
enough to take a major usability hit.

------
danieka
The problem with storing data locally is that users expect data to sync when
using a web app. You could solve the syncing by storing the users data in the
server but encrypted so that only the user can read it. The simplest scheme
would be to let users enter a pass phrase. This passphdase is used to encrypt
all data sent to the server and decrypt data from the server. The user would
never share the passphrase with the server. The server can thus never read the
encrypted data. ProtonMail uses roughly this kind of scheme but with public-
private keys. [https://protonmail.com/security-
details](https://protonmail.com/security-details)

I think this solution could fit you, but crypto can be difficult to get right
and the devil is in the details. And for heaven’s sake don’t use my comment as
a starting point for your implementation.

------
nostrademons
LocalStorage or IndexedDB on the client-side, with Javascript. The "untrusted"
aspect is not a significant concern, since both browser JS and these APIs are
sandboxed and can't escape a single file associated with the website. CPU
usage and disk storage can be a concern, but most non-abusive sites won't be
anywhere close to hitting levels where the user cares.

The biggest problem is that _a lot_ of the algorithms that made the web
successful can't be run on just one person's data. There's a reason why
webapps replaced desktop apps, and that reason isn't _just_ that you only need
a single UI with no installation and aren't vulnerable to all the Windows
worms that were going around in the early 2000s. If you try to compete with
leading webapps you quickly realize the extent to which collecting all that
data actually helps them build better products.

~~~
imauld
> If you try to compete with leading webapps you quickly realize the extent to
> which collecting all that data actually helps them build better products.

I have an extremely difficult time believing this.

Having worked on platforms that use data from millions of users and probably
billions of interactions to create features that no one uses. It would seem
that even with all of the data that Amazon has been collecting from all of us,
the best they can do is show you ads for the exact thing you just bought.

Windows was running in nearly every home long before they were collecting data
about everything you did (some people would probably argue Windows has gotten
worse since then). Most of my adult life online has been collected and
cataloged and machine learned and statistically regressed by now and wielding
all this data ads companies have managed to show me ads I've clicked on almost
dozens of times.

Sure having more data can be helpful but I don't think it's a requirement. I
also think that people can sometimes fool themselves in to thinking that
without collecting every scrap from each user and without millions of users to
collect from that we can't build great software.

~~~
nostrademons
I used to work on Google Search. I now work on my own startup, which uses
ElasticSearch. In many cases, they use the exact same algorithms and data
structures, or can easily be fine-tuned to be equivalent. Nevertheless, stock
ElasticSearch results are _much, much worse_ , because you start with zero
data. To say nothing of not having Suggest, Refinements, universal results,
good spell-correction, disambiguation, etc.

I also use DDG when I'm in Tor Browser, and it always strikes me how much
worse the results are when I try to do technical searches there.

I've still got my music collection in MP3 form on my hard disk, and it's only
an iTunes click away. Nevertheless, when I listen to music, it's all on
YouTube now, because YouTube can actually recommend stuff that a.) I like b.)
is of a similar genre and c.) I haven't heard before.

I rely on GMail's spam filtering for E-mail to be useful; when I've tried to
go back to Yahoo Mail or (heaven forbid) running my own server, the result is
just unusable.

You're welcome to try founding a startup that operates as a desktop app and
collects zero data from users. With improvements in developer tools and
payment systems, most of the ideas from the late 1990s that required millions
in venture capital financing could be done by a single motivated dev in a few
months. Good luck getting users to adopt your software, though, regardless of
how privacy-conscious they say they are.

~~~
imauld
Search and recommendation services are definitely an example of things that do
actually benefit greatly from more data. If you're building something like
that then you will almost definitely benefit form more data. However, it might
still be possible to generate music recommendations for a user based only on
their usage patterns in their local music player. I'm not an expert in that
area but I imagine it could be done.

> Good luck getting users to adopt your software, though, regardless of how
> privacy-conscious they say they are.

It would probably be difficult mostly because we have trained users that an
app isn't useful unless you can sync it to all your devices. We are always
going on about how much better things are once you can connect them to
everything else on the planet. We've sold users on the idea that they need to
be connected all them time and we need to know everything that they do so we
can "improve their experience" which for most things means "show you ads."

I'm not knocking webapps in general. I make them all the time. Some of my
favorite things are webapps. I'm writing this comment on a webapp. I just also
believe that not everything needs to be a webapp and have a constant
connection to some server telling it about everything I'm doing.

------
thriqon
RemoteStorage set out to provide exactly this:
[https://remotestorage.io/](https://remotestorage.io/)

Basically, it provides you with a simple API to store data in a location your
user provides (and trusts). This might be in their local network, but could be
Dropbox as well.

Haven't checked it out for a while, though...

------
mindcrash
As documented here
([http://offlinefirst.org/sync/](http://offlinefirst.org/sync/)) there are two
relatively easy options to achieve this:

1) Using Firebase, which is closed source (and owned by Google)

2) Combining CouchDB with PouchDB (which can be considered "CouchDB for the
browser"), which are both open source.

Both allow saving data in the browser, and both can be configured to sync data
saved in the browser with a server backend when needed.

------
kevinsimper
I have been thinking about the same and my thinking has been trying to
leverage Google Drive, Dropbox for the user to store the data, this way they
would own it all, but it seems weirdly complicated since you would have to
support multiple and you would not earn any money or you would still have to
store user data.

------
jlizzle30
This is the premise of Blockstack
[https://blockstack.org](https://blockstack.org). It allows users to select
their own datastore. I haven't used it so I can't say how good it is but
probably worth a look.

------
citruspi
This is actually a topic that has interested me for a while. A couple months
ago I wrote a program which was intended to be run locally on users' machines
and interacted with via an embedded JSON API and web app. I realized I could
(theoretically) point a domain, e.g. app.myapp.com, to 127.0.0.1 and then
instruct the user to

1\. install program

2\. browse to app.myapp.com to use it

That led me to start thinking about applying this to a "hybrid" application
which is hosted but which stores user data locally using a "sidecar" app. (I
was thinking about larger, more complex web applications where I might not
want to store 100% of user data in e.g. local storage or cookies) You might
have the following DNS records:

1\. app.myapp.com -> <hosted app>

2\. api.myapp.com -> <hosted api>

3\. user-data.myapp.com -> 127.0.0.1

Create a basic application which runs an HTTP API for retrieving/updating
locally-stored user data. When a user signs up, let them know that their data
is stored locally and they'll need to install an application.

When your application needs to interact with constantly updating data (e.g.
weather, finance, etc.), make requests to api.myapp.com for that data. When it
needs to interact with user data, it makes requests to user-data.myapp.com
which resolves to the user's local machine.

Obviously there's a number of things which aren't great (or that I haven't
fully thought out):

\- If the local API runs on port 80, it would require that the user isn't
running anything else on that port. Also, the application would need
privileges to bind to 80. If it isn't running on port 80 (preferable), how do
we pick a port which avoids collisions with ports used by other applications
and how do we enable the user to communicate the port their local API is
running on to the web app.

\- Securing user data from malicious apps - you'd want to make sure that only
your application (and the user) can access the user's data. The same-origin
policy would "prevent" any web application from talking to the local API, so
you'd need to enable CORS for whichever domain is hosting your web app. CORS
would help with this situation, but I don't think it would 100% solve it -
there's still vulnerabilities like DNS rebinding that you'd have to consider.

\- Securing user data from data loss - if e.g. a user's disk died or their
laptop was stolen and they didn't maintain their own backups, they'd lose
their data

\- Sync between devices - e.g. if a user signs up for a service which stores
their user data locally on their computer via something like this, how would
it work if they opened the app on their iPhone?

\- "Moving fast without breaking things" \- if you make changes to the hosted
and local apps which breaks compatibility with the current versions and
requires the user to update, how do we force the user to upgrade their local
app without it becoming a nuisance? This could be solved with an app which
maintains backwards compatibility and slowly deprecates APIs, allowing users
months between requiring updates to the local application. You could also just
write an incredibly generic local API which doesn't require updates often.

Some of these are problems which existed before but which users' didn't need
to think about because developers managed them - e.g. securing user data from
data loss. Others are introduced specifically because of this architecture -
e.g. syncing user data, increasing complexity, etc.

For the average user who may not care _that_ much about the privacy of their
data, some of these trade-offs and increased complexity might not make sense.

If anyone has any further thoughts about this approach or why it's a terrible
idea, I'd love to chat more! Email in my profile.

