*I need to be able to take with me not just my data, but my software (apps)* Can...

kentonv · on Sept 16, 2019

I mean that if I think Google Sheets is the best online spreadsheet software, but I don't want to store my data on Google's servers, I should be able to run Google Sheets on some other servers. I shouldn't be forced to use some other, worse software if I choose different physical hosting. These should be independent decisions.

SaaS providers basically have vertical monopolies here and I think that hurts consumers.

apitman · on Sept 16, 2019

What would you think of a model where instead of having the data and apps coupled at all, they're handled as separate problems? So for example you could pay one provider (or self host) to store your data, then use standard protocols to access the data (like HTTP for simple things, and maybe something more advanced and akin to Firebase for more complicated apps). So it doesn't matter where the apps are hosted. Each app can even be served from a different domain. You just point them at your "hard-drive on the web".

kentonv · on Sept 16, 2019

Sorry, but that's exactly the model that I think doesn't work at all.

- For reasonable performance and reliability, code needs to run close to data. You can't be accessing all data remotely over the internet.

- Requirements to use "standard" data formats prevent apps from innovating new features not covered by the standards.

- Different applications call for different kinds of databases, and there are still lots of new ideas to explore in database technology. It would suck if all apps were forced to use the lowest-common-denominator database protocol supported by user storage.

apitman · on Sept 17, 2019

I disagree, but maybe we're talking about different kinds of computation.

- There are tons of classes of data that work rather well with streaming modes of computation, especially video, music, and images, which make up most of the average user's data. It can even work for bigger data. I recently had my whole genome sequenced. They're sending me the data on a 500GB hard drive, which I'll upload to the cloud and use tools like iobio[0] (which I work on) to analyze it remotely. Even in cases where you can't use sampling, often the actual computation is as much of a bottleneck as the network throughput is.

- I'm thinking more along the lines of a true hard drive analog, ie the apps can store whatever arbitrary data they want, including binary blogs and special config formats. The service provides a simple API for storing/retrieving files, notifying on updates, and handling auth/permissions.

- Again there are a huge class of problems that work just fine with flat filesystems. I think databases are heavily overused, especially in cases where only 1-100 people need to use the instance. You could run into issues with large collections of music or photos, but I don't know anyone who keeps more than a few hundred photos in any one directory.

[0] http://iobio.io/

kentonv · on Sept 17, 2019

Honestly, video, music, and images are the least-interesting use case, because they are large blobs of static data. Yeah you can put those in a flat filesystem just fine. We even already have open standards for this and a healthy ecosystem of file hosting providers and file-consuming applications that integrate with them.

However, the flat filesystem model would be awful for most productivity apps. Think GMail, Google Docs, Calendar, Slack, Trello, GitHub, Jira, etc. These are the use cases I care about.

Video, music, and images might be the bulk of most users data by raw volume but certainly not by frequency of access or importance.

apitman · on Sept 19, 2019

> We even already have open standards for this and a healthy ecosystem of file hosting providers and file-consuming applications that integrate with them.

I'm not aware of any providers that offer the ease of google drive and the access of S3 (ie can I host a website on it). As far as I know both GDrive and Dropbox removed the ability to host files on the web in the last couple years. Do you know of any comparable products that offer both of those? I'd be very interested in looking at them. I think S3 is the closest, because you can access the filesystem through their browser APIs, and publicly over HTTP. And it got popular enough for people to make open server implementations. The problem with S3 at the moment are that it wasn't designed for this use case, so it's missing features like Firebase-style update events.

> However, the flat filesystem model would be awful for most productivity apps. Think GMail, Google Docs, Calendar, Slack, Trello, GitHub, Jira, etc. These are the use cases I care about.

GMail, for sure. GDocs, I can't think of why you couldn't implement it for a reasonable number of users with nothing but texts files using atomic updates, or maybe come up with a standard protocol for invoking text-based CRDTs. Or just send the change updates as git diffs and if someone tries to change a line that would result in a conflict reject that change and send back the current state of the document. This would be sufficient for any app for the which the data can be expressed as text files. Worst case you could even allow something like CF Workers for cases where you absolutely need centralized logic, but I think there are a large number of tasks you can accomplish without that. Even something like slack I think you could implement with text files on a disk. Pre-SSDs I'd have to agree with you, but nowadays I think a generic filesystem is good enough for most things that don't involve scale.

> Video, music, and images might be the bulk of most users data by raw volume but certainly not by frequency of access or importance.

True

solipsism · on Sept 16, 2019

I see. Thanks for the explanation.

Can you talk about how this informs your work on Cloudflare Workers? Should I be able to take my workers and run them elsewhere, and on software that's not second-rate, as you put it? Should all the proprietary code behind Cloudflare Workers be something I can pick up and take to another host?

kentonv · on Sept 16, 2019

I think it would be pretty cool if apps written on Workers could run on whatever servers the end user of that app chooses, rather than what the app developer chooses.

I'm not entirely sure yet how that would work but it's something I like to think about.

oscargrouch · on Sept 16, 2019

Not the parent, but i understand this as the paradigm that containers and VM´s use for instance.

Your applications and executables are also part of the data and move together with them.

ithkuil · on Sept 16, 2019

Yes, you can already install an application in a VM or container (or just instantiate one of the many ready to use images out there; shameless plug https://bitnami.com).

But sandstorm is something else. It composes well. You can have an application that is a good document editor and another one that offers a spreadsheet that works for you, and have a unified layer that handles identity, authorization, document management (collections, sharing).

Short slogans as "liberate your data" or "self host your apps" focus on the end goal without highlighting what is really stopping us from doing it: the integrated, cohesive experience many of us expect/need, especially in the enterprise environments.

I recently went through an acquisition where we had to switch from gsuite to office 365. Oh my. What a mess.

In the ideal world we would have had our stuff in sandstorm grains, and after the merger our new colleagues would have access to them even if we wouldn't necessarily have picked the same spreadsheet app for example.

Now, for those of us that Word doesn't work well and were happy with Google Docs, well, we don't have a choice: we cannot possibly give a gsuite account to every employee in the larger company and thus we have to migrate in order to not preclude potential collaboration