In my opinion the one thing that makes it hard for sandstorm to grow is the requirement that apps have to specifically be modified to work on the platform, and it never reached critical mass of developers building in sandstorm support to their apps. So it's essentially a chicken-and-egg problem. I'm not sure what the solution to that is.
I would love to read a pseudo-postmortem (I refuse to believe this heralds the death of sandstorm) by @kentonv about what he thinks the main challenges of getting to a truly decentralized world are, how sandstorm plays in, and where we go from here.
This is true, though of course without that requirement, Sandstorm wouldn't be able to do what it does.
More broadly, I think the challenge we have today is that we've settled into a local maximum way of doing things that we call "Cloud Architecture" and "Software as a Service". The industry has spent many billions of dollars exploring and optimizing this approach. Even open source developers are designing their little apps using hyper-scalable techniques because they think that's what they're "supposed" to do -- never mind that such architecture is actively hostile towards small-scale self-hosting.
I think the Sandstorm approach -- which is essentially "distribute apps to people's personal servers the same way you distribute apps to phones today" -- would be vastly preferable (both to users and developers), if it had the same level of development and investment.
We can't just invest $100B upfront to build the new world, so instead we need to find an incremental strategy to get there. That's the hard part. Sandstorm tried to harness the investment already being made by "indie" / open source developers. We got a long way with not very much money! But the indie motif didn't exactly give us enterprise credibility, or any other way to sustain the company.
My new hope is that "mainstream" cloud infrastructure will push towards being incrementally more and more decentralized on a technical level, because of the technical advantages that brings... If your software is designed to run as a million little servers rather than one huge one, then a company like Cloudflare (my current employer) can go deploy it to literally hundreds of locations around the world for huge savings in latency and long-haul bandwidth, better reliability, etc. Then once apps are designed that way, maybe, just maybe, it'll be easier to flip control of code execution and data away from the vendor towards the consumer? But obviously there's a long way to go to get there.
I can't say thank you enough for what you have done. None of this was a failure.
> This needs some sort of WRT54G kinda moment, synology or ???, you needed to have a last mile partner or a host (vpn in to services)
While I love the idea of everyone running their own hardware, I've been thinking recently that physical separateness might be the least important aspect of decentralization. I think alignment of incentives is more important. What I mean is that having your apps/data on google's servers isn't necessarily a problem, but being locked into their platform and unable to easily change is. When it's simple and convenient for customers to change providers, companies are incentivized to maintain high quality products and features such as privacy guarantees.
Was following Sandstorm closely, and despite the fact that im going with different choices compared with Sandstorm, your project was one that i instantly recognized to be tackling in the same domain.
I mean, i think your meditation and problem solving are in the right place, we just need a couple more iterations
, research and efforts to get there.
I will try to give it a shot too, been working on this on a couple of years over the chromium codebase, with a design that was not quite right in the beggining, that needed couple more of iteration and tought, but that in the current incarnation, i think i might have something to show.
I hope i can count with your comments and criticism when i launch it, which i believe, its only a couple of months from now, as i have a great respect for your work and your vision.
You helped to move the needle forward in a domain we need desperately of a solution or else all of our data and freedoms are at stake (not to mention it can even be a better technology than the current cloud-corporate centric vision that are designed to maximize corporations profit and data centralization, where we dont actually own anything, anymore).
So thanks for all of your work, it was really inspiring.
Can you give an example of what you're talking about here?
SaaS providers basically have vertical monopolies here and I think that hurts consumers.
- For reasonable performance and reliability, code needs to run close to data. You can't be accessing all data remotely over the internet.
- Requirements to use "standard" data formats prevent apps from innovating new features not covered by the standards.
- Different applications call for different kinds of databases, and there are still lots of new ideas to explore in database technology. It would suck if all apps were forced to use the lowest-common-denominator database protocol supported by user storage.
- There are tons of classes of data that work rather well with streaming modes of computation, especially video, music, and images, which make up most of the average user's data. It can even work for bigger data. I recently had my whole genome sequenced. They're sending me the data on a 500GB hard drive, which I'll upload to the cloud and use tools like iobio (which I work on) to analyze it remotely. Even in cases where you can't use sampling, often the actual computation is as much of a bottleneck as the network throughput is.
- I'm thinking more along the lines of a true hard drive analog, ie the apps can store whatever arbitrary data they want, including binary blogs and special config formats. The service provides a simple API for storing/retrieving files, notifying on updates, and handling auth/permissions.
- Again there are a huge class of problems that work just fine with flat filesystems. I think databases are heavily overused, especially in cases where only 1-100 people need to use the instance. You could run into issues with large collections of music or photos, but I don't know anyone who keeps more than a few hundred photos in any one directory.
However, the flat filesystem model would be awful for most productivity apps. Think GMail, Google Docs, Calendar, Slack, Trello, GitHub, Jira, etc. These are the use cases I care about.
Video, music, and images might be the bulk of most users data by raw volume but certainly not by frequency of access or importance.
I'm not aware of any providers that offer the ease of google drive and the access of S3 (ie can I host a website on it). As far as I know both GDrive and Dropbox removed the ability to host files on the web in the last couple years. Do you know of any comparable products that offer both of those? I'd be very interested in looking at them. I think S3 is the closest, because you can access the filesystem through their browser APIs, and publicly over HTTP. And it got popular enough for people to make open server implementations. The problem with S3 at the moment are that it wasn't designed for this use case, so it's missing features like Firebase-style update events.
> However, the flat filesystem model would be awful for most productivity apps. Think GMail, Google Docs, Calendar, Slack, Trello, GitHub, Jira, etc. These are the use cases I care about.
GMail, for sure. GDocs, I can't think of why you couldn't implement it for a reasonable number of users with nothing but texts files using atomic updates, or maybe come up with a standard protocol for invoking text-based CRDTs. Or just send the change updates as git diffs and if someone tries to change a line that would result in a conflict reject that change and send back the current state of the document. This would be sufficient for any app for the which the data can be expressed as text files. Worst case you could even allow something like CF Workers for cases where you absolutely need centralized logic, but I think there are a large number of tasks you can accomplish without that. Even something like slack I think you could implement with text files on a disk. Pre-SSDs I'd have to agree with you, but nowadays I think a generic filesystem is good enough for most things that don't involve scale.
> Video, music, and images might be the bulk of most users data by raw volume but certainly not by frequency of access or importance.
Can you talk about how this informs your work on Cloudflare Workers? Should I be able to take my workers and run them elsewhere, and on software that's not second-rate, as you put it? Should all the proprietary code behind Cloudflare Workers be something I can pick up and take to another host?
I'm not entirely sure yet how that would work but it's something I like to think about.
Your applications and executables are also part of the data and move together with them.
But sandstorm is something else. It composes well. You can have an application that is a good document editor and another one that offers a spreadsheet that works for you, and have a unified layer that handles identity, authorization, document management (collections, sharing).
Short slogans as "liberate your data" or "self host your apps" focus on the end goal without highlighting what is really stopping us from doing it: the integrated, cohesive experience many of us expect/need, especially in the enterprise environments.
I recently went through an acquisition where we had to switch from gsuite to office 365. Oh my. What a mess.
In the ideal world we would have had our stuff in sandstorm grains, and after the merger our new colleagues would have access to them even if we wouldn't necessarily have picked the same spreadsheet app for example.
Now, for those of us that Word doesn't work well and were happy with Google Docs, well, we don't have a choice: we cannot possibly give a gsuite account to every employee in the larger company and thus we have to migrate in order to not preclude potential collaboration
I think the most important piece of tech that Sandstorm was working on was the capability based security and the powerbox concepts (which I recognized as being similar to Android's capability apis but for the web). I don't see the decentralized data or local servers being an easy sell any time soon, but I can see the capabilities security working really well within the browser, and more importantly with existing SaaS ecosystems.
If we could get a new browser API for websites to offer to register available APIs for jsonschema based type defs through a manifest file, then users could allow those APIs to be included in their local browser registry. Other pages could request (or offer) data from those APIs, and with the user's permission, the browser could do the necessary oauth handshake and use the API, passing data to/from another service in a secure way behind the scenes.
As well since it would be a browser API, a webpage looking to call an API would be able to request provider info and show the appropriate in-app UI for selecting a service, which sounds like one of the problems Sandstorm had.
No one would have to worry about whether they implement a dropbox api or onedrive api, the page would just have to show the 'file service' selector and call the user selected api with a file. So I'd expect every SaaS out there would jump at the chance to provide an easy access way for users to use and pay for their service.
Furthermore, the browser could provide external connectors as well so I could have some native external app that registers a provider with the browser, allowing me to send my in browser data to an external native app (or vice versa). For example right click menu on a .drawio file and send to DrawIO or send to the currently opened Google Slide. Or send a contact from your fav web contact manager to the your native Skype app. Seriously I can think of so many uses for this! And it fits in perfectly with the existing file or clipboard apis as well.
I hope someone sees this excited rant and can make it happen! :)
I also really do hope the work that's been done in Sandstorm continues on and pushes things further for all of us. Thanks!
It stalled for many many reasons, including:
- XML is kind of awful
- Over-general solution without clear advantages for a specific use case
- Mis-aligned incentives
- easy to abuse open query capabilities
- Hard to use for both data publishers and consumers
I think that JSON Schema with incremental enhancement via JSON-LD is a more promising tech stack for another try. That would let you take advantage of the massive investments in the current API ecosystem while carrying forward the best parts of the old Semantic Web Effort.
Of course the incentives are still hard to sort out. Commercial entities WANT lock-in. They will Embrace/Extend/Extinguish anything they can because they need a moat to make money. Honestly anti-trust regulations might be needed.
Ya this looks like it could've definitely fit the bill. Too bad it never took off. And worse yet, both major browsers tried their own version and neither took off.
This just seems so easy an idea to get on board with, like VSCode solving the n-to-1 problem with their language server api. I don't understand why it fizzled out.
But I would have liked to use an architecture like this in a SaaS application. We may wish that applications were self-hosted, but SaaS is what people are used to, and there's a lot of money in it. An architecture for SaaS applications based on many small, stateful units, like Sandstorm grains, would be a refreshing alternative to the current norm of trying to make everything stateless and web-scale. And I think it's something you could have sold. I remember proposing something like this to you on an HN comment thread a few years ago, but maybe I didn't articulate it well.
The reason I wonder if you now see the statefulness of grains as a mistake is that Cloudflare Workers are going even further in the other direction. Whereas Sandstorm grains can be long-running, Cloudflare Workers are very short-lived. Of course, I realize that you aren't explicitly developing Cloudflare Workers as the successor to Sandstorm; it's a new product with its own requirements. Still, I wonder if your thinking on that part of Sandstorm has changed.
Cloudflare Workers is a long, long way from done. Stay tuned... ;)
Can you expand on this? The way I see things, the more of my stack that's stateless the better. That goes equally at the application level and at the distributed system level. There are so many advantages to designing things that way that it's hard to imagine going back to the stateful "bad old days".
So I'm curious to read a treatise about this alternative small, stateful units model you describe.
But these are problems that only exist because you're trying to create a mega-scale centralized application.
In Sandstorm's granular model, you don't have mega-scale applications. You have many distributed small instances. The programming model ends up being much more like desktop or mobile apps. In those environments, state has never been that big a deal. You don't need a database that can store and index petabytes of data; you can use something simple like sqlite or maybe even flat files. You don't need to think about machine failure, backups, etc.; that is the OS and/or device owner's problem. (Meanwhile, small-scale instances lead to clear data ownership which means that the owner can take responsibility for backups, and a distributed OS can potentially migrate grains around as needed to work around machine failures, transparently to the app.)
Another big problem with the stateless approach to big web apps is that off-the-shelf databases are often a terrible fit for specific use cases, but typically changing the database to suit your needs is not an option. Databases tend to be particularly bad at real-time updates, e.g. like in Google Docs where you can see other people typing. To implement Google Docs reasonably, you need a stateful server; you need for all users of the same document to land on the same server instance so that coordination and routing can occur in-memory. That's incredibly difficult to achieve in the stateless cloud architecture orthodoxy, but incredibly easy to achieve in Sandstorm's model.
Exactly, that's what I was getting at. I'm sure Google has a lot of internal infrastructure and know-how for doing this. But outside of giants like Google and Microsoft, AFAIK, tools and techniques for developing stateful web applications that are reliable and scalable don't seem to be widely known. And the orthodoxy, as you aptly put it, is so strong that I'd venture to say that most of us don't even consider the stateful alternative. So for those of us who do think about it, it seems like the unsafe choice that we're better off avoiding. I hope that changes.
At least, I remember benchmarks for nodejs a few years ago that showed a single instance handle a million concurrent, long-lived connections - on a single thread!
So if you're not aiming for Google-scale load, the field still seems interesting to explore.
We'll probably leverage cloud, edge, personal servers, mobile and any other device in the long term to move services to the user.
I share a lot of your ideas and philosophy. Shame it didn't come to fruition with sandstorm.
See https://qbix.com/QBUX/whitepaper.html for the economics of it
Can you provide some information about why this is? Are you talking about vendor lock-in with certain cloud providers? Because Kubernetes, for instance, provides a layer that lets you move applications around (even locally with Minikube) with no fuss. So I suspect I'm misunderstanding what you're referring to.
2) Kubernetes and other popular cloud infrastructure are designed to scale up to massive traffic, but they are not designed to scale down. For a personal server, you want to be able to install hundreds of apps on a single, modest machine, and only have apps consuming resources when the user is actively using them.
> My new hope is that "mainstream" cloud infrastructure will push towards being incrementally more and more decentralized on a technical level
That sounds pretty close to what I'm doing with my current project. Start with a centralized platform that is instanced per-user under the hood, provide tangible benefits vs the big guys, and eventually open source it for self-hosting. That's the dream anyway.
It depends on what you need to do. What's your use case? My primary constraint is that I need web browser compatibility, ie websockets currently. If capnproto had a solid browser JS implementation I would said use it hands down if it has the features you need. There's some work being done to shoehorn gRPC into the browser, but it seems very complicated to me (requires a proxy server IIRC). If you need a robust browser solution today, take a look at RSocket. I think the reason rsocket isn't talked about more is that it comes across as very "enterprisey", but on a technical level it looks very good to me. You may also be interested in my minimalist approach, omnistreams. It basically adds backpressure and multiplexing on top of websockets. We're running that in production at iobio.io, but the API isn't stable yet (would love external input).
One big caveat of capnproto vs gRPC is that capnproto doesn't really have built-in "stream" mechanics, ie the concept of making a request that you expect to return an unbounded list of elements. I've mentioned this before and kentonv explained a way it could be mimicked though.
Sandstorm is an amazing project, and the people who built it are way better programmers than me. But I'm betting that we need to go even deeper and think about why the maintenance burden for software is so high.
This is a very long-term project, and it's still early days. But none of the short-cuts seem to pan out. It makes me sad, and it stiffens my resolve.
That's not correct.
Self-hosted sandstorm is completely auto-updating. You don't have to do anything. I still have to push updates periodically, but I intend to keep doing that. After I push an update, all Sandstorm servers update themselves automatically within 24 hours.
Updating Oasis was, ironically, much more work for me than pushing an update to self-hosted sandstorm. Oasis is built on a much more complicated cluster architecture designed to be scalable -- scale that, sadly, we never actually needed.
>Oasis is built on a much more complicated cluster architecture designed to be scalable
You could dog food Oasis on top of Sandstorm. For example, Oasis could be a Sandstorm app and use Sandstorm services. In this way, you could feel very closely where and when Sandstorm is lacking. And once you fix or invent those parts, the results would be available to all of your customers, and not exclusively to Oasis. This would shift the laborious spot from Oasis to Sandstorm, and would allow you to open up the enterprise gates.
The second observation is a model of sales. You sold a "decentralization idealism" as you coined it, but with an optional central place like Oasis. In my opinion, this offering represents a natural conflict and is not destined to work in a sustainable way. Why not sell a good old license for advanced/enterprisey features? Oasis would be a great add-on to that model for those who wanted to rent the resources. And yes, exclude the free tier from Oasis to make things fair.
There's a fair bit of history you may not be aware of. For instance, the payment system used to be part of Blackrock, but then was moved over to Sandstorm when it was open sourced so you didn't need to run Blackrock to sell subscriptions to a Sandstorm server. And there used to be paid-only features, but when Sandstorm-the-company ran out of money, they were made free as well.
That's not gonna work. Sandstorm is explicitly not designed to be big-cloud infrastructure, it's designed to host small-scale apps. Oasis is intended to be a big Sandstorm server that utilizes a cluster of machines. It therefore needs to sit on top of big-cloud infrastructure, not other Sandstorm servers.
> You sold a "decentralization idealism" as you coined it
That was teleclimber's phrase, not mine.
> In my opinion, this offering represents a natural conflict and is not destined to work in a sustainable way.
I agree, if I were doing it over I don't think I'd build Oasis. I'd focus instead on federation and migration features, and partnerships with hosting providers and device manufacturers to make setting up a private Sandstorm server as trivial as possible.
> Why not sell a good old license for advanced/enterprisey features?
We tried that: https://sandstorm.io/news/2016-08-31-sandstorm-for-work-read...
IIRC our total sales of Sandstorm for Work could be counted on one hand.
I directly communicated to them that we are ready to shell out something like $2,000 per license.
The answer by Kenton surprised me. He told they are aiming to get the consumer market first. Which is a bit odd, given the server is where the money is.
Enterprises of all kinds already have their hierarchies of customers. Moreover, they are in a constant need of a simple but highly efficient server platform (see Kubernetes of a future). It would be a much easier sell to them.
> While Sandstorm was popular on Hacker News, that popularity never really converted into paying users.
Thanks as always in either case. Hoping it somehow gets new life down the line after this!
Meanwhile, with a self-hosted server, you get better security (compared to Oasis) due to the fact that only you and people you authorize can install apps on that server. For even more security you can of course put it on a private network or behind Cloudflare Access for extra defense-in-depth.
Note: I don't rely on Sandstorm for email yet, because Sandstorm doesn't have good email support currently... but I'd like to build that support, because I'm getting really uncomfortable with gmail.
 https://www.cloudflare.com/products/cloudflare-access/ -- Disclosure: I work for Cloudflare.
My 0.02: I'd interview your target market to find out if this is a pain point they recognize. Many of these projects are started out of decentralization idealism (for good reason, and it's not a bad thing), but as a business you do not want to be left having to educate your potential buyers about why they need this.
Plenty of enterprises already need to self-host services, for compliance reasons (ITAR, FISMA, HIPAA, FINRA, GDPR, national data locality laws in Germany, Russia, China, South Korea, and others), or sometimes even just paranoia. The need is very much there.
But, like, we literally didn't know where to start. Pick up a phone and just, like, call people? Apparently that's how sales works but I'm sure as hell not the person to do it. I hate it when people call me! How could I call someone? So we didn't call anyone, and mostly hoped that fans on Hacker News would go sell Sandstorm to their IT departments for us. That was dumb and didn't work.
Sorry, my comment should have been more clear. I wasn't referring to Sandstorm or you in my comment and I didn't mean to accuse of you of such silliness. It was more a reaction to the poster above who said they were targeting "small business" without specifying which kind of small business and what they are solving for them.
Anyways, yes, there are many reasons for businesses (even small ones) to self-host. Other reasons include the ability to customize (you can't customize a saas, unless you're a gigantic customer), and longevity (Saas come and go, and switching costs can be high).
Something where you could install an app and bingo, it instantly worked on all endpoints around the globe.
Or where you once install "yet another CDN provider" component (app), and it instantly and automatically covers all the installed apps.
Kubernetes came close to this, but it requires significant amount of expertise, and honestly it is a matter of time until it gets smashed with something better.
In short, what we (businesses) want is "Click and serve". Sandstorm was so close and yet did not make the expected steps in that direction. This makes me sad, but this is life.
I wish this was true, because a lot of code in Kubernetes is garbage and their entire architecture of constant reconciliation is quite wasteful, but at this point is has essentially won in being supported by vendors. It'll be a long time before something can really push it away. Which sucks, because nobody wants (operational complexity) or can run it outside an enterprise environment, at least not in a way where you can expect a majority of helm charts to work (insanely high compute requirrements for control/instrumentation plane, requirement for loadbalancer and PVC provisioner implementation)
I really want a unified "let's not think about servers" model for using compute and storage but kubernetes is not it for consumers.
What's truly unfortunate, is Sandstorm was massively ahead of it's time: A lot more people are willing to understand the need for Sandstorm today than they were just a few years ago, when Sandstorm was an active development project. I feel like if Sandstorm had launched in 2019, it'd have enjoyed a lot wider support than people were ready for in 2014.
> Alternatively, you can let us run the server for you: Use Sandstorm Oasis
They still have this remark on install page.
For me it entirely replaced Google Docs (using Etherpad and EtherCalc) and Trello (using Wekan) in particular. I'm also super dependent on it for my RSS reader with Tiny Tiny RSS. Everything on it you could host yourself individually, but it's easier and more secure via Sandstorm.
"""Sandstorm Oasis is hosted by the Sandstorm team. Sandstorm is open source; you can host it on your own server."""
Dates are hard. Use a library. Always.