OS seems a little presumptuous. As does the majority of the "how it works" page....

kentonv · on July 16, 2016

Hi! I'm the founder / tech lead of Sandstorm. I wrote that "How it Works" page (https://sandstorm.io/how-it-works) and designed the architecture it is describing. Before Sandstorm, I worked at Google on infrastructure for eight years. As you may know, Google has been doing containers and microservices for over a decade. What I describe on the page is largely based on that experience.

> Plenty of precedent there.

I've personally never observed a production (non-research) server architecture which containerizes and sandboxes application servers on a per-document level. That is, say, with Etherpad on Sandstorm, every document you create is in a separate container isolated from the others.

I would love it if you could name some specific examples that work this way, as I'd very might like to see what others have done.

solipsism · on July 16, 2016

I get the grain-per-document thing. Sounds interesting for at least a subset of applications, and disagree with the grandparent that it's common practice.

First question that comes to mind is how the data source is sharded like that. You say in another post that you don't mandate a tech stack. So let's say I want to store my data in MySQL. Are you saying each document literally has its own private database instance?

I have many more questions, but they will depend on your answer to that question. :)

kentonv · on July 16, 2016

> Are you saying each document literally has its own private database instance?

Yes. Each "grain" (our catch-all term; could mean "an Etherpad document" or "a Gitlab repo" or "a Wekan board", etc.) gets a slice of filesystem to do whatever it wants with.

Of course, MySQL is totally overkill for this task. We generally encourage people to use sqlite instead. But there are a couple apps that do actually run MySQL in the container, mostly because switching to sqlite would have been too much work. There are also several apps that use Mongo, but it turns out Mongo 3's WiredTiger engine is reasonably efficient for small data sets, so it works out.

The model definitely is friendlier to some kinds of apps than others. Productivity apps with clear data boundaries -- Etherpad (document editor), Wekan (Trello clone), Rocket.Chat (Slack clone), Gitlab, etc. -- work very well, and are our main focus. "Big data processing" apps are not well-suited to this model. We're happy to cede those, as there are tons of people doing great work in that domain.

simplehuman · on July 16, 2016

In the case of Rocket.Chat, is each channel a grain or is the entire chat a grain? It appears to me it should be a grain but I am not sure.

kentonv · on July 16, 2016

Currently, the whole Rocket.Chat instance is a grain.

However, it would be "more Sandstorm-y" if each channel were a separate grain. For that to work well, though, we need some UI features to emulate what Rocket.Chat already does internally:

* The ability for grains to highlight themselves in the sidebar when there is activity or when you've been mentioned, so that you know to check them. This API actually just launched last week! So now the apps are being updated to use it.

* Some way to create "private message" grains between two people. We haven't done this yet but we have some ideas.

Soon, I hope to see Rocket.Chat add a "single-channel grain" mode where its own sidebar is removed and it relies on Sandstorm's sidebar. This would provide better security for channels which should only be accessible to a subset of people. Probably, the multi-channel mode will stick around too, for cases where you want the same people to have access to many channels.

Nullabillity · on July 16, 2016

> Are you saying each document literally has its own private database instance?

Yup. On the other hand, those containers are garbage-collected when nobody is viewing them.

kentonv · on July 16, 2016

To be clear: The container shuts down, but the storage stays. Since the data set is small, apps start back up quickly on-demand. Moreover, the storage for app assets (code, binaries, images, etc.) is shared read-only among all instances, so the marginal storage usage of each grain is only the per-grain data, which is usually very small.

andrewstuart2 · on July 16, 2016

First, apologies for my snippy knee-jerk comment. I should have probably taken more time to compose my thoughts. And congratulations on the platform's popularity (at least as far as I can gather from GitHub stars). It seems that it definitely hits a sweet spot for many people as far as ease-of-use. Additionally, I think it integrates a lot of great ideas, some in combinations that thus far are not quite mainstream.

I think what bothered me is actually best described by the current top comment. "How it works," too, seems like mostly marketing hype, and in my personal experience many of its technical claims regarding novelty or distinction are inaccurate and come across as sanctimonious.

As you may guess from my mentioning Android above, that's the most mainstream system that I think obtains the described level of sandboxing while allowing highly-fine-grained access control when the data is to leave a sandbox. For example, if I share a photo from one app to another, it does not gain access to all photos but rather only the one I selected.

The same-origin policy also springs to mind as a great example of application and data sandboxing by default.

Sandstorm does have very fine-grained defaults, which I think is the best, and of course capabilities are the more feasible AuthZ primitive for distributed system [1].

All in all, I think we have many of the same opinions, and I've seen the success of similar models several times in the past. This is probably a good deal of why I find it frustrating and disingenuous when they're described as novel.

[1] https://youtu.be/2htZv45lvLM?list=PL-XXv-cvA_iBDyz-ba4yDskqM...

kentonv · on July 16, 2016

Android sandboxes apps, but it does not sandbox documents within apps. For example, when I use the gmail app to connect to both personal and work email, there is nothing stopping an exploit that lands in my personal inbox from reading all my work email. Similarly, same-origin policy generally divides apps (in the form of domains), but not documents -- if I have two Google Docs, and someone XSS's one of them, they can read and infect the other.

Of course, on a phone (or a single user's browser), you usually have only one user, which makes it less clear why apps should be internally partitioned. On the server side, the traditional model is multi-tenancy, where all of Google's billions of users are potentially hitting the same server containers (modulo geography-based load-balancing).

We definitely don't claim that the capability model is new, and I'm happy to acknowledge that many research systems based on a strong capability model implement something like Sandstorm's fine-grained containerization. Mark Miller, one of the (probably, "the") foremost researchers on capability-based security is an advisor to the Sandstorm project -- you'll see him listed on the Team page. Marc Steigler -- who I believe invented the word "Powerbox" -- is also a friend, as are Tyler Close (Waterken), Norm Hardy (KeyKOS), Alan Karp, etc.

The thing is, though, CapDesk and KeyKOS never made it beyond being research systems -- proving a point, but not widely used in the real world. Also, they predated the "cloud services" era.

I would still like to know if there are other examples of server-side fine-grained containerization that I'm missing.

naasking · on July 16, 2016

> Making wildly ridiculous claims that amount to "microservice are silly" (my paraphrase) when really they've only managed to successfully sandbox applications

No, you don't understand the Sandstorm infrastructure. Sandstorm abstracts the data from the application via the Cap'N Proto protocol, so the user, not the application, is in full control of the application data. The common data format makes all sorts of interesting isolation policies possible, like containerizing internal application state. There is very little precedent for this sort of feature.

api · on July 16, 2016

I find this aspect of their docs totally confusing. Cap'N Proto is mentioned everywhere but without context. I went looking for more info on it and it looks like a serialization format like Google Protocol Buffers, but in the Sandstorm docs it's peppered everywhere in a way that makes it sound like magic pixie dust. How does a serialization format solve all these problems?

Not saying it doesn't, just that this needs to be explained conceptually. In reading the docs I can never seem to grasp what the key invention is. How is this different from a Docker host with a gateway?

The repeated invocation of Cap'N Proto also gives the impression that I would have to rewrite my entire app for this, but the number of familiar apps obviously means this isn't the case.

kentonv · on July 16, 2016

Cap'n Proto isn't really the magic that makes everything work, it's just a communications layer that is designed to facilitate Sandstorm's security model -- namely, Capability-Based Security. Here's some relevant links:

https://sandstorm.io/how-it-works#capabilities

https://sandstorm.io/news/2014-12-15-capnproto-0.5

Note that Sandstorm does not require that apps store their data in Cap'n Proto format. Rather, the app communicates to the outside world via Cap'n Proto. Also note that many other protocols -- especially HTTP -- can be layered on top of Cap'n Proto, allowing apps that don't know anything about Cap'n Proto to operate. Here's how HTTP-over-Cap'n-Proto is defined, BTW:

https://github.com/sandstorm-io/sandstorm/blob/master/src/sa...

The key advantage of using Cap'n Proto is that it allows Sandstorm to be aware of all of the connections that exist between apps, so that it can tell the user which apps are talking to which other apps and how those apps discovered each other, and give them the opportunity to revoke those connections.

tdaltonc · on July 16, 2016

>> Microservices are seemingly designed to match the developer’s mental model of their system. Often, services are literally divided by the team who maintains them. > Good Lord, I sure hope that's not true. Maybe if the non-technical manager designed the system.

This just seems like a restatement of Conway's Law. Nothing outrageous or egregious about it. It's just how people are.

nl · on July 16, 2016

>> Microservices are seemingly designed to match the developer’s mental model of their system. Often, services are literally divided by the team who maintains them.

> Good Lord, I sure hope that's not true. Maybe if the non-technical manager designed the system.

Of course it is. It isn't even a terrible way to do it - making a team responsible for their own data end to end does decouple their development velocity from other teams.

It does have problems too of course!

andrewstuart2 · on July 16, 2016

Good microservices architectures are about data congruity and not team boundaries. Yes, they're hopefully small enough for a single team and may warrant further slicing if they're not, but I definitely would argue strongly against partitioning teams and services into 1:1 relationships. 1:N or 2:N maybe (they should be small enough to quickly learn), but never 1:1.

nl · on July 16, 2016

Great!

What about inter-company services? Inter-division? I think your "team:service = 1:N" model above is exactly what happens in many places, and exactly what you seemed to react so strongly to.

It isn't exactly uncommon for a data ownership to be aligned with team structure.

There are trade-off with all approaches, but I think your initial ("Good Lord, I sure hope that's not true.") comment is an overreaction.

saynsedit · on July 16, 2016

Yeah, the iPhone was also a dangerous fad that everyone bought into. There was precedent with Newton and palm pilot.

andrewstuart2 · on July 16, 2016

I don't have a problem with the tech at all. It's the presumptuous attitude as if it's novel that bothers me.