Hacker News new | past | comments | ask | show | jobs | submit login
Cloud Computing Without Containers: Hello Isolates (cloudflare.com)
72 points by metadat on July 30, 2022 | hide | past | favorite | 16 comments

Related thread about "Pros and Cons of V8 isolates" that includes "the opposing argument" from the fly.io people:


From the Cloudflare article:

> Most importantly, a single process can run hundreds or thousands of Isolates, seamlessly switching between them. They make it possible to run untrusted code from many different customers within a single operating system process.

From the linked comment by a supposed fly.io engineer:

> Under no circumstances should CloudFlare or anyone else be running multiple isolates in the same OS process. They need to be sandboxed in isolated processes. Chrome sandboxes them in isolated processes.

So I’m confused where the disconnect is here. One of them has to be wrong.

Neither side is "right" or "wrong", because this is not actually a binary thing. The two sides are judging the level of risk involved and whether or not that level of risk is acceptable, and they are coming to different conclusions.

V8 is undeniably intended to be a secure sandbox. Chrome relied on V8 as the only thing isolating web sites from each other for about a decade. A couple years ago, Chrome implemented "strict site isolation", which adds defense in depth by also forcing every "site" into a separate OS process. However, this does not mean that the Chrome team no longer cares about V8 itself being secure. Strict site isolation is a defense-in-depth measure. Google will pay a bug bounty if you break either layer.

However, V8, like any piece of software, has bugs. The possibility of security bugs implies some level of risk. The question is, how much risk is there, and it is acceptable? V8 is complex, and as a result it tends to have more bugs than, say, a virtual machine hypervisor. Hence, it is argued that V8 is more risky. Some believe the level of risk is unacceptable for a server environment.

On the other hand, V8 receives more security research and better fuzzing than any other sandboxing technology. Most bugs in V8 are actually found by Google's own fuzzers. And process isolation is only one kind of defense-in-depth possible here; Cloudflare Workers implements many other types of defense-in-depth that align better with our particular requirements.[0]

My take -- as the tech lead of Cloudflare Workers -- is that people are broadly overestimating the risk because the approach is different. In fact, I personally believe that typical cloud environments which allow you to run arbitrary native-code binaries are much riskier. The reason is, sandboxing native code doesn't just require secure virtual machine software, it also requires bug-free hardware. CPUs are extremely complex, certainly much more complex than V8. Do we really believe they don't have any bugs which allow VM breakouts? What happens if someone finds such a bug, and it's not possible to mitigate without new silicon? Say it turns out some obscure instruction accidentally allows unchecked access to all physical memory -- what then? It would be quite a disaster for the industry.

In contrast, if your platform only accepts non-native code formats like JavaScript and WebAssembly, it's much easier to respond to such bugs by e.g. controlling access to the buggy instruction.

So frankly, my take is the entire industry has accepted a much higher level of risk already, but people don't talk about it much because it's already broadly accepted as the status quo. Cloudflare Workers gets more questions because we've made a different judgment.

With that said, it's absolutely possible for smart people to disagree on all these points, and probably neither side will ever be proven definitively right or wrong.

[0] https://blog.cloudflare.com/mitigating-spectre-and-other-sec...

Thoughtful and insightful answer. Although I am on the other side of your argument.

I like your rationale around other cloud providers’ virtualization solutions.

I think the main comparison that’s lacking is one of v8 isolates vs strict site isolation which runs on separate processes. Why would the Chrome team build such a complex thing if v8 isolates passed rigorous security standards/bar etc? This would be good to get your insights on.

What’s the actual cost of separate processes vs v8 isolate? In terms of cpu/memory/latency mainly. I would guess either would win over other virtualization techniques but between the two is it really that stark of a difference?

I am more and more convinced you may have seen the light shine through and people such as me are missing the big picture: so providing answers for this line of attack would be more helpful than comparing with other virtualization mechanisms.

> Why would the Chrome team build such a complex thing if v8 isolates passed rigorous security standards/bar etc?

This question assumes that there exists some threshold of "rigorous security standards" which clearly separates "secure" from "insecure". There's no such threshold. We know that risks always exist. No one can precisely quantify those risks. We can only sort of abstractly debate about which risks might be large, and take precautions by addressing those risks with defense-in-depth measures. Each such measure must, of course, be weighed against the costs -- both development costs and runtime costs.

Chrome made the decision that strict site isolation was a worthwhile trade-off to reduce risks. I generally agree with this decision. In Chrome's case, the overhead of extra processes turned out to be acceptable, and I think the risk reduction was meaningful.

In Cloudflare Workers' case, the cost of isolating every Worker in its own process would be much higher. We actually have the ability to do such isolation, and we selectively apply it in cases where we have other signals to suggest that risk may be elevated (such as when the Worker's performance characteristics suggest it may be engaging in a Spectre attack). But because Workers are designed to handle fine-grained events and often run for less than a millisecond at a time, the overhead of this isolation is much higher than in Chrome's case. The exact overhead depends on what the Worker is doing, but 5x-10x is typical (both for CPU and memory). Given this, implementing strict process isolation would not be a good trade-off for us.

Some people respond to this with "You can't sacrifice security for performance!!!", but this is naive. Again, all of the big cloud providers are trading off a whole lot of security risk for performance when they choose to run untrusted native code. In the real world we always make trade-offs.

Meanwhile, though, there are lots of other types of defense in depth that we can implement in Workers that isn't so easy to do in browsers. Dynamic process isolation is one example. Another is that we can automatically roll out V8 patches to production much faster than Chrome can. See the blog post I linked for some more examples.

There do exist rigorous security standards that have been theoretically and empirically demonstrated to distinguish "secure" from "insecure" as long as we take those terms to mean something meaningful like, "can protect from state-level actors" and "can not protect from actors of average skill" which constitutes a difference of literal orders of magnitude. In fact, it even comes in the form of internationally recognized standards such as the Common Criteria (ISO 15408). In the general space of multi-level systems enforcing isolation between untrusted code and trusted data, we can look to the SKPP [1] or the Orange Book Class A1 [2].

[1] https://www.niap-ccevs.org/MMO/PP/pp_skpp_hr_v1.03.pdf

[2] https://csrc.nist.gov/csrc/media/publications/conference-pap...

I think it also comes down to payload. CF has much more limited scope of what you can do in a worker, so not at all surprised they are more relaxed about isolation

e.g. I don't think you can connect to a redis instance from a worker (unless your redis provider implements a rest api)

I am under the impression that keeping isolates in separate process is most important as a proactive defense for userspace specter-like attacks.

At the moment there should not be known vulnerabilities.

I believe that cloudflare removed some of the more risky JavaScript API in that regard.

Past discussion:


(4 years ago; 663 points, 241 comments)

Curious if anyone outside of CF has a success or challenge story to share about Isolates. Super cool tech.

Submission inspired by @kentonv's post: https://news.ycombinator.com/item?id=32287798

Here's one with more comments than the first link in your comment : https://news.ycombinator.com/item?id=18415708

The way Cloudflare execute worker codes makes it very efficient to do things on the edge.

For example, with almost zero cost, you can have your cheap $2-$5 server hosting your web app serve tens of thousands of users concurrently by using worker cache api.

On top of that you can serve static files at ultra low response time and reduce queries to server by using a cdn like bunny.net.

What's amazing is, these technologies are virtually free for everyone unless your app has huge traffic and even then it's is very much affordable.

there's a fair amount of fud in there. I was curious if they supported python and it says they compile the python to Javascript.

So, this is not really a product except for a limited user base: people writing web function. Much of cloud computing that is now hosted in containers is batch work (ML training, HPC jobs) that depends on a combination of dependencies, access to hardware, and a complex multithreading runtime.

Users of Lambda have demonstrated that it can do matrix multiplication at supercomputer throughput rates.

I wouldn't mind the post so much but it badly misreprents the nature of containers in cloud computing and does so in a way that makes me not want to be their customer.

It's true that, at least at present, Cloudflare Workers is primarily meant to implement web application servers. There are obviously many other ways people use servers which Workers does not yet address.

I don't think the post was intended to mislead people into thinking Workers serves those other use cases well today (or 4 years ago, when it was written). Rather, I think Zack just wasn't thinking about batch work as he wrote. When you spend all your time focused on one problem space, it's easy to forget that others exist.

With that said, I do think the Workers architecture will be a great fit for batch work when we get to it. The reason being, infrastructure for batch work is often all about bringing the code to the data, rather than bringing the data to the code. Workers is extremely good at running code in whatever location is ideal, in extremely fine-grained units. Yes, supporting only JavaScript and WebAssembly will be a limitation, but I suspect the advantages will outweigh the disadvantages.

But again, we haven't had the chance to focus on that yet. If you're doing heavy batch work today, Workers probably isn't the best place for it. Yet.

(Disclosure: I'm the tech lead for Workers.)

Just giving author a hard time: when you start with "This is not meant to be an ad for Workers" but then go into listing advantages, detailed comparative pricing, and close with a "try now" CTA, what you've got is an ad for Workers.

i'm having a hard time getting enthusiastic about tech that facilitate turning computation power on and off on demand.

To me, the only real problem left in the web space is persistent data storage. This is the thing that doesn't scale, that's hard to migrate, that sometimes have hard consistency requirement, that has to be really secure, etc...

I've never ever been worried about running out of ephemeral servers. Am i the only one ?

Given something like Moore's law, this feels something like a power-play by cloudflare to just have more control over your stuff?

Applications are open for YC Winter 2024

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact