> Most importantly, a single process can run hundreds or thousands of Isolates, seamlessly switching between them. They make it possible to run untrusted code from many different customers within a single operating system process.
From the linked comment by a supposed fly.io engineer:
> Under no circumstances should CloudFlare or anyone else be running multiple isolates in the same OS process. They need to be sandboxed in isolated processes. Chrome sandboxes them in isolated processes.
So I’m confused where the disconnect is here. One of them has to be wrong.
V8 is undeniably intended to be a secure sandbox. Chrome relied on V8 as the only thing isolating web sites from each other for about a decade. A couple years ago, Chrome implemented "strict site isolation", which adds defense in depth by also forcing every "site" into a separate OS process. However, this does not mean that the Chrome team no longer cares about V8 itself being secure. Strict site isolation is a defense-in-depth measure. Google will pay a bug bounty if you break either layer.
However, V8, like any piece of software, has bugs. The possibility of security bugs implies some level of risk. The question is, how much risk is there, and it is acceptable? V8 is complex, and as a result it tends to have more bugs than, say, a virtual machine hypervisor. Hence, it is argued that V8 is more risky. Some believe the level of risk is unacceptable for a server environment.
On the other hand, V8 receives more security research and better fuzzing than any other sandboxing technology. Most bugs in V8 are actually found by Google's own fuzzers. And process isolation is only one kind of defense-in-depth possible here; Cloudflare Workers implements many other types of defense-in-depth that align better with our particular requirements.
My take -- as the tech lead of Cloudflare Workers -- is that people are broadly overestimating the risk because the approach is different. In fact, I personally believe that typical cloud environments which allow you to run arbitrary native-code binaries are much riskier. The reason is, sandboxing native code doesn't just require secure virtual machine software, it also requires bug-free hardware. CPUs are extremely complex, certainly much more complex than V8. Do we really believe they don't have any bugs which allow VM breakouts? What happens if someone finds such a bug, and it's not possible to mitigate without new silicon? Say it turns out some obscure instruction accidentally allows unchecked access to all physical memory -- what then? It would be quite a disaster for the industry.
So frankly, my take is the entire industry has accepted a much higher level of risk already, but people don't talk about it much because it's already broadly accepted as the status quo. Cloudflare Workers gets more questions because we've made a different judgment.
With that said, it's absolutely possible for smart people to disagree on all these points, and probably neither side will ever be proven definitively right or wrong.
I like your rationale around other cloud providers’ virtualization solutions.
I think the main comparison that’s lacking is one of v8 isolates vs strict site isolation which runs on separate processes. Why would the Chrome team build such a complex thing if v8 isolates passed rigorous security standards/bar etc? This would be good to get your insights on.
What’s the actual cost of separate processes vs v8 isolate? In terms of cpu/memory/latency mainly. I would guess either would win over other virtualization techniques but between the two is it really that stark of a difference?
I am more and more convinced you may have seen the light shine through and people such as me are missing the big picture: so providing answers for this line of attack would be more helpful than comparing with other virtualization mechanisms.
This question assumes that there exists some threshold of "rigorous security standards" which clearly separates "secure" from "insecure". There's no such threshold. We know that risks always exist. No one can precisely quantify those risks. We can only sort of abstractly debate about which risks might be large, and take precautions by addressing those risks with defense-in-depth measures. Each such measure must, of course, be weighed against the costs -- both development costs and runtime costs.
Chrome made the decision that strict site isolation was a worthwhile trade-off to reduce risks. I generally agree with this decision. In Chrome's case, the overhead of extra processes turned out to be acceptable, and I think the risk reduction was meaningful.
In Cloudflare Workers' case, the cost of isolating every Worker in its own process would be much higher. We actually have the ability to do such isolation, and we selectively apply it in cases where we have other signals to suggest that risk may be elevated (such as when the Worker's performance characteristics suggest it may be engaging in a Spectre attack). But because Workers are designed to handle fine-grained events and often run for less than a millisecond at a time, the overhead of this isolation is much higher than in Chrome's case. The exact overhead depends on what the Worker is doing, but 5x-10x is typical (both for CPU and memory). Given this, implementing strict process isolation would not be a good trade-off for us.
Some people respond to this with "You can't sacrifice security for performance!!!", but this is naive. Again, all of the big cloud providers are trading off a whole lot of security risk for performance when they choose to run untrusted native code. In the real world we always make trade-offs.
Meanwhile, though, there are lots of other types of defense in depth that we can implement in Workers that isn't so easy to do in browsers. Dynamic process isolation is one example. Another is that we can automatically roll out V8 patches to production much faster than Chrome can. See the blog post I linked for some more examples.
e.g. I don't think you can connect to a redis instance from a worker (unless your redis provider implements a rest api)
At the moment there should not be known vulnerabilities.
(4 years ago; 663 points, 241 comments)
Curious if anyone outside of CF has a success or challenge story to share about Isolates. Super cool tech.
Submission inspired by @kentonv's post: https://news.ycombinator.com/item?id=32287798
For example, with almost zero cost, you can have your cheap $2-$5 server hosting your web app serve tens of thousands of users concurrently by using worker cache api.
On top of that you can serve static files at ultra low response time and reduce queries to server by using a cdn like bunny.net.
What's amazing is, these technologies are virtually free for everyone unless your app has huge traffic and even then it's is very much affordable.
So, this is not really a product except for a limited user base: people writing web function. Much of cloud computing that is now hosted in containers is batch work (ML training, HPC jobs) that depends on a combination of dependencies, access to hardware, and a complex multithreading runtime.
Users of Lambda have demonstrated that it can do matrix multiplication at supercomputer throughput rates.
I wouldn't mind the post so much but it badly misreprents the nature of containers in cloud computing and does so in a way that makes me not want to be their customer.
I don't think the post was intended to mislead people into thinking Workers serves those other use cases well today (or 4 years ago, when it was written). Rather, I think Zack just wasn't thinking about batch work as he wrote. When you spend all your time focused on one problem space, it's easy to forget that others exist.
But again, we haven't had the chance to focus on that yet. If you're doing heavy batch work today, Workers probably isn't the best place for it. Yet.
(Disclosure: I'm the tech lead for Workers.)
To me, the only real problem left in the web space is persistent data storage. This is the thing that doesn't scale, that's hard to migrate, that sometimes have hard consistency requirement, that has to be really secure, etc...
I've never ever been worried about running out of ephemeral servers. Am i the only one ?