Hacker News new | past | comments | ask | show | jobs | submit login
Cloud Computing Without Containers (cloudflare.com)
663 points by zackbloom on Nov 9, 2018 | hide | past | favorite | 241 comments

Hi, I'm the tech lead of Workers.

Note that the core point here is multi-tenancy. With Isolates, we can host 10,000+ tenants on one machine -- something that's totally unrealistic with containers or VMs.

Why does this matter? Two things:

1) It means we can run your code in many more places, because the cost of each additional location is so low. So, your code can run close to the end user rather than in a central location.

2) If your Worker makes requests to a third-party API that is also implemented as a Cloudflare Worker, this request doesn't even leave the machine. The other worker runs locally. The idea that you can have a stack of cloud services that depend on each other without incurring any latency or bandwidth costs will, I think, be game-changing.

Greetings Tech Lead of Workers!

I have yet to use this specific product and after reading the blog post I'm actually really frustrated that I didn't fully understand the things that I could do with Workers as I had a proof-of-concept app recently that would have been nice to test with since we've had issues with cold start delays on Lambda in similar projects in the past. And that second point that you make, above, ... holy cow; that's going to change the way I design systems and applications for customers.

Personally speaking, I am a fan of CloudFlare like some of the more obsessed are fans of Apple[0]. Every time your company comes out with a new announcement, it's solves some problem I'm experiencing on projects at that moment.

I could write paragraph after paragraph of all of the grief that using your services has eliminated from mine and my customer's lives, but I am certain you get that enough, already. There is no way my customer's appreciate your free tier as much as I, the developer, do. And your UX is awesome. Whichever one of your team mates designed the API button with instructions next to each option in the control panel as a way to onboard people to the API ... it's those stupid little things -- because that's there, when I'm setting up a client with a unique configuration for the first time, I start from a baseline, set it manually, test, and click into that link to see what commands I'm putting into the script should I have another client with similar needs. I probably wouldn't have thought to even look to see if you had an API were it not for those links.

I'm excited to try out Workers next time I have a Lambda need that fits well. Keep up the good work -- it's taking a ton of grief out of my life.

[0] A gentleman from your company sent me a T-Shirt for providing feedback for the Argo tunnel (called something different back then in beta) product -- you guys even do the damn T-Shirts right ... subtle logo on the shoulder and front, no words, and really soft cotton. It's my favorite "free vendor t-shirt" shirt.

edit: A sentence ... got away from me and I had a dangling bullet

Thank you. Made my day.

Sorry, I just read the linked article so I don’t really know the exact specific of your system, but I think I’m having a flashback. These are exactly all the promises of the sandboxed JVM running in a browser, probably with 10-20x performance deficit to account for JavaScript vs java. I think that after about 20 years we all know why that kind of “security” model failed. I’m actually quite stunned seeing that you are running code from thousands of different websites in the same process. The only game changing part that I can see is giving a huge incentive to a lot of not so good people to try to break the sandbox. And I was hoping that after the intel spectre debacle we were finally going to move in the right direction... Apparently we are still going in the wrong one. I really hope that I misunderstood everything and that you are not really running code from thousands of different websites in the same process.

One of the fundamental truths behind any innovation is it breaks at least one 'sacred' belief of the last generation. The JVM was not a technology built for this kind of multi-tenancy, it was bolted on later.

V8 is one of the most well tested and secured pieces of software in existance, and that it has a much smaller surface area than the Linux process isolation you're referring to.

On the other hand, containerization typically occurs on an independent (from other owners) instance of a virtual machine, which typically is running on separate processor cores, helping increase the overall isolation despite residing on shared hardware. Exposed processor caches due to exploits like Meltdown are a significantly higher risk on a platform of this kind than on a containerized environment. V8 exists at a much higher level than hardware-level exploits. How does your platform mitigate these kinds of concerns? Presumably you have some kind of virtualization above this to manage roll out of your execution environment, but adding a shared execution context like V8 feels to me like the risk factor is doubled, not reduced.

> I’m actually quite stunned seeing that you are running code from thousands of different websites in the same process.

Are you also stunned every time you open your browser? Because it's the same thing.

Chrome has one process per tab. It seems exactly the opposite to me.

As I understand that was introduced only as a mitigation for Spectre.

It's a bit more complicated. Chrome has had the ability to run separate tabs in separate processes since day 1. However, quite often, JavaScript from separate web sites would end up running in the same process. Specifically, (i)frames, popup windows, and sometimes tabs created by a site would run in the same process as the creating site, even if the child window displayed a completely different domain. In this case, in fact, the sites would run in the same V8 isolate. The only separation was "context" separation, which essentially means that the objects from one site were prohibited from accessing the objects from another even though they occupied the same heap.

More recently, Chrome has introduced Site Isolation, which actually allows iframes to run in separate processes. On desktop, this is enabled by default as of Chrome 67. On Android, it is not enabled yet, because it has been shown to be too expensive. The Chrome team wants to fix this, but it's not clear (to me, at least) whether they'll be able to overcome the inherent barriers here.

Chrome started working on Site Isolation before Spectre, but Spectre accelerated interest in it. My take is that Spectre is probably not the main reason that the Chrome security people (who are awesome, BTW) want to do it, but it provided a great excuse to rally support behind it.

For Workers -- much like for Android Chrome -- process-per-tenant is still too costly to be feasible. So, we need to pursue other approaches, as I've described elsewhere in this thread.

Thank you for the very detailed reply, that was an interesting read. Much appreciated!

I appreciate the novel approach, but you should try to use more realistic numbers for the arguments in the post. A node process doing nothing does not consume 30MB of private memory (shared mappings are shared, just like the "shared runtime"), nor does it take 500ms-10s to start a process or 100us to context switch. You're off by 3+ orders of magnitude on the first and almost 2 orders on the second. Cold starts are a symptom of something else (likely shipping function data), which I don't think is really a property of the sandbox.

35MB is what Amazon reports and bills you based on. If that number is wrong, a great question is what are they charging people for?

Launching a process is different than launching a language runtime or interpreter. They generally are not optimized for initial start time in the way an Isolate is. The duration of a context switch varies, but when dealing with tens of thousands of requests per second per machine they are very significant.

Using billing to measure the size or space of something is the worst kind of measure. For example, I recently got on a Grey Hound with my bicycle, packaged in a box. My ticket was $27, but the bike cost $30, even though I weight 3 to 4 times what my bike does. Humans also need thousands of liters of space, while the bicycle box only took up 200 liter or so. And the time the bus driver took to load the bicycle is negligible.

AWS uses virtualization domains to separate customer workloads, and Meltdown required that they all be HVM to provide strong guarantees: https://lists.xenproject.org/archives/html/xen-devel/2018-01...

Your account's Lambda functions won't share a kernel with anyone else's

I'm not sure why you think process start-up is such a heavyweight operation. Of course there's lots of optimization for start-up, it's the whole basis for Unix.

A context switch is O(10000) cycles and 10k is totally doable.

I agree that the Isolate architecture is more efficient, I just object to the straw man comparison. (And would love some real data.)

> nor does it take 500ms-10s to start a process or 100us to context switch. You're off by 3+ orders of magnitude on the first

I'm relatively sure 10ms is not the worst case cold start time of lambda.

> Cold starts are a symptom of something else (likely shipping function data)

Which is all part of spinning up a containerised function in lambda.

About memory, despite being shared memory, does a container in lambda running a node process use 35mb of RAM? Because that's the claim.

> About memory, despite being shared memory, does a container in lambda running a node process use 35mb of RAM? Because that's the claim.

What do you mean by "use" in this context? Does every program that links to glibc "use" the memory for glibc? (The answer is no, because the page cache is shared. But if you just looked at memory usage you might be fooled into thinking it is copied for every program.)

Containers use a drastically tiny amount of in-kernel memory. We are talking fewer than a couple of kilobytes in the worst case.

It appears to me that the main worry was with using Kubernetes, and they've applied this to containers as a whole. I agree that in this sort of usecase Kubernetes really doesn't make sense (using container images at all makes no sense in this case), but the core kernel primitives and simple container runtimes available are more than suitable.

There is an argument that having no userland-kernel context switches has a performance improvement (and it does) but I would like to see data before accepting at face value that "context switches are expensive" is an appropriate thing to optimise for.

After all they are still doing context switches, it's just in userspace. And if v8 is multi-threaded they've just reinvented the N-to-M scheduling model that is very well known to be fundamentally broken because of various well known pathologies.

Your argument is moot when it comes to original point though, isn’t it? Can a single machine handle 10,000 containers with acceptable perfmance at load? Versus the single process handling work for 10,000 customers? Substitute n for 10k here it’s an arbitrary number.

I'm not sure how I could, without having access to CloudFlare-scale compute and load, be able to answer your question.

My point is that the article in question makes claims that I'm not sure are correct, and I wonder whether the engineers behind this solution decided against containers because of their initial experience (this isn't a dig against them, this is a very common trait of almost everyone -- you don't want to waste your time if the first impression you have is negative). Several of the statements (especially about memory) appear to be based on misunderstandings of how containers would actually operate in such a scenario (or based on testing with sub-optimal configuration).

Maybe I'm completely wrong that containers would work under this kind of load in this scenario, but reading the article I didn't find many arguments I would expect to see if someone had really tried to make it work with containers and found the flaws. Instead it reads (at least to me) to be more of a "at first glance this doesn't appear to work" -- which is a fine thing to base product decisions on, but it isn't really okay to then spread (what appears to me to be) misinformation unless you have done significant testing to justify it.

So I disagree that my argument is moot -- and I would like to know how much cheaper the userland context switches are in V8 (I just looked and it appears that V8 does have a multi-threaded core now -- but hopefully CloudFlare doesn't use that with their userland threading+isolation model...).

So, I'm the architect of Cloudflare Workers. In a previous life, I created Sandstorm.io, including implementing its container engine from scratch. I do know some things about containers.

With Sandstorm.io, the rule of thumb we landed on was that a container takes 100MB of RAM. Some apps used more, some used less. This is real, empirical data.

With Workers, we're seeing an order of magnitude better.

There is no one, simple reason for this difference -- rather, it's a large number of factors working together. If I enumerated every one of them, you probably wouldn't find any of them compelling on its own.

Sure, if we could convince all our customers to write tiny C programs, with just the right constraints, maybe they could be as efficient in terms of RAM usage. Maybe. In Sandstorm, a raw C/C++/Rust app could indeed fit in a couple megabytes. But there's still context switching overhead. And no one wants to write C, they want to write JavaScript. ¯\_(ツ)_/¯

> I do know some things about containers.

Great! :D

> With Sandstorm.io, the rule of thumb we landed on was that a container takes 100MB of RAM. Some apps used more, some used less. This is real, empirical data.

I hate to keep harping on this, but what do you mean by "used" here? Are you saying that the RSS was 100MB per container, or that the sum of private mappings used was 100MB (/proc/$pid/smaps)? Did you use a filesystem like overlayfs that facilitates page-cache sharing by allowing read-only inode sharing between different containers, or a driver like {btrfs,devicemapper,aufs} that didn't? (Sandstorm was kicking around a while ago, so this might've been before overlayfs was in mainstream use.)

I know I probably look like an asshole, but I actually don't think I understand what you mean by used -- because saying that a container uses 100MB (which I take to mean that each new container spawned costs 100MB of real mmeory) simply doesn't sound right. It implies you could run less than 80 containers on an 8GB machine, and I doubt that Sandstorm programs were this big. It's like someone telling you that they spent $1500 on lunch -- I'm actually confused what the word "spent" means here.

I'm sure that you'd see less RSS memory usage by only having one process, but it is very possible that this memory usage benefit is not actually real -- I could be completely wrong but I'm just having trouble understanding how putting the same code inside a single program could make such a difference (given that the page cache already shares the memory for the V8 process). If you were to run 10k V8 processes your RSS would increase by 10k but your real memory usage would only increase very slightly because of the minor kernel memory cost of "struct task_struct".

As for context switches, yeah okay that's a cost you pay by having more than one process on a machine. But you do still have context switch overhead (though obviously it's much smaller) if you're running more than one program's state in a single process -- you have to switch context to a different set of protected variables right?

> And no one wants to write C, they want to write JavaScript.

My point about page caches is that the code for V8 is in the page cache and thus running more V8 processes doesn't take up any more real memory than just running one. So that cost of running JavaScript on top should be similar (though probably slightly larger because V8 stores other program information in memory -- but definitely not 10x larger).

My understanding is that he’s talking about app specific code, like yeah the os is smart enough to only load libc once, but php/node/python/rails/x are all going load up there language run times and app specific libraries into their processes. The non .so parts. Then the 100mb number makes sense.

In theory (if most of the runtime code being loaded is the same files underneath) then the page cache will also help with loading them -- you get benefits from the page cache as long as you're opening the same inode (which is why I'm talking about overlayfs -- because overlayfs allows page-cache sharing for base container images).

Even then, to get significant sharing every container would need to be deploying the same frameworks, libraries, and versions, which the single process, restricted environment moots all that anyway.

I'm confused -- the article is specifically about deploying JavaScript (or something that targets WebAssembly). From where I'm standing that seems to be a fairly homogeneous thing to be deploying (if you need to have many versions of JavaScript frameworks I don't see how that's not a problem with this solution either).

"Application containers" get a bad rap because Docker's model of a bunch of tar archives with separate root filesystems isn't actually the best way of getting density (nor is it what a lot of people really want).

A lambda cold start is not equivalent to starting a process. There's obviously a lot of things happening beyond that.

Does it take 500ms every time you type 'ls' in a console? That's starting a process.

> About memory, despite being shared memory, does a container in lambda running a node process use 35mb of RAM? Because that's the claim.

It's not the claim. I wouldn't object to someone saying "we charge X, lambda charge Y". But the post seems to confound implementation decisions in lamda with fundamental properties of containers and processes, which are simply not correct.

Have you tried simply using a seccomp-isolated Linux executable for each tenant? (with rlimit or cgroup-based resource limits)

You have to create a process for each tenant, but that should be much faster and take much less memory than V8 JITting JavaScript or WebAssembly (Linux process creation is very well optimized).

You can try static linking, dynamic linking or even just forking plus an ad-hoc executable loader in userspace.

Clients are intentionally writing JavaScript against (a subset of) the Service Worker API; so you will have to spawn a V8 process & context anyway.

V8 isolates are the same separation as browser tabs, and thus are also designed to be initialized extremely quickly. I’d be surprised if spawning a new process is faster, and it’s not going to be faster if your goal is to run JS/WASM.

Basically CGIs. This would be great if worker functions were written in C, but these days Web programming is done in dynamic languages with fat runtimes where hello world uses 50 MB of RAM. They're saving all that RAM by not using multiple processes.

> If your Worker makes requests to a third-party API that is also implemented as a Cloudflare Worker, this request doesn't even leave the machine. The other worker runs locally.

Doesn't this undermine your assumption that external timers can't be used in a timing side channel attack because "the network is extremely noisy"[1]? Consider a third-party API implemented as a Cloudflare Worker that just returns Date.now(). How much noise will there be if that runs locally?

[1] https://news.ycombinator.com/item?id=18280156

We control the target isolate's Date.now() too, and we can make it return the same value in both workers. Indeed, we can run the workers on the same thread, which makes the whole thing look more like a function call than a network request.

Nice! Thanks for your response.

Chrome recently started running each site's JavaScript in an isolated process, as a Spectre mitigation. [1] Why do you trust V8's isolation more than Chrome does?

[1] https://security.googleblog.com/2018/07/mitigating-spectre-w...

Never mind, I missed the previous discussion of that here: https://news.ycombinator.com/item?id=18418476

This isn’t true. Solaris 11 can have thousands of zones (containers) on one machine. Solaris zones are a very lightweight technology compared to many others on the market. Linux may not, but Solaris can. Yes those containers run Solaris not Linux, but I just wanted to point out that this ability is not unique.

It can also be done on Linux. Docker might not be able to do it, but that's because of its architecture and the language it's written in. There's no kernel limitation like that.

I think Joyent has pushed the limits of what is possible in that area with their tech. From the dev/ops point of view I run docker container / images and they "translate" docker api calls into smartos/opensolaris/illumos zones. NB I don't know much about those OSs/projects.

Docker has a nice and evolving API and OS zones provide an amazing core technology.

Right, but that's because they don't actually run Docker. They have a much better designed manager process that creates LX-branded zones.

My point was that "Linux containers" (so tools like LXC, or runc, or others) are more than capable of being used in this way. Docker has a variety of architectural and other problems that result in it not being as friendly to this kind of use case.

So when you run "docker containers" on Joyent they just untar the image and run it in an LX-branded zone. There is no Go or other problems and they sure as hell don't run Docker as root in their control plane.

Yeah yeah, sorry for not being clear. I understand that they don't use docker and just make their api to quack like a docker.

What are some container technologies that could be used for this right now?

Namespaces and cgroups (using overlayfs to get page cache sharing for the main process mappings in each container). This is a bit of a coy response, but if you have a usecase where other container solutions have a high start-up cost -- write your own. You can write a basic container runtime in tens of lines of shell script.

For existing container glue -- you could probably do this with runc (though unfortunately we are also written in Go) and the architecture is such that runc has no long-running process (only your container continues running after we've set it up). I believe LXC can also produce similar tenancy (and they're written in C which avoids quite a few of the Go problems).

chroot, cgroup, and the systemd run facilities (systemd-nspawn for example).

pivot_root, not chroot. chroot is vulnerable to many trivial escapes.

Probably, but for Solaris it’s native out-of-the-box technology.

What about the language limits the number of containers?


Do you have plans to deploy this tech at cell towers in metro areas and support 5G-enabled applications in the future?

Yes. And a really interesting business model to incent mobile providers to install them. Stay tuned!

Yes, we would love to do that. Workers are lightweight enough that they appear to be a much more viable way to distribute compute than technologies like Kubernetes.

*More viable way to distribute JavaScript compute?

C/C++/Rust/Go all target WebAssembly.

Does V8 allow sandboxing on: cpu cycles, heap, stack and system resources? Are requests to these workers http requests? Is there any backend storage or file access (or blob) from workers?

> Does V8 allow sandboxing on: cpu cycles, heap, stack and system resources?

Yes, and that is critical to our use case.

> Are requests to these workers http requests?

Currently, yes.

> Is there any backend storage or file access (or blob) from workers?

Currently there is Workers KV (key-value storage), and we have some cooler stuff in the works.

Being a non-js person, it wasn't clear if this is just for JavaScript or not. If it is, then it's not really fair to compare against containers or vms, right?

that was my thought as well. in the section about the down sides, they do mention though that they should be able to run anything that compiles to web assembly. they specifically call out rust and go.

wasm really has quite a lot of interesting and unexpected applications! (though of course non-wasm tech can accomplish the same thing, but it’s nice that we are getting close to some form of universal binary format)

So hyperthreading and multicores are a security mess, but a JavaScript VM is going to safely allow "10,000 tenants" on a single machine?


Hyperthread, multicores, rings, were all stability boundries hijacked to be security boundries. It'll never work

I think it's very reasonable that the best possible security is not implemented in microcode.

I've been closely following CF Workers and have KV beta access, but the mere fact that we had to ask and wait for access to KV essentially made our decision for us to not use workers.

When will KV be out of beta? It's hard to commit to a platform like CF Workers when it's obviously still not intended for mainstream usage and there is no public timeline.

Hi! I work on KV at Cloudflare. Thank you so much for your interest in it, and for considering Workers at all. I'm pretty convinced if you keep considering it when situations allow, you'll choose it eventually when the moment is right for the product and your company.

I'm happy to give you beta access for you to experiment. That said, we take the distinction between a beta and GA very seriously. We don't want you relying on software we're not, at the very least, using in production ourselves.

That day will come very soon! We're moving several projects into KV (allowing us to move them out of centralized data centers), and we expect KV to track GA in the next few months.

By the way kentonv, thanks for the protobuffs and sandstorm. I work with amir and I just wanted to provide a positive note of it's cool tech. Amir is correct that we still don't feel like the kvstore is mainstream? When does kvstore go ga?

> we can host 10,000+ tenants on one machine -- something that's totally unrealistic with containers or VMs.

This is definitely possible with containers. Docker has historically had density problems for a variety of reasons (its architecture as well as Go being awful for systems programming in this context). But you can definitely get this density with a simple container runtime that doesn't have large long running processes.

Saying you can't run 10k containers is like saying you can't run 10k processes on Linux.

I'm sure that it might be somewhat faster to have it supported in V8 directly, but I don't want people to think such density is impossible with containers. (And with page-cache sharing you'd be able to not have to load 10k V8 programs into memory as well.)

I think you are missing the point. Regardless of the number of processes you can run, the overhead of starting them and context switching between them is dramatically higher than the equivalent operations within a single V8 process.

The part of the comment I was responding to was talking about it being impractical to have that many containers running, which isn't true. Later I mentioned that there are probably other benefits (having everything in one process does remove the need for TLB flushes and so on), but that isn't the only thing they said. They didn't say it's drastically faster, they said it's "unrealistic" which simply isn't true.

Also things like page cache sharing would make a huge difference to the real memory cost of having many processes. When you run 10k programs you don't have 10k copies of glibc.

Out of interest, why do you say that Go is awful for systems programming in this context?

How do you protect two isolates from interfearing with one another if they land in same process? If two of them happen to both start a calculation that takes 50ms of cpu time, do both calculations end up taking 100ms to complete? Or do you prevent concurrent execution some how?

At 1/10,000th of a server per each tenant (assuming 10,000 tenants) you can achieve “hard” isolation and far more cpu/memory per tenant using a TinkerBoard or something like that per client and charging a flat $2/mo with no hourly fees. That’s a business model that I see materializing that can eat into your business.

EDIT to clarify: I’m assuming they can do 10,000 tenants per server. If they did 1 tenant per 1 TinkerBoard and charged $2/mo for it flat no hourly fee that would be an interesting business model IMO and it achieves hard isolation between tenacts.

Scaleway out of France has been doing baremetal cloud nodes for years. I used them during the beta and they are fantastic (despite slight latency due to being in France). Their smallest baremetal node starts at €3/mo for 4 ARM cores/2GB RAM/50GB SSD. Pretty cool infrastructure to be able to auto provision physical nodes on demand, but I think it's a different use case from the whole lightweight serverless processes on demand thing.

If only they supported NixOS... I would use them if/when they make the leap.

Are you sure they don't? NixOS can be installed on top of any Linux distribution, unless the hardware is really weird.

ETA: It looks like it is complicated, but might work. See https://nixos.wiki/wiki/NixOS_on_ARM/Scaleway_C1

We replicate each tenant to thousands of servers across more than 150 locations worldwide. That's why we need to support so many tenants per server.

Ah. I see. TIL something about CDNs. Thanks. Good stuff.

I think you're assuming that Cloudflare has only one server?

Nope. I’m assuming they can do 10,000 tenants per server. If they did 1 tenant per 1 TinkerBoard and charged $2/mo for it flat no hourly fee that would be an interesting business model IMO and it achieves hard isolation between tenants.

But a Workers customer gets to use more than one of their servers at once....

You can have multiple tenants per low cost edge device. ARM servers are finding so many such applications. Not sure what is the hang up?

(I sit next to Kenton)

I would actually argue with #1. The truth is, this appears to be a better tech even if it isn't globally distributed.

Lightweight serverless functions with no cold-start are a really great primitive to build on. We figured them out in this way to enable global distribution, but that's a bonus, not necessary to really love the concept.

Well done! Very creative solution to the problem and using the V8 system as a starting point is brilliant!

> The idea that you can have a stack of cloud services that depend on each other without incurring any latency or bandwidth costs will, I think, be game-changing.

This looks to me like just another form of vendor lock-in.

Running 10k containers on one machine is totally realistic though. This is just yet another language vm (e.g. jvm) isolation all over again.

Does your implementation support vectorized SIMD or SSE instructions?

That's cool. Do you use any of your tech from sandstorm days?

The marketing around “No VM” is interesting. It sounds like you’re relying on V8. What did you do instead of using its VM? Precompile? Or did you reimplement V8s isolates in another VM-less system? Or do you mean more specifically no OS-level VM?

Disclosure: I work at Google Cloud.

Can you say more about the assumed security model here? We built gVisor [1] (and historically used nacl or ptrace) precisely because things like Isolates aren't (historically?!) trustworthy enough for side-by-side. Plus the whole V8-only ;).

Edit: But let me say, I still think this is awesome (and love seeing more people jumping into the "just run code" space).

[1] https://cloud.google.com/blog/products/gcp/open-sourcing-gvi...

Hi, tech lead of Workers here.

This is tricky to answer concisely because we've implemented a huge number and variety of defense-in-depth measures. (Plus, I'm in transit right now and can't type much -- maybe I'll edit to add more details later.)

First, we believe Chrome has done a pretty good job hardening V8 over the years, and we get a lot of comfort knowing there's a $15,000 bug bounty for V8 breakouts. We update V8 continuously, tracking the version shipping in Chrome.

That said, we obviously don't simply rely on V8 for everything. We've modeled what a V8 breakout is likely to look like, and added a variety of mitigations.

Presumably, a V8 breakout bug is likely to allow an attacker to run arbitrary native code within the Workers runtime process. That's bad, but they will quickly run into some barriers to weaponizing this. For example, we obviously run with ASLR, so an attacker looking for other isolate's data would be flying blind and would likely raise segfaults. Any segfaults in production raise an alert and are investigated. Similarly, we run a tight seccomp fitler that denies all filesystem and network access, and if an attacker ever tries to invoke such syscalls, it will raise an alert and be investigated.

It's worth noting that we do not allow eval() (or any other mechanism of "code generation from strings"), hence all code we execute has to have been uploaded through our code deployment pipeline. This implies we have a copy of all code. When a segfault raises an alert, we immediately look at the code that caused it. Anyone using a zero-day against us is very likely to burn their zero-day long before they manage to pull off a useful attack.

You're probably also wondering about Spectre. Here's a previous comment of mine on that topic: https://news.ycombinator.com/item?id=18280156

Again, this is just a couple of the things we're doing... there's really too much to list in a HN comment. I hope to find time to write this up more formally in the future. :)

How do you prevent side channel attacks that leak secrets, implementations, or even affect other scripts random number generation?

Given the comfort in the bug bounty, and that bugs are inevitable, it seems like cloudflare is willing to risk some exposure to upcoming exploits. How do you feel about this in an era of sophisticated nation states that I’m sure would have a keen interest in being “inside” cloudflare?

Please see the comment I linked about Spectre specifically: https://news.ycombinator.com/item?id=18280156

I actually think our approach to Spectre -- focusing on preventing the observation of the side channel, rather than blocking speculation -- is likely to be more robust against Spectre variants that haven't been revealed yet. Case in point, we developed some of our core mitigations before we knew about Spectre at all.

Spectre affects containers and VMs too.

Specter is one class of side channel attack. When you’re in process, I presume there will likely be many more. I meant it as a more general question.

> we do not allow eval() (or any other mechanism of "code generation from strings")

This is interesting, because preventing code generation from strings goes beyond `eval()` and `new Function()` - it also includes things like the output www.jsfuck.com can produce (which can indirectly call things like `eval()` and `new Function()` without ever mentioning `eval` or `new Function` in their source). How do you prevent things like that?

V8's embedding API provides an explicit option to disable them.


Line 9097, SetAllowCodeGenerationFromStrings()

If you're doing it at the interpreter level, it's pretty easy to prevent even the most obfuscated versions of such functions.

You can entirely remove eval such that even if code contained a very obfuscated form of `eval("foo")` it would get "eval is not defined".

Since they're not just sanitizing their input, but rather are forbidding things at the runtime level, things like jsfuck.com are totally irrelevant and should have no bearing on their security.

They're likely not blacklisting strings but actually removing things from the runtime.

Hi kentonv!

Do you trust the model enough to get Workers and Workers KV PCI-DSS Level 1 certified?

> This implies we have a copy of all code. When a segfault raises an alert, we immediately look at the code that caused it

Pretty sure if you run k8s or VMs your company's IP isn't exposed to random engineers at the provider.

I'd be very surprised if you managed to crash a GCP/AWS/Azure hypervisor (or the equivalent in their PaaS offerings) and they went "well, can't look at what's going on, too bad", and their terms of service almost certainly have the leeway to do that as part of a necessary and structured process.

I’m not sure gVisor was incepted because Isolates didn’t provide strong enough isolation - wasn’t it because Google was looking for a way to provide stronger isolation for containers (i.e. not V8-only) without the overhead of traditional virtualization?

That is, totally different use case... going with V8-only in return for some better non-functional metrics is a trade off Cloudflare made for a serverless platform, but not one that Google (or anyone else, probably) would make for a container-based platform?

Why aren't they trustworthy side-by-side? Due to lack of OS layers between them thereby making a cross-context vuln require only one step? Or inability to govern/throttle their resource use? In some instances in Chrome, aren't v8 context boundaries the only security between two scripts in the same process (though I know many second level domains get their own process, I don't remember what causes them to be shared)?

I would be very interested in an answer concerning resource accounting/control. It is possible to meter .wasm processes by transpiling their source to call an external metering function every time a code block is entered: https://github.com/ewasm/wasm-metering

I have not seen something like this demonstrated in a Javascript/high-level-language context yet.

An address space provides some hardware isolation.

Even if you ignore side channels, there's a lot that can go wrong in the software only case. I'm pretty sure that site isolation as a feature predated Meltdown and Spectre (though the default changed).

Of course a language-specific sandbox with a highly restricted domain-specific API will be faster than an OS container or VM. The tradeoff is that it can run less applications unmodified, so less people can use it.

I think Cloudflare workers are a cool implementation of domain-specific scripting. I’ve used it and it’s great. And with their scale I’m sure they’ll advance the state of the art in various ways. But there’s no radically new insight here.

Yeah, there's a ton of them in the literature. Anything enforcing memory safety (eg Softbound+CETS or SAFEcode), control-flow integrity (esp Code-Pointer Integrity w/ segments), data-flow integrity, information-flow control, etc might be useful to build on here. Many are designed for C, too, which is closer to metal than Javascript, has tons of automated checkers (eg RV-Match from RVI), and a certifying compiler (CompCert).

The other drawback is you have less layers of protection if operating in process with language-based and/or tactical mitigations. The MMU can isolate what we call known unknowns where something goes wrong (eg bitflip or RAM decay on protection mechanism), the app goes rogue, and it's still contained a bit. That's why the strongest approach from high-assurance security is still separation kernels: microkernels designed specifically for security that fit into L1 caches, are often mathematically verified in some way, support individual apps running in own address space optionally on a language runtime, and support VM's for legacy apps. Muen is an example of that approach for x86, seL4 an example for ARM (mainly ARM), and INTEGRITY-178B a commercial example for PPC (beware marketing hype):




The other drawback is you have less layers of protection if operating in process with language-based and/or tactical mitigations.

Just because one abstracts-away the sandbox from the users doesn't mean that the sandbox can't be used as a layer of protection. The wiser approach would be to use an additional layer, though it's abstracted away from the user's point of view.

I agree. Im just emphasizing it since many people think safe languages can be enough security because the forget attacks that break abstraction gaps. One benefit of Spectre/Meltdown and Rowhammer was helping me counter that more easily. :)

The real hope is people start building applications this way. That's the innovation. The current serverless technologies they are using provide process level APIs which are fundamentally unnecessary for their application, and when you break free of that things change dramatically.

With the advent of WASM, the mentioned sandbox is no more language specific. Threads & garbage collection are coming soon to WASM, which should open up most relevant languages in use today to this more efficient sandbox

I agree, wasm is a notable step forward. But it's not magical either. For example you still need to expose system functionality to the application. That means either exposing existing OS APIs (which gives you compatibility at the cost of security and speed); or defining new APIs which take advantage of the properties of wasm (better security and speed; worse compatibility).

As far as I can tell, the wasm community is still primarily focused on running in the browser. There is an explicit goal of supporting "non-web" targets, but no consensus on how to expose system APIs, let alone a full implementation.

Arguably it's still language-specific: the difference is that the language is now WASM bytecode instead of JavaScript. You could already compile from other languages to JS.

> With the advent of WASM, the mentioned sandbox is no more language specific.

Which removes many of the advantages.

How do you figure?

One issue might be that Cloudflare won't have the source and can't recompile the code.

But perhaps web assembly is easy enough to decompile?

wasm doesn't as of now have any means of code generation independent from the JS api. There is also no way to obfuscate call targets or corrupt heap outside the module's linear memory. Any sandbox breaking bugs that do not involve JS would be codegen errors, AFAICT; in that case what you need to analyze is just the wasm blob and its interaction with the codegen pipeline.

Isn't the point of running these serverless components on data-centers closer to the edge to reduce latency in emerging IoT applications?

My question is why would I want to run an unmodified application on this? Because of existing business logic in the form of shared libraries that I'd like to leverage, unmodified?

> My question is why would I want to run an unmodified application on this?

Complete straw-man here, but this is what I envision:

Ignorant businessmen buys a new silver bullet from a (potentially ignorant or malicious) salesperson or idea broker then wants you to "make it work" on the "serverless cloud".

Most B2B software provides no reasonable method of determining if their product(s) actually work. So changing the amalgamation of your boss's silver bullets (your business application) requires the rebuilding of a hard won set of folk knowledge. This task is terrifying, unproductive, potentially impossible (non-permissive licenses), and something that most developers in this situation would like to avoid.

Of course there are decent ways to avoid and/or address situations like the one described, but this seems close to what I've witnessed in industry.

I actually have experienced basically the opposite sort of thing.

There are monolothic applications and overly restrictive central IT processes where the “folk knowledge” you describe is basically a job-security obfuscation tactic for developers to wield political control and act as credit-mongering constraints on what other developer teams might want to do, under the guise of it all being for the safety or stability of the monoliths (which are already unstable and unsafe).

Particularly in cases where multiplicity of different runtime environments is critical for innovation, this central IT bureaucracy can be totally killer.

The worst is when outside consultants come in and just reinforce what the central IT people are already stonewalling with.

For this reason, it can actually be great when new silver bullets come in from consultants who are able to break that hegemony. Sure, the silver bullets are dipshitty business crap, but if it at least offers a hook for using better modern tooling, you can successfully not die from the central IT crap.

Then by showing the results of the new tools, you can undermine the “folk knowledge” gate keeper people, and force them to be honest about the declining business value of protecting the so-called stable legacy monoliths.

I can’t believe Cloudflare would run untrusted customer code side by side with that of other customers. V8 isolates sound great but did the Chromium team have this threat model in mind when designing them?

Not that long ago Cloudflare was bitten by a nasty bug where their own parser mixed up HTTP responses from different customers https://bugs.chromium.org/p/project-zero/issues/detail?id=11...

Cloudflare’s response to this incident did not inspire a lot of confidence. What I am trying to say is that I won’t trust their workers implementation unless a competent external security researcher audits it.

Edit: Related discussion https://news.ycombinator.com/item?id=18417257

I'd have to agree with this. It sounds to me like Cloudflare is sacrificing security for performance in a fairly unpredictable manner. I also think the "Disadvantages"-section of the article casually skips the comparison of security between docker containers and V8 isolates.

What about our response was lacking to you? I can tell you from the inside that huge amounts of code are being moved to rust to prevent that type of vulnerability from ever happening again.

> What about our response was lacking to you?

Several observations recorded in the Project Zero report about CF's response were problematic:

"I had a call with cloudflare, and explained that I was baffled why they were not sharing their notification with me."

"They gave several excuses that didn't make sense, then asked to speak to me on the phone to explain. They assured me it was on the way and they just needed my PGP key. I provided it to them, then heard no further response."

"I can already see that the (huge) list does not contain many pages that still have cached content from before the bug was fixed."

"Cloudflare did finally send me a draft. It contains an excellent postmortem, but severely downplays the risk to customers."

Granted, I have not been keeping up with what has changed at CF since then so maybe things are better. But the decisions taken wrt Workers doesn't inspire a lot of confidence. Even a Google Cloud engineer mentioned that Isolates probably aren't trustworthy enough to run untrusted code side-by-side https://news.ycombinator.com/item?id=18417257

For quite a while now I've wondered why we do this. We take a physical computer and put a program on it that slices the machine up into multi-tenant abstractions called 'processes' so you can run more than one. Then somebody runs just one program on that machine in a process that uses all the machine that then slices the machine up into multi-tenant abstractions called 'X' (for various versions of X - isolates, containers, Erlang Actors, Database triggers).

If the Unix process abstraction is no longer fit for all purposes, why don't we get around to de-layering the operating system so that the Unix process abstraction is just another 'Virtual Machine' sat on top of something more fundamental?

Is it time for the return of the pure hypervisor?

You're asking for unikernels, which have had a recent renaissance: http://unikernel.org/projects/

Some of the density advantages depended on Paravirtualization, which is quite difficult to make safe against Meltdown: https://xenbits.xen.org/xsa/advisory-254.html

> If the Unix process abstraction is no longer fit for all purposes [...]

Perhaps I'm misunderstanding you, but the Unix process abstraction was never meant to isolate processes from each other. It was designed to allow multiple processes to run on the same machine.

On the other hand, the Unix OS was indeed designed to isolate itself from processes running on other Unix OS'es, which is exactly why we use the entire OS as the container of processes that need to be isolated.

> Simultaneously, they don’t use a virtual machine or a container, which means you are actually running closer to the metal than any other form of cloud computing I’m aware of

I recognize this is just a little marketing (which is fine), but I'm not sure what this is supposed to mean any more.. containers have identical efficiency to running on a raw VM, and VMs only trap to emulation in very few circumstances (e.g. IO) any time in the past decade. Running pure compute load in a container, guest VM or host VM should have almost identical performance.

Running a single tenant in a container or VM should have the same performance as bare metal. The problems come when you want to scale the number of tenants, incurring RAM and context switching overhead. On these measures, containers are quite a bit lighter than VMs, but still quite a bit heavier than isolates. When you're only aiming to split a machine 10 ways, this might not matter, but we're targeting splitting a machine 10,000 ways, which calls for entirely new tech.

(Disclosure: I'm the tech lead for Workers.)

Especially since containers use vanilla processes and light-weight access control mechanisms that I wager V8 uses some of.

But they are running multiple client's services in the same process, so that's how they're saving - less context switching, smaller memory footprint, etc.

I'm interested in how isolated the client's applications actually are...

Well, theoretically, perfectly. Practically, there's some exploits lingering in there for sure. But that's true for virtualizers too, so...

As the article says, faster spinup (since v8 is already running) seems to be the customer's greatest advantage only, saving an order of magnitude of memory is certainly a big deal for the operator.

I think since you can compile pretty Much anything to webassembly, porting over some existing tools might even be viable.

As someone who works for a large CDN and who has been working in the networking space for over two decades, the one thing that really bothers me about CF is how hyperbolic they are on their blog and twitter.

>> Unlike essentially every other cloud computing platform I know of, it doesn’t use containers or virtual machines.

This is light in comparison to their posts about QUIC, anycast geolocation, etc.

They surely have done a lot of interesting work, but they'd lead you to believe they're hosting half the internet, and invented every advancement to CDNs in the last decade.

I'm sorry you feel that way. In general we feel like our biggest target with this language are the millions of people who wouldn't care about TLS 1.3, or QUIC, etc if we didn't make it clear how important they are. To us it's not about the people who know and love CDNs, it's all the people who don't, we're trying to speak to.

The point about this approach being "closer to the metal" than other cloud providers definitely brought this talk to my mind. I just hope the nuclear war he also predicted for 2020 doesn't come to pass :/

Indeed! Amazing talk and amazing mind, thanks for sharing. Reminds me of Rich Hickey (whom he mentions in the talk), similarly independent and deep thinking from the first principles.

This will happen sooner than in 2030s: https://github.com/piranna/wasmachine (WebAssembly on FPGA)

I know Dart isn't that loved on here, but I had a pretty similar idea a couple of weeks back, a PoC serverless platform using Dart's Isolates, which are shared-nothing: https://github.com/thosakwe/serverless_dart_platform

I know that the article briefly addressed security, but in my head, I can't even work out how they manage to securely manage multiple tenants within the same process. This isn't a knock on them, it's a lack of knowledge on my part, and I'm wondering how that would even work.

Isolates in Dart can be run based on different dependency graphs (i.e. you can have an Isolate for user A's project, and a separate on for user B's, within the same process). However, things like the current working directory are shared by default. There's an IOOverrides class that lets you change this contextually, so theoretically you could hack something together that patches the entry point code to use it. I'm not sure how that could work in Node, though. What if someone were trying to overwrite another user's files?

Overall, though, the #1 thing that I can't reconcile in my head is that if all the isolates are running in the same process, wouldn't they all have the same permissions? What is there to prevent someone from touching another person's files, or, if they run in the same process as the server, more sensitive data? AFAIK you can't run a Worker/Isolate as a different user within the same process?

Not only that, but what about fork(2) bombs, etc.? The only thing I can think of is for Isolates to run under the context of some user who not only has no home directory, but also cannot spawn processes, or read/write/delete files.

Lastly, the only other thing I can't work out is how whether the Isolates run in the same process as the server (and if so, how do they do this securely), or how they communicate with the separate process, if technically any Isolate could console.log (maybe via TCP sockets?).

The whole CloudFlare workers concept is super cool... I just don't understand how it works. I feel like I'm missing some key here.

Like JavaScript running in your browser, workers don't get direct access to files or other resources on the machine. We provide a limited API which only allows safe operations. Currently that means a Worker can send and receive HTTP requests, and read and write from Workers KV (a distributed storage system). We'll add more storage options in the future, but not raw disk access.

Ahhh okay, makes a lot more sense. Thanks.

They don't provide access to the local filesystem at all.

Fastly's CEO (a Cloudflare competitor) gave a pretty great tech talk on what it means to be isolated: https://www.youtube.com/watch?v=FkM1L8-qcjU

Guessing we'll see a competitive product from Fastly in the near future.

It's important to note that for caching purposes, you can already run VCL (Varnish Cache Language) on Fastly's edge servers.

Really interesting article but the ending is a little weird.

> This might mean Isolate-based Serverless is only for newer, more modern, applications in the immediate future.

It sounds like you're implying unless your app is written with Node or Go / Rust then it must be legacy. That's not really fair to say.

Plenty of modern apps are built with Python, Ruby and Elixir. I don't know if they are all webassembly compile targets, but in either case, you should probably rewrite that to be less "better write everything in Node or you're a dinosaur!".

Cloudflare workers seems amazing and I would love to migrate few thousands of nodejs lambda I have to cf workers... But... I noticed the 1mb limit for the function size, most of my functions are between 1.5mb and 8mb depending on the npm packages required. Also execution time for my functions is between 200ms and 30sec. (Loading remote data, transform it, generate pdf,...) and workers are limited to 5-50ms!

Cloudflare Workers execution time is counted a little differently. When you call fetch() in a Worker and are waiting for remote data, Cloudflare doesn’t count that waiting towards the “execution time” limit. Other providers do.

Note: I work at Cloudflare

I don't understand why I got a downvote... I'm really excited about cloudflare workers and hope they will overcome the actual limitations to make them usable for more use cases (including mine, this is why I have provided my metrics as a hint to cloudflare team)

The time limits for Workers are actual CPU time, not clock time.

The article says that pushing a new lamda@edge function takes 30 minutes. That has not been my experience. I routinely push lambda@edge versions from the AWS console and the new versions typically run within a minute or so. Have others seen anything like a 30 minute propagation delay?

Lambda@Edge is built on Cloudfront and pushing any change to Cloudfront generally takes 20-30 minutes in my experience. Can you tell me more about the change you're making? Is it code or config changing?

I'm talking about using the Lambda@edge console to push a new, numbered version of a lambda function written in node.js. I have pushed 100s of versions in this way, and I would guess that the mean time until the new version begins running is on the order of one minute. By "begins running", I mean that the new version number appears in CloudWatch output.

No other changes to config.

I haven’t used lambda@edge but I have used lambda. Lambda deploys almost instantly from my experience.

Do you have plans to open source this work?

It'd be huge for p2p/decentralized apps. As of now, you can only decentralize data - this allows decentralized compute as well.

https://github.com/laverdet/isolated-vm is similar, and used by a competitor that has a similar offer

This is awesome. I noticed the "In an Isolate universe you have to either write your code in Javascript (we use a lot of TypeScript), or a language which targets WebAssembly like Go or Rust" -- this is a very interesting statement! Being able to run wasm code means it could be possible to run .NET Core apps (really all I care about in the world right now) using such a system soon, thanks to work done for CoreRT (https://github.com/dotnet/corert)

Really really interested to see where this could lead.

> In an Isolate universe you have to either write your code in Javascript (we use a lot of TypeScript) [or Wasm]

Not really: There are many nice managed languages that use JS as compile target. You can use ClojureScript, Scala, Purescript, Fable (F#), Haste (Haskell) etc.

$5 minimum. Overpriced compared to any other faas service. Can only run one worker at a time. No thanks!

There was a company Rackspace bought maybe 6 years ago that was building a semi-POSIX compatible layer on the v8 VM. They had Python running on it and it looked pretty interesting.


Not sure what progress has been made on it.

Looks like none in the past four years: https://github.com/zerovm/zerovm

Wonder if you could build something similar using Go, A goroutine's overhead is only about 2KB. No idea how easy it would be to sandbox it.

It wouldn't be easy at all to sandbox it; that's the problem with every language that isn't Java or JS. (And I think people gave up on Java sandboxing a decade or more ago.)

WASM will probably make sandboxing across languages a lot easier. I wonder what would happen if you ran a golang wasm binary on a the golang wagon WASM interpereter [1]

[1] https://github.com/go-interpreter/wagon

> I wonder what would happen if you ran a golang wasm binary on a the golang wagon WASM interpereter

IIRC it won't quite work because the plumbing required for WASM on the host side isn't done there yet [0]. The plumbing is mostly [1]. In a project I did on the side to reduce Go WASM size and startup time [2], I built just enough of this plumbing to solve my needs. Go initially targets only JS [3], so sadly us with non-web use cases might have to wait.

0 - https://github.com/go-interpreter/wagon/issues/69 1 - https://github.com/golang/go/blob/master/misc/wasm/wasm_exec... 2 - https://github.com/cretz/go-wasm-bake 3 - https://github.com/neelance/go/pull/7#issuecomment-377298992

Neat, thanks!

> that's the problem with every language that isn't Java or JS.

Or WebAssembly, for which many languages now have a backend.

What about JavaScript makes sandboxing easier than, say, Go, Python, etc?

There's not much that makes sandboxing _easier_ for JS, it's just that the economics favor JavaScript (because it's a useful feature for browsers and browsers are installed on billions of devices).

Just the fact that it was designed for sandboxing hostile code from the beginning.

How does that design actually manifest itself? I don't know very much about this - what technical details about JavaScript make this easier? And then why doesn't Node support this kind of sandboxing, but V8 does, if it's so much easier to support sandboxing than other languages?

The primary advantage is that it's already done, really. It's not hard to define isolated environments in almost any language you could name ("I should be able to have two runtime environments that support as much of the original language as possible, and they should not be able to affect each other" - a thousand i's to dot and t's to cross, but that pretty much captures the idea), but implementing them is another thing entirely. So much of the rest of language design tends to cut the other way, making sure that things are hooked together deeply for the convenience of the programmer.

I'm not particularly convinced Javascript is that much better than any other language as the language goes, but the implementation is far ahead of almost anything else. I know of other implementations of this idea in other languages ("Safe" in Perl [1], various attempts at sandboxing in Python which have generally failed, Safe Haskell [2], many others), but without the browser use case driving them forward, they tend to be poorly tested and even decay over time. It tends to be a use case popular enough to get some stab at support, but not popular enough to quite result in a robust, real-world-tested library support. The browser is an exceptional use case.

[1]: https://perldoc.perl.org/Safe.html

[2]: https://ghc.haskell.org/trac/ghc/wiki/SafeHaskell

Take a look here, at the V8 embedding docs: https://v8.dev/docs/embed

It specifically covers isolates vs. contexts, which should help contextualize it.

> Wonder if you could build something similar using Go,

FTA “In an Isolate universe you have to either write your code in Javascript (we use a lot of TypeScript), or a language which targets WebAssembly like Go or Rust.”

They were talking about reimplementing isolates and a similar multi tenant cloud computing environment in go.

No, the previous commenter was correct. You can run any WASM targeting language on Workers.

I agree that from this article it seems possible to run go on cloudflare workers by targeting WASM, it’s still worthwhile to consider what a native go implementation would look like.

i dont know very much about go, or sandboxing, but for starters, you might need to do something like: disable syscalls, disable ability to do operations that could hog resources, such as spawning more goroutines

A native client could do.

But this thing only runs JavaScript. The constant AWS bashing in this article doesn't make sense when the only fair comparison they can make here is AWS Lambda with a Node.js runtime against Cloudfare Workers.

AWS Lambda@edge only supports the node.js runtime, so a comparison between workers and lambda@edge isn't so unreasonable; they have very similar execution models, I think.

You can run any WebAssembly targeting language in a Worker.

If you plan on using Javascript on Lambda the bashing makes perfect sense.

So if want to write serverless functions in Go or Python you go somewhere else. But if you, as most other do, write them in Javascript, then this is a relevant platform to consider.

So I'm guessing that in order to use modules from npm you use WebPack?

And what's available is the same as a Service Worker in Chrome. So you can use http fetch but there is no raw socket etc. Does it support WebRTC (not sure if there is a way to actually use that on the server but maybe there is some weird use case for communicating between Workers).

Also wondering what options there are for files or databases etc.

It sounds like the design is for them to handle individual requests, but just out of curiousity, how long can the workers keep running?

Yep I've got an index.js file that creates a simple API gateway via a basic switch() on the url pathname. Then individual files for each function. Bundle them into a single script with webpack and copy that into the Workers UI (there's an API for deployment too... eg can use Serverless Framework).

1MB max size, max 5-50ms CPU time, max 128MB RAM: https://developers.cloudflare.com/workers/writing-workers/re...

That said I've got a worker script running a pass-through websocket to GraphQL origin server and it keeps the connection open.

I'm doing authentication within the worker too. Long-term I can see ability to run absolutely everything there: microservices, GraphQL server (Apollo has a beta), auth, static SPA files, SSR etc. $0.50 per million calls ($5/mth minimum). And they intend to be within 10ms ping of 99% of the global population.

If they can workout how to do databases on the edge (they're working on it), then I can see a future where you could build scalable apps that serve millions of users for maybe less than $100/mth across the entire stack.

If you know how long a request takes you can roughly calculate how many requests per second your server can handle ... When working in JavaScript/Node.JS one thing that is both sync and very slow (almost 1ms) is console.log ! So next time when running benchmarks, try removing the console.log's =)

It's a mystery to me why we need millions of instructions just to put "hello world" in a pipe though. I'm thinking of learning low level programming, but I'm too scared of what I might find.

That's what's nice about programming retro computers like the C64. The number of layers you have to dig through is small and doesn't lead to a lot of unnecessary bloat or lag.

The article says: "Unlike essentially every other cloud computing platform I know of, it doesn’t use containers or virtual machines. "

Surely there were others before like Azure Service Fabric ?

Google App Engine also originally sort of relied on language sandboxing but it still had a separate process for each tenant.

I’ve seen the automated provisioning and scheduling of jails and zones for application deployment, pretty much since they were invented. Automated UID provisioning for isolation and resource constraint and application package deployment, at least since the nineties.

Ultimately most of us want to put files on a server and start a process. There’s just an endless variety of constantly re-invented ways to do that. Choose your scaffolding.

Jails and Zones are essentially containers - they're ways of isolating process from each other by basically constraining what they can see and change, at the kernel level.

What they're describing is actually running code from multiple people on the same process.

A process is any instance of a program that's being executed. That definition doesn't change whether it's a program running as an ordinary Unix process, as an Isolate in a V8 runtime in a container on a virtual machine on a bare-metal blade in a DC rack, or as a hypothetical MIPS-like CPU on my whiteboard.

By "choose your scaffolding", I'm just saying "how much of the OS kernel would you like to re-invent?"

That isn't a wholly cynical position. Doing so might introduce new capabilities or cost efficiencies, which seems to be Cloudflare's pitch. But very often we get excited by the possibilities and lose sight of the tons of additional stuff now required to achieve quite simple things. In the worst case (e.g. VMware) we end up re-inventing hardware abstraction, scheduling, networking, memory management, and filesystems i.e. most of the damn kernel. Docker is a less egregious offender but still deserving of the "too much scaffolding" label.

In the very best outcome, the capability gets stripped down to the core enhancement and pushed back into the expectations for a base OS, which is for example why practically every OS now comes with a hypervisor and a container mechanism.

Yeah, and that still exists doesn't it?

Oracle provides bare metal instances on demand on the Oracle Cloud Infrastructure platform. There's literally no hypervisor or anything in the way. Other cloud providers also have bare metal instances as well.... it really seems such a bizarre claim that they made.

They are implicitly talking about Serverless / FaaS from other cloud platforms.

Lambda, Google/Azure functions, arguably Google App Engine etc.

So awesome to see this on the front page just a day after I was complaining about Docker on another HN thread.

Seems like a very pragmatic system. Get some great benefits for a reasonably well defined and common use case (JavaScript apps that don't need to run external binaries), without doing anything crazy just to support apps that don't quite fit in the architecture. It might not work for everything, but for many apps this will be a very simple and low overhead solution. Kudos!

Worker execution is limited to 15s of wallclock time. What happens if the client has a slow network connection and takes >15s to receive the HTTP response of a worker?

It's actually required to _start_ all requests within 15 seconds, they can take longer before completing.

I'm working on an experimental function as a service based on CGI apps. It has a lot of advantages of the article:

- Process isolation

- Close to hardware

- No cold start

But it also has some advantages like:

- Any language, including compiled languages

- Local testing, backed by a standard

I am still experimenting with the scaling potential; my biggest concern is memory requirements. Currently I'm writing the project's blog using the project, and then am going to do some performance tests.

It's hosted on a 5 dollar digital ocean instance at bigcgi.com.

[edited for formatting]

How would this not suffer from similar issues to Lambda? The start times are going to be worse per request than application servers today because all of the initialization can no longer be amortized. Long before containers became the hot new thing, people were using FastCGI to amortize these costs, and importantly, Scale - CGI requires expensive process creation on every request. You can't take advantage of fast, usermode context switching like with Goroutines or event-based architectures like Node.js because each request is a process in old-school CGI.

And how doesn't that put you to square 1 with isolation? Cloudflare is running multi-tenant loads; you ideally want to have fine grained control on resource utilization and access controls. With CGI, you pretty much have to reinvent Docker to get the same level of isolation...

I'm not sure I follow how this is superior to other solutions except for that CGI is easier to test locally.

A fast-cgi implementation is something I've looked into if plain CGI is not fast enough. Initially I want to see what exactly the performance comparison is with plain CGI, and then iterate from there.

As for resource utilization/constraints, it runs on freebsd and uses rctl to limit memory usage and number of processes. One thing I want to find out is how to balance user resource constraints with the total load of the shared server.

Oh, I see - so freebsd jails are used for this. At least for process isolation that seems like a solid approach.

Still, I do wonder how much of the problem with containers is container overhead. To me a big difference is about what costs can be amortized. With CGI, pretty much every cost is paid at every request. With long running daemons, the per request overhead is removed. With Cloudflare's functions, the scheduling, request, and startup overhead are (theoretically) removed, leaving mostly just execution.

With just CGI there's some win since you can drop the scheduling overhead, but I think you would probably lose that advantage as soon as you hit the upper limit where forking per request stops scaling. With FastCGI you can enable better scaling but since you now have to manage long running processes, you are officially in the scheduling business, though you could probably beat container scheduler like Kubernetes in terms of raw overhead.

I don't know. Maybe this is a good idea, but I'll admit my first impression was mostly "isn't this just CGI as it always was?"

I read through nearly all of the docs looking for testing guidance. How can I include a CloudFlare Worker in my automated integration tests?

The implication this article makes that "containers" are somehow further from the metal than sandboxed javascript is disingenuous.

How so? Context switching has a cost, and the solution outlined within seems to achieve similar functionality while not paying nearly as much. The same is true for quota/resource management (what cgroups etc. would provide you with containers on Linux).

I haven't used this, just read the same article you have, but the claim seems reasonable.

You can run javascript in a container and its distance from the metal is no different than if you just ran the javascript outside of a container. Containers are just processes with limited visibility into the rest of the host, they're nothing like VMs.

We've deliberately incurred the context switching costs of using multiple processes for decades because the hardware-enforced isolation of address spaces via the MMU is desirable. Of course you can gain efficiency by throwing away the multiprocess model and running separate security domains in the same address space of a single process, we're not breaking new ground here by going backwards.

Been exploring this idea in the app development context for a while now. I wished PNaCl had filled this role, but now think WebAssembly may take up this space instead as it has similar performance characteristics. The attraction of both these for us is that many languages can compile down to be run in wasm.

This tech can be a game changer for applications where latency matters more than others. My question is about portability of this. If I spend enough time and port certain parts of my application to this , will it be just tired to CF workers runtime ? Is it possible to run workers on other clouds ?

> An Isolate-based system runs all of the code in a single process and uses its own mechanisms to ensure safe memory access.

What property of this application allows it to share CPU time and memory more efficiently than the OS can?

The longer we go down this road, the more the sales pitches feel like Erlang.

Can someone shine some light on how secure Workers are compared to containers that use cgroups and namespaces for virtual isolation?

Are you planning to release any free plan on workers as firebase cloud functions does? The current offer is 5$/month...

Is $5/month too much?

(not OP) It is for a side project that gets very little traffic. I'm still paying that $5/mo just to play with this tech but I can't say it's not too much for my use case. I guess that's not the target market for Workers anyway

Can I get an automatic email when my Workers begin timing out?

Will there be logs or graphs to help investigate such issues?

Do workers allow you to run init code on cold-boot only, or do you have to do all of your init with each request? If the latter, presumably that would also mean that a language like Go (compiled to wasm, obviously) would have its runtime start up with each request?

Yes, you can run init code at cold-start time. JavaScript's global scope is evaluated once per isolate created, and you would normally instantiate your WASM modules at the global scope as well.

So does this mean my function(s) must be written in JavaScript? Or can it run wasm assemblies.

> Even more importantly: Amazon is bad. It is monopolistic [...] Its owner should have his immoral hoard of wealth forcibly expropriated by the state before his power grows so great that all of society is warped by it.

That is a heck of a statement to make.

This was a reply to a different article, sorry.

Very cool tech. But ridiculously expensive. According to [1] you pay $5 per 10,000,000 requests, <10ms each. That works out to less than 27.8 hours of CPU time per month. And that's assuming near-perfect utilization of that 10ms window and ignoring upfront cost of Pro plan, which this is an add-on to.

Compare that to Linode/Hetzner/OVH/etc. where you get half-CPU for that money (i.e. 360 hours CPU time per month).

And yeah, sure. Not apples-to-apples. FaaS, infinite scalability, yadda, yadda. But computing http responses is still computing http responses. Is this FaaS magic dust really worth 1400+% premium?

[1] https://www.cloudflare.com/products/cloudflare-workers/

Another reason why this is not an apples-to-apples comparison is that most people don't run their VMs at 100% utilization. In fact, I'd wager the vast majority sit at single-digit percentages on average. Linode will still charge you for the full CPU, of course.

Well, CF always charges you for full 10,000,000 requests/mo too.

Also, it is super-common (especially on low end) to offer burstable CPU VMs [1,2]. So you are not getting charged for full CPU, unless you specifically ask for it.

[1] https://www.hetzner.com/cloud?country=us (bottom of the page, toggle between "default" and "dedicated CPU")

[2] https://aws.amazon.com/ec2/instance-types/t3/ (product details, "Baseline Performance/vCPU" column)

You aren't paying for Workers for the CPU time IMO, you're paying to be in 155 locations around the world so that your code runs closer to your user. You couldn't split a hetzner/ec2 box across locations like that. A lot can be accomplished in 10ms on Workers in my experience since they don't count waiting on network requests as part of that 10ms.

If requests only require <10ms of CPU anyway, $5 per 10 million requests is pretty reasonable to me.

Does the fact that a worker can run in one of the many data-centers of Cloudflare guarantee some sort of multi-regional redundancy/high-availability in case they have a problem with a region?

Yes. One of our goals with this, not yet mentioned, is to get rid of the idea of availability zones or regions. Every worker runs in every data center. Any data center can be taken offline and traffic will be seemlessly rerouted at the BGP level.

This is great news! In my opinion, you should emphasize a bit more on it in your writings. It's one of the most important factors. it was for us the central discussion point when were designing a platform for our webapp. We may reevaluate Cloudflare for the next iterations if you can provide this multiple-regions high-availability for your k/v store.

This is awesome. Would be even better if the cpu time could be increased. Right now it’s 5ms for smaller accounts, 50ms for business accounts ( ~$200/mo ), or custom for large customers.

What kinds of things are you wanting to do with more CPU time?

Database and third party api access were the use cases that initially came to mind.

Do you mean total runtime or actual CPU time? Total runtime would be affected by the use cases you mention; CPU time would not (or would not be affected nearly as much, at least). A function waiting on an HTTP response is not burning CPU.

Interesting, I thought they were measuring runtime when they said cpu time. AWS Lambda is priced on runtime so I figured it would work the same. Thanks for pointing this out.

I re-read the article and it is indeed cpu time not runtime for Cloudflare Workers. Sweet!

The part of the article that clarifies this:

> This is not meant to be a referendum on AWS billing, but it’s worth a quick mention as the economics are interesting. Lambdas are billed based on how long they run for. That billing is rounded up to the nearest 100 milliseconds, meaning people are overpaying for an average of 50 milliseconds every execution. Worse, they bill you for the entire time the Lambda is running, even if it’s just waiting for an external request to complete. As external requests can take hundreds or thousands of ms, you can end up paying ridiculous amounts at scale.

> Isolates have such a small memory footprint that we, at least, can afford to only bill you while your code is actually executing.

> In our case, due to the lower overhead, Workers end up being around 3x cheaper per CPU-cycle. A Worker offering 50 milliseconds of CPU is $0.50 per million requests, the equivalent Lambda is $1.84 per million. I believe lowering costs by 3x is a strong enough motivator that it alone will motivate companies to make the switch to Isolate-based providers.

Yep: 50ms CPU time is a lot. Network calls aren’t meaningful CPU time, which is why you can use Workers to call to third party APIs. You may want to be mindful of doing that during a user request as it now adds a dependency & latency, but it’s certainly possible to call out to a translation API or mapping API & cache the results...

> Network calls aren’t meaningful CPU time

I understand how async io works. I'm used to AWS Lambda charging for request start to end time regardless of cpu usage.

Yes, "The Network is the Computer" (SUN's motto) and it's best implementation I know is Plan9 one. Which means me (user) at the center with my control on my system.

Having anything on someone else computer it's not "The Network is the Computer" it's simply someone else computer. Also it can't scale, no matter how much you work on it, no matter what wonderful tech you create/integrate. Many now talk about edge computing because of that, and I fear it can succeed for a certain amount of time but still it can't scale.

We are people, not puppet, we need to be autonomous and social, not a herd of sheep with very few shepherds.

Containers were always a crap. Everyone is bragging about speed and security. That's all bullshit, it's like jails for BSD. There are tons of exploits, same goes to speed. There is NOTHING comprared to bare metal, have you ever did any benchmark on applications cause my team did and find disturbing results enough to write an article and be king for one day. Docker is made for rock star developers, not rational system engineers.

That is completely untrue. Containers (distinct from the Docker /daemon/) provide significant security gains.

Try breaking https://contained.af/

I know it's kinda old but think about how many exploits are out there in the jungle.


It’s obvious that additional abstraction layers will add overhead. It’s also assumed that they add liabilities, no matter what you choose. So I’m not sure how your point is really applicable to container tech only.

Containers are great for personal use only, not for enterprise industries. I myself have a bunch of containers running but beside simplicity of deployment I don't find them useful. That's what they made for, to actually save deployment time, not make things fast or secure.

This is cool. But why should I run stuff in a Cloud worker, when it can be run in the Browser's service worker !?

Different use-cases. You can do crypto operations in a hosted worker (sign/validate requests, auth clients), make caching decisions that don’t require you to consume (precious) client bandwidth, and make routing decisions based on both client properties (geo-location, auth state, etc) and server properties (backend uptime, latency, etc).

A good example is HTTP middleware in your web framework of choice: there are things it does that can’t be done safely or correctly on the client, but that you may want to do at the edge before it hits your backend entirely.

For all the reasons you use any kind of backend. To access authenticated resources, to persist data centrally, to share information between users, to hide your proprietary code...

I would love to check this out... locally in my own computer ...for evaluation

This is what we do at fly.io: https://github.com/superfly/fly

It's based on node.js right now. It's a bit clunky, but we have many success stories of customers running it locally, on their CI and deploying to infrastructure.

Currently the open source version is not the same as the production version (due to the distributed nature of our platform.) We're working on fixing that very soon and going all-in open source.

And now you have my attention.

What, BSD jails all over again?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact