Hacker News new | past | comments | ask | show | jobs | submit login
Eliminating cold starts with Cloudflare Workers (cloudflare.com)
115 points by kiyanwang 7 days ago | hide | past | favorite | 47 comments

I've been very happy with the CF workers/kv store setup. For some use cases, it's almost perfect. A URL shortener is a good example. You get a very scalable setup for dirt cheap, and very little code to write.

I'm curious how much farther they might take the platform. Like, for example, is a distributed relational data store going to happen at some point? Or is that not a space they want to go?

I'm curious how much farther they might take the platform.

Cloudflare co-founder Michelle Zatlyn likes to say "We're just getting started".

So basically, start the worker as soon as you see a domain in SNI. It's really interesting to see layer models and their abstractions - criticial to the development of the internet - being removed/merged/coupled in order to eek out performance.

In my experience breaking layering leads to interesting optimizations. Another Cloudflare feature is using JPEG start-of-scan boundaries to guide HTTP/2 frame multiplexing:


It's a total protocol layering violation, but makes images appear on screen much faster.

This doesn't seem to be quite that simple. If I have 10 workers (or 100 or 1000) for the same domain then it has to warm them all. Which maybe it does but that seems inefficient.

For now only the route that uses domain.com.

And given that Cloudflare is a big proponent of encrypted SNI, I wonder how this will play into it :)

Encrypted SNI hides the SNI from an eavesdropper and not from Cloudflare if you are connecting to a Cloudflare-managed zone.

I love the discussion about what's the future abstraction for serverless, and love where cloudflare is headed. Can wait to see what's next. A couple questions.

Any plans to support web-sockets terminated in the isolate(not just proxied)?

Any plans for a local development? Some open source locally run-able or low volume self host-able stack that mirrors the global stack would I think really help dev workflow and adoption.

edit: just poking around their blog saw this. https://blog.cloudflare.com/making-magic-reimagining-develop...

Have you updated Wrangler lately?

It used to be super slow to upload the worker to Google Cloud but it's much faster now (that's what happens behind the scenes when you do wrangler dev).

I haven't actually used wrangler yet, but will give it try :)

It doesn't actually do local dev, as it uploads your Worker somewhere to run, but the workflow is quite painless.

Thats kindof competitively fascinating. One of the biggest differences/advantages of fastly's lucet[1] based system was its extremely fast startup time. But if you can hide that startup time in the connection setup then ... why learn rust?

[1] https://www.fastly.com/blog/lucet-performance-and-lifecycle

From Cloudflare's perspective we don't care about the language you choose. We're happy for you to use Rust or C or C++ or JavaScript or Python or ... https://blog.cloudflare.com/cloudflare-workers-announces-bro...

But yeah, this completely eliminates any question about cold starts or warm starts.

I cannot find this in the documentation (I'm sure it's there somewhere) so let me ask here. How long does a function stay warm? As in, a client makes a request that invokes a function, the function processes the request and responds to the client. Now what? Does the same "instance" of a function stay alive to handle another request for a certain period of time or is it immediately released?

I'm asking because sometimes it's necessary to do some expensive work in order to respond to the request and doing the same work on every invocation is not necessary. For example on AWS Lambda, one of our functions launches a Chromium instance which can take a few seconds, but because the function can stay alive after a request is done, the same Chromium instance is immediately ready on the next function invocation. Other use cases involve e.g. connecting a database (which I see CF is tinkering with over at [0]).

[0] https://github.com/cloudflare/db-connect

Isolates stay warm until the memory they consume is needed for something else. We evict the least-recently-used isolate when we need memory for a new one. So how long that is varies depending on how much traffic the isolate gets compared to its neighbors.

This is covered in more detail in my talk: https://www.infoq.com/presentations/cloudflare-v8/

Excuse me if this is a silly question. Some containers take a while to initialize, do you have a way to automatically hibernate their memory state? When will you support Julia?

We don't use containers, we use V8 isolates. V8 is limited to executing JavaScript and WebAssembly, though many other languages can be compiled/transpiled to one of those two. But, we don't run native Linux binaries.

EDIT: It looks like Julia has some Wasm support. I haven't tried it though. https://github.com/Keno/julia-wasm

We have a multi-tenant app and want to use something like this for all user-customizable code (workflow steps that we eventually want to allow users to write, that we treat as completely untrusted).

CF workers possibly wouldn't work cuz we run on AWS and would have to pay for bandwidth to and from CF. (but don't know these figures, we're very early in research).

What is people's actual experience on OpenFaas vs lambda vs other self hosted options vs things like CF Workers?

Lambda@Edge on CloudFront is similar to this, although I don't believe they have that good of performance. I suspect that this would be enough incentive for them to catch up though.

I am able to work within Lambda@Edge's constraints (us-east-1, minimal code size, no layers, low max execution time) reasonably well, but it would be nice for them to start removing some of those limitations as well as adding a speedup (although I have no real complaints about speed).

We've built a few multitenant products, and made friends with some other companies who've done the same.

The tradeoff between "using an external FaaS system" and "just embedding v8" is pretty important. If your user customizable code can be reduced to basically one HTTP call, the options you mentioned will probably work fine.

When we did this though, we found that it was really irritating to have coarse grained for customer interactions and we were better off just embedding our own runtime and building a really nice JS based API for people to use.

So if I were you, I might actually look at whether embedding Deno (https://deno.land/manual/embedding_deno) and building your own runtime API is valuable.

We went: Lambda hooks -> v8 runtime -> Firecracker VMs for our particular use case.

This is really interesting, thanks!

Does anyone have any insights/experience with cold starts and db-connect[1]?

I haven't connected a worker to a datastore (sql or kv), so data and start-times might be orthogonal to each other, but was curious what people's experiences are.

1: https://github.com/cloudflare/db-connect

As much as i like Cloudflare workers, they are limited to 128 mb of memory. It works very well, but you need to keep that in mind. https://developers.cloudflare.com/workers/archive/writing-wo...

We'll offer the option to use more memory in the future.

Important to note that this only applies to "root" workers

> For now, this is only available for Workers that are deployed to a “root” hostname like “example.com” and not specific paths like “example.com/path/to/something.” We plan to introduce more optimizations in the future that can preload specific paths.

It's not clear what exactly they mean by cold start. I just ran a simple test with a worker returning hardcoded html without external requests (on a free plan):

1. First run: time_starttransfer: 0.507267s

2. Second run: time_starttransfer: 0.035244s

So it looks like at least some parts of the cold start taking much longer and aren't eliminated.

Free tier is subject to lower quality-of-service in a number of areas. If you upgrade to a paid plan ($5/mo) you should see much better results.

We probably should have included a discussion of this in the blog post... sorry about that.

It could be due to cache warming?

RTT to a local edge is about 100ms. So lets be generous and give it 150ms for TLS handshake. This still doesnt explain the other 500ms-1500ms startup time a serverless function needs to start from cold?

how did they eliminate those? are they keeping all functions warm 24/7?

From the article:

It’s impractical to keep everyone’s functions warm in memory all the time. Instead, serverless providers only warm up a function after the first request is received. Then, after a period of inactivity, the function becomes cold again and the cycle continues.

For Workers, this has never been much of a problem. In contrast to containers that can spend full seconds spinning up a new containerized process for each function, the isolate technology behind Workers allows it to warm up a function in under 5 milliseconds.

So, we've always had very low cold start times, and now we've made them disappear "inside" the TLS handshake.

ok.. so theres a new technology that does this.

Yes. More here: https://blog.cloudflare.com/cloud-computing-without-containe...

Or deep dive here: https://www.infoq.com/presentations/cloudflare-v8/

(Disclosure: That's me giving that talk.)

Can't this mean that an invalid HTTPS request, which the handshake would reject, can reach a given worker?

No, it just means that the Worker is cold-started and ready to execute. If the TLS connection breaks in some way then worst case we've loaded something into memory but not executed it.

Anyone else read this as

  Eliminating cold
  starts with
  cloudflare workers


I did and had to do a double-take. Was wondering if this was about the environment footprint of cloud-fare servers.

I misread, that cloudflare would champion the fight against the common cold. I was disappointed.

I misread, that this was a new weather tech to dispel cold fronts with people going up into the clouds with flame throwers. I was disappointed too.

"Eliminating cold, starts with Cloudflare Workers". Fixed it.

It's yet another attempt to create dependencies. Good luck to those who drink the Flavor Aid when you're locked in and you can't get out.

Preloading functions is a good strategy, but claims of "eliminating cold starts" and "zero overhead" are a bit hyperbolic.

I don't know enough about cloudflare workers, but function cold start latencies are not fully in control of the platform. You can hide the latency of scheduling and loading, but the function might load a ton of dependencies; and it might have its own custom init code that you generally have little control over. So you can hide an SSL handshake's worth of latency, and that's a nice little win, but whether that makes the user-visible cold start latency zero depends on what it was in the first place.

But I'm guessing cloudflare workers are used for relatively simple stuff, so they don't often have dependencies or complex init code.

(And sorry, I don't mean to sound too negative, this stuff is pretty cool!)

From their FAQ about dependencies in this case for js:

Can I use npm with Workers?

Workers has no explicit support for npm, but you can use any build tool or package manager you need to create your Worker script. Just upload to us the final, built, script with all dependencies included.

Right, you package up your dependencies. That doesn't mean you don't have any!

Anyway, what I'm getting at is, the platform has to load a bunch of code, sometimes that will take longer than the latency that you can hide behind an ssl handshake; plus the code might run something on init.

But yes, workers seem to be oriented towards relatively small bits of code, so maybe the vast majority of load times are a few ms.

Correct me if I'm wrong, but isn't part of the design of QUIC about removing one network roundtrip from the encryption handshake? That gap is going to get a little smaller.

It's fastcgi. I can't be the only one seeing this, right?

The cover may be better than the original song, but this isn't "All along the Watchtower" or "Feeling Good" as far as I can see.

Are you sure about that? It depends on how Cloudflare defines what a cold start is. It might well include the initial loading of your code, with imports and init.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact