
Eliminating cold starts with Cloudflare Workers - kiyanwang
https://blog.cloudflare.com/eliminating-cold-starts-with-cloudflare-workers/
======
tyingq
I've been very happy with the CF workers/kv store setup. For some use cases,
it's almost perfect. A URL shortener is a good example. You get a very
scalable setup for dirt cheap, and very little code to write.

I'm curious how much farther they might take the platform. Like, for example,
is a distributed relational data store going to happen at some point? Or is
that not a space they want to go?

~~~
jgrahamc
_I 'm curious how much farther they might take the platform._

Cloudflare co-founder Michelle Zatlyn likes to say "We're just getting
started".

------
advisedwang
So basically, start the worker as soon as you see a domain in SNI. It's really
interesting to see layer models and their abstractions - criticial to the
development of the internet - being removed/merged/coupled in order to eek out
performance.

~~~
treis
This doesn't seem to be quite that simple. If I have 10 workers (or 100 or
1000) for the same domain then it has to warm them all. Which maybe it does
but that seems inefficient.

~~~
hugoromano
For now only the route that uses domain.com.

------
ledgerdev
I love the discussion about what's the future abstraction for serverless, and
love where cloudflare is headed. Can wait to see what's next. A couple
questions.

Any plans to support web-sockets terminated in the isolate(not just proxied)?

Any plans for a local development? Some open source locally run-able or low
volume self host-able stack that mirrors the global stack would I think really
help dev workflow and adoption.

edit: just poking around their blog saw this.
[https://blog.cloudflare.com/making-magic-reimagining-
develop...](https://blog.cloudflare.com/making-magic-reimagining-developer-
experiences-for-the-world-of-serverless/)

~~~
pier25
Have you updated Wrangler lately?

It used to be super slow to upload the worker to Google Cloud but it's much
faster now (that's what happens behind the scenes when you do _wrangler dev_
).

~~~
ledgerdev
I haven't actually used wrangler yet, but will give it try :)

~~~
pier25
It doesn't actually do local dev, as it uploads your Worker somewhere to run,
but the workflow is quite painless.

------
cagenut
Thats kindof competitively fascinating. One of the biggest
differences/advantages of fastly's lucet[1] based system was its extremely
fast startup time. But if you can hide that startup time in the connection
setup then ... why learn rust?

[1] [https://www.fastly.com/blog/lucet-performance-and-
lifecycle](https://www.fastly.com/blog/lucet-performance-and-lifecycle)

~~~
jgrahamc
From Cloudflare's perspective we don't care about the language you choose.
We're happy for you to use Rust or C or C++ or JavaScript or Python or ...
[https://blog.cloudflare.com/cloudflare-workers-announces-
bro...](https://blog.cloudflare.com/cloudflare-workers-announces-broad-
language-support/)

But yeah, this completely eliminates any question about cold starts or warm
starts.

------
jiripospisil
I cannot find this in the documentation (I'm sure it's there somewhere) so let
me ask here. How long does a function stay warm? As in, a client makes a
request that invokes a function, the function processes the request and
responds to the client. Now what? Does the same "instance" of a function stay
alive to handle another request for a certain period of time or is it
immediately released?

I'm asking because sometimes it's necessary to do some expensive work in order
to respond to the request and doing the same work on every invocation is not
necessary. For example on AWS Lambda, one of our functions launches a Chromium
instance which can take a few seconds, but because the function can stay alive
after a request is done, the same Chromium instance is immediately ready on
the next function invocation. Other use cases involve e.g. connecting a
database (which I see CF is tinkering with over at [0]).

[0] [https://github.com/cloudflare/db-
connect](https://github.com/cloudflare/db-connect)

~~~
kentonv
Isolates stay warm until the memory they consume is needed for something else.
We evict the least-recently-used isolate when we need memory for a new one. So
how long that is varies depending on how much traffic the isolate gets
compared to its neighbors.

This is covered in more detail in my talk:
[https://www.infoq.com/presentations/cloudflare-v8/](https://www.infoq.com/presentations/cloudflare-v8/)

~~~
clarkevans
Excuse me if this is a silly question. Some containers take a while to
initialize, do you have a way to automatically hibernate their memory state?
When will you support Julia?

~~~
kentonv
We don't use containers, we use V8 isolates. V8 is limited to executing
JavaScript and WebAssembly, though many other languages can be
compiled/transpiled to one of those two. But, we don't run native Linux
binaries.

EDIT: It looks like Julia has some Wasm support. I haven't tried it though.
[https://github.com/Keno/julia-wasm](https://github.com/Keno/julia-wasm)

------
atonse
We have a multi-tenant app and want to use something like this for all user-
customizable code (workflow steps that we eventually want to allow users to
write, that we treat as completely untrusted).

CF workers possibly wouldn't work cuz we run on AWS and would have to pay for
bandwidth to and from CF. (but don't know these figures, we're very early in
research).

What is people's actual experience on OpenFaas vs lambda vs other self hosted
options vs things like CF Workers?

~~~
mrkurt
We've built a few multitenant products, and made friends with some other
companies who've done the same.

The tradeoff between "using an external FaaS system" and "just embedding v8"
is pretty important. If your user customizable code can be reduced to
basically one HTTP call, the options you mentioned will probably work fine.

When we did this though, we found that it was really irritating to have coarse
grained for customer interactions and we were better off just embedding our
own runtime and building a really nice JS based API for people to use.

So if I were you, I might actually look at whether embedding Deno
([https://deno.land/manual/embedding_deno](https://deno.land/manual/embedding_deno))
and building your own runtime API is valuable.

We went: Lambda hooks -> v8 runtime -> Firecracker VMs for our particular use
case.

~~~
atonse
This is really interesting, thanks!

------
tomhallett
Does anyone have any insights/experience with cold starts and db-connect[1]?

I haven't connected a worker to a datastore (sql or kv), so data and start-
times might be orthogonal to each other, but was curious what people's
experiences are.

1: [https://github.com/cloudflare/db-
connect](https://github.com/cloudflare/db-connect)

------
samdung
As much as i like Cloudflare workers, they are limited to 128 mb of memory. It
works very well, but you need to keep that in mind.
[https://developers.cloudflare.com/workers/archive/writing-
wo...](https://developers.cloudflare.com/workers/archive/writing-
workers/resource-limits)

~~~
eastdakota
We'll offer the option to use more memory in the future.

------
conroy
Important to note that this only applies to "root" workers

> For now, this is only available for Workers that are deployed to a “root”
> hostname like “example.com” and not specific paths like
> “example.com/path/to/something.” We plan to introduce more optimizations in
> the future that can preload specific paths.

------
oddx
It's not clear what exactly they mean by cold start. I just ran a simple test
with a worker returning hardcoded html without external requests (on a free
plan):

1\. First run: time_starttransfer: 0.507267s

2\. Second run: time_starttransfer: 0.035244s

So it looks like at least some parts of the cold start taking much longer and
aren't eliminated.

~~~
kentonv
Free tier is subject to lower quality-of-service in a number of areas. If you
upgrade to a paid plan ($5/mo) you should see much better results.

We probably should have included a discussion of this in the blog post...
sorry about that.

------
grezql
RTT to a local edge is about 100ms. So lets be generous and give it 150ms for
TLS handshake. This still doesnt explain the other 500ms-1500ms startup time a
serverless function needs to start from cold?

how did they eliminate those? are they keeping all functions warm 24/7?

~~~
jgrahamc
From the article:

 _It’s impractical to keep everyone’s functions warm in memory all the time.
Instead, serverless providers only warm up a function after the first request
is received. Then, after a period of inactivity, the function becomes cold
again and the cycle continues._

 _For Workers, this has never been much of a problem. In contrast to
containers that can spend full seconds spinning up a new containerized process
for each function, the isolate technology behind Workers allows it to warm up
a function in under 5 milliseconds._

So, we've always had very low cold start times, and now we've made them
disappear "inside" the TLS handshake.

~~~
grezql
ok.. so theres a new technology that does this.

~~~
kentonv
Yes. More here: [https://blog.cloudflare.com/cloud-computing-without-
containe...](https://blog.cloudflare.com/cloud-computing-without-containers/)

Or deep dive here:
[https://www.infoq.com/presentations/cloudflare-v8/](https://www.infoq.com/presentations/cloudflare-v8/)

(Disclosure: That's me giving that talk.)

------
vemv
Can't this mean that an invalid HTTPS request, which the handshake would
reject, can reach a given worker?

~~~
jgrahamc
No, it just means that the Worker is cold-started and ready to execute. If the
TLS connection breaks in some way then worst case we've loaded something into
memory but not executed it.

------
kbutler
Anyone else read this as

    
    
      Eliminating cold
      starts with
      cloudflare workers
    
    ?

~~~
cakebrewery
I did and had to do a double-take. Was wondering if this was about the
environment footprint of cloud-fare servers.

------
fifticon
I misread, that cloudflare would champion the fight against the common cold. I
was disappointed.

~~~
david_draco
I misread, that this was a new weather tech to dispel cold fronts with people
going up into the clouds with flame throwers. I was disappointed too.

------
johnklos
It's yet another attempt to create dependencies. Good luck to those who drink
the Flavor Aid when you're locked in and you can't get out.

------
soamv
Preloading functions is a good strategy, but claims of "eliminating cold
starts" and "zero overhead" are a bit hyperbolic.

I don't know enough about cloudflare workers, but function cold start
latencies are not fully in control of the platform. You can hide the latency
of scheduling and loading, but the function might load a ton of dependencies;
and it might have its own custom init code that you generally have little
control over. So you can hide an SSL handshake's worth of latency, and that's
a nice little win, but whether that makes the user-visible cold start latency
zero depends on what it was in the first place.

But I'm guessing cloudflare workers are used for relatively simple stuff, so
they don't often have dependencies or complex init code.

(And sorry, I don't mean to sound too negative, this stuff is pretty cool!)

~~~
sopromo
From their FAQ about dependencies in this case for js:

Can I use npm with Workers?

Workers has no explicit support for npm, but you can use any build tool or
package manager you need to create your Worker script. Just upload to us the
final, built, script with all dependencies included.

~~~
soamv
Right, you package up your dependencies. That doesn't mean you don't have any!

Anyway, what I'm getting at is, the platform has to load a bunch of code,
sometimes that will take longer than the latency that you can hide behind an
ssl handshake; plus the code might run something on init.

But yes, workers seem to be oriented towards relatively small bits of code, so
maybe the vast majority of load times are a few ms.

~~~
hinkley
Correct me if I'm wrong, but isn't part of the design of QUIC about removing
one network roundtrip from the encryption handshake? That gap is going to get
a little smaller.

