
Introducing Workers KV - jgrahamc
https://blog.cloudflare.com/introducing-workers-kv/
======
noelwelsh
Holy return of tuple spaces! Are there any features for controlling
consistency, or is it just YOLO last write wins?

~~~
kloepper
Last write wins. There are several features for controlling consistency that
we have been prototyping. Based on how storage is used we will enable those as
the product matures.

~~~
zzzcpan
So why not strong eventual consistency and CRDTs? As they fit perfectly for
such functions running on edge nodes and no silly limitations of 1 write per
second necessary.

~~~
kentonv
Indeed. This is just stage 1 of our storage plans. We have some really cool
stuff in the works but we wanted to get basic KV functionality out there for
people to start using ASAP.

~~~
_pmf_
OT: we need to read more about your website-in-kv-thingy setup.

~~~
kentonv
It was pretty trivial. I just stored each page into a value and served them
out of there, with some stupid code to infer content-type from file extension.
I meant it as a dumb demo because I wasn't feeling creative. I didn't know
John would mention it in his blog. :)

Currently this isn't a great way to serve content because the values are
limited to 64k, you have to infer content-type (which makes your URLs ugly),
etc. So I don't recommend it. We'll come up with something better in the
future. :)

------
OJFord
The feature aside, this is a really great blog post: learning a bit about the
history of the feature, even if only tangentially related (Babbage and the
Analytical Engine) in the intro is great.

~~~
jgrahamc
Thank you; that's kind.

------
chrisweekly
This is really cool. Webapps are basically programs that are executed across
the network; having fine-grained control over what happens at each layer
(client, client service worker, edge nodes, proxy server, origin server) means
more choices and complexity, but also so much more power.

~~~
zackbloom
The question I would ask is: once you have the ability to run code and store
data on the network itself (which Cloudflare effectively is), why do you need
an origin at all?

~~~
manigandham
You work for Cloudflare so that's a fun question but realistically there are
still plenty of limitations. The script size, number of routes, KV
consistency, write limits, and simple get/put API, etc.

It can work for simple apps but there's a long way to go before any serious
enterprise system will be hosted entirely on a FaaS system like Workers.

~~~
rita3ko
If you’re having trouble with script size and route limitations, would love to
chat. We’re working on providing more flexibility in that realm. rita at
cloudflare.

------
zzzcpan
A bit offtopic: could you guys move the "please enable javascript"/captcha
page to a separate, but cloudflare owned domain, so at least people who don't
want to enable javascript for random websites behind cloudflare could enable
it for your domain and pass all your browser checks?

------
sudhirj
Great job - was trying to something similar by setting up three etcd raft
nodes on each continent, but this is like 150+ nodes. Mind sharing of you’re
using good clocks, paxos or raft to coordinate?

Would also be great if CF could do websocket / sse fanout at the edges - if I
have a couple million websockets connected on ws.cf.com/key1 and I update the
value, would be great if the new value could be broadcast. That would help
with a lot of media / real-time / sports games, apps and websites.

~~~
zackbloom
Can you talk more about your specific use case?

~~~
sudhirj
The simplest example is liveblogging - someone is sitting at the next iPhone
launch, and sending out photos and commentary that's being updated as they
send on a large number of readers' screens worldwide. The Verge has a good
system that they do with polling S3 for new data, but there's really much more
effective ways to do it.

The more specific cases include opening ticket sales for movies in India - for
some big movies tickets are sold out really quickly, so there'll be hundreds
or thousands of people with browsers / phones open on a seat layout and they
all need to see which seats are being sold in real-time.

Then there's broadcasting stock market movements and tips to thousands of
traders in real-time.

The only existing solutions that seem really scalable are fanout.io - others
like pusher.com and pubnub.com have pricing that's not conducive to serving a
large number of simultaneous users / broadcasting. Fanout.io gets this right,
but would be nice to have cheaper alternatives / make it a commodity.

Given that the CF nodes do support websockets, I assume they terminate and re-
create new connections to origins as well - so the nodes are capable of
holding a large number of websockets open, right? Can Workers intercept
websockets / stream data into them? And given that the KV store is capable of
propagating new values to all the nodes, there could be a way expose incoming
change notifications on keys, which would handle this use case. Could also
implement as continuous addition of new timestamps keys, prefixed with a
topic.

~~~
zackbloom
Fundamentally what I think we want to do is build the primitives you would
need to create your own fanout. That means we need the ability to push events
to Worker nodes, and for Workers to be able to terminate WebSockets for you.

If you can accept up to 10s of latency, I agree that KV could solve the
'pushing events' problem for you. (A tuple store is semantically equivalent to
message passing, and it will get easier when we support range queries).

~~~
sudhirj
Yes, both points make sense. And I assume the 10s latency is also
geographically related - i.e. it's worst case 10s for the longest distance on
the network topology graph? That should be fine.

Low latency requirements also tend to be geographical, I think - stock markets
are mostly specific to a single country, examples like ticketing are usually
specific to a single city / edge. For general news being broadcast globally,
that kind of latency is fine.

------
flaviocopes
Pretty awesome to see Charles Babbage credited in the introduction.

~~~
sylvinus
If you're interested in his work, you might want to check out another project
of JGC: [http://plan28.org/](http://plan28.org/)

~~~
jgrahamc
Best link is probably the blog:
[http://blog.plan28.org/](http://blog.plan28.org/)

~~~
teabee89
Any reason it doesn't serve HTTPS?

~~~
jgrahamc
Turns out I never enabled HTTPS in Blogger. Just clicked the button...
[https://blog.plan28.org/](https://blog.plan28.org/)

------
mkj
Good work Kenton Varda getting his initials into the product name!

~~~
kentonv
[https://twitter.com/jgrahamc/status/1041998948733472768](https://twitter.com/jgrahamc/status/1041998948733472768)

> There's no truth in the rumour that a KV store is a @KentonVarda store. Nor
> that it maps kentons to vardas.

------
skunkworker
With this and the changes to their worker caching API it's now completely
possible to write a custom pull origin CDN service that you have complete
control over.

------
skunkworker
I'm curious but I haven't seen anything about cache expiration on keys. Eg
Expire all keys 'key-1' after 30 minutes or expire at a globally recognized
time frame. Does anyone know if this is possible?

~~~
zackbloom
We haven't exposed it yet, but it's implemented internally. Follow our blog
and we'll update you when it's released.

~~~
skunkworker
Awesome, I'll be sure to keep up on that.

------
ryanworl
Are you planning on releasing any of the architectural details behind this
service?

If you’re truly replicating to every PoP that’s quite a fan-out and I can see
why you’re limited to 1 write per second per key!

~~~
jgrahamc
Yes, we will. This is something we built internally and we'll talk about
architecture at some point.

~~~
Serow225
Thank you! Lots of super cool stuff coming out of CF these days, the pace is
hard to keep up! Not that it's a bad thing necessarily :)

------
manigandham
I wonder how the Fastly crew feels about all this. Great team over there, and
they were originally the super configurable CDN, but it seems that is now
completely beaten by Workers.

------
tracker1
This is very cool, but how do you limit an end user to only being able to read
a given set of values. It seems to me that this scheme puts all credentials
for access in the browser, and they'll be able to just run roughshod all over
your data without constraints in place.

Can you create data spaces/keys for ones that are read-write per-user and
others that are read-only for a given user?

~~~
zackbloom
You write the Worker code which runs on our infrastructure, not in a browser.
You have complete control over who can access which piece of data, as it's
your code doing the accessing.

~~~
tracker1
Thanks, I misunderstood that part... very cool indeed. What are the overhead
limits for workers?

Aside, it seems like this could be _really_ interesting combined to make an
entire API with static delivery from S3 or Azure Blobs, with a backend on a
cloud hosted database. With a lot of flexibility in between.

~~~
zackbloom
The ultimate goal is to let you do all of that within our network!

When you say overhead limit, are you talking about latency? Our goal is to
keep reads on the order of 5ms in the 90th percentile.

~~~
tracker1
I mean compute/memory for the workers... just a rough comparison of what that
might look like. Will probably sign up for the early access and play around
with it soon.

~~~
zackbloom
We limit you to 50ms of CPU and 128 MB of memory now, but we're working on
ways of raising those limits.

------
webjames
With the shopping cart example, I'm not sure I understand how session identity
can be preserved if I clear my local storage/cookies?

~~~
zackbloom
It's moving that shopping cart data from being stored at a single origin, to
being stored in the network all around the world. The advantage of that in
that use-case is you can render your site just as quickly as if it was a
static website, but it can contain the customer's personal shopping cart data.

~~~
adrianmonk
The shopping cart is still stored somewhere, but when the user has cleared
their cookies, etc., what information do you have that will let you it's their
shopping cart so you can find it again?

There's a (K,V) pair somewhere, but in order to get V, you need K, which
you've lost.

~~~
zackbloom
It would work just as most shopping carts do I imagine. You could store a
session id in a cookie, and then use the data in KV to map that session id to
a user account (if they're logged in) and their cart. So you would have a
namespace full of sessions, and a namespace full of carts.

You could also store carts by session id until they're logged in, and then
store the carts by account id when they are.

~~~
adrianmonk
> _You could store a session id in a cookie_

But the person you responded to asked what happens after "I clear my local
storage/cookies". So you don't have access to the session id. It was in a
cookie that is now gone.

------
mbell
Are there any plans to provide browser detection in the worker API? I imagine
it's possible to implement as a user now, but may be clunky to provide a
dataset to work from. This seems like a killer feature to have on edge workers
to allow serving optimized builds for based on browser feature support without
paying the overhead of client side detection.

~~~
zackbloom
If you can find a good Javascript library for doing it you can build it into
your Worker with Webpack.

~~~
mbell
It's not finding a parser that I find problematic, it's the dataset needed. My
understanding is that the workers are limited to 128MB of ram and that they
may be 'restarted' at any time. Datasets for accurate browser detection would
be really tight in fitting into that amount of RAM when combined with whatever
else you may be doing and you have to deal with load / parse time whenever a
worker is restarted. This seems like a common enough need that cloudflare
could provide it as an API and their implementation wouldn't have such issues.

~~~
kloepper
Using storage along with worker code to perform the function should make this
possible.

Our hope is that anyone could create such an API, not just Cloudflare. Just
like we are building storage on top of workers, we hope to eliminate the
distinction between what is possible for a Cloudflare employee and what is
possible for any of our customers.

------
Serow225
@jgrahamc could you link to the technical post near the top, and maybe copy in
that table 'Limits and Pricing' on this post? Because it would be really
helpful to have that info in the general post, otherwise it's not clear what
the details are (like write rate). Great posts BTW!

------
sctb
Shit, the Cap'n Proto website is like 10ms faster than the HN frontpage now.

------
tyingq
Is there an API method that can list existing/matching keys in a namespace
with a glob or prefix or similar?

Like, "show me all keys that match xyz∗ in this namespace"

~~~
zackbloom
We're planning on adding support for range requests (which becomes equivalent
to prefixes a la CouchDB). Is there a reason you need glob specifically?

~~~
tracker1
well, for one example off the top of my head would be using geohashes for
proximity in records... if you could glob every geohash with the same first
6-8 characters, you could do range queries pretty quickly.

I've considered similar for shard keys in terms of proximity related
information.

------
gaius
Dupe
[https://news.ycombinator.com/item?id=18092660](https://news.ycombinator.com/item?id=18092660)

~~~
chatmasta
Those are different blog posts. Also, the OP of both is CTO of Cloudflare.

------
sahin-boydas
@jgrahamc what a great service. It might be great to have cloudflare credits
like aws/google/azure credits for CL Workers for startups.

~~~
jgrahamc
Take a look at the pricing.

$5/month for

    
    
        - 1 GB of KV storage and up to 10 million KV reads
        - 10 million requests
    

After that it's $0.50/month

    
    
        - per million requests
        - per GB of storage
        - per million KV reads
    

I think that's well within startup money. I mean, if a company has to give out
credits perhaps it's just too expensive :-)

~~~
Operyl
Heck, I think that's well within hobbyist budgets. Thanks for pricing it out
nicely :).

------
_pmf_
Tuplespaces?

