CF Workers seem to slowly becoming a full-blown "edge" platform. A year or two ago you couldn't really use it for much except simple cloud functions and key-value storage, but now, with WS support, S3 storage and SQL database coming along I'll be taking a second look.
Not sure why this is a surprise to people, AWS started with similar humble origins. The goal of Cloudflare has been to become the fourth public cloud for a long time now, I think in several years they will be the most powerful platform to build on.
Not really a surprise, and the platform will very likely grow further, though I wouldn't compare CF to AWS in anything except the rise in popularity. I'd say they're "reinventing" cloud in a way. From my point of view, AWS is (mostly) IaaS, likes of Heroku - PaaS, whereas CF Workers and their "ecosystem" is primarly serverless, though - with these recent additions - they're expanding what this category of tools can do. It's a unique platform for sure.
I used to work at Render, and I said often "Cloudflare is the iceberg - only ten percent of it is above the water and it's going to tear a big chunk out of the [PaaS] boat".
Over time it seems more and more confirmable as a hypothesis.
True. I was actually looking at Render a while back to host my app. The Render Disks were the one feature I was most interested in, as no other PaaS provider I was aware of provided persistent disk storage. The prices were a bit too much for a side-project though, and with no managed MongoDB, I ultimately made the app not require the disk storage and moved to Railway.
Clean - yes. Working with CF Workers was (and I think still is) a joy. However, I'm curious if, with this many features that are mostly unique to CF, will there be a feeling of vendor lock-in when developing more complex apps? Serverless functions can be usually moved back and forth pretty easily, but if you become dependent on CF-specific features how hard will it be to migrate if there comes a need to?
Note that although there is unfortunately no complete standard that we could follow for `connect()`, the API is largely based on the JavaScript streams standard. `connect()` itself returns a ReadableStream and a WritableStream representing the two sides of the socket -- both standard APIs that are widely implemented by browsers and others. Hopefully, this means that it doesn't take much to adapt code to any other API that another vendor might provide.
That is exactly my thought. As long as the APIs remain clean: SQL for D1, simple KV-store for KV etc with clear semantics, it will be agnostic. Especially since caching sits at the core.
As someone who built quite a bit of tech / product on Workers / Pages over the last year and a half or so, this (and some other announcements from this week) really excites me and I wish our timing had been better.
Not being able to meaningfully use any external services that didn’t support an HTTP / fetch API was one of the biggest consistent pain points.
Arguably it was the one with the biggest negative architectural ramifications. Given how long (understandably) it has taken to move D1 forward in the ways that matter most (e.g. transaction support), this is a huge step towards production viability for a more diverse range of products.
When I left my company in April I had Cloudflare as a “glad I tried it, but not ready for production use / that was a mistake” - this week has it back on my list for evaluation on whatever I do next.
Congrats to the Cloudflare team! I admire your intuition for what customers need and your willingness to compete with yourself on stuff like this (actively support other DB providers while building D1 - respect).
Sane assessment - The transaction API for D1 will be so important as well. I've not been that excited for their approaches so far, but I also know of no other good alternative.
Something I quite like doing is a thread-local (or async-local) context transaction, and that seems quite hard to do if not impossible with both batching and stored procedures from what I've seen.
What I really wish for is to drop in any old query builder or ORM and use it identically to how I would with SQLite. I'm not sure if that's feasible, however.
So, a challenge here is that SQLite is designed for single-writer scenarios. One writer performing a transaction necessarily has to block any other writer from proceeding in the meantime. (There are some experimental approaches in the works to solve this, like "BEGIN CONCURRENT", but it's still limited compared to a typical multi-client database.)
This is all fine when the application is using SQLite as a local library since any particular transaction can finish up pretty quick and unlock the database for the next writer. But D1 allows queries to be submitted to the database from Workers located around the world. Any sort of multi-step transaction driven from the client is necessarily going to lock the database for at least one network round trip, maybe more if you are doing many rounds of queries. Since D1 clients could be located anywhere in the world, you could be looking at the database being write-locked for 10s or 100s of milliseconds. And if the client Worker disappears for some reason (machine failure, network connectivity, etc.), then presumably the database has to wait some number of seconds for a timeout, remaining locked in the meantime. Yikes!
So, the initial D1 API doesn't allow remote transactions, only query batches. But we know that's not good enough.
To actually enable transactions, we need to make sure the code is running next to the database, so that write locks aren't held for long. That's complicated but we're attacking it on a few different fronts.
The new D1 storage engine announced a couple weeks ago (which has been my main project lately) is actually a new storage engine for Durable Objects itself. When it's ready, this will mean that every Durable Object has a SQLite database attached, backed by actual local files. In a DO, since the database is local, there's no problem at all with transactions and they'll be allowed immediately when this feature is launched.
But DO is a lower-level primitive that requires some extra distributed systems thinking on the part of the developer. For people who don't want to think about it, D1 needs to offer something that "just works". The good news is that the Workers architecture makes it pretty easy for us to automatically move code around, so in principle we should be able to make a Worker run close to its D1 database if it needs to perform transactions against it. (We launched a similar feature recently, Smart Placement, which will auto-detect when a Worker makes lots of round trips to a single back-end, and moves the Worker to run close to it.)
Sorry it's not all there yet, but we're working on it...
I really appreciate you taking the time to provide this context!
Intuitively I had a rough idea around why this was both a) a blocking issue ("why can't we just YOLO-try some version of it anyways?" came up at least a couple times on our side internally during alpha evaluation) and b) a hard issue, but I didn't know the details re: SQLite being single-writer specific (at least for the time being).
Valuable for my own knowledge, in addition to being useful re: understanding the steps involved for Cloudflare to enable this in the future.
I was already excited about Smart Placement regardless, but now doubly so knowing it is adjacent to what will enable a more generic solution for D1. We used DOs very heavily on the project I mentioned in my previous post, but as you call out, they are more complex to reason about, and in practice it limited who could work on them effectively on our team.
I always appreciate your comments in Worker / DO-related threads here on HN and have found them very insightful / helpful in learning more about what's under the hood! Thanks for taking the time to continue posting here - I know there's lots to do elsewhere.
This was a far more comprehensive answer than I expected to my skepticism. If nothing else, I'm very excited to see how it pans out in practice and what the ergonomics are like.
Makes me curious how they track abuse. Since it's distributed pretty widely, it seems like it could make a pretty good free portscanner, command-and-control network, DDOS amplifier, etc, with a group of free-level accounts.
That's concerning. Could you elaborate on how you identified the traffic as cloudflare workers? Also, what sorts of HTTP attacks? wp-admin probes? Plain DDoS?
Cloudflare has (had?) a murky history with not taking down DDoS for hire services ironically hosted behind cloudflare. But while you could argue they had an incentive to do that (sell protection), I can't think of any incentive to let Workers be abused.
> Could you elaborate on how you identified the traffic as cloudflare workers?
Trivial based on the fact that HTTP requests coming from CloudFlare Workers has a cf-worker header. Also, any traffic coming from cloudflare-owned IP blocks clearly belongs to cloudflare and can be safely blocked.
On the second point, with the introduction of Cloudflare WARP VPN, that's not quite true. Additionally, I believe Safari Private Relay may end up looking like it originates from CF as well.
Concerning the use of this for regular DB connections.. isn't there a bit of a conflict between edge computing, which runs close to the user, and using a regular centralized DB, which resides in a fixed region?
That's really interesting. Between this and TCP sockets you could now design really interesting distributed, stateful services on Cloudflare Workers.
I'd love to design a database for this environment (if you're reading this at Cloudflare, you can hire me to work on this.) I think something that distinguishes between write and read requests, moves writes close to the leader server hosting the data being written, handles reads at the edge, and replicates the deterministic request itself would perform the best and give sequential consistency. This is the approach taken by fauna.com, and it's competitive with Spanner but without the need for GPS and atomic clocks to provide an accurate time source.
I wish I could put a bet on cloudflare probably already thinking about acquiring fly.io as they've been working on stuff like this and bumping against all the challenging edges.
What's the source IP for these sockets? Is it consistent across requests?
Opening a raw socket from a worker combined with a basic HTTP implementation could let you create a dynamic proxy that uses Cloudflare's worker IP range as the source address. That sounds like a fun^Winteresting way of getting around rate limits.
Related, does waiting for I/O count as "cpu time"? A proxy request might 10s of ms in total, but most of that will be waiting for packets to flow back and forth.
>What's the source IP for these sockets? Is it consistent across requests?
Just did some brief testing. For me, the source IP wasn't consistent, but it was from an IP range belonging to Cloudflare. Notably, however, it wasn't from one of the IP ranges listed at https://www.cloudflare.com/ips/ unlike requests made from a Worker via `fetch()`. So, if you initiate two requests from the same worker -- one with `connect()`, and the other with `fetch()` -- then the first request uses a source IP belonging to Cloudflare but not from a range listed on their IP range page, while the second uses a source IP from a range listed on their IP range page.
I suspect the reason for the different behavior is to do with the `Cf-Worker` header that `fetch()` adds, which enables applications to differentiate requests made by a Worker from requests made by Cloudflare itself. Raw TCP sockets can't add headers, so they need to differentiate themselves another way.
I really like Cloudflare's offerings, they are top notch in areas like security, performance and pricing.
What I'm not fond of is company's fixation only with big clients and leaving out a serious effort to bring on the average solo programmers/entrepeneurs (something on which companies like Stripe instead thrived on).
Cloudflare really needs to do more to target small fishes, they are tomorrow mid and big fishes trying to understand how to make small and medium problems trivial leveraging their platforms.
My experience is the opposite. It's quite easy and cheap to start using CF, starting with free plans for their proxy. We've added other CF services and pay them now, I think, about $25/month. Very accessible.
I needed logging to track down a few issues with my website, but logging is apparently a feature for Enterprise only, and requires a recurring four-figure cost. Thus, I switched over to Cloudfront, which lacks in some security features and is insanely expensive past 1 TB, but at least provides features without having to pay a huge amount upfront.
I've got a use case that is now almost entirely covered by these workers but the JS-only API is somewhat painful from my perspective. If CF offered support for running x64 linux .NET6+ binaries on these edge workers, I'd probably block off the next 3-4 weekends to play around with the stack.
I realize this is probably untenable considering certain compromises made in the CF infra (i.e. V8 optimizations), but one can dream. For now, Azure appears to be my prison.
> If CF offered support for running x64 linux .NET6+ binaries on these edge workers, I'd probably block off the next 3-4 weekends to play around with the stack.
The trouble with "containers on the edge" is that if we just literally put your container in 300+ locations it's going to be quite expensive.
Cloudflare Workers today actually runs your Worker in 300+ locations, and manages to be cost effective at that because it's based on isolates rather than containers.
We'll probably offer some sort of containers eventually, but it probably won't be oriented around trying to run your container in every location. Instead I'm imagining containers would come into play specifically for running batch jobs or back-end infrastructure that's OK to concentrate in fewer locations.
What does the roadmap look like around establishing some sort of multi-tier architecture within the CF product stack?
I imagine I could hack something together today by combining CF workers and another hyperscaler to run my .NET workload (TCP connection definitely helps with this!), but I think that there would still be a lot of friction with operations, networking, etc at scale. Ideally, workers and backend would be automagically latency-optimized and scaled relative to each other.
We don't have this in any public example yet, but here's our simple trick with Workers on how to bypass needing to pay for Amazon API Gateway or CloudFront but still get routed to the nearest AWS location:
1. Add add Lambda Function URLs as records in a latency based record on Route 53. (Lambda Function URLs do not support custom domains, so you cannot use this record directly.)
2. Have the Worker do a fetch to `https://cloudflare-dns.com/dns-query` on the Route 53 CNAME to discover what lowest latency Lambda Function URL hostname is.
3. The Worker can then fetch the Lambda Function URL using the discovered hostname.
What do you think about running it via Wasm? .NET runtime can be compiled to WebAssembly. It creates slow-start, but pre-initialized snapshot of WebAssembly VM, like AWS SnapStart should make it low again.
I haven't used Workers, but from the article it seems that they were limited to the standard APIs available across browsers and JS runtimes like Express and Deno, so you could use the fetch API or AJAX for things like HTTP, but there was no standard for raw TCP sockets they could use.
Static file serving, HTTP requests, eventually-consistent Key-Val store, and a funky sort of way to able to edit/view consistent data from a single location, but only if you pay minimum $5 month (anyone out there like Durable Objects?).
I'm currently building a real-time multiplayer editor on durable objects and I love them. I wish other cloud providers would ruthlessly copy them so I don't have that vendor lock-in feeling.
They usually stay alive while they're receiving requests and die after a period of inactivity - though they can also die unexpectedly for other reasons.
This necessitates some more careful design - clients must assume the server can go away at any time and therefore work in a local-first fashion. I was interested in doing that anyway, so it didn't feel like an imposition.
I think the core of what I really like about DO's is that it's a single thread with a globally unique and accessible address.
Another annoyance is how little it gels with the rest of Pages. I'd expect that if I'm developing an application from the ground up in Pages, incorporating the various Cloudflare value-ads should come relatively naturally. Instead I need to have entirely separate projects just for these components.
Agreed on Pages, though I'm not using that personally.
I forgot to mention about pricing. We've taken the approach of estimating a "worst case" time per object and requests/second rate. E.g. we assume a worst case of 30 req/s for the lifetime of the project, based on 3 users simultaneously sending updates at a rate of 10/sec each. And maybe we assume 1-2 hours of cumulative time spent in the editor per object. Multiply that out to get a value of GB-seconds per object, multiply by the respective prices, and add to get the total "expected lifetime cost per thing".
(We're not actually using durable objects' built-in storage, so this would indeed be complicated by trying to price that in. We only wanted DOs for coordination.)
If it's hard to estimate the total lifetime of a thing, you could estimate against something like how many hours can/will each user spend in your app per month, and come up with a monthly benchmark price. We haven't done that as the objects we deal with have naturally limited lifespans, and our pricing is tied to the number of these objects created anyway.
Tcp is a last resort rather than an important feature. Most tcp connection are built stateful. They expect long running rather than short-style connections like http. It'd be very expensive running serverless tcp connections e.g. to a database.
But stale-while-revalidate does not work. And there is no request collapsing. And you still have to write code to get CORS to work in workers. And you can't cache the response from workers.
Looking forward to when we are going to be able to listen to incoming tcp connections also. It will become a fully blown platform. The developer experience with wrangler is still off though
Right? They pretend to have a dev server with live reload, but it's about the worst implementation of such I've seen. Protip is to wrap it within some watchers and cleaners of your own:
while true; do pkill -f miniflare-dist; npx wrangler pages dev public; done;
Lest ye end up with a million rogue processes spinning down your CPU after each file saved with a syntax error causes the entire dev launcher to crash and leave its spawn everywhere...
Is there an large need for this? MailChannels works if you want free outgoing mail. (We use AWS SES for our Workers since the lack of webhook events with MailChannels for delivery events is kinda a problem.)
When building an email service, a few subrequests per mail send from a Worker acting as an SMTP server will be cheaper than the extra $ per millon mails for the API at MailGun/MailChannels/SES.
I don’t understand how $0 per outgoing email with MailChannels will cost more than $0 for a million emails.
For incoming emails, Cloudflare Email Workers are $0 per email and SendGrid is also $0 per email. We only pay for the cost of the Worker with this setup.
MailChannels CEO here. Would you pay something to get access to webhooks for delivery notifications and click-tracking? We have it; it's just not exposed.
1. Would pay for webhook delivery notifications as long as it isn’t more expensive than AWS SES.
2. Would not pay for click tracking. Might use it if it’s free.
3. We probably won’t use MailChannels again yet since we want to use AMP emails and ticket #221628 was handled pretty poorly. (We made our own MailChannels-like API for AWS SES after the frustrating experience.)
I don't understand something: "Cloudflare workers" are, at the end of the day, computers, right? Probably virtual ones (either VMs or containers). So, why is the ability of creating TCP sockets a feature? Why their "cloudflare workers" had that handicap to begin with?
They are neither VMs or containers, but V8 Isolates. They lacked an API for making outgoing raw TCP connections in the same way that JavaScript running in a browser can’t.
I might be wrong, but they're more like a browser’s "service worker" than a Node app.[0] The fact that they have to manually add "compatibility" with Node modules like `utils`[1] seems to support this.
That's exactly right, and they even use the same API[1] for accepting and replying to HTTP requests . As well as supporting other service worker APIs like caches.
Thats cool, do they let you fetch the current datetime multiple times now? Last time I remember you could get the current instant but it would never change for the whole request due to “security reasons”
Well, it's javascript at the edge. Porting it to somebody else's "javascript at the edge" apis for sockets, kv store, etc, wouldn't be that hard. The lock-in doesn't seem that strong to me. And fully open source DIY global edge seems hard outside of maybe running a botnet :)
I wouldn’t underestimate that. All CDN vendors are substantially different. Lambda @edge is broken into requests and responses, and don’t even get me started on Akamai.
Nice to see Workers steadily reinventing CGI-served PHP. Maybe in another few years they'll reinvent mod_php, too, and then we can have persistent connections!
Regardless of English communication ability, we should hope programmers understand the difference between "until" and "while". (VB, bash, certainly others.)