I think describing this as caching hides the real benefits. My startup pushes 150 Mbps on average so I care about this stuff.
Rather than being caching, this introduces a proxy server that is geographically close to your visitors that then communicates with your server using an efficient compression protocol. So, much of the network path that would transfer a full payload is replaced by a compressed connection that does not have to build up and tear down a TCP connection with a three way handshake. The most important benefit I see here for site visitors is reducing latency by removing the connection setup. They will spend a few ms setting up the connection with a server a few hundred miles from them instead of on the other side of the planet. That server then serves a cached page or uses an established connection to only get the changes, which could mean as little as a single send and single receive packet.
Another benefit of this is that your local web server will be talking to a local cloudflare client which means there is practically zero latency from your perspective for each request. This means that each of your app server instances spends less time waiting for it's client to send or receive data and more time serving app requests. It's why people put Nginx in front of Apache.
I think the most important cost benefit here is reducing your bandwidth consumption. We're constantly negotiating our colo deal based on 95th percentile and getting your throughput from 1 Gbps down to 50 Mbps which I think this may do will drastically reduce your real hosting costs. Of course Cloudflare need to maintain their servers and will be serving 1Gbps to your customers but those cloudflare servers will be geographically closer to your customers. However because data centers bill based on your throughput at the switch and not how far your customers are away from you, I don't see that there are any cost savings they (cloudflare) can pass on to you. They're going to be billed what you were being billed for bandwidth, but they'll mark it up. I suppose you could argue there are economies of scale they benefit from, but that doesn't seem like a compelling argument for reduced costs.
I doubt it would go unnoticed, though, so I expect someone will be interested in persuading you to get a Business account with us which is $200/month because at that point we give you an SLA.
I'd spring for the $200/mo service if you are pushing real traffic just to get to try the railgun. $200 is less than epsilon for a large site.
Also, the cupcakes are probably a lie, or at least are vegan.
During that time there was a total of one Cloudflare related outage and that was resolved within about 15 minutes by their changing the data-center the site(s) were routed through. I can tell you that one of the greatest benefits you will see with Cloudflare as it stands currently is that your bandwidth utilization is going to go down substantially. Before switching to Cloudflare we were pushing a good deal more than 50Mbps. Essentially if you were to switch I have to imagine your side of the bandwidth utilization is going to drop to somewhere around 75-90Mbps if not more.
That said, understand what you're getting into. This is a 'cloud' service and they require you to switch your DNS records to their service. All things considered, running a multi-million dollar business through them has been much smoother than anticipated... this new feature we will be looking at very carefully as well because about half or more of our content cannot be cached.
The whole 'freeing you up to serve more requests' thing is not accurate: your app servers run as fast as they can and your frontend proxies deal with handing the data to the client, so your app servers are (or should be) always doing as many requests as they can. If anything, the reduced latency and caching will allow more connections than usual to come in, putting more potential load on your app servers. Catch-22 =)
"This means that each of your app server instances spends less time waiting for it's client to send or receive data and more time serving app requests. It's why people put Nginx in front of Apache."
Sounds silly to me. Putting a proxy in front of a proxy doesn't change the tcp/ip stack. If you tuned your network stack and Apache properly it should be able to handle anything you throw at it. I don't remember what the setting was, but modern versions of Apache should be able to only send a request to the app server once the client has finished its request to the frontend.
Here are some helpful links - CF customer testimonies:
A fully server-side solution will probably bypass most problem cases, but might solve some, so I hope CloudFlare looks into contributing to a distributed standard solution.
Google's SDCH (mentioned in CloudFlare's post) is an alternative that isn't tied to a single URL, but involves prefetching some data that may or may not be needed, so it's a bit hacky.
An interesting approach would be to hook into a template engine to generate a page skeleton with dictionary references. That would allow reordering the small, newly generated varying snippets before the large common ones; both could be preemptively pushed using SPDY to avoid round-trips.
Probably going to get downvoted, but http://www.reactiongifs.com/wp-content/gallery/yes/shaq_YES....
Taking the CNN front page for example. If you set a TTL of 60 seconds, and you have 14 edge locations (taken from the CloudFlare about page), you've got to satisfy 1440 * 14 = 20160 requests a day. The CNN page is currently 97527 bytes, which gzips down to 20,346 bytes. That's 391 megabytes per day. Serving the edge locations even with this relatively short TTL is trivial.
Now, the TCP connection means the content can be pushed and not pulled, so latency is better. It also means that caching a lot of pages will become cheaper (though still expensive).
But it doesn't seem like a lot of benefit for essentially replacing HTTP (between the content servers and the edge nodes), for all the proprietary software and vendor lockin that entails. For each byte you're sending to edge nodes, it's going to be served up orders of magnitude more from the edge nodes to end users, which seems like where almost all the cost would lay.
I'm sure they know what they're doing, and that their customers have asked for this, but think some real world case studies would help make it click. That the blog post is going on about 'caching the uncacheable' really doesn't help.
The big benefit is not in terms of bandwidth saved (for us) it's in terms of total time to get the page. That's partly driven by latency and partly by bandwidth. Because we have worldwide data centers we can see high latency from say the data center in Miami and a web server located in Sydney. Railgun helps with that problem.
Also, CNN has a TTL of 60s but many, many web sites have a TTL of 0 because they want no caching at all (see New York Times web site) or because the page is totally personalized.
I think I might be slow, but I'm not sure where the wins are here and how you overcome the personalized rendering problem? I'd love for you to talk a bit about it.
1. You install "Railgun" on your publishing web server
2. It pushes deltas of your webpage to CloudFlare
3. Who then update their cache of your webpage across their CDN.
It seems as though folk are starting to see that in most environments caching ought to be driven by POSTs, not GETs + timeouts.
Each end of the Railgun link keeps track of the last version of a web page that's been requested. When a new request comes in for a page that Railgun has already seen, only the changes are sent across the link. The listener component make an HTTP request to the real, origin web server for the uncacheable page, makes a comparison with the stored version and sends across the differences. The sender then reconstructs the page from its cache and the difference sent by the other side.
I take this to mean that there are two chained proxies, each proxying pages on a per-user basis. Since the upstream server-side proxy knows what the downstream client-side proxy has cached, it can send a very efficient shorthand describing how the page has changed without having to resend the information that's already been sent.
I think it's a smart good approach. So long as the origin-proxy is inside the datacenter, it clearly would save a lot on data charges. But I'm surprised the speedup is as much as JGC reports since you still have to pass the full page over the last-mile to user. I would have thought that was the slowest link. Is the core internet so congested that this is not the case? I'm presuming the data center and the origin server have very good throughput, and that even a very short message would have the same latency.
The issue with the end user number is deciding on what to report. We are currently rolling out a very large system for monitoring timing throughout our network and will be able (later this year) to report on actual end user timings to see how much Railgun makes a difference. Our goal (as usual) is to improve the end user experience because that makes our customers (publishers) happy. Railgun is one small part of improving overall web performance.
OK. Still a useful measure, but a less dramatic one.
The issue with the end user number is deciding on what to report.
I would think that time to show a slightly changed page after a "refresh" or "reload" would be appropriate. What are the other choices?
For pages that are frequently accessed the deltas are often so small that they fit inside a single TCP packet, and because the connection between the two parts of Railgun is kept active problems with TCP connection time and slow start are eliminated.
Just reread this part, and not sure I understand it. Yes, there are no extra packets between the proxies, but in the base case there is only a single proxy and hence no extra connection time to consider. I'd think even a very fast proxy would introduce more latency than a hop on a backbone router. Or are you indeed pushing the per-user delta to the data center in anticipation of the request?
Well, what we really want to measure is the overall effect so that we can see how Railgun improves things in general. The 'refresh' time is interesting, but we're also interested in the network scale (how does user X downloading Y improve the speed for user Z downloading the same (but slightly different) page Y).
"but in the base case there is only a single proxy and hence no extra connection time to consider"
That isn't really the base case. The base case is that we need to go get the resource with a normal HTTP connection direct to the server.
HTTP supports that too (through ETag and 204 Not Modified).
With a different hosting partner (and a different set of sites) we saw a page download time speedup of between 2.94x and 8.12x.
They're not really caching the uncacheable, either.
Leaving a persistent connection between the cache and the server obviously does most of it (especially with latency mitigation tricks/tcp acceleration). It would be interesting to calculate the benefits of nothing vs. a cache with an accelerated persistent tcp connection vs. deltas. I suspect it's something like 500 vs. 50 vs. 45, but every bit helps.
Cloudflare is like a bunch of people suddenly realized what every CDN in the world with a dynamic acceleration product does, but then they blog about as if it's all magic and unicorns.
Every DSA provider maintains persistent connections to origin nodes. Every DSA provider runs a custom multiplexing protocol between the first and last mile POPs on their network. Nothing here is new.
The only moderately interesting thing about this is that they're sending X bytes instead of Y bytes once every 60 seconds. Meh.
If you have servers with a very low throughput outbound connection to the cache, then reducing the data transfered in this manner could be worth it. But 100kb over the wire at todays throughput is not going to add all that much latency to the whole transaction. As you suggested, some figures would be nice though.
(This all from TCP acceleration, not deltas, though. Deltas might give you 1ms on a 1G link if it saves you sending 100KB. BFD. Deltas to the edge, where you might be constrained by bandwidth on a mediocre 3G connection, is where deltas would rock - coupled with SPDY and tcp acceleration and caching and we'd be living in 2015.)
Passing deltas to a client over a mobile data network would be an awesome development, I agree. With some additional specification to HTTP and vendor implementation on clients, it'd definitely be possible.
It would possibly be fair to use something closer to 155Mbps in a lot of places (the most constrained part of the link; we're not even talking about congestion/packet loss, which would exponentially favor this technique, and which does happen on SP transit links sometimes). At that point, 4MB could actually matter:
155Mbps = 20 MB/sec. 4MB takes 200ms to transfer; 100KB takes 5ms. If you assume a single packet for the delta instead, I'd be happy to save 5-200ms.
So you're actually saving these 5-200ms vs. non-cloudflare delivery. It might still be 305-505ms total page load time, or easily greater (I used GEO sat for a while, feeding cell; it was hell), but if it were 305 instead of 505 I'd be quite happy actually.
(I'm assuming a 4MB page with 100KB which actually changes between loads. A 4MB all dynamic page, frequently reloaded, which can't be cached but where a tiny delta is possible would be pretty pathological.)
With CDNs you can also get benefits from prefetch/prefill, either opportunistically (due to multiple users hitting the same thing, or through actual scheduled fill). Works for many media assets but not for dynamically generated pages, which is what Railgun is supposed to address.
If it'd be from upstream server (or cache) to end clients, that'd be an awesome saving.
The thing that has stopped me from trying it is that you have to move your whole DNS to them though, and they have had downtime for DNS as well as their hosting.
I'm comfortable with serving www.example.com through them and dealing with the occasional downtime. But I'm not so comfortable with downtime for MX records - I really want email@example.com to be super reliable.
Because Cloudflare acts as a proxy, it gets in your way in subtle but devastating ways. First, the SSL support wasn't stable and I turned it off a few days into launch. That probably killed some traffic.
Then, I realized that caching was an all-or-nothing proposition. You can have Cloudflare cache your assets, but it doesn't seem to respect expires headers. You can set a cache time in the web interface, but the minimum is 2 hours! So you end up creating a rule to have it cache no assets, at which point you might as well use AWS Cloudfront for a CDN.
I advise you not to use Cloudflare. It's a great idea, but the execution just isn't there.
Why do you expire assets at all? If it's because you are deploying new versions of the app, would it not be better to solve this by using revision-stamped URLs? When you deploy a new version of your app, just change the stamp and let the old assets expire.
Invalidation is a challenge with any kind of caching system. Invalidating passively by changing the cache key means that the cache has to keep more old cruft around until it expires naturally, but it's easier to implement and works around the race condition problem.
I actually haven't experimented with the SSL support yet. (we've been working on some better ways to handle SSL ourselves, which might be interesting for something like Cloudflare). I probably would pick SSL over Cloudflare if forced to decide between the two, but the $200+/mo plans support real SSL certs you provide, including EV, so I'd look at that.
I generally go with longer expires headers, and have dev vs. production sites. I don't use cloudflare for any dev sites. I believe you can manually expire everything if some catastrophe happens; you could also do that as part of the update process.
In my experience (e.g. a site with 50.000-100.000 pageviews/day) it's mostly the requests hitting the db and not server throughput that are the bottleneck.
So for some people this could solve the wrong problem...
So yeah, basically almost all of them have it. Most have had it for years. CDN and various other business models (POP's, etc.) have been converging for a while now, so this has become pretty standard for CDN's to provide.
I wonder why these aren't available to normal users.
That being said, I'd have used another name ;à
You can still break up your page into pieces and cache all but the once that change. Believe this is one of the reasons you would use ESI: http://en.wikipedia.org/wiki/Edge_Side_Includes