This is not a CDN: Content Delivery Network. The value is in the networking bit. Storage all around the world for both resiliency, bandwidth cost, scalability, and low latency.
Having 1 server with some static file storage is called a web server.
As it is, the article is a bit confusing because it's the third one in the series just focusing on NetBSD. The original one[0] offers a better context, especially this bit:
> The idea is to create reverse proxies with local caching. These proxies would cache the content on the first request and serve it directly afterward. The proxies would be distributed across different regions, and the DNS would route requests to the nearest proxy based on the caller’s location. All this is achieved without relying on external CDNs, using self-managed tools instead.
Not to mention that Anycast and TCP can have some fun edge cases for users with chaotic routing and you also need to figure out how to do sticky L4 sessions / forwarding, which can be hell to debug.
(Or you could just ignore those users when they complain, PlayFab sure does.)
Adding Varnish makes it more than just a web server for static files, it also caches content created by the web server. This can be very useful for things like on-the-fly image resizing, which some blogs seem to do (requesting paths like /images/image.jpg/100x40/... based on the reader's device size).
Whether you do image resizing (or other CPU-bound tasks) ahead-of-time or just-in-time is an extremely important design decision, that needs to consider the needs of the application, available hardware resources, traffic patterns, etc - you shouldn't leave it to what is essentially random chance (maybe it's in the cache, maybe it isn't) without considering all of the above - unless you have a low-traffic blog that can get away with anything.
Yes I was burned really badly by this in my young days ;)
Varnish is one of those tools that has a very specific purpose (a highly configurable reverse caching proxy that is crazily fast). Most of the time I don't need it - but those places I have had to use it, it's made the difference between working services and failing services.
One example of where it made the difference was where we had two commercial systems, let's call them System A and System B. System A was acting as front end for System B, but System A was making so many API calls to System B it was grinding it to a halt. System B's responses would only change when System A made a call to a few specific APIs - so we put Varnish between System A and System B caching the common API responses. We also set it up so that when a request was made to the handful of APIs that would change the other API's for an account, we'd invalidate all the cache entries for that one specific account. Once System A was talking to the Varnish cache the performance of both Systems drastically improved.
- You don't really need to repeat built-in VCLs in default.vcl. In the article, you can omit `vcl_hit`, `vcl_miss`, `vcl_purge`, `vcl_synth`, `vcl_hash`, etc. If you want to modify the behavior of built-in VCL, e.g. adding extra logs in vcl_purge, then just have `std.log` line and don't `return` (it will fall through to the built-in VCL). You can read more about built-in VCL on Varnish Developer Portal[1] and Varnish Cache documentation[2].
- Related to the above built-in VCL comment: `vcl_recv` current lacks all the guards provided by Varnish default VCL, so it's recommended to skip the `return (hash)` line at the end, so the built-in VCL can handle invalid requests and skip caching if Cookie or Authorization header is present. You may also want to use vmod_cookie[3] to keep only cookies you care about.
- Since Varnish is sitting behind another reverse proxy, it makes more sense to enable PROXY protocol, so client IPs are passed to Varnish as part of Proxy Protocol rather than X-Forwarded-For (so `client.ip`, etc. works). This means using `-a /var/run/varnish.sock,user=nginx,group=varnish,mode=660,PROXY`, and configuring `proxy_protocol on;` in Nginx.
I’ve heard good things about varnish and believe I used it for a few things back in the day. Squid was also good when I used it in the kid 2000s (not sure where it’s today) and I think I heard that Akamai was originally just Squid on NetBSD or something like that!! Can anyone confirm or deny?
This article is part of a series, and the goal is to create content caching nodes on hosts scattered around the world. When a user connects, the DNS will return the closest active host to them. On a larger scale, it's not much different from what commercial CDNs do.
Always nice to see a project choosing NetBSD! It's pretty easy to manage with Ansible too, so we sometimes rotate it in on "this could be any *NIX" projects and services.
I stumbled on wwwoffle about a month ago. I don't really have a need, but it just seems incredibly cool, and bit like a hold out from a different time.
Many of these project from the late 1990s just seems so well design and build, solving very interesting problems, many of which we don't necessarily have anymore. I was also looking at uw-imap (which is no longer maintained) and the simplicity of just going "Your mail box will be in mbox and authentication is passwd" brings a bit of joy.
Varnish is not better in any shape or form than nginx for static content.
Varnish has one single usecase, php-sites. - For everything else it will just add a layer of complexity that give no gains. And since varnish is essentially built on apache there is some issues with how it handles connections above about 50k/sec - where it gets complicated to configure, something that nginx does not have.
Do you mean built "for" Apache? Because I think it written from scratch.
I wouldn't call it useless, but it's not exactly a CDN, it's missing the "Network" bit. This is just caching. You'd need something like this, but scaled out on multiple locations for it to be a CDN.
Also most of it isn't exactly NetBSD related, the same approach works on anything that runs Varnish and Nginx.
Besides those listed I think a plus would be to only have one server listening on priviliged ports (<1024), using the same/similar TLS configuration for both web and mail, etc. Basically having one service be the arbiter of your incoming traffic and its encryption.
Having 1 server with some static file storage is called a web server.
reply