Hacker News new | past | comments | ask | show | jobs | submit login
Make Your Own CDN with NetBSD (dragas.net)
70 points by jaypatelani 12 days ago | hide | past | favorite | 33 comments





This is not a CDN: Content Delivery Network. The value is in the networking bit. Storage all around the world for both resiliency, bandwidth cost, scalability, and low latency.

Having 1 server with some static file storage is called a web server.


As it is, the article is a bit confusing because it's the third one in the series just focusing on NetBSD. The original one[0] offers a better context, especially this bit:

> The idea is to create reverse proxies with local caching. These proxies would cache the content on the first request and serve it directly afterward. The proxies would be distributed across different regions, and the DNS would route requests to the nearest proxy based on the caller’s location. All this is achieved without relying on external CDNs, using self-managed tools instead.

[0] https://it-notes.dragas.net/2024/08/26/building-a-self-hoste...


> DNS would route requests to the nearest proxy based on the caller’s location

Anycast is the proper way to do this. No geolocation required, just dynamic routing protocols optimizing for shortest path.


There's no such thing as "anycast". For IPv4 there is only BGP. And BGP is not viable for a rented server and IP address, unlike running a DNS server.

Sure, but Anycast requires specific investments, ASN, IP blocks, etc. This solution can be implemented using cheap VPSes in a few minutes.

Not to mention that Anycast and TCP can have some fun edge cases for users with chaotic routing and you also need to figure out how to do sticky L4 sessions / forwarding, which can be hell to debug.

(Or you could just ignore those users when they complain, PlayFab sure does.)


True - thank you for pointing out. I've modified the article, I think it's clearer now.

Adding Varnish makes it more than just a web server for static files, it also caches content created by the web server. This can be very useful for things like on-the-fly image resizing, which some blogs seem to do (requesting paths like /images/image.jpg/100x40/... based on the reader's device size).

Still not really a CDN, though.


Whether you do image resizing (or other CPU-bound tasks) ahead-of-time or just-in-time is an extremely important design decision, that needs to consider the needs of the application, available hardware resources, traffic patterns, etc - you shouldn't leave it to what is essentially random chance (maybe it's in the cache, maybe it isn't) without considering all of the above - unless you have a low-traffic blog that can get away with anything.

Yes I was burned really badly by this in my young days ;)


Varnish is one of those tools that has a very specific purpose (a highly configurable reverse caching proxy that is crazily fast). Most of the time I don't need it - but those places I have had to use it, it's made the difference between working services and failing services.

One example of where it made the difference was where we had two commercial systems, let's call them System A and System B. System A was acting as front end for System B, but System A was making so many API calls to System B it was grinding it to a halt. System B's responses would only change when System A made a call to a few specific APIs - so we put Varnish between System A and System B caching the common API responses. We also set it up so that when a request was made to the handful of APIs that would change the other API's for an account, we'd invalidate all the cache entries for that one specific account. Once System A was talking to the Varnish cache the performance of both Systems drastically improved.


Some comments:

- You don't really need to repeat built-in VCLs in default.vcl. In the article, you can omit `vcl_hit`, `vcl_miss`, `vcl_purge`, `vcl_synth`, `vcl_hash`, etc. If you want to modify the behavior of built-in VCL, e.g. adding extra logs in vcl_purge, then just have `std.log` line and don't `return` (it will fall through to the built-in VCL). You can read more about built-in VCL on Varnish Developer Portal[1] and Varnish Cache documentation[2].

- Related to the above built-in VCL comment: `vcl_recv` current lacks all the guards provided by Varnish default VCL, so it's recommended to skip the `return (hash)` line at the end, so the built-in VCL can handle invalid requests and skip caching if Cookie or Authorization header is present. You may also want to use vmod_cookie[3] to keep only cookies you care about.

- Since Varnish is sitting behind another reverse proxy, it makes more sense to enable PROXY protocol, so client IPs are passed to Varnish as part of Proxy Protocol rather than X-Forwarded-For (so `client.ip`, etc. works). This means using `-a /var/run/varnish.sock,user=nginx,group=varnish,mode=660,PROXY`, and configuring `proxy_protocol on;` in Nginx.

[1]: https://www.varnish-software.com/developers/tutorials/varnis...

[2]: https://varnish-cache.org/docs/7.4/users-guide/vcl-built-in-...

[3]: https://varnish-cache.org/docs/trunk/reference/vmod_cookie.h...


I’ve heard good things about varnish and believe I used it for a few things back in the day. Squid was also good when I used it in the kid 2000s (not sure where it’s today) and I think I heard that Akamai was originally just Squid on NetBSD or something like that!! Can anyone confirm or deny?

Fastly is basically a varnish-based CDN. They even let you provide VCL code to program it.

They forked Varnish (around 2.0, judging by the VCL syntax) quite early and I suspect there isn't much left of the original codebase.

I'm very impressed how they managed to retrofit multitenancy.


Last I heard they were moving away from Varnish (but still supporting VCL for customers that were using it)

And log delivery using smtp (attaching files) from the nodes at the border to the central servers

The first article in the series offers a better explanation of what and why:

https://it-notes.dragas.net/2024/08/26/building-a-self-hoste...


This article is part of a series, and the goal is to create content caching nodes on hosts scattered around the world. When a user connects, the DNS will return the closest active host to them. On a larger scale, it's not much different from what commercial CDNs do.

Always nice to see a project choosing NetBSD! It's pretty easy to manage with Ansible too, so we sometimes rotate it in on "this could be any *NIX" projects and services.


I stumbled on wwwoffle about a month ago. I don't really have a need, but it just seems incredibly cool, and bit like a hold out from a different time.

Many of these project from the late 1990s just seems so well design and build, solving very interesting problems, many of which we don't necessarily have anymore. I was also looking at uw-imap (which is no longer maintained) and the simplicity of just going "Your mail box will be in mbox and authentication is passwd" brings a bit of joy.


NetBSD is just leet. FTW.

What is the point of this? Isn’t a cdn’s primary purpose to cache content close to the client?

Electric hybrid

Useless

Varnish is not better in any shape or form than nginx for static content. Varnish has one single usecase, php-sites. - For everything else it will just add a layer of complexity that give no gains. And since varnish is essentially built on apache there is some issues with how it handles connections above about 50k/sec - where it gets complicated to configure, something that nginx does not have.


> varnish is essentially built on apache

Do you mean built "for" Apache? Because I think it written from scratch.

I wouldn't call it useless, but it's not exactly a CDN, it's missing the "Network" bit. This is just caching. You'd need something like this, but scaled out on multiple locations for it to be a CDN.

Also most of it isn't exactly NetBSD related, the same approach works on anything that runs Varnish and Nginx.


It was built in 2005 by Poul-Henning Kamp. It can work with Apache. But definitely not the same codebase.

How wrong can one comment be?

You might want to read this post on the founding of Varnish https://info.varnish-software.com/blog/history-varnish-cache...

And Fastly would certainly disagree that it’s only useful for PHP as they built a whole company based on Varnish


There are quite a few CDNs based on Varnish out there. Most of them are private, though, so you don't really see them.

You have a valid point regarding nginx for static content and its simplicity.

The rest of what you wrote is either wrong, oversimplified, or inaccurate.


Thing I've never understood with nginx is why would I want my web server to also be my mail server?

I think most nginx deployments only do web, but the article on mail functionality does outline some of the reasons: https://docs.nginx.com/nginx/admin-guide/mail-proxy/mail-pro...

Besides those listed I think a plus would be to only have one server listening on priviliged ports (<1024), using the same/similar TLS configuration for both web and mail, etc. Basically having one service be the arbiter of your incoming traffic and its encryption.

Some people also throw dns via DoH/DoT in: https://www.f5.com/company/blog/nginx/using-nginx-as-dot-doh...


I suppose the technology inside nginx provides similar benefits as mail proxy as it would for a web proxy.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: