Hacker News new | comments | show | ask | jobs | submit login
Building Your Own CDN for Fun and Profit (pasztor.at)
276 points by janoszen 35 days ago | hide | past | web | favorite | 64 comments

Fast nameservers are not as important as author suggests. But either way one extra indirection for nameservers would allow you to choose nameserver records dynamically too. And with large enough TTLs and some traffic, clients won't have to go all the way to find out the closest nameserver, essentially providing clients with the fastest one from the cache. Since redundancy is built in into DNS any nameserver with large TTL going down won't be a problem. And unlike with anycast this is much more reliable and much cheaper, since you don't have to rely on one AS and network infrastructure as a single point of failure and you don't have to even build one either. You can use as many different hosting providers as needed.

(I'm the author.) This whole setup is built for a comparatively low traffic blog, so DNS caching won't help much. (On normal days I get ~100 visitors.) This is compounded by the TTL which is 60s to account for node failures.

The optimization level is in the sub 1 second range, so not having to pay one large RTT penalty for a DNS lookup is quite important. I've measured 300+ms RTT to Australia on the previous box I was using, that impacted the load times quite severely.

That's a fun project. For production websites and blogs, I'm pretty happy with Netlify, CloudFlare, and CloudFront. But CloudFront charges $600 per month for custom SSL certificates [1], so you could save a lot of money by just spinning up ~10 servers in different AWS regions.

I noticed this line at the bottom of the page: "When it comes to picking a solution, I often choose the less traveled road". I don't agree with that at all, and it sounds a bit like NIH syndrome. It's always better to choose the most-traveled roads, especially in DevOps. If there's a problem, then you can join those communities and contribute to the projects.

[1] https://aws.amazon.com/cloudfront/custom-ssl-domains/

"When it comes to picking a solution, I often choose the less traveled road"

I forgot to add that this applies only to R&D and hobby projects, for production setups I'm a bit more careful. :)

(I'm the author.)

Ah, that makes sense!

Thank you for pointing that out, I've updated my bio to reflect that. Hopefully this way it's a little less ambiguous. :)

> But CloudFront charges $600 per month for custom SSL certificates

This is misleading. Cloudfront doesn't charge anything for putting your domains on an SSL cert that uses SNI. They only change you if you need a cert without SNI, which requires them to allocate a dedicated IP address to you.

I'm hosting my personal blog on S3 and cloudfront, with SSL, for less than a dollar a month.

Performance and capabilities are fine for me, too. I get 0.15 seconds to first byte from Chicago, vs 0.24 for the author's site.



If you are fine with having slashes at the end of your URLs and you do not want to do anything too complicated like content negotiation for image types, S3 and CloudFront is fine. The moment you turn on Lambda@Edge, to do the magic, things get slow after a period of no traffic.

I plan on expanding on the featureset, so no S3 for me. :)

Did you consider using periodic calls to keep the Lambda@Edge functions "warm"? I've been playing with Zappa (https://www.zappa.io) for standard Lambda and it sets this up by default.

Yes, but it's kind of a whack-a-mole since their reuse times are not public AFAIK, so it would constantly need tuning as they develop the service.

I think SNI is fine, all modern browsers seem to support it: https://caniuse.com/#search=sni

If you can get by with only SNI connections then you don’t have to pay the $600 per month. The $600 is for a dedicated IP that will serve a single certificate.

'It's always better to choose the most-traveled roads, especially in DevOps'

Maybe. I agree it's not as clearcut as always pick the less traveled road, but the difference between the two may include a competitive advantage that you'd be unwise to overlook. I mean, you're on HN; Paul Graham and Common Lisp back in the 90s is an excellent example.

That price is only for non-SNI browser and devices, which is probably less than 1% of devices/browsers now [1]. Otherwise Cloudfront supports free custom certificates and you can use Amazon's Certificate Manager to acquire and renew them automatically, also for free.


$600/month for a custom SSL certificate is totally ridiculous. Cloudflare Pro ($20/month) includes a certificate that works with non-SNI browsers.

The $600/mo figure has to be considered against the type of workload your servers run, and how variable your traffic patterns are.

If you need to handle bursty traffic, you're likely going to get best value in shared tenancy services until you can fully utilize your servers. Otherwise, you will probably end up paying for idle infrastructure.

It has nothing to do with workload, that's the price for a dedicated IP to serve a TLS certificate to serve the (rapidly diminishing number of) non-SNI capable browsers and devices.

I use https://www.keycdn.com which has free letsencrypt

"Second, BGP routes are not that stable."

This has been disproved for close to ten years empirically and academically[1]. Route flaps generally result in convergence to the exact same destination if it has another path and is still online. If it's offline, then it's working as intended, and that's no different from a server being rotated via a DNS pool going down.

1: Quick search: https://www.google.com/search?q=tcp+anycast+paper&ie=utf-8&o...

That's not our experience either. BGP is fine.

But it is the case that transit and peering connections are not stable (in the sense of going up and down randomly or suddenly experiencing high levels of packet loss) and active monitoring is a must.

Thank you to both of you, I've edited the article to clarify that point.

Is your experience that when routes reconverge, they still select the same end POP they did prior to the flap?

Assuming the end POP is still reachable along another regional route, I believe all the data I've seen shows that the client almost always hits the same destination they did before the flap.

What is POP?

A POP or edge location is a server (or multiple) that the user traffic is being routed to, hopefully close to the user. A CDN consists of multiple POPs, one in each region, with intelligent traffic routing added (as described in the article).

if you have a specialized application knowing how to do this can be quite useful. CDN pops are almost not existent across much of the Middle East and Africa. Sometimes building your own is the only way until a commercial offering becomes available.

Also, PoPs in some regions are often nearly useless even if they exist on paper.

For example, Cloudflare has a PoP in Seoul, but it has such limited bandwidth that most sites using Cloudflare are routed to Tokyo, Hong Kong, and even Los Angeles. Several of my clients in Korea signed up for Cloudflare a few years ago when the local PoP was still usable, but now all but two of them have canceled their subscriptions. Instead, I've been building a lot of caching proxies for them lately.

If anyone is here for the Winter Olympics right now and some of your favorite sites don't seem to be living up to Korea's reputation for ultra-fast internet, Cloudflare might be one reason. (Meanwhile, Amazon's PoP in Seoul is perfectly fine, albeit expensive.)

Cloudflare has a Vancouver PoP but Telus Vancouver doesn't use it, all traffic is routed to Seattle. (as an example)


I have always thought it would be a fun and inspiring project to deploy a global CDN ... my career and my lifelong hobby have both been "UNIX sysadmin" and I love running networks ...

However, I spoke about this to some ISP/NANOG folks that I trust and they said that running a real CDN is a nightmare because all of your links (providers) hate you ... you're producing the exact opposite of the traffic that they want and they will not give you any breaks or help or benefits since you are their worst customer.

How accurate was that assessment ?

It depends on the scale. Running a personal blog with sub-1MiB/s traffic is not a problem. I've seen some larger projects though where detailed data analysis had to be employed to debug bad connections... that's not a one-man-job and it was a serious headache to work around some of the less... neutral providers.

I have also heard the same. An interesting thing to do would be to also be a commercial ISP (i.e., sell to datacenters and businesses). Now that is the traffic all the ISPs want as outgoing traffic goes to such networks.

Running a Global CDN and ISP might be a tad too ambitious.

Not at all accurate, assuming the "real CDN" is well run.

We currently use KeyCDN which works out well, both performance & money wise. You may want to try it out.

Yeah. They are pretty good and very good value for money. We went from Cloudfront -> Edgecast -> KeyCDN and each change reduced our costs. Cloudfront can become really expensive since they charge for each HTTP request in addition to bandwidth.

I was using KeyCDN until I discovered bunnyCDN at $0.01/GB.

Same here. Cloudfront was ridiculous. KeyCDN works great for us.

Why KeyCDN ? Why not MaxCDN or Fastly etc?

MaxCDN has shitty performance, terrible monitoring. We have to tell them when their servers are overloaded due to our monitoring detecting regions with super high SSL negotiation times.

MaxCDN performance is not great. Fastly charges for requests in addition to bandwidth.

I don't understand how his use of Traefik gets round the SSL pain point?

> Using SSL/TLS certificates

> The next pain point is using SSL/TLS certificates. Actually, let’s call them what they are: x509 certificates. Each of your edge locations needs to have a valid certificate for your domain. The simple solution, of course, is to use LetsEncrypt to generate a different certificate for each, but you have to be careful. LE has a rate limit, which I ran into on one of my edge nodes. In fact, I had to take the London node down for the time being until the weekly limit expires.

> However, I am using Traefik as my proxy of choice, which supports using a distributed key-value store or even Apache Zookeeper as the backend for synchronization. While this requires a bit more engineering, it is probably a lot more stable in the long run.

Traefik can simply request certificates using the DNS verification method, as opposed to the certbot HTTP verification. (HTTP would not work with a distributed setup like this.) Alternatively, Traefik can also synchronize certificate requests using one of the many key-value stores supported (untested as of yet).

The drawback of the DNS method without synchronization between the nodes is that you run into the LetsEncrypt rate limit quite easily. My expansion to ap-southeast-1 and sa-east-1 is waiting for the LE cooldown.

Disclaimer: I'm the author of the article.

I guess they can use Traefik to distribute the certificates, so instead of having each node request their own set of certificates, he can instead request them once and distribute the certificate to all the other nodes, and keep themselves under the limit set by LetsEncrypt.

Author mentions why not use Cloudflare that CDN cache is purged often. If you want to verify if it happens for your content. You can try this tool - http://cloudperf.speedchecker.xyz/cloudflare-tester.html

Side effect of this tool as you might have guessed is that using it will actually prolong the time your content stays in their cache.

So one could setup an automated crawler thatbruns frequently to keep everything in cache?

Yes, but you would need a crawler that does so in every region, or at least know the IPs of the edge nodes on that CDN. You would probably also hit some rate limit / DDoS protection with the CDN itself.

https://github.com/apache/incubator-trafficcontrol is an open source cache control layer (working with ATS) that has features for header rewrites, ssl, and custom urls (among others). It is built for video but can be used to cache any content. Probably a bit heavy for your use case infrastructure wise though.

Interesting, although I specifically wanted to build a push CDN (where I can push the content) rather than a pull CDN (that works with an origin) to avoid the added latency with cache misses.

Makes sense, I am enjoying looking through the source as we are moving to an ansible and hopefully dockerized deployment model.

Of course it's dockerized, it has to be cool, right? :)

Ansible is running docker-compose up -d on deployment an Traefik is doing the magic. I want to extend it to host multiple sites in the future. (Btw. Ansible ran from a central location is painfully slow because of the large latency to the edge nodes.)

The content itself is deployed using rsync, Ansible was just too painfully slow for that.

> Second, BGP routes are not that stable. While DNS requests only require a single packet to be sent in both directions, HTTP (web) requests require establishing a connection to download the content. If the route changes, the HTTP connection is broken.

I thought Cloudflare uses Anycast to avoid targeted DDOS? How do they handle changing routes during HTTP requests?

There's alot of fear around the possibilities of flapping routes, but alot of real world data seems to show it doesn't happen seem to impact web traffic that often.

People often mix and match anycast/Geo DNS and anycast/unicast http.

Some even go a step further and, for video files, anycast to a node that 302s to it's own unicast address.

Indeed. There are also right ways to setup anycasting and wrong ways.

Right way: 1-2 major Tier1 carriers across all of your PoPs with local peering for regional eyeball networks.

Wrong way: Using a different set of transit carriers at each location.

You really don't want that many AS paths to reach your content from a given location (3-4 is more than enough). What you're really going for with BGP anycasting is that your local ISP has a direct route to the closest PoP via exchange peering, or that the Tier1 path drop you off to the "closest" route. Transit carriers do this for a living, and they're usually quite good at figuring out route weighting inside their own network.

Yes, I know Netflix does it differently but they use a lot more smart geo DNS routing than anycasting.

Edit: IMHO it's also better to choose a Tier1 with a moderate sized network that values stability and performance over size. So someone like NTT over say Level3.

Anycast means there are multiple routes going to the same destination. You get the route that is the shortest path via BGP to the anycast IP (least number of BGP hops). Once you have an established TCP session via one route, it will remain established through that route, as long as that route is still the “shortest” between your IP and the anycast IP.

The route will not “change” unless cloudflare changes their routing, or you change your location/IP so that a shorter route exists. Once you’ve changed your IP, you’ve already interrupted any TCP sessions anyway.

You might find these two blog posts from LinkedIn to be helpful:



I think it should be clarified that "destination" refers to an IP address, not an individual host. My understanding is that anycasting means a single address corresponds to multiple hosts achieved by simply advertising it from several sources with BGP, and you will often still have multiple redundant routes to any of the individual hosts behind the anycast IP because most locations will have redundant internet links.

Depending on how the routing is set up, it doesn't matter if the route changes so long as you end up on the same host consistently (or one that can at least pretend it's the same host if you do some kind of fancy session mirroring, perhaps)

Google Cloud Global Loadbalancer seems to do this fancy session mirroring because you only have one IP for the http load balancing. I am very often impressed by the GCP products.

There are other techniques that Google's routers will most likely use to load-balance traffic transparently to multiple hosts. A relatively simple way is to hash the (source, destination) address pair of the IP packet to determine which host to forward the packet to, so it doesn't necessarily require mirroring or any state. Only seamless failover when the host fails requires the fancy tricks.

> The route will not “change” unless cloudflare changes their routing, or you change your location/IP so that a shorter route exists. Once you’ve changed your IP, you’ve already interrupted any TCP sessions anyway.

That's what I thought, too. But the article explicitly states this as a potential issue.

For a more "own" CDN, here is another write-up: https://www.linkedin.com/pulse/build-your-own-anycast-networ...

why not just set up a server that requests the website every few minutes/seconds or so? That way it would always stay in the cache

I have tried using AWS Route53's latency based records, but for some reason, it always didn't work for me. I need to check it again.

I am curious, if anyone knows how well does Akamai work in the CDN world?

They're good but only at $100k per month and above. You really need to use their full suite of products to get the full benefit, and by that time you'll be at $100k per month.

They are OK not great for smaller accounts.

At work we use Akamai to serve a large website and we pay about a 10th of that and it definitely benefits us. But that's only because they simply have a large network and are one of the few CDN's that have POP's close to our customers. Other than that, it's just overkill for small businesses.

Terrible to work with and not all that performant as well. Also they caused us a 24+ downtime by a forced configuration change from their side, which undid some configurations their professional services implemented (yeah, some parts of the UI is only modifiable by PS). Luckily we had Cloudfront integration as backup, so switched over to that until finally Akamai team decided fix our problem.

Don't use them unless you have to. The vendor lock-in is strong.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact