
Building Your Own CDN for Fun and Profit - janoszen
https://pasztor.at/blog/building-your-own-cdn
======
zzzcpan
Fast nameservers are not as important as author suggests. But either way one
extra indirection for nameservers would allow you to choose nameserver records
dynamically too. And with large enough TTLs and some traffic, clients won't
have to go all the way to find out the closest nameserver, essentially
providing clients with the fastest one from the cache. Since redundancy is
built in into DNS any nameserver with large TTL going down won't be a problem.
And unlike with anycast this is much more reliable and much cheaper, since you
don't have to rely on one AS and network infrastructure as a single point of
failure and you don't have to even build one either. You can use as many
different hosting providers as needed.

~~~
janoszen
(I'm the author.) This whole setup is built for a comparatively low traffic
blog, so DNS caching won't help much. (On normal days I get ~100 visitors.)
This is compounded by the TTL which is 60s to account for node failures.

The optimization level is in the sub 1 second range, so not having to pay one
large RTT penalty for a DNS lookup is quite important. I've measured 300+ms
RTT to Australia on the previous box I was using, that impacted the load times
quite severely.

------
nathan_f77
That's a fun project. For production websites and blogs, I'm pretty happy with
Netlify, CloudFlare, and CloudFront. But CloudFront charges $600 per month for
custom SSL certificates [1], so you could save a lot of money by just spinning
up ~10 servers in different AWS regions.

I noticed this line at the bottom of the page: "When it comes to picking a
solution, I often choose the less traveled road". I don't agree with that at
all, and it sounds a bit like NIH syndrome. It's always better to choose the
most-traveled roads, especially in DevOps. If there's a problem, then you can
join those communities and contribute to the projects.

[1] [https://aws.amazon.com/cloudfront/custom-ssl-
domains/](https://aws.amazon.com/cloudfront/custom-ssl-domains/)

~~~
sciurus
> But CloudFront charges $600 per month for custom SSL certificates

This is misleading. Cloudfront doesn't charge _anything_ for putting your
domains on an SSL cert that uses SNI. They only change you if you need a cert
without SNI, which requires them to allocate a dedicated IP address to you.

I'm hosting my personal blog on S3 and cloudfront, with SSL, for less than a
dollar a month.

Performance and capabilities are fine for me, too. I get 0.15 seconds to first
byte from Chicago, vs 0.24 for the author's site.

[https://www.webpagetest.org/result/180214_K2_28c2826a6422b01...](https://www.webpagetest.org/result/180214_K2_28c2826a6422b01066a8ac8e3eb54ec5/)

[https://www.webpagetest.org/result/180214_QT_a91f7af3bf78e7b...](https://www.webpagetest.org/result/180214_QT_a91f7af3bf78e7bad4bfbe47614e0b4e/)

~~~
janoszen
If you are fine with having slashes at the end of your URLs and you do not
want to do anything too complicated like content negotiation for image types,
S3 and CloudFront is fine. The moment you turn on Lambda@Edge, to do the
magic, things get slow after a period of no traffic.

I plan on expanding on the featureset, so no S3 for me. :)

~~~
gunzel
Did you consider using periodic calls to keep the Lambda@Edge functions
"warm"? I've been playing with Zappa
([https://www.zappa.io](https://www.zappa.io)) for standard Lambda and it sets
this up by default.

~~~
janoszen
Yes, but it's kind of a whack-a-mole since their reuse times are not public
AFAIK, so it would constantly need tuning as they develop the service.

------
davidu
"Second, BGP routes are not that stable."

This has been disproved for close to ten years empirically and
academically[1]. Route flaps generally result in convergence to the exact same
destination if it has another path and is still online. If it's offline, then
it's working as intended, and that's no different from a server being rotated
via a DNS pool going down.

1: Quick search:
[https://www.google.com/search?q=tcp+anycast+paper&ie=utf-8&o...](https://www.google.com/search?q=tcp+anycast+paper&ie=utf-8&oe=utf-8&client=firefox-b-1-ab)

~~~
jgrahamc
That's not our experience either. BGP is fine.

But it is the case that transit and peering connections are not stable (in the
sense of going up and down randomly or suddenly experiencing high levels of
packet loss) and active monitoring is a must.

~~~
davidu
Is your experience that when routes reconverge, they still select the same end
POP they did prior to the flap?

Assuming the end POP is still reachable along another regional route, I
believe all the data I've seen shows that the client almost always hits the
same destination they did before the flap.

~~~
severine
What is POP?

~~~
janoszen
A POP or edge location is a server (or multiple) that the user traffic is
being routed to, hopefully close to the user. A CDN consists of multiple POPs,
one in each region, with intelligent traffic routing added (as described in
the article).

------
rsync
I have always thought it would be a fun and inspiring project to deploy a
global CDN ... my career and my lifelong hobby have both been "UNIX sysadmin"
and I love running networks ...

However, I spoke about this to some ISP/NANOG folks that I trust and they said
that running a real CDN is a nightmare because all of your links (providers)
hate you ... you're producing the exact opposite of the traffic that they want
and they will not give you any breaks or help or benefits since you are their
worst customer.

How accurate was that assessment ?

~~~
janoszen
It depends on the scale. Running a personal blog with sub-1MiB/s traffic is
not a problem. I've seen some larger projects though where detailed data
analysis had to be employed to debug bad connections... that's not a one-man-
job and it was a serious headache to work around some of the less... neutral
providers.

------
kirankn
We currently use KeyCDN which works out well, both performance & money wise.
You may want to try it out.

~~~
ksec
Why KeyCDN ? Why not MaxCDN or Fastly etc?

~~~
vasco
MaxCDN has shitty performance, terrible monitoring. We have to tell them when
their servers are overloaded due to our monitoring detecting regions with
super high SSL negotiation times.

------
mikerg87
if you have a specialized application knowing how to do this can be quite
useful. CDN pops are almost not existent across much of the Middle East and
Africa. Sometimes building your own is the only way until a commercial
offering becomes available.

~~~
kijin
Also, PoPs in some regions are often nearly useless even if they exist on
paper.

For example, Cloudflare has a PoP in Seoul, but it has such limited bandwidth
that most sites using Cloudflare are routed to Tokyo, Hong Kong, and even Los
Angeles. Several of my clients in Korea signed up for Cloudflare a few years
ago when the local PoP was still usable, but now all but two of them have
canceled their subscriptions. Instead, I've been building a lot of caching
proxies for them lately.

If anyone is here for the Winter Olympics right now and some of your favorite
sites don't seem to be living up to Korea's reputation for ultra-fast
internet, Cloudflare might be one reason. (Meanwhile, Amazon's PoP in Seoul is
perfectly fine, albeit expensive.)

~~~
jtl999
Cloudflare has a Vancouver PoP but Telus Vancouver doesn't use it, all traffic
is routed to Seattle. (as an example)

    
    
      colo=SEA
      spdy=h2
      http=h2
      loc=CA

------
porker
I don't understand how his use of Traefik gets round the SSL pain point?

> Using SSL/TLS certificates

> The next pain point is using SSL/TLS certificates. Actually, let’s call them
> what they are: x509 certificates. Each of your edge locations needs to have
> a valid certificate for your domain. The simple solution, of course, is to
> use LetsEncrypt to generate a different certificate for each, but you have
> to be careful. LE has a rate limit, which I ran into on one of my edge
> nodes. In fact, I had to take the London node down for the time being until
> the weekly limit expires.

> However, I am using Traefik as my proxy of choice, which supports using a
> distributed key-value store or even Apache Zookeeper as the backend for
> synchronization. While this requires a bit more engineering, it is probably
> a lot more stable in the long run.

~~~
janoszen
Traefik can simply request certificates using the DNS verification method, as
opposed to the certbot HTTP verification. (HTTP would not work with a
distributed setup like this.) Alternatively, Traefik can also synchronize
certificate requests using one of the many key-value stores supported
(untested as of yet).

The drawback of the DNS method without synchronization between the nodes is
that you run into the LetsEncrypt rate limit quite easily. My expansion to ap-
southeast-1 and sa-east-1 is waiting for the LE cooldown.

Disclaimer: I'm the author of the article.

------
forcer
Author mentions why not use Cloudflare that CDN cache is purged often. If you
want to verify if it happens for your content. You can try this tool -
[http://cloudperf.speedchecker.xyz/cloudflare-
tester.html](http://cloudperf.speedchecker.xyz/cloudflare-tester.html)

Side effect of this tool as you might have guessed is that using it will
actually prolong the time your content stays in their cache.

~~~
kardos
So one could setup an automated crawler thatbruns frequently to keep
everything in cache?

~~~
janoszen
Yes, but you would need a crawler that does so in every region, or at least
know the IPs of the edge nodes on that CDN. You would probably also hit some
rate limit / DDoS protection with the CDN itself.

------
dzolvd
[https://github.com/apache/incubator-
trafficcontrol](https://github.com/apache/incubator-trafficcontrol) is an open
source cache control layer (working with ATS) that has features for header
rewrites, ssl, and custom urls (among others). It is built for video but can
be used to cache any content. Probably a bit heavy for your use case
infrastructure wise though.

~~~
janoszen
Interesting, although I specifically wanted to build a push CDN (where I can
push the content) rather than a pull CDN (that works with an origin) to avoid
the added latency with cache misses.

~~~
dzolvd
Makes sense, I am enjoying looking through the source as we are moving to an
ansible and hopefully dockerized deployment model.

~~~
janoszen
Of course it's dockerized, it has to be cool, right? :)

Ansible is running docker-compose up -d on deployment an Traefik is doing the
magic. I want to extend it to host multiple sites in the future. (Btw. Ansible
ran from a central location is painfully slow because of the large latency to
the edge nodes.)

The content itself is deployed using rsync, Ansible was just too painfully
slow for that.

------
dx034
> Second, BGP routes are not that stable. While DNS requests only require a
> single packet to be sent in both directions, HTTP (web) requests require
> establishing a connection to download the content. If the route changes, the
> HTTP connection is broken.

I thought Cloudflare uses Anycast to avoid targeted DDOS? How do they handle
changing routes during HTTP requests?

~~~
chatmasta
Anycast means there are multiple routes going to the same destination. You get
the route that is the shortest path via BGP to the anycast IP (least number of
BGP hops). Once you have an established TCP session via one route, it will
remain established through that route, as long as that route is still the
“shortest” between your IP and the anycast IP.

The route will not “change” unless cloudflare changes their routing, or you
change your location/IP so that a shorter route exists. Once you’ve changed
your IP, you’ve already interrupted any TCP sessions anyway.

You might find these two blog posts from LinkedIn to be helpful:

[https://engineering.linkedin.com/network-performance/tcp-
ove...](https://engineering.linkedin.com/network-performance/tcp-over-ip-
anycast-pipe-dream-or-reality)

[https://engineering.linkedin.com/blog/2016/04/the-joy-of-
any...](https://engineering.linkedin.com/blog/2016/04/the-joy-of-anycast--in-
side-the-datacenter)

~~~
chousuke
I think it should be clarified that "destination" refers to an IP address, not
an individual host. My understanding is that anycasting means a single address
corresponds to multiple hosts achieved by simply advertising it from several
sources with BGP, and you will often still have multiple redundant routes to
any of the individual hosts behind the anycast IP because most locations will
have redundant internet links.

Depending on how the routing is set up, it doesn't matter if the route changes
so long as you end up on the same host consistently (or one that can at least
pretend it's the same host if you do some kind of fancy session mirroring,
perhaps)

~~~
tpetry
Google Cloud Global Loadbalancer seems to do this fancy session mirroring
because you only have one IP for the http load balancing. I am very often
impressed by the GCP products.

~~~
chousuke
There are other techniques that Google's routers will most likely use to load-
balance traffic transparently to multiple hosts. A relatively simple way is to
hash the (source, destination) address pair of the IP packet to determine
which host to forward the packet to, so it doesn't necessarily require
mirroring or any state. Only seamless failover when the host fails requires
the fancy tricks.

------
vbernat
For a more "own" CDN, here is another write-up:
[https://www.linkedin.com/pulse/build-your-own-anycast-
networ...](https://www.linkedin.com/pulse/build-your-own-anycast-
network-9-steps-samir-jafferali/)

------
kirankn
I have tried using AWS Route53's latency based records, but for some reason,
it always didn't work for me. I need to check it again.

------
lucjac
why not just set up a server that requests the website every few
minutes/seconds or so? That way it would always stay in the cache

------
thisisit
I am curious, if anyone knows how well does Akamai work in the CDN world?

~~~
scurvy
They're good but only at $100k per month and above. You really need to use
their full suite of products to get the full benefit, and by that time you'll
be at $100k per month.

They are OK not great for smaller accounts.

~~~
photonios
At work we use Akamai to serve a large website and we pay about a 10th of that
and it definitely benefits us. But that's only because they simply have a
large network and are one of the few CDN's that have POP's close to our
customers. Other than that, it's just overkill for small businesses.

