Hacker News new | past | comments | ask | show | jobs | submit login
My Own Private CDN (petekeen.net)
123 points by zrail on Aug 30, 2018 | hide | past | favorite | 39 comments



The main problem is often geoip, and properly using the results of said GeoIP.

Having a dozen of POPs around the world is a trivial weekeend project.

Routing the client to the faster one and scaling up to local demands while optimizing given bandwith price is not trivial. Because not everything is Digital Ocean, for example what about Japan? Or Brazil?


Absolutely. This project leans heavily on AWS route53’s latency-based routing. I don’t do the magic, route53 does.


I wonder how long till shared hosting or some sort of similar cloud hosting takes off more that handles a lot of this nuance for you. CDN's can really reduce a lot of the heavy burdens that a single server will face which is downloading static files. Not familiar enough with all the things offered on the cloud, just with the few tools I've used so anyone who might already be aware can let me know. I know the author mentions Amazon's Route53 but I'm on about an all-inclusive solution like one would find in shared hosting that could include:

* SSL - Free tier would bring Let's Encrypt SSL Certs by default, the end-user shouldn't even think about how it works or anything of the sort, it should just work. * CDN - no need to even think about how it works, ServerlessCDN type of stuff. * Host any kind of app (maybe some sort of serverless approach or docker, though the less you have to mess with even more dev tools the better) * Intuitive admin panel for when SHTF with detailed logging.

I'm surprised we're still stuck in the era of CPANEL and L*MP stacks. Imagine if anyone could install and run Discourse as easily as they do Wordpress, same with Ghost. DigitalOcean comes pretty close in this regard, but maybe something more fool proof for the average joe would be great.


Dreamhost shared hosting includes Let's Encrypt and Cloudflare, click a checkbox to enable them.

https://www.dreamhost.com/hosting/ssl-tls-certificates/

https://www.dreamhost.com/partners/cloudflare/


One of the most painful things to work on would be clearing caches. I have my own setup that I invalidate the _entire_ set of assets just to make sure the end users do not get stale assets, and I just ++ a build ID and all the URLs are uncached at the edges. Not all sites hosted in shared hosts can do this.


My plan for this is to just tell the cache nodes to delete the nginx cache folder entirely via their one-minute check in cycle. Each node will get a flag in the database that says "needs clearing" and when they check in next their update script will include an `rm -rf /nginx/cache/directory/*`. Extremely blunt but also easy and effective.


What you describe sounds like now.sh if you are on a JS-based stack.

Edit: also for just plain static if you use a non-JS generator.


I think the N part of the CDN is going to be the real thing here, the CD part is easy, especially with the various FOSS options out there.


This. Building a "CDN" on top of an existing network is not building a CDN. Nice project to learn some parts of CDN management software tho.


As someone who just recently CDN hell, and rebuilt our entire CDN network from the ground up (software and hardware), I was wondering why you picked RoR?


It’s what I know best and what I’m most productive in. The project is to get something running and learn a handful of new things, and learning a new framework would be a detriment to that first goal.

The manager app is not in the hot path with this design so performance doesn’t matter all that much.


Are you designing this CDN to pull from origin, cache temporarily? Or to pull from local file and put strong cache on it?

If you need a hand let me know, I’ve built pretty large CDNs before (10M r/s at peak)


The former to start but I want to add push zones and/or “s3sync” zones that proactively sync an s3 bucket to local disk.

Thanks for the offer! I might just take you up on it :)


Just be careful, understand that if you do a PULL only CDN, you're not going to gain big benefits. If you do want a pull only CDN, have a background task runner to retrieve the files, and update them locally.


> understand that if you do a PULL only CDN, you're not going to gain big benefits.

This statement makes no sense. A CDN edge node is just a cache; its size and your access patterns determine the hit ratio.

At $dayjob we get Nginx cache hit ratios on our edge in excess of 99% for “an origin fetch” setup. That is a very large benefit.

Cloudflare works entirely on origin fetch. They seem to be doing okay.


Sure. I have Nginx set to keep files around for s long time and serve stale and refresh in the background, but proactively refreshing periodically is a good idea.


What would you suggest and why? Not a loaded question for all the people eager to downvote.


Personally I would build it in something like Go. I've done a lot of work in Rails and I would probably have the signup/profile/interface built in Rails 5.2 but use a high performant go framework for the really intensive stuff.

I've been considering building my own but getting an up and running gossip protocol to have it share data between nodes isn't the easiest thing in the world to code.


Its a pain to code. I’ve done it, and I hated every second of it. Keeping data in sync with dynamic data in near real-time is terrible.

I wrote the CDN in Go, with Redis and a smaller go-powered daemon to retrieve assets every 20 seconds, sync them to a local storage drive, and after 5 days retrieve again - or, if there is no requests within 48 hours, clear the unused items.

Then I setup a system that if one edge requests an “unpopular” file, it’ll ping a simple REST API and have all the other edges pull that file, this allowing the edges to stay “one step ahead” of the user load


Yeah, when thinking it through personally it comes down to a hard math problem. Because you have to maintain the state of the local files, whether they should live in memory vs ssd vs another node. Did you use an LRU cache for expunging less utilized resources?


State is much less important to track. It’s easier to do, the real challenge is garbage collection - you need it, but you don’t want to collect too much in memory. That’s why Redis is a great tool for our edge servers.


And nothing has made me realize just how slow the speed of light is until I started looking into the CAP theorem and distributed databases like CockroachDB.


You could also look at something like Apache Traffic Control, which came out of Comcast, and is used by a number of CDNs. https://trafficcontrol.apache.org/


> which came out of Comcast

Not that it matters, but some fun history for you. ATC is built on top of Apache Traffic Server. Before being donated over to ASF, ATS was known as YTS (Yahoo! Traffic Server). Of course the story doesn’t stop there, it was originally known as Inktomi Traffic Server, Inktomi having been acquired by Yahoo! in the early 2000s.


"NET::ERR_CERT_COMMON_NAME_INVALID"

off to a good start...

Your cert is valid only for corastreetpress{.}com


Hmmmmm! Thanks for the bug report!

Edit: fixed. T'was a dumb copy and paste error.


> Deploy onto the server in my basement on my ZeroTier network

I've read as much as I can handle of the website of this ZeroTier thing and I still can't fully grok it.

What's the difference between this and your own private VPN?


A VPN is point-to-point. You fire up the client on your laptop and connect to a server. That server is usually also acting as a bridge into a network. It gives your machine a presence into that other well defined network.

ZeroTier is an overlay network. What is does it create a new encrypted network, with it's own address space and everything, on which you can connect through the controller (a default one is provided by ZeroTier). It doesn't matter where the nodes are. If it detects that two nodes are on the same LAN it's going to route the traffic directly. The overlay network is encrypted all the time, even when it goes over your LAN.


Sounds kinda like DMVPN.


Zero Tier is peer to peer.


I look through the goals, apart from self learning experience, bunnycdn.com seems to fit all the bills without the hassle. And despite its pricing, it is pretty damn fast as well.


I can get 4TB of transfer from Vultr or Digital Ocean for half that price.

In any case, the cost considerations are somewhat secondary. I wanted to learn some stuff and this is a practical way to do it :)


> I wanted to learn some stuff and this is a practical way to do it :)

Fair enough.

>I can get 4TB of transfer from Vultr or Digital Ocean for half that price.

You still need multiple POPS, with each droplet in those region the minimum cost is still going to be higher.


Dont see the point if its not distributed across networks.


Digital ocean has a bunch of locations


Location is far from the only factor, peering can make as much of a difference too. Not saying DO is bad, but there are more factors at play than just location diversity.


Good exercise, but I doubt this will be cheaper than a commercial offering like Cloudflare, for example.


The skills you pick up from even attempting these things are probably the biggest reason for doing it. And bragging rights maybe?


Sure, but some things are done just for fun, others to prove it can be done. It's not all about the $

Also, TFA is talking this vs. CloudFront which is $$$




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: