The main problem is often geoip, and properly using the results of said GeoIP.
Having a dozen of POPs around the world is a trivial weekeend project.
Routing the client to the faster one and scaling up to local demands while optimizing given bandwith price is not trivial. Because not everything is Digital Ocean, for example what about Japan? Or Brazil?
I wonder how long till shared hosting or some sort of similar cloud hosting takes off more that handles a lot of this nuance for you. CDN's can really reduce a lot of the heavy burdens that a single server will face which is downloading static files. Not familiar enough with all the things offered on the cloud, just with the few tools I've used so anyone who might already be aware can let me know. I know the author mentions Amazon's Route53 but I'm on about an all-inclusive solution like one would find in shared hosting that could include:
* SSL - Free tier would bring Let's Encrypt SSL Certs by default, the end-user shouldn't even think about how it works or anything of the sort, it should just work.
* CDN - no need to even think about how it works, ServerlessCDN type of stuff.
* Host any kind of app (maybe some sort of serverless approach or docker, though the less you have to mess with even more dev tools the better)
* Intuitive admin panel for when SHTF with detailed logging.
I'm surprised we're still stuck in the era of CPANEL and L*MP stacks. Imagine if anyone could install and run Discourse as easily as they do Wordpress, same with Ghost. DigitalOcean comes pretty close in this regard, but maybe something more fool proof for the average joe would be great.
One of the most painful things to work on would be clearing caches. I have my own setup that I invalidate the _entire_ set of assets just to make sure the end users do not get stale assets, and I just ++ a build ID and all the URLs are uncached at the edges. Not all sites hosted in shared hosts can do this.
My plan for this is to just tell the cache nodes to delete the nginx cache folder entirely via their one-minute check in cycle. Each node will get a flag in the database that says "needs clearing" and when they check in next their update script will include an `rm -rf /nginx/cache/directory/*`. Extremely blunt but also easy and effective.
As someone who just recently CDN hell, and rebuilt our entire CDN network from the ground up (software and hardware), I was wondering why you picked RoR?
It’s what I know best and what I’m most productive in. The project is to get something running and learn a handful of new things, and learning a new framework would be a detriment to that first goal.
The manager app is not in the hot path with this design so performance doesn’t matter all that much.
Just be careful, understand that if you do a PULL only CDN, you're not going to gain big benefits. If you do want a pull only CDN, have a background task runner to retrieve the files, and update them locally.
Sure. I have Nginx set to keep files around for s long time and serve stale and refresh in the background, but proactively refreshing periodically is a good idea.
Personally I would build it in something like Go. I've done a lot of work in Rails and I would probably have the signup/profile/interface built in Rails 5.2 but use a high performant go framework for the really intensive stuff.
I've been considering building my own but getting an up and running gossip protocol to have it share data between nodes isn't the easiest thing in the world to code.
Its a pain to code. I’ve done it, and I hated every second of it. Keeping data in sync with dynamic data in near real-time is terrible.
I wrote the CDN in Go, with Redis and a smaller go-powered daemon to retrieve assets every 20 seconds, sync them to a local storage drive, and after 5 days retrieve again - or, if there is no requests within 48 hours, clear the unused items.
Then I setup a system that if one edge requests an “unpopular” file, it’ll ping a simple REST API and have all the other edges pull that file, this allowing the edges to stay “one step ahead” of the user load
Yeah, when thinking it through personally it comes down to a hard math problem. Because you have to maintain the state of the local files, whether they should live in memory vs ssd vs another node. Did you use an LRU cache for expunging less utilized resources?
State is much less important to track. It’s easier to do, the real challenge is garbage collection - you need it, but you don’t want to collect too much in memory. That’s why Redis is a great tool for our edge servers.
And nothing has made me realize just how slow the speed of light is until I started looking into the CAP theorem and distributed databases like CockroachDB.
You could also look at something like Apache Traffic Control, which came out of Comcast, and is used by a number of CDNs. https://trafficcontrol.apache.org/
Not that it matters, but some fun history for you. ATC is built on top of Apache Traffic Server. Before being donated over to ASF, ATS was known as YTS (Yahoo! Traffic Server). Of course the story doesn’t stop there, it was originally known as Inktomi Traffic Server, Inktomi having been acquired by Yahoo! in the early 2000s.
A VPN is point-to-point. You fire up the client on your laptop and connect to a server. That server is usually also acting as a bridge into a network. It gives your machine a presence into that other well defined network.
ZeroTier is an overlay network. What is does it create a new encrypted network, with it's own address space and everything, on which you can connect through the controller (a default one is provided by ZeroTier). It doesn't matter where the nodes are. If it detects that two nodes are on the same LAN it's going to route the traffic directly. The overlay network is encrypted all the time, even when it goes over your LAN.
I look through the goals, apart from self learning experience, bunnycdn.com seems to fit all the bills without the hassle. And despite its pricing, it is pretty damn fast as well.
Location is far from the only factor, peering can make as much of a difference too. Not saying DO is bad, but there are more factors at play than just location diversity.
Having a dozen of POPs around the world is a trivial weekeend project.
Routing the client to the faster one and scaling up to local demands while optimizing given bandwith price is not trivial. Because not everything is Digital Ocean, for example what about Japan? Or Brazil?