
Deploying a global, private CDN - jloveless
https://blog.edgemesh.com/deploy-a-global-private-cdn-on-your-lunch-break-7550e9a9ad7e
======
chrissnell
Ugh, I hate these P2P "CDNs". A few years back, CNN tried this for their
streaming video with technology from a company called Octoshape. Users
(including myself) were unwittingly conned into accepting the plugin in order
to watch live video. This created a huge mess for big corporate IT
departments, who suddenly had hundreds or thousands of desktop machines
streaming out video whenever there was a major news event.

I realize that this is a server-side option now, however. Still, it's a crappy
deal. A decently-sized deployment of public cloud boxes to support your
private CDN is going to cost far, far more than an actual CDN. Public cloud
bandwidth is obscenely priced compared to what you can get it for on the CDN
market.

~~~
technopriest
Hello, lead engineer here. Just to clarify, you don't need a plugin to enable
Edgemesh as a user. Everything is 100% browser compliant. Your webmaster adds
our one line of code and our one javascript file you are done. Your users will
never even see a pop up.

~~~
willitpamp573
Obviously the plugin isn't the issue, the fact that my browser will now be
uploading X MB/sec, which I have no control over, to random peers. This sucks,
especially because I have capped monthly bandwidth from my penny-pinching ISP.

Why should I host and seed your data for free?

~~~
technopriest
We detect if you are on a metered connection and we disable all seeding
functionality to ensure you are not playing for the bandwidth we use. We also
have an opt out mechanism, but that is up to the installing site to implement.
The reason you would want to host data is that you get a faster internet
experience. We fill your cache with assets that YOU are likely to request in
the future. Performance is our driving metric.

~~~
hwillis
You and jloveless have been commendably open in this submission. Personally, I
think that if you're not destroying someone's limited data then you're meeting
what should be expected of you. Demanding that you deliver your content in a
certain way is silly IMO.

Say I have a web application that displays a complex rendering of millions of
constantly changing points, and for some reason it's very expensive for me to
do computing. However it's easy to write some javascript that renders the
millions of points on the user's computer. It's absurd to say I'm being
unreasonable by streaming more data to the user instead of rendering frames
and streaming video. Using my upload speed is annoying, but it's still stupid
to pretend that using a website is entirely one-sided. It's like complaining
about ad bandwidth.

Abuse is one thing, but this isn't categorically bad. Plus, it's _really_
cool!

~~~
jloveless
Thank you! We work _really_ hard to ensure we're staying off CPU, managing
disk, and making every replication event count (generally intra-ASN). But it's
also really valuable for our NGO clients (and other non profits). My personal
favorite is an aide program where they literally bring a Supernode on a
laptop, setup a WiFi[1] point in the middle of no-where and can support a
fully interactive site for refugee's who have devices when they reach the
camp. They can then find out where they are, and what's going on - and you can
power a surprisingly large site from a single laptop . It's also really
helpful in places like sub-Sahara Africa where in region bandwidth capacity
_dramatically_ outstrips off country bandwidth.

[1] [http://www.meshpoint.me/](http://www.meshpoint.me/)

~~~
hwillis
That's fantastic! I have a personal vendetta against heavy websites
specifically because of how unusable they are in remote countries, so that
sounds just fucking awesome to me.

------
mattbillenstein
BitTorrent tried to market this product when video streaming on the web was
still fairly new under the BitTorrent DNA name. As I recall, over the year it
was being developed, bandwidth prices dropped something like 80% which made
the market for it pretty much evaporate at that time.

Most cloud bandwidth is crazy overpriced since in the datacenter you typically
pay for peak bandwidth, not bytes. You can see this with cloud providers like
digital ocean where you can essentially buy 1TB for the cost of running a
$5/mo instance. You can build a poor mans CDN using these types of services
and geo DNS that saves you a ton of coin.

~~~
jloveless
Totally agree - and bandwidth rates on cloud are crazy expensive [1]. The more
common use-case for Supernodes are on colocated servers where you have excess
capacity already - or for when you need to just deploy additional capacity for
a moment (Black Friday etc), or when you have private links that are
uncorrelated to the common CDN backbones (e.g. areas in Asia and Africa)[2].
If setting up Geo DNS with healthchecks is a bit much to get going - this is a
self bootstrapping option that doesn't require other changes. That being said
we run Geo DNS as well :)

[1] [https://blog.edgemesh.com/its-time-to-change-the-web-and-
sto...](https://blog.edgemesh.com/its-time-to-change-the-web-and-stop-paying-
bandwidth-toll-booths-3f78d3203cee) [2]
[https://blog.edgemesh.com/understanding-diversification-
netw...](https://blog.edgemesh.com/understanding-diversification-networks-and-
nobel-prizes-114bf61247c4)

------
ramshanker
I have also been contemplating to make one. After all web is all about
decentralization. However such crowd sourcing bandwidth must be with full
transparency and need to have configurable soft limit on per user basis.

------
atrudeau
For those hosting assets on S3, you can use something like
[http://idiallo.com/blog/creating-your-own-cdn-with-
nginx](http://idiallo.com/blog/creating-your-own-cdn-with-nginx) or
[https://github.com/alexandres/poormanscdn](https://github.com/alexandres/poormanscdn)
with Geo routed DNS on Route 53. Seems a lot simpler than this (but probably
not as feature-rich).

~~~
jloveless
With this there's no DNS to even setup which is nice. Route53 is great but
getting the failover and geo-routing to work is ... challenging. But I would
def still keep a base NGINX and/or Varnish cache at the origin for sure. Can
also look at AWS Cloudfront[1]

[1] [https://aws.amazon.com/cloudfront/](https://aws.amazon.com/cloudfront/)

~~~
atrudeau
Yep, CloudFront is great. The disadvantage is missing Let's Encrypt support,
which is trivial with the two options above.

~~~
manigandham
AWS has Certificate Manager, you don't need let's encrypt.

------
kennydude
Or you know, have your lunch :P

------
ernsheong
Sorry if this has been mentioned, but it would be worth having a link back to
your main website from Medium. I've hunted around and finally had to reach for
my URL bar to get to your product landing page.

~~~
jloveless
Thank you! We just added a navigation element to get you back to the home
page. Can't believe that hasn't been there all this time!

------
mayli
what's the difference between this and peer5, and other webrtc based p2p cdn?

~~~
jloveless
Peer5 and Streamroot.io are focused solely on video - and tap into WebRTC
media functionality to scale Video on Demand etc. Both great pieces of tech.
We are lower level and use Datachannels[1] to p2p replicate _almost_ all
assets (images, Video, Fonts etc) required to build the page. This is
primarily enabled by using the ServiceWorker[2]. We also focus on updating the
client side cache as opposed to stepping infront of a page load. E.g. we
replicate in asset's that will be used to render the page and when you request
the page we simply serve those assets from the (now populated) cache. For
video it's a bit more complex as we replicate in the first N seconds of video
(to help with buffer lag) and then switch to a similar mode as Peer5 and
Streamroot. Feel free to PM me for more info, and also there will be an ACM
article out this month that goes into more details.

[1] [https://developer.mozilla.org/en-
US/docs/Web/API/RTCDataChan...](https://developer.mozilla.org/en-
US/docs/Web/API/RTCDataChannel) [2] [https://developer.mozilla.org/en-
US/docs/Web/API/Service_Wor...](https://developer.mozilla.org/en-
US/docs/Web/API/Service_Worker_API)

------
dylz
I want to set up a private CDN pointed to some kind of hostname / reverse
proxy and caching a hostname. Almost all of my users are on metered
connections so p2p might not be the best.

I have existing infrastructure and unused bandwidth. What are my choices for
easy deploy?

~~~
jloveless
One option is Varnish [1] with some DNS routing to your caches. It's well
tested and deployed. If most of the users are on metered connections your
correct that they won't be able to provide upload capacity (but will be able
to download). In those cases you can also just deploy the server version [2]
mentioned here and disable the browser client.

[1] [http://varnish-cache.org/trac/wiki/Introduction](http://varnish-
cache.org/trac/wiki/Introduction) [2]
[https://edgemesh.com/product#Supernode](https://edgemesh.com/product#Supernode)

~~~
dylz
Is there any way I can selectively enable/disable upload capacity on the
client side with some form of JS method? I know on my own end specific
netblocks that are severely metered (<1-5GB/m) but likely won't show up as
such because they're usually 3g wifi modems or otherwise, so will just be a
wifi connection instead of mobile.

Are my supernodes used for any other site / are my users' browsers used for
any other site than mine?

~~~
jloveless
You can limit your supernode to your Origins [1] by setting the EM_ORIGINS
environment variable.

With regards the first point we should detect it (based on you ASN, if you are
on 3g modems they won't be able to upload). E.g. even though your
laptop/tablet is on 'Wifi' your actual IP that comes to the backplane will be
from your network block (the cellular address block) and so your client will
be automatically removed from the available upload pool (although you can
still download). Feel free to PM me directly if you've more questions

[1]
[https://edgemesh.com/docs/supernode/configuration](https://edgemesh.com/docs/supernode/configuration)

~~~
dylz
Sent you an email w/some more details. Thanks!

------
theanomaly
Looking at the image on the link, the "checksums" are a suspicious 32
characters... Hoping you guys are not using md5sums.

Am I missing something, or would this let any node (supernode/browser) in the
system potentially replace arbitrary content with their own content? [1]

Hopefully JS isn't being served by this mechanism (attack vector pretty
obvious there), but even images are still a concern [2] [3].

[1] [https://en.wikipedia.org/wiki/Collision_attack#Chosen-
prefix...](https://en.wikipedia.org/wiki/Collision_attack#Chosen-
prefix_collision_attack)

[2] [https://threatpost.com/apple-patches-ios-flaw-exploitable-
by...](https://threatpost.com/apple-patches-ios-flaw-exploitable-by-malicious-
jpeg/121521/)

[3] [https://imagetragick.com/](https://imagetragick.com/)

~~~
jloveless
There is a 3 part hash going on. There is an Origin ID hash, a URL hash and
then an MD5 on the actual payload. When a new asset is registered on the mesh
the Edgemesh backplane downloads the asset direct to confirm the MD5. If it
doesn't match it won't allow the asset to register. On a replication the
destination node receives the asset and calc's the MD5 again. If the MD5
doesn't match - it signals Edgemesh who then takes that node (source) out of
the mesh. E.g. if you modify an asset and attempt to replicate it - the
receiving party will invalidate the object and signal back to Edgemesh.
Replication directions are from the Edgemesh backplane. PM me if you'd like to
go into this in more detail.

~~~
namelost
21 fucking years ago.

> In 1996, Dobbertin announced a collision of the compression function of MD5
> (Dobbertin, 1996). While this was not an attack on the full MD5 hash
> function, it was close enough for cryptographers to recommend switching to a
> replacement, such as SHA-1 or RIPEMD-160.

[https://en.wikipedia.org/wiki/MD5#History_and_cryptanalysis](https://en.wikipedia.org/wiki/MD5#History_and_cryptanalysis)

~~~
jloveless
:) You're dead right and it's why we use it inside two other top level hashes
(e.g. you'd need to collide inside the OriginID space as well). It's certainly
possible though (for extremely large sites) and we're experimenting with an
xxHash64 implementation for a later release.

~~~
tatersolid
You have SHA256 built into the browser. Use it.

Stop inventing your own crypto protocols, as you clearly have no idea what
you're doing in that area (as evidenced by any usage of MD5).

xxHash64 is not a cryptographic hash function. Collisions and pre-images
matter here as they allow for subdtitution of content by an adversary.

------
jloveless
For those looking for even more detail, our ACM article is now available on
Queue [1]

[1]
[http://queue.acm.org/detail.cfm?id=3136953](http://queue.acm.org/detail.cfm?id=3136953)

------
jloveless
This breaks down running on a P2P CDN on Google Cloud, but the same can be
done on AWS, Azure , Digital Ocean. Digital Ocean definitely has the best
bandwidth rates AFAIK and has enough regions to serve as a solid backbone.

~~~
nik736
Vultr is half the price of DO and has even more regions.

~~~
kuschku
You can actually get even cheaper.

Check out [https://git.io/vps](https://git.io/vps), where I made a comparative
listing of different providers.

~~~
fapjacks
This is actually a pretty good list. The VPS hosting industry is actually one
of the most awful, bottom-feeding industries that exist. I've been buying VPS
from many different providers for fifteen years or something, and the one
thing I learned is that the vast majority of VPS providers are scumbag thieves
and fly-by-night scammers. I'd also include Ramnode on this list of good VPS
providers, but otherwise I'd stay far away from any provider not listed here.

~~~
kuschku
I wanted to add Ramnode, but didn’t have the time (work, open source projects)
to do that yet. Thanks for reminding me!

------
gnu-user
Great article, I'll definitely take a look at this! I think it's a nice
solution for startup companies that are looking to cut costs on CDNs as well
as experiment with P2P tech.

~~~
andreapaiola
Can I propose an alternative?

[https://github.com/andreapaiola/P2P-CDN](https://github.com/andreapaiola/P2P-CDN)

------
johnvega
Would this be a potential competitor to netlify.com or perhaps partial
competitor or a complement?

~~~
jloveless
Author and Edgemesh employee here: I think it sits pretty squarely in the
'complimentary' category. E.g. we've customers who run Fastly, Akamai ,
Cloudflare etc but add us as well to get increased resiliency, Real User
Metrics and edge acceleration. One customer saw a 35% drop in page load time
when they added Edgemesh to an already fast Fastly supported site. Enterprise
customers use Supernodes to add capacity in via their own datacenters.

