
My fight against CDN libraries - agateau
http://peppercarrot.com/article390/my-fight-against-cdn-libraries
======
mstaoru
I only represent about 0.00000013% of all Chinese Internet users, but let me
chime in: EVERY website that uses Google CDNs for js or fonts just doesn't
work here. It just keeps loading and loading, and loading forever. In most
cases it's jQuery, and in most cases it's in the <head> so the page just never
shows. Cloudflare (cdnjs), Amazon CDNs, Akamai CDNs also occasionally get
blocked and take entire Internet segments with them.

If you use 3rd party CDNs, please consider implementing client-side failover
strategy so you don't leave out 50% of the Internet "population".

~~~
throwawasiudy
I don't like the censorship policies of the Chinese government. I'm not going
to go out of my way to make sure my site is compatible with their censorship,
use a VPN.

~~~
deno
You care about Chinese government’s censorship, but you don’t care about
privacy from multinational corporations and governments?

~~~
xenophonf
False equivalence - the one has nothing to do with the other.

~~~
deno
They do, if you take a moralistic attitude, as op has. It’s hypocrisy. If you
just don’t care about Chinese market then say so—don’t pretend you do so out
of a moral obligation.

~~~
xenophonf
You're making assumptions about the beliefs of OP that aren't in evidence. A
"moralistic attitude" (as you put it) can give greater weight to concerns
about government censorship than to concerns about corporate respect for
privacy rights. The two are quite different, even to an avowed anti-statist
such as myself.

~~~
deno
That’s why I stated it as a question.

I still think it’s hypocrisy to care about one and not the other. Those
corporations are actually actively working to erode Internet freedoms, which
affects everyone, not just a single country, and one that is not even
democratic in the first place.

To get on your high horse over censorship in Asia while at the same time
merrily include spying in your own code, as a simple convenience, nonetheless,
is very much indefensible.

------
a3n
Firefox on Linux.

I use uBlock Origin, Ghostery and Disconnect, and Flash Control.
peppercarrot.com is all zeroes for all three blockers, meaning nothing is
blocked because there's nothing noticed that needs to be blocked. There are no
Flash Control icons, meaning no video or audio noticed and blocked. Thanks for
caring. :)

On the front page of theguardian.com, logged in as me, there's a _V_ icon at
the top, meaning that Flash Control has blocked video, probably for some
gratuitous menu feature. I have zero trouble using and reading the site.

When I first opened theguardian a few minutes ago, uBlock was blocking 13
requests. It's steadily climbed in those minutes to 32 blocked requests.
Ghostery is noticing/blocking 0 trackers. Disconnect is blocking two: nielsen
and comscore. Disconnect is also blocking 1 from Facebook and 3 from Google.
All three tools may be seeing and blocking some of the same things.

Without these four tools, except for low/no-commercial technical sites and
public service sites like wikipedia my web is all but unusable. With them my
web is fine.

I very rarely have any problems using any site. I had to enable my bank in
uBlock to use their popup bill pay feature. I think I had trouble viewing a
cartoon at The New Yorker; I forget what I did to view it. Youtube and Flash
Control seem to be in a perpetual arms race, as was the case with Flashblock.
Youtube is my main motivation for using Flash Control, to prevent automatic
video playing.

And yep, I get that sites pay the bills with ads. I $ubscribe to three news
sites, and I also get that that doesn't pay the whole bill. The web is either
going to have to block me for using a blocker (I've been seeing that very
rarely recently, or at least "Unblock us please") or figure out a less
dangerous, intrusive and loadsome way to serve ads. (And yep, I just made up
the word "loadsome." I can do anything!)

EDIT: I whitelist duckduckgo.com in uBlock.

[https://duck.co/help/company/advertising-and-
affiliates](https://duck.co/help/company/advertising-and-affiliates)

[https://duckduckgo.com/privacy](https://duckduckgo.com/privacy)

~~~
JustSomeNobody
> I use uBlock Origin, Ghostery and Disconnect, and Flash Control.

I just have to say, thank goodness for Moore's Law. Without it, we would never
have so many wasted cycles![0]

[0] Not saying you're wasting, but the fact that we have to jump through sooo
many hoops to stop all this crap is just disgusting.

~~~
Noseshine
I see a parallel to our immune system. One of the most complex pieces of
machinery in our bodies has the sole function to keep us from getting overrun
and taken over by "hackers". When I look at nature as an example I see a way
to _more_ complexity in the things we create - because they are not actually
completely "designed", instead we let laws of nature govern how they develop,
so I think it's not too far-fetched to look at existing nature-designed
systems for guidance on predictions about the future of man-made systems.

How we operate, a good example using a very simple product:
[https://medium.com/@kevin_ashton/what-coke-
contains-221d4499...](https://medium.com/@kevin_ashton/what-coke-
contains-221d449929ef)

------
mark242
From the post:

"Well a big one: Privacy of the readers of Pepper&Carrot."

Before even thinking about tossing things like Google Fonts or AddThis or
whatever, the very first thing you need to do is turn on HTTPS. If you're
concerned about privacy, or content injection, or MITM attacks, or name-your-
poison-here, you must immediately only serve up pages via HTTPS with strong
encryption.

~~~
mpweiher
These seem completely independent to me.

\- HTTPS is for attacks.

\- What the article describes is run-of-the-mill tracking by Google etc.

If I am not being attacked, the CDN resources will still allow Google to track
me. If I _am_ being attacked the CDN resources will still allow Google to
track me.

If I don't have these Google resources (let's just use Google resources for
now), I don't think that Google will MITM me.

~~~
pdkl95
> HTTPS is for attacks.

You are _always_ under attack on the internet.

This isn't really hyperbole. While I'm sure it's possible to find the
occasional exception, you really need to assume all internet traffic could be
hostile.

\- Verison vandalizes most of the plaintext HTTP by adding their X-UIDH[1]
tracking-id header.

\- I's common to see Javascript appended to HTML files when they are sent over
HTTP on a cellular network. (it replaces images URLs with a very highly
compressed version).

\- If the HTTP socket intersects Great Firewall, more injected Javascript
might conscript your browser into the Great Cannon[2]. (also: the "QUANTUJM"
suite of tools that use packet races for similar purposes)

\- One of the goals is the privacy of the readers. Google isn't the only
attacker, and MitM is only one type of attack. If you aren't encrypting, your
requests being analyzed - probably several times - with DPI[3]. If you aren't
encryption, you are enabling _passive_ surveillance.

That's just some of the obvious stuff.

[1] [https://www.verizonwireless.com/support/unique-identifier-
he...](https://www.verizonwireless.com/support/unique-identifier-header-faqs/)

[2] [https://citizenlab.org/2015/04/chinas-great-
cannon/](https://citizenlab.org/2015/04/chinas-great-cannon/)

[3]
[https://en.wikipedia.org/wiki/Deep_packet_inspection](https://en.wikipedia.org/wiki/Deep_packet_inspection)

------
hhsnopek
The only issue with going against the grain here if you're not putting your
site itself behind a cdn. It'll vary in download rates across the global. This
was the intended use case for CDNs, but analytics are added so CDNs can
improve.

You're correct with the fact that they are tracking us, but there's a trade
off that comes with this that holds tremendous value. If that value of speed
isn't a factor or low on your list of priorities then by all means, sever
everything.

~~~
ehnto
The latency issue is only present once, the initial page load. After that the
resources are cached. Second to that, if you're following best practices for
page speed, the user will not notice at all because a snippet of CSS that
provides the initial layout and styles will be sent with the HTML body.
Amongst dozens of other things you can do to make this a non-issue.

~~~
enraged_camel
>>The latency issue is only present once, the initial page load.

The initial page load is also one of the most important things to optimize for
things like, you know, conversion of visitors to paying customers. I've given
up on subscribing to new products and services simply because their pages
weren't performing well, and I'm sure many others here have done the same.

~~~
ams6110
I've given up subscribing because pages don't work (as in don't render
_anything at all_ ) with JS disabled.

~~~
leeoniya
i've given up subscribing because pages look like shit in telnet:80

:D

------
cagenut
This post and half the comments are killing me on conflating "third party
javascript" with "CDN".

~~~
pselbert
Yes. While I completely agree with the author and their quest to eliminate
third party scripts from their site, the problem isn't with CDNs. The problem
is with third party scripts, most of which aren't coming from a typical CDN
(cdnjs, for example).

It is entirely valid, and common, to front your own application code behind a
CDN.

Love the sentiment, just wish the terminology was more accurate.

------
beardog
The code injection problem can often (but not always) be solved via
Subresource Intergrity [https://developer.mozilla.org/en-
US/docs/Web/Security/Subres...](https://developer.mozilla.org/en-
US/docs/Web/Security/Subresource_Integrity)

------
smnscu
After working at an encrypted/private email service, this is my cup of tea.
However, I'd like to go off-topic and point out that the comic looks
fantastically well drawn:
[http://peppercarrot.com/en/article383/episode-19-pollution](http://peppercarrot.com/en/article383/episode-19-pollution)

~~~
severine
Made with Krita!

~~~
chrismorgan
I’ve just recently been deciding on an app to use for drawing with my Surface
Book for illustrating all kinds of things, and I’ve settled with Krita in the
last week. It’s best-of-breed, and free to boot.

------
vbezhenar
CDN is common enough technique which should be standardized in browsers. HTML
should include link to resource hosted by site and its checksum. Now browser
can easily use cached resource from any other site with the same checksum or
just download it from site.

There are 2 reasons to use CDN. First is caching (different sites using the
same resource from the same CDN will download it only once), second is speed
(some browsers restrict connection count to the same domain, so hosting
resources on a different domains might improve download time). Caching is
better solved by using checksum as a key, instead of URL. Speed with HTTP/2 is
not an issue, because there's only one TCP connection. The only advantage of
CDN might be geographically distributed servers, so user from China would
download resource from China server instead of US server. I don't see easy and
elegant way to solve it, but I'm not sure it should be solved at all, HTTP/2
pushing resources should be enough.

~~~
fenollp
> CDN is common enough technique which should be standardized in browsers.
> HTML should include link to resource hosted by site and its checksum. Now
> browser can easily use cached resource from any other site with the same
> checksum or just download it from site.

I really like this idea! Store your heavy assets in a public DHT with each
browser storing a part. Then fetch said assets by content-hash if not already
in cache. Maybe disable serving for mobiles. The W3C needs to get on this!

~~~
willglynn
The W3C has a thing called subresource integrity, which is basically what
vbezhenar described:

[https://www.w3.org/TR/SRI/](https://www.w3.org/TR/SRI/)

However, there are reasons why e.g. hash-addressed JavaScript are not used as
a shared cache:

[https://hillbrad.github.io/sri-addressable-caching/sri-
addre...](https://hillbrad.github.io/sri-addressable-caching/sri-addressable-
caching.html)

~~~
fenollp
WRT the "timing attack":

In most cases, client does not even request bytes from CDN which is then not
able to track Client. But then again CDNs can implement tracking based on this
lack of requests (which is kind of ironic and should be infeasible the more
clients use this technique I think).

Actually the other issues are solved by the "DHT" part of this idea: no
centralized party can track which assets are already in your history.

The only tracking I can think of is by your nearest neighbours's browsers. If
such a neighbour N empties your cache (DNS attack?) it will trigger a full
fetch from N. Then N can attempt to fingerprint this assets query with what
other pages list. But then the whole point of this is to cache assets that are
used on most pages!

I love this idea. Let's make the Web decentralized again! (I couldn't resist)

------
jonchang
I use Decentraleyes to help with the CDN issue. It's not much but every little
bit helps I think.

[https://addons.mozilla.org/firefox/addon/decentraleyes](https://addons.mozilla.org/firefox/addon/decentraleyes)

------
kakarot
I use uMatrix and do not load external web fonts. I am stripping out CDN
reliance in our stack at work as well. This practice of supporting secure
protocols but still trading ease-of-development for end-user privacy &
security must stop.

------
blauditore
Maybe I'm missing something crucial, but why not just host the content on your
own server? I.e., just download that Google font, jquery.js or FontAwesome and
serve it directly instead of using an external CDN.

The post seems to say "I don't like where some content is coming from, so I
re-created said content by myself".

~~~
CapacitorSet
To first thought, there may be licenses in place preventing you from self-
hosting the content.

~~~
ocdtrekkie
At least in the case of Google Web Fonts and FontAwesome, I am almost positive
there is no issue with hosting locally.

------
JoshTriplett
Great to see someone paying attention to the problem of loading third-party
<script>s, and talking about the work required to avoid them.

~~~
pselbert
Before I knew it was a comic site I was amazed they took the time to copy all
of the icons they wanted as svg. Even knowing the author is an illustrator it
is still admirable and impressive.

~~~
JoshTriplett
That part didn't seem strictly required to address the third-party content
problem. They could have used the font icons, and just copied all the
necessary bits to their server.

Also, for anyone with a similar problem, consider backing
[https://www.kickstarter.com/projects/232193852/font-
awesome-...](https://www.kickstarter.com/projects/232193852/font-awesome-5) .
They're 15 hours from completion, and $38k away from a stretch goal to release
SVG icon support in the Open Source version.

------
splitbrain
It's awesome that nearly 10 years after I came up with MonsterID, it's still
going strong. I love those cats.

------
tscs37
Why use alternatives?

You can download the Google Web Fonts and serve them from your host.

You can also download and serve Font Awesome from local.

And there doesn't seem to be a reason why you can't do it with gravatar
either.

I don't get this post honestly. It seems to be about replacing stuff with
other stuff instead of replacing CDN with locally served content.

------
madeofpalk
Good. Another reason not to use these CDNs is they're additional risk and
introduce the potential for downtime and breakage. It's an additional point of
failure that just doesn't come with many benefits.

I'll happily use these services for quick POCs and throwaway demos, but once
anything starts to become semi-permanent I'll make sure I control my uptime
and host these assets myself.

~~~
this-dang-guy
I've started to leverage them with fallback, but I guess I'll see how that
plays out. (For fonts - I don't use anything else from a CDN, aside from front
caching with cloudflare)

------
dillondoyle
AddThis makes money by selling 3rd party audience segments to advertisers like
me. I assume they get this data by tracking what users view what pages through
their sharing buttons. Example segments I can buy to advertise too:
[http://i.imgur.com/JF6ZZPC.jpg](http://i.imgur.com/JF6ZZPC.jpg)

The author doesn't even mention the big players: every FB share or like
button, on all that nasty porn you watch (even in incognito mode), straight to
FB. They recently changed their policies and signaled that they are going to
start using this data for ad targeting, probably in a push to expand FAN and
be more competitive with Google.

Something as simple as a share button that some blogger copy and pasted into
their blog turned into an ad tech/data company!

I personally love that story and think that's cool and innovative thinking
from AddThis.

But I also think more data = better ads, at the expense of privacy (probably
not a popular opinion around here).

------
brianzelip
Off topic, but the root site of this blog post is pretty awesome - "Pepper &
Carrot: A free, libre and open-source webcomic supported directly by its
patrons to change the comic book industry!"

------
thinkMOAR
Wonder if there will be a time CDNs of these will pay you for the visitor data
you 'share/leak' with them via the linked resources (to convince you to keep
using them).

------
WildGreenLeave
I really like CDNs because of the ability to drop in a file and know it will
be cached correctly. (Also there is a high probability that your user already
has a cached version of the file) But never thought about CDNs being able to
track you.

Isn't there an alternative? A more transparant way to provide users with
source files and still keep the 'cached items' aspect.

------
ludwigvan
In the case of Google fonts, is it legally possible to download the font and
serve it from one's own server? The FAQ has a relevant section, but does not
answer this question:
[https://developers.google.com/fonts/faq](https://developers.google.com/fonts/faq)

~~~
wanda
IANAL but they would not appear to be able to construct a case against you for
using the fonts on your own server, since at no point is it stated that such a
practice would be in violation of the terms of use.

As you observe, they do not explicitly answer the question, but their
reticence should be taken as an implicit green light, encased in a warning
about loading times.

Most Google fonts are merely served from their hardware, and not created by
them, so the license selected by the font's creator applies. Think of Google
Fonts as an aggregator of free-to-use fonts.

There is also a list of fonts and their licenses available from Google Fonts
here:
[https://fonts.google.com/attribution](https://fonts.google.com/attribution)

If you're really concerned, check who created the font and see if they make
the font available under a permissive license on their own website. Lato, for
instance, is available from its creator's website and is published under the
Open Font License.

------
bandrami
So, here's where I mark myself as a dinosaur: why are you trying to set a
specific font for a web page? Clients select fonts for a reason.

~~~
jmcdiesel
Your question is answered by another question. Why does more than one font
exist?

~~~
bandrami
Because readers have different needs? Seems pretty obvious to me.

~~~
zachsnow
Historically "fonts" have been set by the author, not the reader. While I
appreciate that this is no longer necessary, it seems reasonable that authors
still want to choose it for reasons of presentation. Of course it's easy
enough to write a use stylesheet, so readers that need a different view can
get one.

------
nitwit005
The cats are pretty nice.

------
olegkikin
So your main argument is privacy, not letting Google collect users' data, but
then consider that most of your users are probably using Chrome, everything
they type in the URL box is sent to Google (for autocompletion) anyway.

Is looking at some comics website even a privacy problem? Let's say google
finds out your user X looks at your website. What possible damage can they do?
Sell it to the advertisers so they can target X with some comics ad? If you
ran a medical site, I would get it.

Then you have to give up other cool things like Google Analytics.

P.S.

Some beautiful artwork on your site.

~~~
flukus
> So your main argument is privacy, not letting Google collect users' data,
> but then consider that most of your users are probably using Chrome,
> everything they type in the URL box is sent to Google (for autocompletion)
> anyway.

I can't control information they willingly give away, but I don't have to give
them additional data on the people who chose not to send everything to google.

> Is looking at some comics website even a privacy problem? Let's say google
> finds out your user X looks at your website. What possible damage can they
> do? Sell it to the advertisers so they can target X with some comics ad?

For me it's because it ruins the search results. Just because I looked at a
comic doesn't mean I want them ranked higher in search results. I've found the
less google knows about me the better the search works.

> Then you have to give up other cool things like Google Analytics.

Personally I'm about to give up on it anyway. Maybe if your a big site it's
useful. I just want a list of page hits and the referrer url if possible.
Analytics completely fails at this.

~~~
mavhc
> For me it's because it ruins the search results. Just because I looked at a
> comic doesn't mean I want them ranked higher in search results. I've found
> the less google knows about me the better the search works.

Isn't that what clicking the Globe icon next to the Cog icon does?

