Hacker News new | past | comments | ask | show | jobs | submit login

There are some really valid points in here and I dislike the idea of using the whole font when only a few icons are required.

But, isn't subsetting going to result in users now caching your subset instead of a cached copy of everything? I would think that does more harm than everyone grabbing a fully cached copy once from a cdn.

It's one of those things that works "in a perfect world", but in the real world it just doesn't work out that well.

For starters, leveraging caching via a common CDN pretty much requires everyone to be using a single version from a single CDN. If you can't agree on that, then every time a new version comes out the web is split and the caching doesn't work, and every time someone decides to use another CDN (or someone provides a new one) the group is split again.

But then split that across all the fonts, formats, and compression schemes available and you'll see that the chance that a visitor has seen that font, at that version, from that CDN, using that compression scheme, in that format at any point in the past EVER is actually significantly smaller than you'd think.

Which brings us into the next point. Even if you've seen it before, the chances that you'll have it cached is pretty small. Browser caches are suprisingly small in the grand scheme of things, and people tend to clear them more often than you think. Add in privacy browser mode and "PC cleaner" programs and the average person's caches lasts much shorter than at least I expected it to.

But even worse are mobile caches. IIRC older android had something like a 4MB cache!!! And until very recently safari had something like a 50mb limit (and before that didn't cache ANYTHING to disk!). Now it's better, but you are still looking at a few hundred MB of cache. And with images getting bigger, big GIFs being common, huge amounts of AJAX requests happening all the time in most web pages, you'll find that the browser cache is completely cycled through on a scale of days or hours not weeks or months.

IMO it's at the point where the "dream" of using a CDN and having a large percentage of your users already have the item in their cache isn't going to work out, and you are better off bundling stuff yourself and doing something like "dead code elimination" to get rid of anything you don't use. And that method only becomes more powerful when you start looking at custom caching and updating solutions. A few months ago I saw a library that was designed to only download a delta of an asset and store it in localstorage so updates to the application code only need to download what changed and not the whole thing again. Sadly I can't seem to find it again.

> For starters, leveraging caching via a common CDN pretty much requires everyone to be using a single version from a single CDN. If you can't agree on that, then every time a new version comes out the web is split and the caching doesn't work, and every time someone decides to use another CDN (or someone provides a new one) the group is split again.

All this common web stuff that is distributed by several CDNs (as well as separately by individuals) really suggests to me that there should some browser feature like `<script src="jquery.min.js" sha256="85556761a8800d14ced8fcd41a6b8b26bf012d44a318866c0d81a62092efd9bf" />` that would allow the browser to treat copies of the file from different CDNs as the same. (This would nicely eliminate most of the privacy concerns with third-party jQuery CDNs as well.)

Because anything that can cross domains instantly allows anyone to probe your browser to see what is in your cache.

So to take it to a bit of a rediculous (but still possible) point, I could probably guess what your HN user-page looks like to you. So from there I could serve that in an AJAX request to all my visitors with this content-based hash and if I get a hit from someone, I can be pretty damn sure it's you.

And that only really solves one or 2 of those issues. The versioning, compression schemes, formats, number of fonts, and sizes of browser caches will still cause this system's cache to be a revolving door, just slightly more effective.

And as for the security concerns of using a CDN. Subresource-integrity (which someone else here linked already) allows you (you being the person adding the <script> tag to your site) to say what the hash of the file you expect is, and browsers won't execute it if it doesn't match. So that lets you include 3rd party resources without fear that they will be tampered with.

Ideally this would be used with a sort of `Cache-Global: true` header in HTTP, and then you would only be able to grab things that are intended to be cached like this. It would do nothing to stop super-cookies with this method though.

Security hole: This could leak hash preimages that the user has in cache but are sensitive.

Solution: Using a sha256="..." attribute should only allow you to access files that were initially loaded with a tag that has a sha256 attribute, and this attribute is only used for resources the developer considers public.

The mechanism here (if not the intent) is pretty similar to subresource integrity[1].

[1]: https://developer.mozilla.org/en-US/docs/Web/Security/Subres...

Was about to reply with exactly that information, but as it turns out apparently doing content addressable caching via the SRI mechanism has some problems and maybe is not possible:


Yeah have thought about this a few times myself. Maybe missing something that makes it impossible/risky? Or maybe its just the tendency to ignore simple solutions.

This not only solves the CDN issue but it also solve the issue of having to rename the files manually everytime someone do a change. It just makes caching that much saner.

It can be used to subvert the same origin policy and content security policy.

If you see a script tag with the URL bank.com/evil.js, the browser shouldn't assume that the bank is actually hosting evil.js. Even if the hash matches, the content might not be there.

The bank might be using a content security policy to minimize the damage that an XSS attack can do. It only allows script tags from the same origin. However, now an attacker just needs to load evil.js with a particular hash into the cache, and they can create the illusion that the site is hosting it, without having to hack the server to do so.

This is the dream of IPFS.

Awesome description of the fragmentation and browser cache size problems that prevent these shared CDNs for common JS/CSS/Whatever files from providing optimal benefits the vast majority of the time. The challege is that oeople still say "well, even if it only works some of the time, that's OK."

It's not. Because does this hurts page load times.

You are having to create a new, cold, TCP connection to go fetch 50-100KB of CSS/JS/whatever from some random server. Which even in HTTP/1.1 is usually slower than just bundling that into your own CSS/JS/Whatever. HTTP/2 makes it even more so.

Just store and serve these things yourselves.

These things highlight how the current system of font distribution is really suboptimal. Even CDN hits are metered, and the idea that I need to either load or cache a bunch of data to render text is dumb.

My employer manages like $20k devices. I betcha we spend 5 figures annually on this crap.

It's a deceptively hard problem to solve.

installing a ton of fonts up front takes a pretty significant amount of space, installing a subset for their language/preference or letting the user manage it makes it VERY easy to fingerprint users based on what fonts they download, and doing any kind of cross-origin long-term caching is a security nightmare as it lets you begin to map out where a user has been just based on what they download.

Exactly, but I would take it further...

It solves a problem that people other than web designers care very little about, but costs me money and creates a slew of other problems...

Personally, I wish it was easy to just turn off!

Fonts are a pretty important factor in design. Most people may not explicitly notice it, but it certainly affects the impression they get from a website.

You could compare it with http/2: If you do a survey, you won't find many people even knowing it. That doesn't mean it's useless to them.

> Most people may not explicitly notice it, but it certainly affects the impression they get from a website.

Most people already have attractive, readable fonts installed on their computers, which are likely either sensible defaults, or configured for specific reasons (e.g. minimum size to deal with eyesight). Web pages that render as blank white space for awhile, or briefly tease me with text before blanking it out, give me a much more negative impression than ones that simply render in my chosen default fonts.

This is an interesting comparison because web fonts have the opposite effect of HTTP/2: They introduce a huge delay between clicking a link and being allowed to actually read anything.

On 3G or shaky Wi-Fi, I've regularly given up on browser tabs because all I see is an empty page, even after half a minute of loading and when most images have finished downloading. (Maybe other browsers are better than Safari, but I won't switch just to see prettier fonts.)

about:config browser.display.use_document_fonts=0

That will only work once designers start using SVG instead of icon fonts.

I've been blocking web fonts for a while, and it feels like I have to whitelist one site out of three because it depends on icon fonts.

If old mobile browsers have 4MB caches, 160+k of that is a big chunk. If you could reduce it to e.g. 10k, you'd need 16 sites all using FA with different font selections before you equal the original size. There's a reasonable chance that it's an investigation worth doing.

Another option: find the most-used icons or combinations. Group them.

Another option: similar to nandhp, get a hash of the font selection and name the file so. There's a very good chance a nearby proxy has that combination stored already.

Interesting. These points seem to apply to ld.so, as well :)

Wouldn't some hashing allow to change cdn without problems? There'll be a lot of room for improvement until the web looks closer to a big p2p network.

> But, isn't subsetting going to result in users now caching your subset instead of a cached copy of everything?

Disk is cheap. Particularly disks that you don't pay for like your users' disks.

It's not about disk, it's about network I/O. Making your resource "more unique" means more cache misses and more requests that need to be served (in theory anyway, see Klathmon's sibling comment[0] for more on this).

[0]: https://news.ycombinator.com/item?id=13138826

Not being a fan of used shared CDNs for static resources[1], I don't see the issue here. They're going to have to download something anyway so from the perspective of your users it's still one download, just smaller. With proper unique namespacing (unique URL per version) and HTTP Cache-Control headers, they only have to download it once (assuming they don't clear their local cache).

[1]: Combination of security reasons and unnecessarily coupling apps to the public internet.

What an interesting thought. I wonder what the actual user base of a library would have to be before it would even itself out and then go over the threshold of where you would see a return. Certainly 74 million sites should do it if they were all using the same CDN but I have no idea how you would start to even try to calculate this.

Usually when I want to use under 10 icons, I just download an image of each with this handy tool: http://fa2png.io (not affiliated, just a user)

Unfortunately it's the same with Bootstrap and other JS/CSS frameworks/libraries like that. You typically only use a small subset but it is non-trivial to carve out the much smaller set that you need. There is some tooling that claims to attempt at cleanup, but not sure how tested they are.

The distribution of icons used is probably not uniform, so it's not like a worst case scenario of all misses all the time. Just the less popular ones.

Even if you subset is the same as mine, if each of us is doing the subsetting ourselves, we won't share the same URL, hence the browser will still fetch it twice.

On the other hand, it seems that FA themselves are building a CDN with subsetting, so they could in fact provide those shared subsets. Unfortunately (but understandably) it's paid, so most of us can't use it.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact