Hacker News new | past | comments | ask | show | jobs | submit login
Please stop using CDNs for external JavaScript libraries (shkspr.mobi)
627 points by edent 9 months ago | hide | past | favorite | 347 comments



I started my career, and have spent most of it, working in places where the production network was (almost) airgapped from the entire internet (MPAA accredited facilities). I would say that the general quality of software and and robustness when it comes to dependencies is so much greater in these places. If you want to use some library, it’s up to you to get it, and its dependencies, check the versions are compatible and package it up and build it all internally. Yup, it’s work... Do you need this library? Is it actually any good? Is the license compatible with our usage? This is all basic code quality stuff that’s often completely overlooked when people can just pull in whatever junk from whatever trendy repo is the hotness nowadays. And then when that goes down/bankrupt - it’s up to you to fix something you’ve no control over.


We provide software that runs within very secure financial networks, and have some extreme constraints regarding what sorts of 3rd party code we can pull in. We are having to do a vendor swap on some document parsing code because one of our clients scanned a dependency and found it could be vulnerable in a context that it would never be exposed to in our application. These types of things make it really risky to go out and build your enterprise on top of someone else's idea of a good time.

Virtually everything we do is in-house on top of first party platform libraries - i.e. `System.`, `Microsoft.`, etc. We exclusively use SQLite for persistence in order to reduce attack surface. Our deliverables to our customers consist of a single binary package that is signed and traceable all the way through our devops tool chain, which is also developed in-house for this express purpose of tightly enforcing software release processes.

This approach is certainly slower than vendoring out everything to the 7 winds, but there are many other advantages. Every developer knows how everything works up and down the entire vertical since its all sitting inside one happy solution just an F12 away. Being able to see a true, enterprise-wide reference count above a particular property or method is like a drug to me at this point. We are definitely over the hill and reaping dividends for building our own stack. It did take 3-4 years though. Most organizations cannot afford to do what we did.


Couldn’t this end up being disaster with a large codebase and even a small amount of turnover? Are you really advocating that someone should write all their dependencies themselves even if they can afford it?


I’m find it amusing that we’ve reached the point where developers can no longer imagine a shop that writes its own software.


I think the problem is mostly cognitive. The codebase should be viewed as the most important investment that a software company could ever hope to possess.

Within the codebase, the infrastructure and tooling is by far the most important aspect in terms of productivity and stability of daily work process.

If you take the time to position yourself accordingly, you can make the leverage (i.e. software that builds the software) work in virtually any way you'd like for it to. If it doesn't feel like you are cheating at the game, you probably didn't dream big enough on the tooling and process. Jenkins, Docker, Kubernetes, GitHub Actions, et. al. are not the pinnacle of software engineering by a long shot.


A company's codebase is a liability, not an asset - it needs to be maintained, and as you point out, it needs money spent on tooling and infrastructure to be most effective.

Unless you happen to be one of the very rare companies that sells source code and not built artefacts, your asset is the built artefact and your code is the expense you take on to get it.

Having less code to get the business outcome only makes sense when you see the code as a cost, not a thing of value itself.


> A company's codebase is a liability, not an asset

Its an asset. Like many (virtually all, other than pure financial) assets, it has associated expenses; maintenance, depreciation, and similar expenses are the norm for non-financial assets.

> Unless you happen to be one of the very rare companies that sells source code and not built artefacts, your asset is the built artefact and your code is the expense you take on to get it.

No, things that are instrumental to producing product are still assets, not just the things that you sell. That's true if its machines on your factory floor, if its the actual real estate of the factory, or vehicles that you use to deliver goods. And all of these assets, like a codebase, have associated expenses.

The whole "code is a liability, not an asset" line is something from people who might understand code, but definitely don't understand assets and liabilities.


From a pure accounting perspective, yes, but the whole "code is a liability, not an asset" crowd aren't using the strict definition of it by any means.

Its more just a mental footnote that the contract with the customer is what is valuable, and the code is either supporting that value or destroying it. So if you can have the same contract with the customer for less code that's the outcome to strive for.


I think this perspective comes from the opinion of folks that generally want to cut costs, not leverage their assets to their fullest. If your means of production is an asset that you understand how to leverage over your competitors your mindset won’t be “we really need to minimize this codebase”


I think the saying is drawing on the same idea as Bill Gates' opinion that measuring progress on software by lines of code written is like measuring progress on an airplane by weight.

Airplanes have many functional parts and many not-so-functional parts. There are parts of the airplane that will, if removed, prevent it from working in various critical ways.

But from another perspective, all of those parts, decorative, functional, or essential, are liabilities, dragging your airplane back toward the ground when you want it to stay up in the air. The fact that a particular piece is important doesn't mean it's less of a problem having it; it means you have to suck it up and work around the problem.

Source code is like this. The mere fact of its existence causes problems. Some of it you can't do without. But it's causing problems anyway, and if you can do without it, you want to.


> Source code is like this. The mere fact of its existence causes problems.

So you delete your source code after you ship an app? That doesn't make sense. The source code is one of every software company's greatest assets which is why we have created so many tools to keep track of it like version control, automated testing etc. Otherwise there would be no point in clean code, code documentation etc, just write something that solves the problem and done! No need to even check in the code, just build a binary locally and ship!

I understand the point, but saying that code is a liability and not an asset is false no matter how you look at it. Source code solves a lot of problems you can't solve with binaries, it lets you adapt to change much better. So instead of saying that code is a liability, use the business saying "Focus on your core business", meaning don't write code for things that isn't your core business.


> I understand the point, but saying that code is a liability and not an asset is false no matter how you look at it.

I didn't say that. I said it was a liability. "Don't write code for things that aren't your core business" doesn't tell you that you should try to minimize the amount of code that addresses your core business. But you should.


You misunderstand the analogy. An airplane could be very heavy because it is a very large aircraft with a lot of payload capacity - that would be good. An aircraft could also be very heavy because it is inefficiently built and thus have very little payload capacity. Similarly an aircraft could be light because it is small, but still be inefficiently built, or the reverse. Bill Gates is complaining about evaluating an aircraft by its weight because its weight doesn't on its own tell you about any of the stuff you really care about with an airplane.

Likewise, code could be many lines because it is well formatted and robust. Alternatively it could be long because there's a lot of repetition and bloat. It could be short because it is very targeted to what it needs to be, or it could be very short because the developer used a lot of unreadable code-golf tricks.

In any complex system, you have optimization problems. Very rarely is the answer to an optimization problem simply to minimize an isolated variable. You can not say a lighter airplane is better than a heavier one without asking why it's lighter; likewise a smaller codebase with fewer lines can not be called better unless you know why it is smaller.

Just because a plane could get off the ground without something does not mean that cutting that thing to save weight will move the plane closer to its optimal design. Likewise just because the code base could be reduced does not mean that actually makes it more maintainable and better performing.


It comes from the fact that code depreciates like a car. Yes it does make you a profit (gets you to work) but after 20 years nobody wants your code (car) unless you have done rigorous maintenance. Having the biggest codebase(car) is not a benefit. It's kinda like measuring your software product by how many cups of coffee your developers have drunk.


> It comes from the fact that code depreciates like a car.

If a business owns a car that they use to make a profit, guess where the car is in the balance sheet?

> Having the biggest codebase(car) is not a benefit.

Sure, lines of code is not how you measure the value of code as an asset, just like tons of gross weight isn't how you measure the value of vehicles as an asset.

That doesn't mean code, and vehicles, aren't assets.

“Lines of code are a measure associated with maintenance cost, not asset value” is a reasonable statement. “Code is a liability, not an assset” is not.


No, it comes from wanting to leverage your asset to the fullest. But that's hard to do if it's full of cruft and getting in the way of actually delivering value.

More lines of code is not equal to better. What you need is better lines of code, which usually means less of them.

Let a dev loose on a codebase and they could add value, but they could also be subtracting it. Hmm maybe developers are the liability.


One needs to keep in mind though, that minimizing a code base can have significant cost as well. (1) the cost of doing the minimization itself (2) the cost of runtime or compile time differences if any. Also one needs to keep in mind, that less code does not mean easier to maintain either.

Perhaps the amount of code is the wrong metric to optimize. Perhaps it is readability, simplicity, maintainability that we really want and those are harder to put into numbers than LOC.


> Its an asset. Like many (virtually all, other than pure financial) assets, it has associated expenses; maintenance, depreciation, and similar expenses are the norm for non-financial assets.

To get technical about it, code whose discounted future maintenance costs exceed the discounted revenue it is expected to bring in or helps to bring in is a liability. Some maintenance costs are invisible, such as having team members leave; leading to recruiting expenses to seek a replacement and the associated ramp time when they're hired. Companies are often not equipped to assess the cost side in any way that approaches reality and just stick the cost of the "code" on the balance sheet as capex. Having an asset on the balance sheet after doing this doesn't mean you have an asset in reality.

> The whole "code is a liability, not an asset" line is something from people who might understand code, but definitely don't understand assets and liabilities.

Code can be a net liability. Unless you are looking purely at the asset side of the balance sheet and ignoring liabilities, which tends not to be very useful.


I think many folks forget that nobody wants to use your software. They want something that maybe your software can help them get maybe. But if the problem could be solved without your software, they'd try to remove your software ten times or more if they could.


This is misleading and pretty much wrong. It’s like saying the hen is a liability, the only asset is the egg. Or like saying your team is a liability and the only asset is the work they produce..


But these are both true. The problem is only if you assign a derogatory meaning to the word liability. If the company could provide the solution it does without a team, or with a smaller team, it would - the purpose of the company is to provide the solution, not feed team members. If we could get eggs without taking care of hens, we would - they're a pain to take care of and feed.

The moment you want to minimize something while still achieving your goals you know it's a liability. Do you want the same profits or solutions with a smaller team? Then the team is a liability. If team were an asset you'd be trying to hire a bigger team without any work for them to do. If you had the chance to double egg production with constant demand, you'd eat or kill half your hens. They're a liability.


> The problem is only if you assign a derogatory meaning to the word liability

That's only true if you assign an appropriate meaning to the word derogatory.


A company's codebase is a liability, not an asset

Why not just throw it away, then?


It’s a liability in the sense that a rusty drain pipe covered in duct tape is a liability. You think of it as the source of your problems, but throwing it away would be worse.


> It’s a liability in the sense that a rusty drain pipe covered in duct tape is a liability. You think of it as the source of your problems, but throwing it away would be worse.

So in your analogy, what's the software equivalent of replacing the drain pipe?


If you can, you should.

If you can't, it's (IMO) because you're stuck with the burden of having to write and maintain that code.

If it's an asset, why not just write more?


> If it's an asset, why not just write more?

Companies do write more.


It's still an asset, it's just that most assets require maintenance.


> A company's codebase is a liability, not an asset...

> your asset is the built artefact

Therefore, summarized, you mean that the sourcecode needed to generate the resulting <app/service/whatever> is a liability, but that the result can be an asset (if it does generate external revenue, or internally lowers costs ,etc..)?

I personally never thought about this kind of separation - interesting.


I mean that source code comes with costs, often substantial, but has no direct benefits.

It's easy for us as developers to think that source code is valuable - but this leads to problems like never removing code "in case it's needed", or with an in-house dev team developing systems you could get off the shelf.

If code is an asset, then it makes a lot more sense to write stuff yourself: you not only get the artefact, you also get the source.

If it's a liability, then it makes a lot more sense to let someone else bear the costs of that liability, especially if they have economies of scale, except where you can't get your desired outcome other than writing code.

This is of course technically incorrect. It's perhaps more accurate to say that source code requires upkeep and is expensive to maintain. That tends to draw less interest and discussion, because it's "obvious." Except as an industry we're overall pretty lousy at paying the required upkeep on code.


I think you have a point. But essentially what you're saying boils down to unmaintainable, untestable, unrefactorable, obsolete code being a liability. I don't think anyone disagrees with that and I personally have been involved with a (semi-popular public cloud) service deprecation myself precisely because it was legacy and simply had to be rewritten to be moved to a shiny new home that used new-age systems owned by another team that was doing a stellar job at upkeeping and innovating on those (though one could easily classify their "innovation" as NIH). The code that they wrote wasn't a liability at all, but in fact, it kept seeing active investment from all engineering and product management angles.


Think about a machine in a factory. You have to maintain it and it doesn't generate revenue by itself. It generates the product you then have to sell. The machine is still an asset.


It always comes down to resources. In my experience an underdeveloped or funded NIH project happens far more often than most tech teams would admit. That said, when it’s only lean on vendors and no invention is happening then it’s basically an expanded IT department and will be eventually treated as such.


The software community has built a lot of extraordinary tools that have been through a lot of battle testing. Pretending those lessons aren’t worth something and thinking you can do it all yourself is a mistake a lot of the time.


If you're taking about foundational dependencies like OpenSSL, Linux, LLVM, or even jQuery and React, sure. Also most stdlibs and DBs, like GP said he uses, also fall into that.

Dependencies in general, like 95% (or more) of the kind we see in modern package managers? No, they're mostly untested liabilities and the majority of them could be rewritten in an afternoon.

This whole discussion is a bit strange. GP clearly uses dependencies, just not as much as everyone else today. I don't understand why the fixation with polarizing the discussion into "use lots of dependencies" vs "write everything from scratch".


"The software community" also built npm, which then enabled "extraordinary tools" like leftpad.js... I suspect there's a lot to be learned from "those battle-tested lessons", but pretending everybody is learning them is a mistake.

Dependancies are inevitable.

I'm OK with depending on glibc and _almost_ as OK with depending on openssl. And of course I inevitable rely on gcc/clang/CPU microcode/transistor photolithography...

But those examples are a world away from npm pulling in 12+ levels of random dependancies of unknown origin buried so deep it's all but impossible to audit - and the segment of "the software comm8unity" who built that particular house-of-cards and then promotes it blindly to the segment of the software community who obliviously use it to generate code that runs on expensive production platforms in business critical applications just boggles my mind.

(And makes me alternately weep and drink heavily - having to support exactly that in production because "velocity" and "moving fast and breaking things" are considered more important that quality or security by both management and clients - both of who will happily point the finger of blame elsewhere when the inevitable happens, in spite of having been repeatedly warned... :sigh: )


You definitely have some valid points. On the other hand, I just did a project at my company that had dynamic PDF generation from web-app data as a requirement. Being a shop that in no way has the experience, time, or budget to build a dependency-free in-house tool to do this, I instead was able to find the jspdf library, pull it in from node, and implement the functionality in an afternoon.

Believe me, as someone who cares about code, I far prefer a world where I know how every line of code in a system I'm building works. In fact, in my free time, I do just that. But my job as an in-house software dev for a non-tech company is to solve business problems with technology, and as with every business problem, there is an acceptable level of risk that you need to be OK taking. In our case in this post-COVID world, people's jobs are literally depending on us iterating quickly.


Oh sure. I'm totally not saying that the right point on the velocity/quality spectrum is always right over at the quality end (or even that those two choices are fundamentally orthogonal), and by using the phrase "an acceptable level of risk" you're demonstrating you're the sort of developer that understands there _is_ risk. And I'd assume that means you'd have pushed back if someone had instead "and this PDF generator needs to run on the same servers and have access to the databases with all our PII and financial information on them".

I spend a lot of time talking to "full stack developers" who came out of graphic design into front end web dev and then fell into nodejs backend dev, who's only "architecture" course was about drawing buildings and who's main complaints about security requirements are that bars across windows look ugly. A startlingly large number of them don't even know there are things they don't know. I'd say fewer than half of them could tell you what OWASP was, and fewer than 10% of them could tell you what SQLi or XSS was, and whether their sites need to consider them as attack vectors. Most of them just say "I use $frameworkDeJour, it handles all the security stuff!"

(BTW, last time I needed to supply dynamic PDFs in a web app, one of the good fullstack-via-FE-and-graphic-design devs build the PDF generation in the browser (using some random js/pdf library) so we could just feed it JSON from the backend. Made me sleep better at night doing it that way...)


Out of curiosity, why would you expect 5x more people to know what OWASP is than to know what SQL injection is? I would have thought it would be the other way around.


In my head at least, knowing of the existence of a respected/curated list of problems/solutions in your area of expertise is likely to be more widespread than knowing the details of specific items on that list.

I know _of_ HIPPA regulations, but not being in either the US or in healthcare records, I have very very vague notions of the HIPPA requirements. Same with PCI compliance - I know there are important rules and requirements, which I don't fully understand the details of, because I choose to use 3rd parties like Stripe to handle all my CC processing so that those requirements don't apply to me (with there exception of needing to understand the risks of webapps and problems like XSS in the context of Stripe-powered CC forms).


The community had produced 1000x as many tools that are garbage. It can sometimes be hard to tell the difference (if the developer cares to look at all).


It is not uncommon to find cases where the whole monstrous library was brought in just because someone needed 5-lines trivial function. And these add up. And then we all get surprised how come this page weights megabytes and barely does anything fancy besides screwing-up scrolling.


I think this is simply the recognition that our expectations of what team of developers can do in a given time has surpassed what that team can actually write themselves. We know understand that teams have to lean on open source, or buy libraries, in order to achieve what we expect within a reasonable time.

The reason is very obvious - the most expensive part of most projects is the people. Why pay them to write code that you can just download for free? That's a tremendous waste of money.


This is exactly it.


Let's start a car company by first building an iron smelter? Using available tools is not some kind of crazy idea that software engineers came up with in the last ten years.


You should raise that point to all the clueless shareholders who relentlessly pulled even technical control out of the hands of the techies.

Most of us just do what we're told and we're in no position to question the "company's priorities" -- nevermind the fact that very often those priorities actually align with the techie's vision.


If a company isn't competent enough to even check their dependencies for vulnerabilities, as GP's company is, how could they be competent enough to write and check their own alternative versions?


Original commenter advocates for writing your own, presumably "from scratch", and mentions high-risk targets. Even if you don't take those for granted, though... Let's assume lower risk than a studio, and you relax the conditions from developed-in-house to maintained-in-house (e.g., a library exists, so you go grab it, and by the power of open source, internally it's now "yours"—a fork that you take full responsibility for and have total control over, just like if you had developed in-house, except you're shortcutting the process by cribbing from code that already exists.)

Here's an unrecognized truth:

The cost of forking a third-party library and maintaining it in-house solely for your own use is no higher than the cost of relying on the third-party's unforked version. Depending on specifics, it can actually be lower.

Note that this is a truth; the only real variable is whether it's acknowledged to be true or not. Anyone who disputes it has their thumb on the scale in one way or another, consciously or unconsciously.


You're really claiming this is a universal truth? Do you think it applies to OpenSSL? Chromium?


I didn't say it was universal. OpenSSL and Chromium are outside the scope of "CDNs for external JavaScript libraries".

(I actually included the appropriate hedging to clarify that my comments are scope-locked to that topic and to prevent digressions like this, but I edited it out because it made the comment too hard to read. Goes to show...)


Reading your "the cost of forking a third-party library and maintaining it in-house solely for your own use is no higher than the cost of relying on the third-party's unforked version" I didn't realize at all that you were trying to only talk about JavaScript that runs client side. I think your deleted clarification would have been helpful!


> I didn't realize at all that you were trying to only talk about JavaScript that runs client side

Well, not just client-side JS; server-side, too, or anywhere that NPM is used, but even more than that: e.g. other package managers that were influenced by or work similarly to NPM and encourage a similar package-driven development style, e.g. Rust's crates.io or the Go community's comfortability with importing by URL. It applies for many of those cases, too, it just wasn't the focus of my comment.


For cryptography one should just be able to rely on their OS' library, and depending on a full browser with high amounts of code churn, no compatibility for implementation code and a large dependency graph of its own is not really seen as a good thing in this context at all.


Let's be more specific: do you think Brave would be better off if they hard forked Chromium?


Chromium ships with Windows and is maintained by Microsoft. Use the OS crypto library.


libjpeg?

Specifically, I think hard forking is a bad idea for any sort of library that needs to be regularly updated for compatibility or security reasons.


That's possibly true if you don't have headcount for doing that maintenance. If you have appropriately planned for it however, it's just more software that you're writing to do the work you need done.

If you're depending on some random person on the internet to update software which underlies your whole stack, then when the next imagetragick drops you can't update until they get around to fixing it. Since you won't have developers familiar with the code, fixing it won't likely be feasible for you. That's a lot of risk.


> Note that this is a truth; the only real variable is whether it's acknowledged to be true or not. Anyone who disputes it has their thumb on the scale in one way or another, consciously or unconsciously.

Could you elaborate on the actual argument for why this is the case. On the surface it seems like the opposite of what you are claiming can just as easily be true depending on the situation. For example take a widely used utility lib such as lodash or jQuery. In your scenario there are two options:

1. Use lodash via a package manager and rely on the lodash team to fix bugs, write tests, and add new useful utilities over time.

2. Fork lodash and take on the maintenance burden yourself. You are responsible for keeping up with security vulnerabilities and making patches. You are responsible for writing high quality tests for the parts of your code that diverge for the original.

Think about how much ramp up time is required for new hires to become familiar with a company's codebase. Why would you ever want to devote that amount of time to maintaining code that for the vast majority of businesses is already "good enough". For some companies the engineers working on the open source project may even be more competent than the resources available in house. Sure there may be edge cases were performance or security is absolutely paramount, and in those this approach may make sense, but not for the majority of generic CRUD apps.

I would have a very tough time trying convince a competent manager that it is beneficial to devote so many man hours to this task rather than to business logic, new app features, etc.


I'm not really saying you should write anything from scratch that you don't have to, just that you should treat a dependency as something that you did. Therefore, review it, check compatibility, and have some named team responsible for its maintenance and availability.


Even without turnover, I've no idea how you're supposed to compete with the quality of top open source software. Maybe no one finds the bugs in your software, but they're definitely there.


We don't try to compete with the quality of top open source software. Our stack fundamentally consists of:

C# 8.0 / .NET Core 3.x / AspNetCore / Blazor / SQLite

Of these, SQLite is arguably the most stellar example of what open source software can provide to the world.

Everything else in our stack consists of in-house primitives built upon these foundational components.


It's funny to me the number of developers who have effectively forgotten that Microsoft exists, and that it's possible to have your entire stack be provided by one company who directly sells it's software for profit.


That's really interesting. My team's stack is the same except we use Azure SQl server instead of SQLite.

I'd love to understand why you chose that.

Feel free to hit my up at the address in my profile if you don't want to talk here.


Perhaps the parent's situation ("software that runs within very secure financial networks" cf https://news.ycombinator.com/item?id=24747781) prevents them to rely on an external service?


We cannot rely on anything outside the client's secure network. There are a few exceptional items that are allowed to talk to the internet, but our system's data persistence layer is not one of them.

Effectively, our client's operations cannot rely on cloud services and all of the related last mile connectivity into their infrastructure. If AWS/Azure/et.al. go down, many of our customers are still able to continue operating without difficulty.


In the frontend space?

When I go looking for a dependency, I check the license and I have a quick read of the code.

Last time I did that was for autocomplete. I checked the most popular six options; in the space of 2 hours, I found obvious-from-reading-code bugs in all 6.

None of them had a CLA signed by contributors, so there's really no evidence their code is genuinely available under the license they claim to offer.

I wrote my own. It took about 3 hours initially plus 2-3 hours ironing out edge cases found over the following weeks. It only added 700 bytes to my bundle.

Total time spent: 1 days work. Smaller code, loads fast, free from license issues, does exactly what I want.


It depends on the context. Yeah it'd be lovely to have your system connected to the internet - but no, our clients wouldn't give us money if you do that, and we need money. So, maybe yes in this case, start writing some quality, well thought out maintainable libraries (that can include audited third party code), and just bill it. In my case, the cost to the client of a team of devs working on that was less than the cost to them of the risk of a film leaking...

[Edit] - But what I have found through experience is that the code that was written under these constraints seemed to be better, more secure and robust, than without them. YMMV.


A lot of time when you are doing high critical environments you tend to be less cutting edge and rely more on old proven technologies that is boring, but dependable.


I assume if you have the money to be that thorough you have the money to offer some inducements to stick around.

Plus if all you know is this custom stack, where are you gonna go?


Anywhere that hires good software developers, since if you learned this stack it's presumably not hard to get a job somewhere else? Exooglers have a pretty easy time getting hired.


This is probably better from a quality perspective and also probably not worth the time it would take in 90% of projects.

That being said I'm amazed how much production software depends on multiple libraries that are developed and maintained by a single person as a hobby.


That one person (sometimes) puts in more effort into that single library then I've seen some agencies put into a client project.


Intrinsic vs extrinsic motivation.

It's pretty common for someone building a library to do so for the utility, scratching a real itch. Not universal (people build libraries to be someone who built a library, too), but common.

It's very common for agencies to be building client projects in order to bill for it. Sometimes there's a special alignment where the client is also primarily interested in spending an allotted budget so long as there's a plausibly adequate deliverable.


It's hard for me to think of open source libraries that are developed and maintained by mega corporations that I actually prefer to use over libraries made by small indie developers. The only one that I can think of is pytorch, but thats kind of unfair since it was acquired by Facebook not developed from scratch by them.


Golang is pretty well liked.

And Guava in Java land, and Abseil in C++.

It's not too surprising that the stuff big companies make for their needs isn't as useful for small indie devs as stuff small indie devs make.


For me, the biggest is React... aside from that, not much really.


Sure, everything's a tradeoff - but sometimes I see things like that as frontloading the pain of when bndlrrr.io or whatever goes down in the middle of the night and your client is angry at you. But yeah, like everything in software, it's a spectrum and 'best practice' and the 'right way' are highly dependent on the context.


Yeah like as a frontend/web developer, most of what we do is make what are essentially Wordpress themes for some company that really doesn’t matter or do much important.



Ex-Amazon SDE here. The same happens in FAANGS: tons of software is written and served internally.

It's not NIH syndrome, usually. It's about having control over the whole software supply chain for security, reliability, licensing compliance and general quality.


As another ex-Amazon... there's a whole bunch of NIH going on there too. The way I saw it, it tended to be split three ways:

1) NIH. Almost always the problems that need solved are interesting, and engineers are naturally chomping at the bit to solve them. Added bonus you can potentially make a name for yourself. This happens way more than it should, in cases that don't meet the other two ways I saw. Solving problems that have already been solved very effectively and efficiently, in a mature low friction fashion.

2) It doesn't scale to needs. A lot of software just doesn't scale to the requirements of the platform. It's hard to understate just how much traffic and work a lot of Amazon infrastructure has to handle. Most software doesn't scale that well because it's not run in so big an environment. We're using some well known commercial software at my current employers (because it works, has a good reputation, and did everything we need), that is experiencing major scaling issues because we're literally orders of magnitude larger than any of their other customers. We're seeing stuff they've never had to deal with before. We're not even close to Amazon's scale for this particular type of software.

3) Need to control the entire software stack, have the ability to drastically modify it to meet the changing demands placed on it. A lot of public software is written to meet one need, and it rarely changes that drastically over time. That's not what the consumers want, even though needs change over time. Change your software too much and you'll lose your existing users that fundamentally need what the software is providing. You can see the boom and fall of it all with so many projects. Take a look at what's happened with Chef and Puppet, for a quick off-the-top-of-my-head example.


> there's a whole bunch of NIH going on there too

That's why I wrote "usually".


4) the most important reason: open source stuff isn't built to integrate with the decades old custom stack cruft


> It's not NIH syndrome, usually. It's about having control over the whole software supply chain for security, reliability, licensing compliance and general quality.

Maybe, though that's usually the exact rationalization given for NIH syndrome. I mean, “NIH syndrome” is never the stated reason for anything.


Bingo. I recently left my job at a FAANG, and there are the top three reasons I saw for code to be written locally.

(1) Need to interact with other internal systems.

(2) Pure NIH.

(3) Need to scale further than outside solutions.

1 and 3 are closely related. There are quite a few legitimate category-3 internal services for provisioning, configuration, service discovery, monitoring, upgrades at various levels, fault remediation, etc. Any other production service would have to interact with most or all of them. It's often easier to build something local than to add all of those "touch points" to an open-source project. I know because I did both while I was there.

But pure NIH is very close behind as a reason. Despite all protestations to the contrary, engineers get far more "impact" for creating new things than for fixing old ones. It's hard to get somebody to do X when their bonuses and raises are better served by doing !X. This ends up amplifying, instead of attenuating, the natural impulse of all engineers everywhere to build new things because it's more fun. People always make up other reasons, and perhaps even believe those reasons themselves, but nine times out of ten those reasons are pure delusion.


> Despite all protestations to the contrary, engineers get far more "impact" for creating new things than for fixing old ones.

In general, yes, and that's a problem. Luckily in some teams it's much better.

> but nine times out of ten those reasons are pure delusion.

If you were in a company/team with such level of true NIH syndrome it's good you left.


I'd say just the opposite. I worked on a 10-million LoC codebase where third-party libraries had to be individually brought in by an internal owner. I suspect most of those 10 million lines of code ended up being ad hoc, informally-specified, bug-ridden, slow implementations of half of some external library.


Where I'm working, all the dependencies have to be approved before use and are stored locally (build machines have no internet access), but I haven't found that the quality of the software is better than the other places where I've worked. I think it's more a matter of work 'culture' (not sure if that's the right word, perhaps expectations would be a better fit?), with the idea that we ship once it's good enough and that's it.


Though I think it's still a good thing that we're doing it, but it's not enough on its own to get good quality.


Why is MPAA air-gapped? Are they worried about leaks or accidentally using copyrighted material?

Seems arbitrary, but I bet the rationale is fascinating. Could you go into more detail? I'd love hearing a little about your work experience, the industry, process, etc.


Disclosure: I work on Google Cloud.

Your intuition is right: they're afraid of the content leaking.

For cloud providers, this results in ... amusingly long documents. Here's GCP's at 110 pages [1], while the AWS folks were clever and used landscape mode for theirs [2] so that it's only 59 pages :).

[1] https://cloud.google.com/files/gcp-mpaa-compliancemapping.pd...

[2] https://d1.awsstatic.com/whitepapers/compliance/AWS_Alignmen...


Good to know that Spider-Man movies are better protected than my personal health information.


That's fascinating! These companies protect their assets as if release were an existential risk. I suppose it would impact their bottom line to some extent.

It'll be interesting to see in the future when content is cheap to produce. I predict a complete shift away from this.


You clearly wouldn’t see the movie if you’d seen some final frames that were emailed to you :).

> It'll be interesting to see in the future when content is cheap to produce. I predict a complete shift away from this.

It’s actually one of the most obvious forms of friction, causing an increase in the cost of doing business. A lot of VFX houses take these rules to imply that they must segment their networks and keep workstations completely unable to reach the internet (coming full circle to the airgapped comment at the start). Pretend that your entire development workflow is like being on a plane when the WiFi is down. That’s modern VFX software engineer life :(.


I've spent much of my career on similarly, perhaps a little more, restricted networks and found the opposite.

Dependencies are painful to pull in and only pulled in when a dev needs a specific version. The internal repos end up being a missing version nightmare where people cobble together whatever works with what's available. Where feature and security upgrades go ignored, left to rot like the brains of the devs who struggle to keep up with what's available in the real world.

Many of those networks I've worked on are becoming more permeable at the edges because the cost of the air gap outweighs any benefits.


Even Cisco's web-based firewall management interface uses Google Analytics. Granted, it will work just as bad regardless of reachability, so there's that.


Even when you package your dependencies with your code, unless you have policies in place and actively enforcing it, there's no preventing people from bloating the project with all kind of junks libraries.

I've had to deal with legacy projects that include multiple versions of the same libraries, all of them being shaded/relocated so they don't conflict. The result, however, is bloated binaries that takes 30 minutes to build.


True to some degree, but you can also end up with very old dependencies which have vulnerabilities if someone is not keeping track of it all.


I have to ask, why does the MPAA accredit airgapped facilities? I know movies are a big business, but that seems a little extreme.


It’s not airgapping per se. The large studios want to ensure they can send out their content pre-release to the various third-party vendors working on a project. Ensuring the vendors meet MPAA guidelines is a mechanism they use to ensure this. It’s not technically an accreditation but it’s usually contractually enforced if you want to work with a large studio.

You can actually read the whole thing: https://www.motionpictures.org/wp-content/uploads/2020/07/MP...


Lots of people like to get copies of IP worth many millions of dollars.


Are you saying that when you need to add a space character to the beginning of your strings you write it yourself?!


Yes, but after getting approval from the board I was able to get a dispensation to use https://isevenapi.xyz/ for some of our calculation services.


Loading common libraries from a CDN will no longer bring any shared cache benefits, at least in most major browsers. Here's Chrome's intent to ship: https://chromestatus.com/feature/5730772021411840 Safari already does this, and I think Firefox will, or is already, as well.


For info, this shipped in Chrome 86 just last week:

https://developers.google.com/web/updates/2020/10/http-cache...


I'm not sure I understand the threat here. Say I visit SiteA which references jQuery from CDNJS, then later visit SiteB which references exactly the same jQuery from CDNJS - what's the problem?


I'm guessing many websites are identifiable by which patterns of libs and specific versions they will force you to cache. One SiteA would then be able to tell that a user visited a SiteB (which, depending on the website, may or may not be problematic)


I'm sure some sites would be identifiable by their cached libs, but the cache is shared, so any overlapping dependencies would decrease the accuracy to unusable levels. The best you could do is know someone did not visit a site in the last ${cache_time}.

There are, of course, other vectors to consider, but I can't think of any that could be abused by third parties. If anything, isolating caches would make it easier for the CDN themselves to carry out the attack you mentioned, as they would be receiving all the requests in one batch.


What if my website tries to load a JS file that only foxnews.com loads (maybe with a less restrictive CORS config)?

I'd be able to tell if you visited Fox news recently, correct?


It would be extremely unlikely for only one site to use a specific file from a public CDN (like cdnjs). As for site-specific files like JS bundles and other static assets, those would be served on a "private" CDN, usually under the same domain (like cdn.foxnews.com) and with restrictive CORS settings for this very reason (and also to prevent bandwidth stealing).


A single file? Highly unlikely.

But three specific files can already be pretty unique. I chart.js with two specific plugins in my toy project, and I'm willing to bet that no one else on the world uses the exact same set and version configuration.


Exactly, but a third party can't see that set from the cache, they see the union of every website recently visited. They would see hundreds of files from many websites and if only one of those uses one of the three files yours does, it's impossible to tell for sure without a file that isn't used anywhere else on the Web. Your site uses A+B+C, site 2 uses A+D+E, site 3 uses B+F, site 4 uses C. The cache contents is A+B+C+D+E+F+... did the user visit your site? It's like trying to get individual pictures out of a single piece of film that was exposed multiple times - you can make some guesses and rule some possibilities out, but nothing other than that will be conclusive.


Like a discount Bloom Filter?


It would behave like one, yes.


You only need ~30 bits of information to uniquely fingerprint someone.


Could you explain this further?


There are about 8 billion people. 33 bits is enough to give each one a unique number. A whole bunch of them doesn't have access to the Internet, so fewer than 33 bits are enough to identify someone on the Internet.


It can be used to track users across domains to some degree.


Thanks, I understand the issue now - I haven't thought about CDNs from a privacy perspective before.

I suppose with HTTP2 some of the benefits of serving JS through CDNs are gone anyway, so I guess it's time to stop using them.


Not to be dense but wasn't that always the purpose of running a CDN service for common scripts and libraries?


The script wouldn't have to be from a CDN to track people using the browser cache. I could infer whether you've visited a site that doesn't use CDNs or trackers by asking you to load something from that site and inferring whether you have that resource cached by the time it took you to load it.


This is true, but if you're running a CDN you have access to cross-domain user information just based on the headers, no?


The CDN is not the place you have to worried about.

If Site A loads a specific JavaScript file for users with an administrator account, Site B can check to see if the JavaScript file is in your cache, and infer that you must have an administrator account if the file is there.

The attack can happen with different types of resources (such as images).


This I understand, the risk of third-parties monitoring. The attacks are pretty obvious. My confusion is over what the business model of a commercial CDN is if not to track users across multiple sites? How do they pay for bandwidth?


The problem is not the CDN (or arbitrary shared source domain) being able to track you but the sites which use the CDN.

Furthermore a CDN can't track you as simple as you might think, it often would require thinks which need explicitly opt-in agreements on a per website basis to be legal.

Furthermore due to technical limitations you can only get that permission from the user after the CDN was already used.

CDNs can still track aggregated information to some degree but they can't legally act like a tracker cookie.


How serious is this type of threat? Compared to all the info about us that is already shared by data brokers?


There are several data-broker-esque "services" that actually do this already with FB, Google, etc assets (favicon.ico and similar, loggedIn urls, ...) to check whether you have visited those pages, or whether you are logged in to those services by trying to request a URL that might return a large image if logged in, or fail rapidly if logged out. -- This has been a thing for a long time: https://news.ycombinator.com/item?id=15499917

If you don't use any of those sites, you're considered higher risk/fraudulent user/bot.

Here's an example of a very short and easy way to see if someone is probably gay: https://pastebin.com/raw/CFaTet0K

On chrome, I consistently get 1-5 back after it's been cached, and 100+ on a clean visit. On Firefox with resistFingerprinting, I get 0 always.


Thank you, that was insightful.

> Here's an example of a very short and easy way to see if someone is probably gay

Ok, but now the resource is in my cache, so from now on they will think I'm gay?


> Ok, but now the resource is in my cache, so from now on they will think I'm gay?

This resource is just generic, so probably not, but if you actually visited grindr's site without adblocking heavily, they load googletagmanager and a significant number of other tracking services, which will almost certainly associate your advertising profile and identifiers as 'gay'

I also can't believe they send/sell your information to 3 pages worth of third party monetization providers/adtech companies for something that is this critically sensitive.


You could have run this on a private window of the browser (and in that case, they would surely think you're a closeted gay).


Could this not be solved by by Grindr setting up CORS properly for that resource? It's unlikely anyone would ever open the script directly in their browser.


CORS wouldn't help here. CORS prevents you from reading the response or making a cross origin XHR requests, not loading an external resource from a different domain in a script or img tag.


Fun fact: browsers put scary warnings in their dev console (and some web sites log warnings or console) because some people love copy-pasting code they got from sketchy people trying to bypass all the browser security.


It's being actively used by the ad networks to do user fingerprinting instead of cookies, since the latter are more and more blocked.


I guess as serious as any other privacy threats but one that doesn't get enough attention in my opinion. CDNs and web fonts are definitely being used to track us and can bypass mitigations like private mode in your browser and ad/tracker blockers by tracking your IP address across sites.


Try and load assets from another domain and observe if it was probably cached or not, and you can know that they visited the site


I guess, but that disadvantage seems massively outweigh by the benefits. Can always use something like [1] to check if a client is active on interesting sites.

[1] - https://www.webdigi.co.uk/demos/how-to-detect-visitors-logge...


Are there actually any benefits though? I saw an article a few years back about how when loading jquery from Google's cdn there was about a 1% chance the user had it cached already. Since you have to have, the same library, the same cdn source and the same version of the library, it almost never is the case that the user has already grabbed this recently enough that it hasn't been kicked out.

Plus the trend now is to use webpack and have all of your deps bundled in and served from the same server.


Can’t always use that. It’s much less specific compared to the potential of cache, only works when websites provide that type of redirect, doesn’t work if you block third-party cookies (I think a form of that might already be the default in some browsers), etc.


maybe not with CDNJS, but perhaps you don't want every website to know you have AshleyMadison.com assets cached.


Can websites even tell what is cached and what’s pulled fresh?


Yes, using timing


Wouldn't the act of timing a download mean that I download and pollute my cache with new assets from the site trying find where else I've been? Does this only work for the first site that tries to fingerprint a browser in this way?


Is there a noCache option? Or can JS remove entries from the cache to reset it?

Someone below mentioned doing requests for a large image that requires authentication. Short response time means the user isn't logged in (they got a 403), long response time means they downloaded the image and are logged in.


Not if the javascript starts running only after all resources have loaded.


No there could still be timing attacks after. Just dynamically request a cross domain asset


Then those requests should not be cached?


  const start = window.performance.now();
  const t = await fetch("https://example.com/asset_that_may_be_cached.jpg");
  const end = window.performance.now();
  if (end - start < 10/*ms*/) {
    console.log("cached");
  } else {
    console.log("not cached");
  }


In that case, the browser would always load the asset (it is not cached). So the rule would be that only stuff that is directly in the <head> may be cached (or stuff that is on the same domain).


To be clear, the context of the thread is "why do we need to partition the HTTP cache per domain." My example code works under the (soon-to-be-false) assumption that the cache is NOT partitioned (i.e. there is a global HTTP cache).

> In that case, the browser would always load the asset (it is not cached).

Agreed, if the cache is partitioned per domain AND the current domain has not requested the resource on a prior load. If the cache is global, then the asset will be loaded from cache if it is present: https://developer.mozilla.org/en-US/docs/Web/API/Request/cac...

> So the rule would be that only stuff that is directly in the <head> may be cached (or stuff that is on the same domain).

You could be more precise here: with a domain-partitioned cache, all resources regardless of domain loaded by any previous request on the same domain could be cached. So if I load HN twice and HN uses https://example.com/image.jpg on both pages, then the second request will use the cached asset.


> To be clear, the context of the thread is "why do we need to partition the HTTP cache per domain."

Ah right, the thread is becoming long :)

> So if I load HN twice and HN uses https://example.com/image.jpg on both pages, then the second request will use the cached asset.

Good point!


On the other hand, the URL for a common library hosted on cdnjs (or one of the other big JavaScript CDNs) and included on many different websites is much more likely to already be cached on edge servers close to your users than if you host the file yourself.


The time to connect to the CDN hostname will negate any benefit, especially if push can be used.


You can mitigate this by getting your website itself on a CDN. If this is cached, then it's assets (incl javascript), would be too.

And by going that route you make sure that all pieces of your website have the same availability guarantees, the same performance profile, and the same security guarantees that the content was not manipulated by a 3rd party.


> And by going that route you make sure that all pieces of your website have the same availability guarantees, the same performance profile, and the same security guarantees that the content was not manipulated by a 3rd party.

You can already guarantee the security of the file by using the integrity attribute on the <script> tag. And the performance of your CDN is probably worse than the Google CDN (not to mention that you lose out on the shared cache).


I agree on the security side if you use that attribute. However:

> And the performance of your CDN is probably worse than the Google CDN

What means probably? Other CDNs (Akamai, CloudFront, Cloudflare, etc) are also fast.

And by pushing one piece of your website on a different CDN you force your users browser to create an additional HTTPS connection which takes additional round-trips, instead of being able to leverage one connection for all assets. This alone might as well outweigh the performance differences between CDNs.

Also the "shared cache" benefit might go away, if I read the other answers in this topic correctly.


Does someone know why they don't split the cookie storage equally by the top origin?

I mean, wouldn't that take care of a whole class of attack vectors and make cross-origin requests possible without having to worry about CSRF?


One of the problems is that it breaks use cases like logging into stackoverflow.com and then visiting serverfault.com, or (if you do it by top-level origin) even en.wikipedia.org and then visiting de.wikipedia.org. [1]

While privacy sensitive users may consider this a feature in case of e.g. google.com and youtube.com, the average user is more likely to consider it an annoyance, and worse, it is likely to break some obscure portal somewhere that is never going to be updated, so if one browser does it and another doesn't, the solution will be a hastily hacked note "this doesn't work in X, use Y instead" added to the portal. And no browser vendor wants to be X.

[1] The workaround of using the public suffix list for such purposes is being discouraged by the public suffix list maintainers themselves IIRC, so the "right" thing to do would be breaking Wikipedia.

Edit: If done naively on an origin basis right now, it would break the Internet. You couldn't use _any_ site/app that has login/account management on a separate host name. You couldn't log into your Google account with such a browser anymore (because accounts.google.com != mail.google.com). Countless web sites that require logins would fail, both company-internal portals and public sites.


It's possible to get around this with a redirect staple. E.g. if Google wants you to be logged in on youtube.com and google.com simultaneously:

1) User logs in at google.com/login and sets google.com cookies. 2) Server generates a nonce and redirects to youtube.com/login?auth=$NONCE 3) youtube.com checks the $NONCE and sets youtube.com cookies 4) youtube.com redirects back to google.com.

Firefox's container tabs can maintain isolation despite this since even this redirect will stay within a container. However there is a usability penalty since the user has to open links for sites in the right container (and automatically opening certain sites in certain containers will enable cross-container stapling again).


"if"?

webapps.stackexchange.com/questions/30254/why-does-gmail-login-go-through-youtube-com



Oh, that's interesting. I guess it makes sense from a security and privacy perspective.


Will this cause performance issues for sites that use static cookieless domains for js, images etc

Google themselves do this with gstatic.net and ytimg.com etc


> Will this cause performance issues for sites that use static cookieless domains for js, images etc

> Google themselves do this with gstatic.net and ytimg.com etc

Most probably not. The point of cookieless domains is that you can use a very simple web server to serve content (no need to handle user sessions, files are pre-compresses and cached, etc.) and it lowers incoming bandwidth a lot. If you have a lot of requests (images, css, js) the cookie information adds up quickly.

Opening video thumbnails from ytimg.com will still be cached for youtube.com as before. The only thing that will change is for embedded videos on 3rd party websites as those won't be able to use caches ytimg.com thumbails from elsewhere.


Couldn't the same thing be achieved by routing e.g. google.com/static/ to a separate simple webserver, instead of using another domain? Or use a subdomain, e.g. static.google.com.

The current way seems like needless DNS spam to me...


Even if Google used a separate highly optimised webserver for google.com/static/jquery.js, users who are logged in would be sending their auth cookies when requesting the library.

Given that generally people have slower upload than download, shaving off a few bytes from requests is worth it.

I also recall that browsers [used to (?)] limit concurrent requests per domain which this helps work around


Good! The whole idea for doing js on CDN is suppose to make it easier for entry level front dev to be able to start coding. I think that is great for a school exercise, should never be used in business or production sites.

And on a side note, very unhappy about how the entry to be a developer has lower significantly over the last 10 years or so.


Wouldn't a better, but partial, solution be for browsers to preload the top x common libraries? All other libraries would probably have to follow this new rule.


Isn’t this essentially what Decentraleyes does?

https://decentraleyes.org/


I've been using LocalCDN, it seems to be more acatively maintained and has a better selection of libraries.

https://www.localcdn.org/


What version(s) of those libraries? I mean, I don't deal with this anymore... but I've seen sites literally loading half a dozen different copies of jQuery in the past (still makes me cringe).


Please no, don't create such barriers.


Every other language runtime has a standard library, it's always been a shortcoming of the web IMO


At that point wouldn't it make more sense to just have the browsers include that functionality?


As soon as you add something to JS you have to support it forever. Its better to let sites pick what they need and scrap what they no longer need. JS has already added most of the useful stuff from jQuery. If browsers included a built in version of React it would pretty much lock in the design as it is now without room to remove and replace bad ideas.


I wonder if an nginx plugin could be made to auto-cache CDN javascript/css files and edit the HTML on the fly to serve them from locally.


You can set up a path that does a cached proxy to the CDN and just edit the HTML yourself. It's a bit annoying to get the cache settings to work properly, but editing the HTML is easy.


You’re probably thinking about a caching proxy like squid cache.



CDNs are misunderstood these days. Caching at the browser across sites is not that important, it caching at a point of presence (POP). This POP being so much closer to your end users brings performance gains because TCP is terrible over distances. QUIC may fix this by it's shift to UDP. I haven't seen a benchmark yet.

Security is a concern, use SRI.

Reliability can be mitigated with fail over logic to a backup.

The part missed is bandwidth. Using a CDN means your web server doesn't have to serve out static files that you are paying per a GB to serve. Small sites it's not much but it does add up. It's a Content Delivery Network not a Cache Delivery Network.


The post is "Please stop using CDNs for external JavaScript Libraries" not "Please stop using CDNs". If a CDN is critical to your site's performance you should put your site on it not internal libraries here and external libraries there. The page mentions this as well:

"Speed:

You probably shouldn’t be using multi-megabyte libraries. Have some respect for your users’ download limits. But if you are truly worried about speed, surely your whole site should be behind a CDN – not just a few JS libraries?"

Also even before QUIC HTTP/2 fixed a lot of the problems with distance as you no longer need to wait for separate handshakes for multiple files to be streamed. QUIC will still give a few advantages but again those advantages would be good to have on your whole site not just a few libraries.


But wouldn't a site be faster if cachable requests go to a cdn, and no un-cachable requests go directly to origin with no forwarding at the cdn layer?


Only if the CDN serves assets faster than the origin can. That's not necessarily true by default. Not to mention, if the speed is okay for the initial request, why not the following requests?

The final nail in the coffin for me is that CDNs are a shared resource, if your CDN is getting heavy traffic or otherwise suffering, it becomes your slow point, while your site is fine. I just don't see any upsides worth the tradeoffs.

If you have scale that demands some serious content distribution, that is different, I would argue you shouldn't be relying on public shared CDNs then even moreso. Pay for a CDN service or roll your own.


CDNs can almost always serve assets faster than the origin can.

Because PoPs are closer and transfer speed ramp rate scales to latency, the further away a server is, the longer it takes the download to ramp up to full speed. This is especially relevant when talking about smaller resources like javascript, css, and small or optimized images, and webfonts.


When I say "If a CDN is critical to your site's performance you should put your site on it" I do mean to say "put (those parts of) your site on it" not "if 51% of your site should be on a CDN put the other 49% as well".

But to your question though even un-cacheable content can be "those parts of". There are products from CDNs like https://blog.cloudflare.com/argo/ which combine CDN cache tiering with higher tier network transport to origin servers for all cache misses (or uncacheable content). Again though, it depends on if it's critical to your site's performance or not. If you don't have a bunch of uncacheable content, that content doesn't need the absolute best transport, or the time/money could improve some other part of the site speed more then it's not critical to your site's performance.


Wait, what is the misunderstanding? Aren't these the well known benefits of CDNs?


CDNs are just a service to handle delivery of static content for you. Their main points (unordered) are:

- reliability

- delivery speed through closeness to user (having nodes all around the world)

- cost

- ease of use

- handling of high loads for you / making static content less affectedly by accidental or intentional DoS situations

That multiple domains might use the same url and might share the cache was always just a lucky bonus. Given that the other side needs to use the exact same version of the exact same library with the exact same build options accessed through the exact same url to profit from cach sharing it never was reliable at all.

I mean how fast does the JS landscape change?

Given how cross domain caching can be used to track users across domains safari and Firefox disabled it a while ago as far as I know, and chrome will do so soone.


go to a webpage that uses a cdn and do view source.

it all looks like https://cdn.example.com/foo/bar.js?v=129a1d14ad3


> This POP being so much closer to your end users brings performance gains because TCP is terrible over distances. QUIC may fix this by it's shift to UDP. I haven't seen a benchmark yet.

Quic can't defeat physics. Performance will still lineary degrade with distance to (edge) servers, and therefore CDNs will stay important.

What Quic however will do is reduce the time-to-first-byte on an intial connection by 1RTT due to one less handshake - which can be e.g. a 30ms win. After the connection is established it aims to yield more consistent performance than e.g. HTTP/2 over TCP. But packets will still require the same time to go from the browser to an edge location, and therefore the minimum latency for a certain distance is the same.


How would failover actually be implemented? If you have a script tag with integrity enabled, and the cryptographic hash doesn't match, what happens next?

From some quick research, it doesn't seem like the script tag has built-in support for this. One could imagine something like multiple src attributes (used as a search order for the first valid file), but that doesn't seem to exist. So it seems like the web page has to do it manually.

Which I guess means you have to have some javascript (probably inline, so you know it's loaded and for performance?) to check and fix the loading of your other javascript.

If it's really that manual, it sounds like it adds cost to implementing this correctly. in other words, it might be one of those scenarios where correctness is achievable, but it's a whole lot simpler to just not do it that way.


Since Script tags are blocking, you can do a undefined check then if that fails, inject a new script tag either local or a secondary CDN.

Link for reference. .Net Core has this built in as a tag helper too! https://www.hanselman.com/blog/cdns-fail-but-your-scripts-do...


Thanks. So it seems like it's not really that bad. Particularly if you are already using some loader tools (and don't have to add them to your build just to get this).


My understanding is that it’s not TCP but the tcp AND tls handshake overhead combined. Where-as quick combined both handshakes at the protocol level.


TCP has the concept of a TCP Window, where its a buffer of the data that the opposite side waits for an ACK packet from before sending more data. Windows defaults to 64kb to start. On your local LAN (which TCP was built for), no big deal, but going across a distance. Then add in one lost packet or one out of order packet and TCP has to ask for it again and delay the whole thing. Its why HTTP/2 has higher latency on spotty 4g networks. The TLS handshake suffers from the same distance issue ACK packets have which with TLS 1.3 there is 0-rtt which removes the handshake as part of the first TCP packet.

QUIC puts everything in UDP, so theoretically its a never ending firehose of data for a download with the occasional "hey, I missing packet 3, 12, 18, please resend". Mimicking TCP but putting the app in control versus the kernel.


Quic also has a window size.

> QUIC congestion control has been written based on many years of TCP experience, so it is little surprise that the two have mechanisms that bear resemblance. It’s based on the CWND (congestion window, the limit of how many bytes you can send to the network) and the SSTHRESH (slow start threshold, sets a limit when slow start will stop).

https://blog.cloudflare.com/cubic-and-hystart-support-in-qui...


Quic kind of has 3 windows:

Per stream and per connection flow control windows, which kind of indicate how much data the peer may send on a given connection before it gets a window update. Those windows also indicate how much the server is willing to store in its receive buffers, since the updates are likely sent when those buffers are drained.

A congestion window, which indicates how many low-level packets and data in them can be in-flight without being acknowledged. Those also account for retransmissions, and packets which do not necessarily contain stream data.


Sure, but if you have any custom JavaScript that needs to load before the page can render, using a CDN for just the JS libraries does not really help.

If you want the benefit of a CDN, you need to put your own code up there. And if you’re doing that, you might as well host your own copy of the libraries too, so the browser won’t have to talk to two different CDNs.


We bundle everything, including dependencies, up into minified modules and serve our whole frontend via Google Cloud's CDN, using Cache-Control to control the way things are cached. This way there are no third-party requests at all to load the site, and everything is cached close to the user.


One won’t have loading time problem if one doesn’t ship websites with kilotons of JS crap. Also paying per downloaded content is dumb as it’s easy for an attacker to attack you financially and lot of hosting companies (like OVH) offers "unlimited" bandwidth.


There are some good reasons in here (especially privacy), but I'm not convinced by the security point. It seems like the example linked about British Airways was JS under the britishairways.com domain being changed, not a third party CDN.

Incidentally, a few years ago when people were loading third party scripts over HTTP, I demoed a fun hack where, if you control a user's DNS, you could redirect queries to popular CDNs to a proxy that injects keylogger code and tells the browser to cache it indefinitely. Because at the time almost every site included either jQuery or Google Analytics, you'd have a persistent keylogger even after the user switched to a more secure connection. How far we've come!


Is that demo of yours available somewhere? It sounds interesting and I'd love to read up on it


I was able to find a video of it: https://www.youtube.com/watch?v=_BUg9NzdLd4

I think this is the code: https://github.com/paulgb/cachebeacon

It basically just runs two servers:

- A DNS server that resolved a list of domains to its own IP.

- An HTTP server that looked at the HTTP host and proxied requests to the upstream server. If the content type or extension indicated that the response was JavaScript, it would add the payload and set cache headers to cache as long as possible.

- A special HTTP endpoint on the proxy server to capture data sent back from the payload.


> British Airways' payments page was hacked by compromised 3rd party Javascript. A malicious user changed the code on site which wasn't in BA's control – then BA served it up to its customers.

That's the only reason you should ever need to not load any 3p javascript including google-analytics.

If you're like me and want to avoid loading popular javascript libraries from CDNs but want those webpages to work, get the Decentraleyes plugin: https://en.wikipedia.org/wiki/Decentraleyes


Except this article and that quote are wrong. If you look at the linked report on the British Airways hack, https://www.riskiq.com/blog/labs/magecart-british-airways-br..., you'll see that the compromised script was hosted on www.britishairways.com. It wasn't a third party CDN hack, their own CMS was simply compromised.

I kept reading this article looking for an actual decent reason to not use a third party CDN, and I never found one. In fact, the right answer is really that you should always set up your CSP and subresource integrity rules to completely prevent these kinds of attacks, whether from an unintended script injection from your own domain or a 3rd party.


Came here to post this same thing.

A CSP would have stopped this attack. The exfil server was baways[.]com.

Ticketmaster, on the other hand, did have a 3rd party JS that was compromised: https://www.riskiq.com/blog/external-threat-management/magec...


Or at least do subresource integrity so you know you’re getting the exact same file every time.


That can mitigate the damage, but any user with a browser that doesn’t support it will still be compromised.

If it’s an important part of the site, it might make the failure more obvious in newer browsers.. but small libraries used on only some pages might not be noticed quickly... so you’d probably also want to test all of your resources regularly.. and even then the time between those test runs may allow some users to be compromised.

So using it is a good idea but it’s not a fix for the actual problem.


https://caniuse.com/subresource-integrity

It is basic available for every browser except IE and Opera Mini, so I think it is user's problem to use an old browser that don't support a wide supported security feature.


Might as well simply block IE users outright at this point; it's just not worth the risk (and classic edge is close). It's probably better user experience to be upfront about issues than pretend you actually test and support all those old versions (unless you do... but why?)


Your own link says 94.79% coverage. So 1 in 20 users would be compromised.. on a large site that could be millions of users.

And your response is: "that's their problem" ??

I hope you're not in charge of any important or large sites or anything that handles financial data (ecommerce, etc)... because this isn't a good attitude when it comes to security.


By the same logic, TLS 1.2 isn't a solution to insecurities in 1.1 because only 98% of users currently support it.

It's perhaps worth accepting there's no silver bullet here but a combination of initiatives like SRI is still worthwhile to help reduce the attack surface for the majority of users?


Simply enabling TLS 1.2 is not a fix for problems in 1.1. You must also disable 1.1 in your server config. It's both actions that fix the insecurities: first enabling a secure method of communication; and then cutting off anyone trying to communicate insecurely. If you simply enable 1.2, but leave 1.1 working, then you haven't fixed the problem.

SRI is the equivalent of just enabling 1.2. You haven't disabled access to browsers that dont support SRI.

You 2nd sentence sounds remarkably similar to my first post that maple responded to: SRI can help mitigate the damage, but it cant fix it.

You seem confused about the difference between mitigation and fixing.

Mitigation: the action of reducing the severity, seriousness, or painfulness of something.

Key work there is reducing. A fix actually eliminates the issue.. like enabling 1.2 + disabling 1.1 eliminates the potential for communicating insecurely.

It's important to understand the difference because anything short of actually fixing the issue leaves some users exposed to the vulnerability.


Extensions that act like a local CDN. Page loads faster, more privacy, etc.

https://decentraleyes.org/

https://www.localcdn.org/ (fork of decentraleyes, with many more resources)


Interesting, I was already using DecentralEyes. Do you know why the fork happened? What policy does DecentralEyes keep that LocalCDN extends or violates?

edit: will be sticking with the original, looks like the fork maintainer made no effort to work upstream first, which is a very bad look for what is essentially a piece of security software. https://gitlab.com/nobody42/localcdn/-/issues/5


I found out about localCDN recently, commented on this subject, and got this response from the author:

https://codeberg.org/nobody/LocalCDN/issues/51#issuecomment-...

I haven't gone searching for a PR yet and didn't think to do so beforehand (all made more complicated by both projects' repos having moved locations at least once recently).


Update: searched for a PR, didn't find it.

Initial commit[1] in the LocalCDN repo is Feb 2020; I don't see any PRs on either of the Decentraleyes repos in early 2020 or late 2019. Of course, it's still possible the author reached out

[1] For some reason, the project does not continue the git history from Decentraleyes. For me this is a red flag (much easier to sneak in a change this way) and I will continue using Decentraleyes.


localCDN has more resources than Decentraleyes. It also has very important resources like Google Fonts, some cloudflare resources etc, none of which were present in Decentraleyes (the last time I checked)

> will be sticking with the original

It's your choice. The fork is better. The maintainer seems a bit more active (more updates) and extremely pro-privacy (I concluded this from his home page and extension settings)


I was thinking about switching from decentraleyes too, but I'm hesitant to install a "can access all sites" extension that hasn't been vetted ("recommended") by Mozilla.


LocalCDN is the most offline extension I have seen.

It even opens donation pages locally, instead of opening the author's website. He says ''I think it is better if your public IP address is rarely listed in any server log files.''


localCDN looks super interesting. It works with Chrome partially and fully with Firefox. I'm curious, is there any good native tool to replace localCDN, uBlock, uMatrix ( resources concern ). Btw, thanks for pointing out localCDN


I don't know any replacements, but localCDN can generate rulesets for uBlock Origin and uMatrix if you have configured it too hard (had mode,etc)

You must enable the rulesets. It's very easy and a one time job. To generate them, go to LocalCDN settings and select your adblocker.


I use decentraleyes and like it, but although it has a ton of content, much was outdated and there were no updates. It would also be nice to be able to add resources yourself.


Am I the only one that feels using CDNs comes with very little benefit compared to just hosting all resources locally? Almost all security issues go away with locally hosted (i.e., same domain or another domain controlled by the website), plus you avoid an extra DNS lookup, and you still get caching on the same site.

I realize there are benefits, but are the benefits so extreme that they merit all the hype around CDNs? So many developers talk about them like the web would crawl to a halt if they were stopped, but I think that they have their own slippery-slope of problems that has resulted in folks just hand-waving away the expense of web assets since it's a hidden problem from the developer. I doubt actual bytes transferred and latency are affected in a significant way as folk that promote CDNs claim.

It'd be nice to see actual comparisons in a real world scenario. Keep in mind, web site responsiveness is not just linked to download time/size, but also asset processing. If your page is blocking because the JS is still being parsed, the time you saved downloading it is moot.


Yes, the benefits are substantial. I used to have a static web site served out of a Dallas data center, with a great network connection, from a powerful bare metal machine that did nothing else. Sitting in Austin, it felt instant, equal to localhost speed, until I tried accessing it while traveling, especially overseas.

It's not just geographic latency you're addressing with a CDN, you're also reducing the number of network hops. It's not uncommon to experience higher latency going from SF to San Jose datacenter just because you're on a "wrong" ISP. A good CDN usually has a POP on the same network as you.


I live in Vietnam and frequently give tech advice to people regarding websites they are developing. Most sites on the works fine, but I can tell straight away when someone has hosted their site like you have described - because it will take >5 minutes to load! And they'll be telling me that it's fine, and I have to explain, yes, it's fine if you only want people in the same country as you to access the site. If you want it to work worldwide, you'll have to do better.

Fortunately, these days that just means creating a free Cloudflare account.


Thanks for sharing your perspective.

In your case, are all CDNs equal? Do I just have to throw my content to the biggest provider?

Don't get me wrong. Disenfranchisng non-Western visitors is the last thing I want to do, but the issue is not CDN or not, it's caching content closer to people whose ISPs don't provide sufficient service outside their own borders.

I dislike that CDNs are the only way around this and I feel like it is centralizing Internet access in an unfavorable way.


> In your case, are all CDNs equal?

The big data centers in this region are in Singapore and I expect all CDNs have a presence there, so probably. I haven't exactly done any benchmarking though.

> the issue is not CDN or not, it's caching content closer to people whose ISPs don't provide sufficient service outside their own borders.

How would you do this without a CDN (on a small budget)?


What causes these performance problems? Latency across the globe shouldn't be that bad (200ms or so). Is it limited bandwidth into your country?


Remember that the JS CDN thing started 10+ years ago when internet connections were a lot slower and JQuery was the JS framework and only released one new version per year. That means there's a really good chance another website also used it from the Google CDN, and you're browser cached it forever-ish, no DNS lookup needed.

The other thing to remember is old browsers used to cap the number of connections per domain HTTP 1.1 only supports serial requests, so there were benefits to hosting on multiple domains.

Even today, the big benefit of a CDN domain is that CDNs can host static resources faster and cheaper than your webservers. Yes, you can forward requests from the CDN to your webservers, but it's also one point point of failure. What's interesting is that with a modern, JS-only site, the split becomes API and static JS.


There are situations where you do not want or need a global reach. If you are in ecommerce and your not wanting to ship outside your own country then why have images hosted on CDNs?

Scripts and CSS are (or should be) small compared to the images.

Yet, with the images, you can use mod_pagespeed on the server to replace images with picture source sets, with the server able to detect the bandwidth of the client and their device to serve them highly optimised images. So that means images that look glorious on a 4K screen with a good connection and images that are a bit jaggy for the person on their phone with only 3G.

I am sure that CDNs can do this too and that you can get mod pagespeed to work with CDNs but there is so much that can be done on a server without having to sign up for extra services and their overheads.


As a scrub in this domain. When I use a CDN, I get a network that is cheaper per-GB at delivering content, and is better at delivering content. Far more resources in a CDN are configured to deliver content faster and cheaper than whatever rinky-dink EC2 I've got serving the website.


Just how much JavaScript do you have to use for it to make it a significant difference? If you serve 100k of JS (please don't), at 1 million hits per month you're looking at 100GB. Even at EC2's obscene pricing, this is still just $10/month.

If you're that stingy, why aren't you just putting the entire website behind CloudFlare and calling it a day?


I have a js file that is over 900kb, (nearing a mb)

Given, its for use in embedded webviews of a video game, and it powers like 50 different atomic interfaces via react.


I wasn't aware that CDNs were much cheaper, and am genuinely curious what service you are using at what prices.

When I look at AWS pricing, us-east-1 at < 10TB, I see: EC2 data transfer out to the internet is 9¢/GB, and Cloudfront is 8.5¢/GB for the lowest price class. That's a slight savings, but at 6% I can't justify the effort to switch over on cost alone.

Should I be looking at a different CDN service?


Cloudflare famously charges $0/GB, with some arbitrary restrictions on the way you use their service, and I’ve heard rumors of soft limits on the total amount of bandwidth you can use before they email you to upgrade to a higher plan.

I’ve never used BunnyCDN, but they charge a flat $0.01/GB for North American traffic, and I’ve heard some good things about them.

DigitalOcean, Vultr, Linode, and some other cloud providers charge $0.01/GB without a CDN, just using their regular servers, but obviously a CDN is more than just a way to save money — it’s a way to lower latency and improve user experience.

The mega clouds (AWS, Azure, and GCP) seem to significantly over-charge for egress bandwidth as a nice profit mechanism, just because they can.

My unpopular opinion is that mega clouds are overrated. They’re fine, but they have a lot of weird gotchas that most people have just accepted as “how the cloud works.”


Vercel is free.


Fun case from Sweden: the central government website for healthcare included js from a third party with no integrity hash. Third party got hacked and they changed the js to mine cryptocurrency over the weekend. Thousands of people participated in mining...


Ooh! Do you have a source on that? I couldn't find anything in English language media.


Maybe this, the website was the swedish police

https://www.privateinternetaccess.com/blog/swedish-police-we...


This is where browsers need to be locked down a bit more. The web api, and available local compute resources should be governed by permissions. Yeah it'll break a few sites at first!


It would be sufficient to just put a red triangle on the tab pointing out this one tab is using high cpu.


Stop using externally-hosted fonts, too! It's 2020. I don't visit websites to marvel at their beauty; I just want to read the article. So I'm not going to compromise my privacy just to read seven or eight paragraphs of text in Myriad instead of Helvetica.


You're not everyone's typical customer. Stop generalising personal preferences


He is not generalising, you are. Why is that some personal preferences are met with such hostility here (unless carefully wrapped with disclaimers that it's meant only personal)?

I am running my own recursive DNS resolver and some sites pull crap from such a sheer variety of sources that before it all gets cached, the load times remind me of dialup 20 years ago.


If you have problem with my website font, feel free to do whatever in your power or put it on screen reader mode and get it done with, why should I change my font to impress you? That's what I dont like when people stress their personal preference here.


Also he exactly said Please stop using external fonts in 2020. I dont know how that wasn't aggressive.


when i visit a site and it has an ugly font nowadays i just click away or use a browser extension that prettifies the content(and erases any formatting the author may have intended)

if you're ok with that then by all means, use some ugly font. ime most authors aren't though...


Whichever font a designer uses, it won't look any worse if they store it on their own server instead of link to it on fontawesome.com or fonts.google.com


> So I'm not going to compromise my privacy just to read seven or eight paragraphs of text in Myriad instead of Helvetica.

uBlock Origin allows specifically blocking external fonts.

> I don't visit websites to marvel at their beauty

Using the right font is rarely about beauty.

Yes, it's often aesthetically pleasing. But good fonts can have lots of practical effects for people, especially those who are disabled or aging.

It's fine for you not to care about fonts and to block them, and it's fine for people to host fonts themselves (if the license allows), but it's misinformed to complain that all non-standard fonts are a meaningless aesthetic contrivance.


The alternative to "link to custom fonts on a third-party server" isn't "don't use custom fonts" though, it's "host them on your own server".

In any event, the point about accessibility is wrong because it's more common for designers to use custom fonts that are harder to read than easier.


I just host the font files with my website. This tool helps with that:

https://google-webfonts-helper.herokuapp.com/fonts/


Good, host them on your own server. It's one point of failure fewer, anyways!


Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: