Please stop using CDNs for external JavaScript libraries

alibarber · on Oct 11, 2020

I started my career, and have spent most of it, working in places where the production network was (almost) airgapped from the entire internet (MPAA accredited facilities). I would say that the general quality of software and and robustness when it comes to dependencies is so much greater in these places. If you want to use some library, it’s up to you to get it, and its dependencies, check the versions are compatible and package it up and build it all internally. Yup, it’s work... Do you need this library? Is it actually any good? Is the license compatible with our usage? This is all basic code quality stuff that’s often completely overlooked when people can just pull in whatever junk from whatever trendy repo is the hotness nowadays. And then when that goes down/bankrupt - it’s up to you to fix something you’ve no control over.

bob1029 · on Oct 11, 2020

We provide software that runs within very secure financial networks, and have some extreme constraints regarding what sorts of 3rd party code we can pull in. We are having to do a vendor swap on some document parsing code because one of our clients scanned a dependency and found it could be vulnerable in a context that it would never be exposed to in our application. These types of things make it really risky to go out and build your enterprise on top of someone else's idea of a good time.

Virtually everything we do is in-house on top of first party platform libraries - i.e. `System.`, `Microsoft.`, etc. We exclusively use SQLite for persistence in order to reduce attack surface. Our deliverables to our customers consist of a single binary package that is signed and traceable all the way through our devops tool chain, which is also developed in-house for this express purpose of tightly enforcing software release processes.

This approach is certainly slower than vendoring out everything to the 7 winds, but there are many other advantages. Every developer knows how everything works up and down the entire vertical since its all sitting inside one happy solution just an F12 away. Being able to see a true, enterprise-wide reference count above a particular property or method is like a drug to me at this point. We are definitely over the hill and reaping dividends for building our own stack. It did take 3-4 years though. Most organizations cannot afford to do what we did.

postpawl · on Oct 11, 2020

Couldn’t this end up being disaster with a large codebase and even a small amount of turnover? Are you really advocating that someone should write all their dependencies themselves even if they can afford it?

jasonkester · on Oct 11, 2020

I’m find it amusing that we’ve reached the point where developers can no longer imagine a shop that writes its own software.

bob1029 · on Oct 11, 2020

I think the problem is mostly cognitive. The codebase should be viewed as the most important investment that a software company could ever hope to possess.

Within the codebase, the infrastructure and tooling is by far the most important aspect in terms of productivity and stability of daily work process.

If you take the time to position yourself accordingly, you can make the leverage (i.e. software that builds the software) work in virtually any way you'd like for it to. If it doesn't feel like you are cheating at the game, you probably didn't dream big enough on the tooling and process. Jenkins, Docker, Kubernetes, GitHub Actions, et. al. are not the pinnacle of software engineering by a long shot.

codebje · on Oct 11, 2020

A company's codebase is a liability, not an asset - it needs to be maintained, and as you point out, it needs money spent on tooling and infrastructure to be most effective.

Unless you happen to be one of the very rare companies that sells source code and not built artefacts, your asset is the built artefact and your code is the expense you take on to get it.

Having less code to get the business outcome only makes sense when you see the code as a cost, not a thing of value itself.

dragonwriter · on Oct 11, 2020

> A company's codebase is a liability, not an asset

Its an asset. Like many (virtually all, other than pure financial) assets, it has associated expenses; maintenance, depreciation, and similar expenses are the norm for non-financial assets.

> Unless you happen to be one of the very rare companies that sells source code and not built artefacts, your asset is the built artefact and your code is the expense you take on to get it.

No, things that are instrumental to producing product are still assets, not just the things that you sell. That's true if its machines on your factory floor, if its the actual real estate of the factory, or vehicles that you use to deliver goods. And all of these assets, like a codebase, have associated expenses.

The whole "code is a liability, not an asset" line is something from people who might understand code, but definitely don't understand assets and liabilities.

nullsense · on Oct 12, 2020

From a pure accounting perspective, yes, but the whole "code is a liability, not an asset" crowd aren't using the strict definition of it by any means.

Its more just a mental footnote that the contract with the customer is what is valuable, and the code is either supporting that value or destroying it. So if you can have the same contract with the customer for less code that's the outcome to strive for.

altcognito · on Oct 12, 2020

I think this perspective comes from the opinion of folks that generally want to cut costs, not leverage their assets to their fullest. If your means of production is an asset that you understand how to leverage over your competitors your mindset won’t be “we really need to minimize this codebase”

thaumasiotes · on Oct 12, 2020

I think the saying is drawing on the same idea as Bill Gates' opinion that measuring progress on software by lines of code written is like measuring progress on an airplane by weight.

Airplanes have many functional parts and many not-so-functional parts. There are parts of the airplane that will, if removed, prevent it from working in various critical ways.

But from another perspective, all of those parts, decorative, functional, or essential, are liabilities, dragging your airplane back toward the ground when you want it to stay up in the air. The fact that a particular piece is important doesn't mean it's less of a problem having it; it means you have to suck it up and work around the problem.

Source code is like this. The mere fact of its existence causes problems. Some of it you can't do without. But it's causing problems anyway, and if you can do without it, you want to.

username90 · on Oct 12, 2020

> Source code is like this. The mere fact of its existence causes problems.

So you delete your source code after you ship an app? That doesn't make sense. The source code is one of every software company's greatest assets which is why we have created so many tools to keep track of it like version control, automated testing etc. Otherwise there would be no point in clean code, code documentation etc, just write something that solves the problem and done! No need to even check in the code, just build a binary locally and ship!

I understand the point, but saying that code is a liability and not an asset is false no matter how you look at it. Source code solves a lot of problems you can't solve with binaries, it lets you adapt to change much better. So instead of saying that code is a liability, use the business saying "Focus on your core business", meaning don't write code for things that isn't your core business.

thaumasiotes · on Oct 12, 2020

> I understand the point, but saying that code is a liability and not an asset is false no matter how you look at it.

I didn't say that. I said it was a liability. "Don't write code for things that aren't your core business" doesn't tell you that you should try to minimize the amount of code that addresses your core business. But you should.

jjk166 · on Oct 12, 2020

You misunderstand the analogy. An airplane could be very heavy because it is a very large aircraft with a lot of payload capacity - that would be good. An aircraft could also be very heavy because it is inefficiently built and thus have very little payload capacity. Similarly an aircraft could be light because it is small, but still be inefficiently built, or the reverse. Bill Gates is complaining about evaluating an aircraft by its weight because its weight doesn't on its own tell you about any of the stuff you really care about with an airplane.

Likewise, code could be many lines because it is well formatted and robust. Alternatively it could be long because there's a lot of repetition and bloat. It could be short because it is very targeted to what it needs to be, or it could be very short because the developer used a lot of unreadable code-golf tricks.

In any complex system, you have optimization problems. Very rarely is the answer to an optimization problem simply to minimize an isolated variable. You can not say a lighter airplane is better than a heavier one without asking why it's lighter; likewise a smaller codebase with fewer lines can not be called better unless you know why it is smaller.

Just because a plane could get off the ground without something does not mean that cutting that thing to save weight will move the plane closer to its optimal design. Likewise just because the code base could be reduced does not mean that actually makes it more maintainable and better performing.

imtringued · on Oct 12, 2020

It comes from the fact that code depreciates like a car. Yes it does make you a profit (gets you to work) but after 20 years nobody wants your code (car) unless you have done rigorous maintenance. Having the biggest codebase(car) is not a benefit. It's kinda like measuring your software product by how many cups of coffee your developers have drunk.

dragonwriter · on Oct 12, 2020

> It comes from the fact that code depreciates like a car.

If a business owns a car that they use to make a profit, guess where the car is in the balance sheet?

> Having the biggest codebase(car) is not a benefit.

Sure, lines of code is not how you measure the value of code as an asset, just like tons of gross weight isn't how you measure the value of vehicles as an asset.

That doesn't mean code, and vehicles, aren't assets.

“Lines of code are a measure associated with maintenance cost, not asset value” is a reasonable statement. “Code is a liability, not an assset” is not.

nullsense · on Oct 12, 2020

No, it comes from wanting to leverage your asset to the fullest. But that's hard to do if it's full of cruft and getting in the way of actually delivering value.

More lines of code is not equal to better. What you need is better lines of code, which usually means less of them.

Let a dev loose on a codebase and they could add value, but they could also be subtracting it. Hmm maybe developers are the liability.

zelphirkalt · on Oct 12, 2020

One needs to keep in mind though, that minimizing a code base can have significant cost as well. (1) the cost of doing the minimization itself (2) the cost of runtime or compile time differences if any. Also one needs to keep in mind, that less code does not mean easier to maintain either.

Perhaps the amount of code is the wrong metric to optimize. Perhaps it is readability, simplicity, maintainability that we really want and those are harder to put into numbers than LOC.

andrewjl · on Oct 12, 2020

> Its an asset. Like many (virtually all, other than pure financial) assets, it has associated expenses; maintenance, depreciation, and similar expenses are the norm for non-financial assets.

To get technical about it, code whose discounted future maintenance costs exceed the discounted revenue it is expected to bring in or helps to bring in is a liability. Some maintenance costs are invisible, such as having team members leave; leading to recruiting expenses to seek a replacement and the associated ramp time when they're hired. Companies are often not equipped to assess the cost side in any way that approaches reality and just stick the cost of the "code" on the balance sheet as capex. Having an asset on the balance sheet after doing this doesn't mean you have an asset in reality.

> The whole "code is a liability, not an asset" line is something from people who might understand code, but definitely don't understand assets and liabilities.

Code can be a net liability. Unless you are looking purely at the asset side of the balance sheet and ignoring liabilities, which tends not to be very useful.

nickbauman · on Oct 12, 2020

I think many folks forget that nobody wants to use your software. They want something that maybe your software can help them get maybe. But if the problem could be solved without your software, they'd try to remove your software ten times or more if they could.

cloudhead · on Oct 11, 2020

This is misleading and pretty much wrong. It’s like saying the hen is a liability, the only asset is the egg. Or like saying your team is a liability and the only asset is the work they produce..

sudhirj · on Oct 12, 2020

But these are both true. The problem is only if you assign a derogatory meaning to the word liability. If the company could provide the solution it does without a team, or with a smaller team, it would - the purpose of the company is to provide the solution, not feed team members. If we could get eggs without taking care of hens, we would - they're a pain to take care of and feed.

The moment you want to minimize something while still achieving your goals you know it's a liability. Do you want the same profits or solutions with a smaller team? Then the team is a liability. If team were an asset you'd be trying to hire a bigger team without any work for them to do. If you had the chance to double egg production with constant demand, you'd eat or kill half your hens. They're a liability.

robertlagrant · on Oct 12, 2020

> The problem is only if you assign a derogatory meaning to the word liability

That's only true if you assign an appropriate meaning to the word derogatory.

iainmerrick · on Oct 11, 2020

A company's codebase is a liability, not an asset

Why not just throw it away, then?

goodside · on Oct 12, 2020

It’s a liability in the sense that a rusty drain pipe covered in duct tape is a liability. You think of it as the source of your problems, but throwing it away would be worse.

robertlagrant · on Oct 12, 2020

> It’s a liability in the sense that a rusty drain pipe covered in duct tape is a liability. You think of it as the source of your problems, but throwing it away would be worse.

So in your analogy, what's the software equivalent of replacing the drain pipe?

codebje · on Oct 11, 2020

If you can, you should.

If you can't, it's (IMO) because you're stuck with the burden of having to write and maintain that code.

If it's an asset, why not just write more?

Eikon · on Oct 12, 2020

> If it's an asset, why not just write more?

Companies do write more.

nkohari · on Oct 12, 2020

It's still an asset, it's just that most assets require maintenance.

zepearl · on Oct 11, 2020

> A company's codebase is a liability, not an asset...

> your asset is the built artefact

Therefore, summarized, you mean that the sourcecode needed to generate the resulting <app/service/whatever> is a liability, but that the result can be an asset (if it does generate external revenue, or internally lowers costs ,etc..)?

I personally never thought about this kind of separation - interesting.

codebje · on Oct 11, 2020

I mean that source code comes with costs, often substantial, but has no direct benefits.

It's easy for us as developers to think that source code is valuable - but this leads to problems like never removing code "in case it's needed", or with an in-house dev team developing systems you could get off the shelf.

If code is an asset, then it makes a lot more sense to write stuff yourself: you not only get the artefact, you also get the source.

If it's a liability, then it makes a lot more sense to let someone else bear the costs of that liability, especially if they have economies of scale, except where you can't get your desired outcome other than writing code.

This is of course technically incorrect. It's perhaps more accurate to say that source code requires upkeep and is expensive to maintain. That tends to draw less interest and discussion, because it's "obvious." Except as an industry we're overall pretty lousy at paying the required upkeep on code.

ignoramous · on Oct 12, 2020

I think you have a point. But essentially what you're saying boils down to unmaintainable, untestable, unrefactorable, obsolete code being a liability. I don't think anyone disagrees with that and I personally have been involved with a (semi-popular public cloud) service deprecation myself precisely because it was legacy and simply had to be rewritten to be moved to a shiny new home that used new-age systems owned by another team that was doing a stellar job at upkeeping and innovating on those (though one could easily classify their "innovation" as NIH). The code that they wrote wasn't a liability at all, but in fact, it kept seeing active investment from all engineering and product management angles.

krab · on Oct 12, 2020

Think about a machine in a factory. You have to maintain it and it doesn't generate revenue by itself. It generates the product you then have to sell. The machine is still an asset.

taurath · on Oct 12, 2020

It always comes down to resources. In my experience an underdeveloped or funded NIH project happens far more often than most tech teams would admit. That said, when it’s only lean on vendors and no invention is happening then it’s basically an expanded IT department and will be eventually treated as such.

postpawl · on Oct 11, 2020

The software community has built a lot of extraordinary tools that have been through a lot of battle testing. Pretending those lessons aren’t worth something and thinking you can do it all yourself is a mistake a lot of the time.

ratww · on Oct 11, 2020

If you're taking about foundational dependencies like OpenSSL, Linux, LLVM, or even jQuery and React, sure. Also most stdlibs and DBs, like GP said he uses, also fall into that.

Dependencies in general, like 95% (or more) of the kind we see in modern package managers? No, they're mostly untested liabilities and the majority of them could be rewritten in an afternoon.

This whole discussion is a bit strange. GP clearly uses dependencies, just not as much as everyone else today. I don't understand why the fixation with polarizing the discussion into "use lots of dependencies" vs "write everything from scratch".

bigiain · on Oct 11, 2020

"The software community" also built npm, which then enabled "extraordinary tools" like leftpad.js... I suspect there's a lot to be learned from "those battle-tested lessons", but pretending everybody is learning them is a mistake.

Dependancies are inevitable.

I'm OK with depending on glibc and _almost_ as OK with depending on openssl. And of course I inevitable rely on gcc/clang/CPU microcode/transistor photolithography...

But those examples are a world away from npm pulling in 12+ levels of random dependancies of unknown origin buried so deep it's all but impossible to audit - and the segment of "the software comm8unity" who built that particular house-of-cards and then promotes it blindly to the segment of the software community who obliviously use it to generate code that runs on expensive production platforms in business critical applications just boggles my mind.

(And makes me alternately weep and drink heavily - having to support exactly that in production because "velocity" and "moving fast and breaking things" are considered more important that quality or security by both management and clients - both of who will happily point the finger of blame elsewhere when the inevitable happens, in spite of having been repeatedly warned... :sigh: )

Gene_Parmesan · on Oct 12, 2020

You definitely have some valid points. On the other hand, I just did a project at my company that had dynamic PDF generation from web-app data as a requirement. Being a shop that in no way has the experience, time, or budget to build a dependency-free in-house tool to do this, I instead was able to find the jspdf library, pull it in from node, and implement the functionality in an afternoon.

Believe me, as someone who cares about code, I far prefer a world where I know how every line of code in a system I'm building works. In fact, in my free time, I do just that. But my job as an in-house software dev for a non-tech company is to solve business problems with technology, and as with every business problem, there is an acceptable level of risk that you need to be OK taking. In our case in this post-COVID world, people's jobs are literally depending on us iterating quickly.

bigiain · on Oct 12, 2020

Oh sure. I'm totally not saying that the right point on the velocity/quality spectrum is always right over at the quality end (or even that those two choices are fundamentally orthogonal), and by using the phrase "an acceptable level of risk" you're demonstrating you're the sort of developer that understands there _is_ risk. And I'd assume that means you'd have pushed back if someone had instead "and this PDF generator needs to run on the same servers and have access to the databases with all our PII and financial information on them".

I spend a lot of time talking to "full stack developers" who came out of graphic design into front end web dev and then fell into nodejs backend dev, who's only "architecture" course was about drawing buildings and who's main complaints about security requirements are that bars across windows look ugly. A startlingly large number of them don't even know there are things they don't know. I'd say fewer than half of them could tell you what OWASP was, and fewer than 10% of them could tell you what SQLi or XSS was, and whether their sites need to consider them as attack vectors. Most of them just say "I use $frameworkDeJour, it handles all the security stuff!"

(BTW, last time I needed to supply dynamic PDFs in a web app, one of the good fullstack-via-FE-and-graphic-design devs build the PDF generation in the browser (using some random js/pdf library) so we could just feed it JSON from the backend. Made me sleep better at night doing it that way...)

strken · on Oct 12, 2020

Out of curiosity, why would you expect 5x more people to know what OWASP is than to know what SQL injection is? I would have thought it would be the other way around.

bigiain · on Oct 12, 2020

In my head at least, knowing of the existence of a respected/curated list of problems/solutions in your area of expertise is likely to be more widespread than knowing the details of specific items on that list.

I know _of_ HIPPA regulations, but not being in either the US or in healthcare records, I have very very vague notions of the HIPPA requirements. Same with PCI compliance - I know there are important rules and requirements, which I don't fully understand the details of, because I choose to use 3rd parties like Stripe to handle all my CC processing so that those requirements don't apply to me (with there exception of needing to understand the risks of webapps and problems like XSS in the context of Stripe-powered CC forms).

dahfizz · on Oct 11, 2020

The community had produced 1000x as many tools that are garbage. It can sometimes be hard to tell the difference (if the developer cares to look at all).

rimliu · on Oct 12, 2020

It is not uncommon to find cases where the whole monstrous library was brought in just because someone needed 5-lines trivial function. And these add up. And then we all get surprised how come this page weights megabytes and barely does anything fancy besides screwing-up scrolling.

onion2k · on Oct 12, 2020

I think this is simply the recognition that our expectations of what team of developers can do in a given time has surpassed what that team can actually write themselves. We know understand that teams have to lean on open source, or buy libraries, in order to achieve what we expect within a reasonable time.

The reason is very obvious - the most expensive part of most projects is the people. Why pay them to write code that you can just download for free? That's a tremendous waste of money.

robertlagrant · on Oct 12, 2020

This is exactly it.

adrianN · on Oct 12, 2020

Let's start a car company by first building an iron smelter? Using available tools is not some kind of crazy idea that software engineers came up with in the last ten years.

pdimitar · on Oct 12, 2020

You should raise that point to all the clueless shareholders who relentlessly pulled even technical control out of the hands of the techies.

Most of us just do what we're told and we're in no position to question the "company's priorities" -- nevermind the fact that very often those priorities actually align with the techie's vision.

srtjstjsj · on Oct 12, 2020

If a company isn't competent enough to even check their dependencies for vulnerabilities, as GP's company is, how could they be competent enough to write and check their own alternative versions?

pwdisswordfish4 · on Oct 11, 2020

Original commenter advocates for writing your own, presumably "from scratch", and mentions high-risk targets. Even if you don't take those for granted, though... Let's assume lower risk than a studio, and you relax the conditions from developed-in-house to maintained-in-house (e.g., a library exists, so you go grab it, and by the power of open source, internally it's now "yours"—a fork that you take full responsibility for and have total control over, just like if you had developed in-house, except you're shortcutting the process by cribbing from code that already exists.)

Here's an unrecognized truth:

The cost of forking a third-party library and maintaining it in-house solely for your own use is no higher than the cost of relying on the third-party's unforked version. Depending on specifics, it can actually be lower.

Note that this is a truth; the only real variable is whether it's acknowledged to be true or not. Anyone who disputes it has their thumb on the scale in one way or another, consciously or unconsciously.

jefftk · on Oct 11, 2020

You're really claiming this is a universal truth? Do you think it applies to OpenSSL? Chromium?

pwdisswordfish4 · on Oct 12, 2020

I didn't say it was universal. OpenSSL and Chromium are outside the scope of "CDNs for external JavaScript libraries".

(I actually included the appropriate hedging to clarify that my comments are scope-locked to that topic and to prevent digressions like this, but I edited it out because it made the comment too hard to read. Goes to show...)

jefftk · on Oct 12, 2020

Reading your "the cost of forking a third-party library and maintaining it in-house solely for your own use is no higher than the cost of relying on the third-party's unforked version" I didn't realize at all that you were trying to only talk about JavaScript that runs client side. I think your deleted clarification would have been helpful!

pwdisswordfish4 · on Oct 12, 2020

> I didn't realize at all that you were trying to only talk about JavaScript that runs client side

Well, not just client-side JS; server-side, too, or anywhere that NPM is used, but even more than that: e.g. other package managers that were influenced by or work similarly to NPM and encourage a similar package-driven development style, e.g. Rust's crates.io or the Go community's comfortability with importing by URL. It applies for many of those cases, too, it just wasn't the focus of my comment.

ntauthority · on Oct 11, 2020

For cryptography one should just be able to rely on their OS' library, and depending on a full browser with high amounts of code churn, no compatibility for implementation code and a large dependency graph of its own is not really seen as a good thing in this context at all.

jefftk · on Oct 11, 2020

Let's be more specific: do you think Brave would be better off if they hard forked Chromium?

Godel_unicode · on Oct 11, 2020

Chromium ships with Windows and is maintained by Microsoft. Use the OS crypto library.

jefftk · on Oct 11, 2020

libjpeg?

Specifically, I think hard forking is a bad idea for any sort of library that needs to be regularly updated for compatibility or security reasons.

Godel_unicode · on Oct 11, 2020

That's possibly true if you don't have headcount for doing that maintenance. If you have appropriately planned for it however, it's just more software that you're writing to do the work you need done.

If you're depending on some random person on the internet to update software which underlies your whole stack, then when the next imagetragick drops you can't update until they get around to fixing it. Since you won't have developers familiar with the code, fixing it won't likely be feasible for you. That's a lot of risk.

14u2c · on Oct 12, 2020

> Note that this is a truth; the only real variable is whether it's acknowledged to be true or not. Anyone who disputes it has their thumb on the scale in one way or another, consciously or unconsciously.

Could you elaborate on the actual argument for why this is the case. On the surface it seems like the opposite of what you are claiming can just as easily be true depending on the situation. For example take a widely used utility lib such as lodash or jQuery. In your scenario there are two options:

1. Use lodash via a package manager and rely on the lodash team to fix bugs, write tests, and add new useful utilities over time.

2. Fork lodash and take on the maintenance burden yourself. You are responsible for keeping up with security vulnerabilities and making patches. You are responsible for writing high quality tests for the parts of your code that diverge for the original.

Think about how much ramp up time is required for new hires to become familiar with a company's codebase. Why would you ever want to devote that amount of time to maintaining code that for the vast majority of businesses is already "good enough". For some companies the engineers working on the open source project may even be more competent than the resources available in house. Sure there may be edge cases were performance or security is absolutely paramount, and in those this approach may make sense, but not for the majority of generic CRUD apps.

I would have a very tough time trying convince a competent manager that it is beneficial to devote so many man hours to this task rather than to business logic, new app features, etc.

alibarber · on Oct 12, 2020

I'm not really saying you should write anything from scratch that you don't have to, just that you should treat a dependency as something that you did. Therefore, review it, check compatibility, and have some named team responsible for its maintenance and availability.

cs02rm0 · on Oct 11, 2020

Even without turnover, I've no idea how you're supposed to compete with the quality of top open source software. Maybe no one finds the bugs in your software, but they're definitely there.

bob1029 · on Oct 11, 2020

We don't try to compete with the quality of top open source software. Our stack fundamentally consists of:

C# 8.0 / .NET Core 3.x / AspNetCore / Blazor / SQLite

Of these, SQLite is arguably the most stellar example of what open source software can provide to the world.

Everything else in our stack consists of in-house primitives built upon these foundational components.

Godel_unicode · on Oct 11, 2020

It's funny to me the number of developers who have effectively forgotten that Microsoft exists, and that it's possible to have your entire stack be provided by one company who directly sells it's software for profit.

steverb · on Oct 11, 2020

That's really interesting. My team's stack is the same except we use Azure SQl server instead of SQLite.

I'd love to understand why you chose that.

Feel free to hit my up at the address in my profile if you don't want to talk here.

baud147258 · on Oct 12, 2020

Perhaps the parent's situation ("software that runs within very secure financial networks" cf https://news.ycombinator.com/item?id=24747781) prevents them to rely on an external service?

bob1029 · on Oct 12, 2020

We cannot rely on anything outside the client's secure network. There are a few exceptional items that are allowed to talk to the internet, but our system's data persistence layer is not one of them.

Effectively, our client's operations cannot rely on cloud services and all of the related last mile connectivity into their infrastructure. If AWS/Azure/et.al. go down, many of our customers are still able to continue operating without difficulty.

danielheath · on Oct 12, 2020

In the frontend space?

When I go looking for a dependency, I check the license and I have a quick read of the code.

Last time I did that was for autocomplete. I checked the most popular six options; in the space of 2 hours, I found obvious-from-reading-code bugs in all 6.

None of them had a CLA signed by contributors, so there's really no evidence their code is genuinely available under the license they claim to offer.

I wrote my own. It took about 3 hours initially plus 2-3 hours ironing out edge cases found over the following weeks. It only added 700 bytes to my bundle.

Total time spent: 1 days work. Smaller code, loads fast, free from license issues, does exactly what I want.

alibarber · on Oct 11, 2020

It depends on the context. Yeah it'd be lovely to have your system connected to the internet - but no, our clients wouldn't give us money if you do that, and we need money. So, maybe yes in this case, start writing some quality, well thought out maintainable libraries (that can include audited third party code), and just bill it. In my case, the cost to the client of a team of devs working on that was less than the cost to them of the risk of a film leaking...

[Edit] - But what I have found through experience is that the code that was written under these constraints seemed to be better, more secure and robust, than without them. YMMV.

brixon · on Oct 12, 2020

A lot of time when you are doing high critical environments you tend to be less cutting edge and rely more on old proven technologies that is boring, but dependable.

jrumbut · on Oct 11, 2020

I assume if you have the money to be that thorough you have the money to offer some inducements to stick around.

Plus if all you know is this custom stack, where are you gonna go?

Godel_unicode · on Oct 11, 2020

Anywhere that hires good software developers, since if you learned this stack it's presumably not hard to get a job somewhere else? Exooglers have a pretty easy time getting hired.

jrsj · on Oct 11, 2020

This is probably better from a quality perspective and also probably not worth the time it would take in 90% of projects.

That being said I'm amazed how much production software depends on multiple libraries that are developed and maintained by a single person as a hobby.

ClikeX · on Oct 11, 2020

That one person (sometimes) puts in more effort into that single library then I've seen some agencies put into a client project.

wwweston · on Oct 11, 2020

Intrinsic vs extrinsic motivation.

It's pretty common for someone building a library to do so for the utility, scratching a real itch. Not universal (people build libraries to be someone who built a library, too), but common.

It's very common for agencies to be building client projects in order to bill for it. Sometimes there's a special alignment where the client is also primarily interested in spending an allotted budget so long as there's a plausibly adequate deliverable.

typon · on Oct 11, 2020

It's hard for me to think of open source libraries that are developed and maintained by mega corporations that I actually prefer to use over libraries made by small indie developers. The only one that I can think of is pytorch, but thats kind of unfair since it was acquired by Facebook not developed from scratch by them.

srtjstjsj · on Oct 12, 2020

Golang is pretty well liked.

And Guava in Java land, and Abseil in C++.

It's not too surprising that the stuff big companies make for their needs isn't as useful for small indie devs as stuff small indie devs make.

tracker1 · on Oct 11, 2020

For me, the biggest is React... aside from that, not much really.

alibarber · on Oct 11, 2020

Sure, everything's a tradeoff - but sometimes I see things like that as frontloading the pain of when bndlrrr.io or whatever goes down in the middle of the night and your client is angry at you. But yeah, like everything in software, it's a spectrum and 'best practice' and the 'right way' are highly dependent on the context.

madeofpalk · on Oct 11, 2020

Yeah like as a frontend/web developer, most of what we do is make what are essentially Wordpress themes for some company that really doesn’t matter or do much important.

benhurmarcel · on Oct 12, 2020

https://xkcd.com/2347/

ex_amazon_sde · on Oct 11, 2020

Ex-Amazon SDE here. The same happens in FAANGS: tons of software is written and served internally.

It's not NIH syndrome, usually. It's about having control over the whole software supply chain for security, reliability, licensing compliance and general quality.

Twirrim · on Oct 12, 2020

As another ex-Amazon... there's a whole bunch of NIH going on there too. The way I saw it, it tended to be split three ways:

1) NIH. Almost always the problems that need solved are interesting, and engineers are naturally chomping at the bit to solve them. Added bonus you can potentially make a name for yourself. This happens way more than it should, in cases that don't meet the other two ways I saw. Solving problems that have already been solved very effectively and efficiently, in a mature low friction fashion.

2) It doesn't scale to needs. A lot of software just doesn't scale to the requirements of the platform. It's hard to understate just how much traffic and work a lot of Amazon infrastructure has to handle. Most software doesn't scale that well because it's not run in so big an environment. We're using some well known commercial software at my current employers (because it works, has a good reputation, and did everything we need), that is experiencing major scaling issues because we're literally orders of magnitude larger than any of their other customers. We're seeing stuff they've never had to deal with before. We're not even close to Amazon's scale for this particular type of software.

3) Need to control the entire software stack, have the ability to drastically modify it to meet the changing demands placed on it. A lot of public software is written to meet one need, and it rarely changes that drastically over time. That's not what the consumers want, even though needs change over time. Change your software too much and you'll lose your existing users that fundamentally need what the software is providing. You can see the boom and fall of it all with so many projects. Take a look at what's happened with Chef and Puppet, for a quick off-the-top-of-my-head example.

ex_amazon_sde · on Oct 17, 2020

> there's a whole bunch of NIH going on there too

That's why I wrote "usually".

srtjstjsj · on Oct 12, 2020

4) the most important reason: open source stuff isn't built to integrate with the decades old custom stack cruft

dragonwriter · on Oct 12, 2020

> It's not NIH syndrome, usually. It's about having control over the whole software supply chain for security, reliability, licensing compliance and general quality.

Maybe, though that's usually the exact rationalization given for NIH syndrome. I mean, “NIH syndrome” is never the stated reason for anything.

notacoward · on Oct 12, 2020

Bingo. I recently left my job at a FAANG, and there are the top three reasons I saw for code to be written locally.

(1) Need to interact with other internal systems.

(2) Pure NIH.

(3) Need to scale further than outside solutions.

1 and 3 are closely related. There are quite a few legitimate category-3 internal services for provisioning, configuration, service discovery, monitoring, upgrades at various levels, fault remediation, etc. Any other production service would have to interact with most or all of them. It's often easier to build something local than to add all of those "touch points" to an open-source project. I know because I did both while I was there.

But pure NIH is very close behind as a reason. Despite all protestations to the contrary, engineers get far more "impact" for creating new things than for fixing old ones. It's hard to get somebody to do X when their bonuses and raises are better served by doing !X. This ends up amplifying, instead of attenuating, the natural impulse of all engineers everywhere to build new things because it's more fun. People always make up other reasons, and perhaps even believe those reasons themselves, but nine times out of ten those reasons are pure delusion.

ex_amazon_sde · on Oct 17, 2020

> Despite all protestations to the contrary, engineers get far more "impact" for creating new things than for fixing old ones.

In general, yes, and that's a problem. Luckily in some teams it's much better.

> but nine times out of ten those reasons are pure delusion.

If you were in a company/team with such level of true NIH syndrome it's good you left.

lmm · on Oct 12, 2020

I'd say just the opposite. I worked on a 10-million LoC codebase where third-party libraries had to be individually brought in by an internal owner. I suspect most of those 10 million lines of code ended up being ad hoc, informally-specified, bug-ridden, slow implementations of half of some external library.

baud147258 · on Oct 12, 2020

Where I'm working, all the dependencies have to be approved before use and are stored locally (build machines have no internet access), but I haven't found that the quality of the software is better than the other places where I've worked. I think it's more a matter of work 'culture' (not sure if that's the right word, perhaps expectations would be a better fit?), with the idea that we ship once it's good enough and that's it.

baud147258 · on Oct 12, 2020

Though I think it's still a good thing that we're doing it, but it's not enough on its own to get good quality.

echelon · on Oct 12, 2020

Why is MPAA air-gapped? Are they worried about leaks or accidentally using copyrighted material?

Seems arbitrary, but I bet the rationale is fascinating. Could you go into more detail? I'd love hearing a little about your work experience, the industry, process, etc.

boulos · on Oct 12, 2020

Disclosure: I work on Google Cloud.

Your intuition is right: they're afraid of the content leaking.

For cloud providers, this results in ... amusingly long documents. Here's GCP's at 110 pages [1], while the AWS folks were clever and used landscape mode for theirs [2] so that it's only 59 pages :).

[1] https://cloud.google.com/files/gcp-mpaa-compliancemapping.pd...

[2] https://d1.awsstatic.com/whitepapers/compliance/AWS_Alignmen...

srtjstjsj · on Oct 12, 2020

Good to know that Spider-Man movies are better protected than my personal health information.

echelon · on Oct 12, 2020

That's fascinating! These companies protect their assets as if release were an existential risk. I suppose it would impact their bottom line to some extent.

It'll be interesting to see in the future when content is cheap to produce. I predict a complete shift away from this.

boulos · on Oct 12, 2020

You clearly wouldn’t see the movie if you’d seen some final frames that were emailed to you :).

> It'll be interesting to see in the future when content is cheap to produce. I predict a complete shift away from this.

It’s actually one of the most obvious forms of friction, causing an increase in the cost of doing business. A lot of VFX houses take these rules to imply that they must segment their networks and keep workstations completely unable to reach the internet (coming full circle to the airgapped comment at the start). Pretend that your entire development workflow is like being on a plane when the WiFi is down. That’s modern VFX software engineer life :(.

cs02rm0 · on Oct 11, 2020

I've spent much of my career on similarly, perhaps a little more, restricted networks and found the opposite.

Dependencies are painful to pull in and only pulled in when a dev needs a specific version. The internal repos end up being a missing version nightmare where people cobble together whatever works with what's available. Where feature and security upgrades go ignored, left to rot like the brains of the devs who struggle to keep up with what's available in the real world.

Many of those networks I've worked on are becoming more permeable at the edges because the cost of the air gap outweighs any benefits.

tyldum · on Oct 11, 2020

Even Cisco's web-based firewall management interface uses Google Analytics. Granted, it will work just as bad regardless of reachability, so there's that.

dikei · on Oct 12, 2020

Even when you package your dependencies with your code, unless you have policies in place and actively enforcing it, there's no preventing people from bloating the project with all kind of junks libraries.

I've had to deal with legacy projects that include multiple versions of the same libraries, all of them being shaded/relocated so they don't conflict. The result, however, is bloated binaries that takes 30 minutes to build.

physicsguy · on Oct 12, 2020

True to some degree, but you can also end up with very old dependencies which have vulnerabilities if someone is not keeping track of it all.

dvdkon · on Oct 11, 2020

I have to ask, why does the MPAA accredit airgapped facilities? I know movies are a big business, but that seems a little extreme.

mikeryan · on Oct 11, 2020

It’s not airgapping per se. The large studios want to ensure they can send out their content pre-release to the various third-party vendors working on a project. Ensuring the vendors meet MPAA guidelines is a mechanism they use to ensure this. It’s not technically an accreditation but it’s usually contractually enforced if you want to work with a large studio.

You can actually read the whole thing: https://www.motionpictures.org/wp-content/uploads/2020/07/MP...

srtjstjsj · on Oct 12, 2020

Lots of people like to get copies of IP worth many millions of dollars.

rapind · on Oct 11, 2020

Are you saying that when you need to add a space character to the beginning of your strings you write it yourself?!

alibarber · on Oct 11, 2020

Yes, but after getting approval from the board I was able to get a dispensation to use https://isevenapi.xyz/ for some of our calculation services.

davidmurdoch · on Oct 11, 2020

Loading common libraries from a CDN will no longer bring any shared cache benefits, at least in most major browsers. Here's Chrome's intent to ship: https://chromestatus.com/feature/5730772021411840 Safari already does this, and I think Firefox will, or is already, as well.

jakub_g · on Oct 11, 2020

For info, this shipped in Chrome 86 just last week:

https://developers.google.com/web/updates/2020/10/http-cache...

GordonS · on Oct 11, 2020

I'm not sure I understand the threat here. Say I visit SiteA which references jQuery from CDNJS, then later visit SiteB which references exactly the same jQuery from CDNJS - what's the problem?

Phemist · on Oct 11, 2020

I'm guessing many websites are identifiable by which patterns of libs and specific versions they will force you to cache. One SiteA would then be able to tell that a user visited a SiteB (which, depending on the website, may or may not be problematic)

franga2000 · on Oct 11, 2020

I'm sure some sites would be identifiable by their cached libs, but the cache is shared, so any overlapping dependencies would decrease the accuracy to unusable levels. The best you could do is know someone did not visit a site in the last ${cache_time}.

There are, of course, other vectors to consider, but I can't think of any that could be abused by third parties. If anything, isolating caches would make it easier for the CDN themselves to carry out the attack you mentioned, as they would be receiving all the requests in one batch.

flak48 · on Oct 12, 2020

What if my website tries to load a JS file that only foxnews.com loads (maybe with a less restrictive CORS config)?

I'd be able to tell if you visited Fox news recently, correct?

franga2000 · on Oct 12, 2020

It would be extremely unlikely for only one site to use a specific file from a public CDN (like cdnjs). As for site-specific files like JS bundles and other static assets, those would be served on a "private" CDN, usually under the same domain (like cdn.foxnews.com) and with restrictive CORS settings for this very reason (and also to prevent bandwidth stealing).

cedilla · on Oct 12, 2020

A single file? Highly unlikely.

But three specific files can already be pretty unique. I chart.js with two specific plugins in my toy project, and I'm willing to bet that no one else on the world uses the exact same set and version configuration.

franga2000 · on Oct 12, 2020

Exactly, but a third party can't see that set from the cache, they see the union of every website recently visited. They would see hundreds of files from many websites and if only one of those uses one of the three files yours does, it's impossible to tell for sure without a file that isn't used anywhere else on the Web. Your site uses A+B+C, site 2 uses A+D+E, site 3 uses B+F, site 4 uses C. The cache contents is A+B+C+D+E+F+... did the user visit your site? It's like trying to get individual pictures out of a single piece of film that was exposed multiple times - you can make some guesses and rule some possibilities out, but nothing other than that will be conclusive.

notsuoh · on Oct 11, 2020

Like a discount Bloom Filter?

franga2000 · on Oct 12, 2020

It would behave like one, yes.

adrianN · on Oct 12, 2020

You only need ~30 bits of information to uniquely fingerprint someone.

angrais · on Oct 12, 2020

Could you explain this further?

adrianN · on Oct 13, 2020

There are about 8 billion people. 33 bits is enough to give each one a unique number. A whole bunch of them doesn't have access to the Internet, so fewer than 33 bits are enough to identify someone on the Internet.

dathinab · on Oct 11, 2020

It can be used to track users across domains to some degree.

GordonS · on Oct 11, 2020

Thanks, I understand the issue now - I haven't thought about CDNs from a privacy perspective before.

I suppose with HTTP2 some of the benefits of serving JS through CDNs are gone anyway, so I guess it's time to stop using them.

mmcwilliams · on Oct 11, 2020

Not to be dense but wasn't that always the purpose of running a CDN service for common scripts and libraries?

daveFNbuck · on Oct 11, 2020

The script wouldn't have to be from a CDN to track people using the browser cache. I could infer whether you've visited a site that doesn't use CDNs or trackers by asking you to load something from that site and inferring whether you have that resource cached by the time it took you to load it.

mmcwilliams · on Oct 11, 2020

This is true, but if you're running a CDN you have access to cross-domain user information just based on the headers, no?

daveoc64 · on Oct 11, 2020

The CDN is not the place you have to worried about.

If Site A loads a specific JavaScript file for users with an administrator account, Site B can check to see if the JavaScript file is in your cache, and infer that you must have an administrator account if the file is there.

The attack can happen with different types of resources (such as images).

mmcwilliams · on Oct 13, 2020

This I understand, the risk of third-parties monitoring. The attacks are pretty obvious. My confusion is over what the business model of a commercial CDN is if not to track users across multiple sites? How do they pay for bandwidth?

dathinab · on Oct 12, 2020

The problem is not the CDN (or arbitrary shared source domain) being able to track you but the sites which use the CDN.

Furthermore a CDN can't track you as simple as you might think, it often would require thinks which need explicitly opt-in agreements on a per website basis to be legal.

Furthermore due to technical limitations you can only get that permission from the user after the CDN was already used.

CDNs can still track aggregated information to some degree but they can't legally act like a tracker cookie.

amelius · on Oct 11, 2020

How serious is this type of threat? Compared to all the info about us that is already shared by data brokers?

ev1 · on Oct 11, 2020

There are several data-broker-esque "services" that actually do this already with FB, Google, etc assets (favicon.ico and similar, loggedIn urls, ...) to check whether you have visited those pages, or whether you are logged in to those services by trying to request a URL that might return a large image if logged in, or fail rapidly if logged out. -- This has been a thing for a long time: https://news.ycombinator.com/item?id=15499917

If you don't use any of those sites, you're considered higher risk/fraudulent user/bot.

Here's an example of a very short and easy way to see if someone is probably gay: https://pastebin.com/raw/CFaTet0K

On chrome, I consistently get 1-5 back after it's been cached, and 100+ on a clean visit. On Firefox with resistFingerprinting, I get 0 always.

amelius · on Oct 11, 2020

Thank you, that was insightful.

> Here's an example of a very short and easy way to see if someone is probably gay

Ok, but now the resource is in my cache, so from now on they will think I'm gay?

ev1 · on Oct 11, 2020

> Ok, but now the resource is in my cache, so from now on they will think I'm gay?

This resource is just generic, so probably not, but if you actually visited grindr's site without adblocking heavily, they load googletagmanager and a significant number of other tracking services, which will almost certainly associate your advertising profile and identifiers as 'gay'

I also can't believe they send/sell your information to 3 pages worth of third party monetization providers/adtech companies for something that is this critically sensitive.

tzot · on Oct 11, 2020

You could have run this on a private window of the browser (and in that case, they would surely think you're a closeted gay).

14u2c · on Oct 12, 2020

Could this not be solved by by Grindr setting up CORS properly for that resource? It's unlikely anyone would ever open the script directly in their browser.

thefreeman · on Oct 12, 2020

CORS wouldn't help here. CORS prevents you from reading the response or making a cross origin XHR requests, not loading an external resource from a different domain in a script or img tag.

srtjstjsj · on Oct 12, 2020

Fun fact: browsers put scary warnings in their dev console (and some web sites log warnings or console) because some people love copy-pasting code they got from sketchy people trying to bypass all the browser security.

gpvos · on Oct 11, 2020

It's being actively used by the ad networks to do user fingerprinting instead of cookies, since the latter are more and more blocked.

weare138 · on Oct 12, 2020

I guess as serious as any other privacy threats but one that doesn't get enough attention in my opinion. CDNs and web fonts are definitely being used to track us and can bypass mitigations like private mode in your browser and ad/tracker blockers by tracking your IP address across sites.

madeofpalk · on Oct 11, 2020

Try and load assets from another domain and observe if it was probably cached or not, and you can know that they visited the site

convery · on Oct 11, 2020

I guess, but that disadvantage seems massively outweigh by the benefits. Can always use something like [1] to check if a client is active on interesting sites.

[1] - https://www.webdigi.co.uk/demos/how-to-detect-visitors-logge...

Polylactic_acid · on Oct 12, 2020

Are there actually any benefits though? I saw an article a few years back about how when loading jquery from Google's cdn there was about a 1% chance the user had it cached already. Since you have to have, the same library, the same cdn source and the same version of the library, it almost never is the case that the user has already grabbed this recently enough that it hasn't been kicked out.

Plus the trend now is to use webpack and have all of your deps bundled in and served from the same server.

minitech · on Oct 11, 2020

Can’t always use that. It’s much less specific compared to the potential of cache, only works when websites provide that type of redirect, doesn’t work if you block third-party cookies (I think a form of that might already be the default in some browsers), etc.

kami8845 · on Oct 11, 2020

maybe not with CDNJS, but perhaps you don't want every website to know you have AshleyMadison.com assets cached.

Uehreka · on Oct 11, 2020

Can websites even tell what is cached and what’s pulled fresh?

EE84M3i · on Oct 11, 2020

Yes, using timing

evilduck · on Oct 11, 2020

Wouldn't the act of timing a download mean that I download and pollute my cache with new assets from the site trying find where else I've been? Does this only work for the first site that tries to fingerprint a browser in this way?

curryst · on Oct 11, 2020

Is there a noCache option? Or can JS remove entries from the cache to reset it?

Someone below mentioned doing requests for a large image that requires authentication. Short response time means the user isn't logged in (they got a 403), long response time means they downloaded the image and are logged in.

amelius · on Oct 11, 2020

Not if the javascript starts running only after all resources have loaded.

darepublic · on Oct 11, 2020

No there could still be timing attacks after. Just dynamically request a cross domain asset

amelius · on Oct 11, 2020

Then those requests should not be cached?

tylerhou · on Oct 11, 2020

  const start = window.performance.now();
  const t = await fetch("https://example.com/asset_that_may_be_cached.jpg");
  const end = window.performance.now();
  if (end - start < 10/*ms*/) {
    console.log("cached");
  } else {
    console.log("not cached");
  }

amelius · on Oct 11, 2020

In that case, the browser would always load the asset (it is not cached). So the rule would be that only stuff that is directly in the <head> may be cached (or stuff that is on the same domain).

tylerhou · on Oct 11, 2020

To be clear, the context of the thread is "why do we need to partition the HTTP cache per domain." My example code works under the (soon-to-be-false) assumption that the cache is NOT partitioned (i.e. there is a global HTTP cache).

> In that case, the browser would always load the asset (it is not cached).

Agreed, if the cache is partitioned per domain AND the current domain has not requested the resource on a prior load. If the cache is global, then the asset will be loaded from cache if it is present: https://developer.mozilla.org/en-US/docs/Web/API/Request/cac...

> So the rule would be that only stuff that is directly in the <head> may be cached (or stuff that is on the same domain).

You could be more precise here: with a domain-partitioned cache, all resources regardless of domain loaded by any previous request on the same domain could be cached. So if I load HN twice and HN uses https://example.com/image.jpg on both pages, then the second request will use the cached asset.

amelius · on Oct 11, 2020

> To be clear, the context of the thread is "why do we need to partition the HTTP cache per domain."

Ah right, the thread is becoming long :)

> So if I load HN twice and HN uses https://example.com/image.jpg on both pages, then the second request will use the cached asset.

Good point!

thorum · on Oct 11, 2020

On the other hand, the URL for a common library hosted on cdnjs (or one of the other big JavaScript CDNs) and included on many different websites is much more likely to already be cached on edge servers close to your users than if you host the file yourself.

arghwhat · on Oct 11, 2020

The time to connect to the CDN hostname will negate any benefit, especially if push can be used.

Matthias247 · on Oct 11, 2020

You can mitigate this by getting your website itself on a CDN. If this is cached, then it's assets (incl javascript), would be too.

And by going that route you make sure that all pieces of your website have the same availability guarantees, the same performance profile, and the same security guarantees that the content was not manipulated by a 3rd party.

GeneralTspoon · on Oct 11, 2020

> And by going that route you make sure that all pieces of your website have the same availability guarantees, the same performance profile, and the same security guarantees that the content was not manipulated by a 3rd party.

You can already guarantee the security of the file by using the integrity attribute on the <script> tag. And the performance of your CDN is probably worse than the Google CDN (not to mention that you lose out on the shared cache).

Matthias247 · on Oct 11, 2020

I agree on the security side if you use that attribute. However:

> And the performance of your CDN is probably worse than the Google CDN

What means probably? Other CDNs (Akamai, CloudFront, Cloudflare, etc) are also fast.

And by pushing one piece of your website on a different CDN you force your users browser to create an additional HTTPS connection which takes additional round-trips, instead of being able to leverage one connection for all assets. This alone might as well outweigh the performance differences between CDNs.

Also the "shared cache" benefit might go away, if I read the other answers in this topic correctly.

arendtio · on Oct 11, 2020

Does someone know why they don't split the cookie storage equally by the top origin?

I mean, wouldn't that take care of a whole class of attack vectors and make cross-origin requests possible without having to worry about CSRF?

tgsovlerkhgsel · on Oct 11, 2020

One of the problems is that it breaks use cases like logging into stackoverflow.com and then visiting serverfault.com, or (if you do it by top-level origin) even en.wikipedia.org and then visiting de.wikipedia.org. [1]

While privacy sensitive users may consider this a feature in case of e.g. google.com and youtube.com, the average user is more likely to consider it an annoyance, and worse, it is likely to break some obscure portal somewhere that is never going to be updated, so if one browser does it and another doesn't, the solution will be a hastily hacked note "this doesn't work in X, use Y instead" added to the portal. And no browser vendor wants to be X.

[1] The workaround of using the public suffix list for such purposes is being discouraged by the public suffix list maintainers themselves IIRC, so the "right" thing to do would be breaking Wikipedia.

Edit: If done naively on an origin basis right now, it would break the Internet. You couldn't use _any_ site/app that has login/account management on a separate host name. You couldn't log into your Google account with such a browser anymore (because accounts.google.com != mail.google.com). Countless web sites that require logins would fail, both company-internal portals and public sites.

singron · on Oct 11, 2020

It's possible to get around this with a redirect staple. E.g. if Google wants you to be logged in on youtube.com and google.com simultaneously:

1) User logs in at google.com/login and sets google.com cookies. 2) Server generates a nonce and redirects to youtube.com/login?auth=$NONCE 3) youtube.com checks the $NONCE and sets youtube.com cookies 4) youtube.com redirects back to google.com.

Firefox's container tabs can maintain isolation despite this since even this redirect will stay within a container. However there is a usability penalty since the user has to open links for sites in the right container (and automatically opening certain sites in certain containers will enable cross-container stapling again).

srtjstjsj · on Oct 12, 2020

"if"?

webapps.stackexchange.com/questions/30254/why-does-gmail-login-go-through-youtube-com

fastest963 · on Oct 11, 2020

This is being worked on https://github.com/privacycg/storage-partitioning.

edent · on Oct 11, 2020

Oh, that's interesting. I guess it makes sense from a security and privacy perspective.

dndvr · on Oct 11, 2020

Will this cause performance issues for sites that use static cookieless domains for js, images etc

Google themselves do this with gstatic.net and ytimg.com etc

babuskov · on Oct 11, 2020

> Will this cause performance issues for sites that use static cookieless domains for js, images etc

> Google themselves do this with gstatic.net and ytimg.com etc

Most probably not. The point of cookieless domains is that you can use a very simple web server to serve content (no need to handle user sessions, files are pre-compresses and cached, etc.) and it lowers incoming bandwidth a lot. If you have a lot of requests (images, css, js) the cookie information adds up quickly.

Opening video thumbnails from ytimg.com will still be cached for youtube.com as before. The only thing that will change is for embedded videos on 3rd party websites as those won't be able to use caches ytimg.com thumbails from elsewhere.

pferde · on Oct 11, 2020

Couldn't the same thing be achieved by routing e.g. google.com/static/ to a separate simple webserver, instead of using another domain? Or use a subdomain, e.g. static.google.com.

The current way seems like needless DNS spam to me...

dndvr · on Oct 11, 2020

Even if Google used a separate highly optimised webserver for google.com/static/jquery.js, users who are logged in would be sending their auth cookies when requesting the library.

Given that generally people have slower upload than download, shaving off a few bytes from requests is worth it.

I also recall that browsers [used to (?)] limit concurrent requests per domain which this helps work around

deepstack · on Oct 12, 2020

Good! The whole idea for doing js on CDN is suppose to make it easier for entry level front dev to be able to start coding. I think that is great for a school exercise, should never be used in business or production sites.

And on a side note, very unhappy about how the entry to be a developer has lower significantly over the last 10 years or so.

rebelde · on Oct 11, 2020

Wouldn't a better, but partial, solution be for browsers to preload the top x common libraries? All other libraries would probably have to follow this new rule.

ValueNull · on Oct 11, 2020

Isn’t this essentially what Decentraleyes does?

https://decentraleyes.org/

Forbo · on Oct 11, 2020

I've been using LocalCDN, it seems to be more acatively maintained and has a better selection of libraries.

https://www.localcdn.org/

tracker1 · on Oct 11, 2020

What version(s) of those libraries? I mean, I don't deal with this anymore... but I've seen sites literally loading half a dozen different copies of jQuery in the past (still makes me cringe).

codegladiator · on Oct 11, 2020

Please no, don't create such barriers.

admax88q · on Oct 11, 2020

Every other language runtime has a standard library, it's always been a shortcoming of the web IMO

lkschubert8 · on Oct 11, 2020

At that point wouldn't it make more sense to just have the browsers include that functionality?

Polylactic_acid · on Oct 12, 2020

As soon as you add something to JS you have to support it forever. Its better to let sites pick what they need and scrap what they no longer need. JS has already added most of the useful stuff from jQuery. If browsers included a built in version of React it would pretty much lock in the design as it is now without room to remove and replace bad ideas.

dheera · on Oct 11, 2020

I wonder if an nginx plugin could be made to auto-cache CDN javascript/css files and edit the HTML on the fly to serve them from locally.

daveFNbuck · on Oct 11, 2020

You can set up a path that does a cached proxy to the CDN and just edit the HTML yourself. It's a bit annoying to get the cache settings to work properly, but editing the HTML is easy.

oefrha · on Oct 11, 2020

You’re probably thinking about a caching proxy like squid cache.

ex_amazon_sde · on Oct 11, 2020

https://decentraleyes.org/

jabart · on Oct 11, 2020

CDNs are misunderstood these days. Caching at the browser across sites is not that important, it caching at a point of presence (POP). This POP being so much closer to your end users brings performance gains because TCP is terrible over distances. QUIC may fix this by it's shift to UDP. I haven't seen a benchmark yet.

Security is a concern, use SRI.

Reliability can be mitigated with fail over logic to a backup.

The part missed is bandwidth. Using a CDN means your web server doesn't have to serve out static files that you are paying per a GB to serve. Small sites it's not much but it does add up. It's a Content Delivery Network not a Cache Delivery Network.

zamadatix · on Oct 11, 2020

The post is "Please stop using CDNs for external JavaScript Libraries" not "Please stop using CDNs". If a CDN is critical to your site's performance you should put your site on it not internal libraries here and external libraries there. The page mentions this as well:

"Speed:

You probably shouldn’t be using multi-megabyte libraries. Have some respect for your users’ download limits. But if you are truly worried about speed, surely your whole site should be behind a CDN – not just a few JS libraries?"

Also even before QUIC HTTP/2 fixed a lot of the problems with distance as you no longer need to wait for separate handshakes for multiple files to be streamed. QUIC will still give a few advantages but again those advantages would be good to have on your whole site not just a few libraries.

MrStonedOne · on Oct 11, 2020

But wouldn't a site be faster if cachable requests go to a cdn, and no un-cachable requests go directly to origin with no forwarding at the cdn layer?

ehnto · on Oct 12, 2020

Only if the CDN serves assets faster than the origin can. That's not necessarily true by default. Not to mention, if the speed is okay for the initial request, why not the following requests?

The final nail in the coffin for me is that CDNs are a shared resource, if your CDN is getting heavy traffic or otherwise suffering, it becomes your slow point, while your site is fine. I just don't see any upsides worth the tradeoffs.

If you have scale that demands some serious content distribution, that is different, I would argue you shouldn't be relying on public shared CDNs then even moreso. Pay for a CDN service or roll your own.

MrStonedOne · on Oct 12, 2020

CDNs can almost always serve assets faster than the origin can.

Because PoPs are closer and transfer speed ramp rate scales to latency, the further away a server is, the longer it takes the download to ramp up to full speed. This is especially relevant when talking about smaller resources like javascript, css, and small or optimized images, and webfonts.

zamadatix · on Oct 12, 2020

When I say "If a CDN is critical to your site's performance you should put your site on it" I do mean to say "put (those parts of) your site on it" not "if 51% of your site should be on a CDN put the other 49% as well".

But to your question though even un-cacheable content can be "those parts of". There are products from CDNs like https://blog.cloudflare.com/argo/ which combine CDN cache tiering with higher tier network transport to origin servers for all cache misses (or uncacheable content). Again though, it depends on if it's critical to your site's performance or not. If you don't have a bunch of uncacheable content, that content doesn't need the absolute best transport, or the time/money could improve some other part of the site speed more then it's not critical to your site's performance.

EE84M3i · on Oct 11, 2020

Wait, what is the misunderstanding? Aren't these the well known benefits of CDNs?

dathinab · on Oct 11, 2020

CDNs are just a service to handle delivery of static content for you. Their main points (unordered) are:

- reliability

- delivery speed through closeness to user (having nodes all around the world)

- cost

- ease of use

- handling of high loads for you / making static content less affectedly by accidental or intentional DoS situations

That multiple domains might use the same url and might share the cache was always just a lucky bonus. Given that the other side needs to use the exact same version of the exact same library with the exact same build options accessed through the exact same url to profit from cach sharing it never was reliable at all.

I mean how fast does the JS landscape change?

Given how cross domain caching can be used to track users across domains safari and Firefox disabled it a while ago as far as I know, and chrome will do so soone.

m463 · on Oct 11, 2020

go to a webpage that uses a cdn and do view source.

it all looks like https://cdn.example.com/foo/bar.js?v=129a1d14ad3

Matthias247 · on Oct 11, 2020

> This POP being so much closer to your end users brings performance gains because TCP is terrible over distances. QUIC may fix this by it's shift to UDP. I haven't seen a benchmark yet.

Quic can't defeat physics. Performance will still lineary degrade with distance to (edge) servers, and therefore CDNs will stay important.

What Quic however will do is reduce the time-to-first-byte on an intial connection by 1RTT due to one less handshake - which can be e.g. a 30ms win. After the connection is established it aims to yield more consistent performance than e.g. HTTP/2 over TCP. But packets will still require the same time to go from the browser to an edge location, and therefore the minimum latency for a certain distance is the same.

adrianmonk · on Oct 11, 2020

How would failover actually be implemented? If you have a script tag with integrity enabled, and the cryptographic hash doesn't match, what happens next?

From some quick research, it doesn't seem like the script tag has built-in support for this. One could imagine something like multiple src attributes (used as a search order for the first valid file), but that doesn't seem to exist. So it seems like the web page has to do it manually.

Which I guess means you have to have some javascript (probably inline, so you know it's loaded and for performance?) to check and fix the loading of your other javascript.

If it's really that manual, it sounds like it adds cost to implementing this correctly. in other words, it might be one of those scenarios where correctness is achievable, but it's a whole lot simpler to just not do it that way.

jabart · on Oct 11, 2020

Since Script tags are blocking, you can do a undefined check then if that fails, inject a new script tag either local or a secondary CDN.

Link for reference. .Net Core has this built in as a tag helper too! https://www.hanselman.com/blog/cdns-fail-but-your-scripts-do...

adrianmonk · on Oct 11, 2020

Thanks. So it seems like it's not really that bad. Particularly if you are already using some loader tools (and don't have to add them to your build just to get this).

baskire · on Oct 11, 2020

My understanding is that it’s not TCP but the tcp AND tls handshake overhead combined. Where-as quick combined both handshakes at the protocol level.

jabart · on Oct 11, 2020

TCP has the concept of a TCP Window, where its a buffer of the data that the opposite side waits for an ACK packet from before sending more data. Windows defaults to 64kb to start. On your local LAN (which TCP was built for), no big deal, but going across a distance. Then add in one lost packet or one out of order packet and TCP has to ask for it again and delay the whole thing. Its why HTTP/2 has higher latency on spotty 4g networks. The TLS handshake suffers from the same distance issue ACK packets have which with TLS 1.3 there is 0-rtt which removes the handshake as part of the first TCP packet.

QUIC puts everything in UDP, so theoretically its a never ending firehose of data for a download with the occasional "hey, I missing packet 3, 12, 18, please resend". Mimicking TCP but putting the app in control versus the kernel.

baskire · on Oct 11, 2020

Quic also has a window size.

> QUIC congestion control has been written based on many years of TCP experience, so it is little surprise that the two have mechanisms that bear resemblance. It’s based on the CWND (congestion window, the limit of how many bytes you can send to the network) and the SSTHRESH (slow start threshold, sets a limit when slow start will stop).

https://blog.cloudflare.com/cubic-and-hystart-support-in-qui...

Matthias247 · on Oct 11, 2020

Quic kind of has 3 windows:

Per stream and per connection flow control windows, which kind of indicate how much data the peer may send on a given connection before it gets a window update. Those windows also indicate how much the server is willing to store in its receive buffers, since the updates are likely sent when those buffers are drained.

A congestion window, which indicates how many low-level packets and data in them can be in-flight without being acknowledged. Those also account for retransmissions, and packets which do not necessarily contain stream data.

mikl · on Oct 12, 2020

Sure, but if you have any custom JavaScript that needs to load before the page can render, using a CDN for just the JS libraries does not really help.

If you want the benefit of a CDN, you need to put your own code up there. And if you’re doing that, you might as well host your own copy of the libraries too, so the browser won’t have to talk to two different CDNs.

_-___________-_ · on Oct 12, 2020

We bundle everything, including dependencies, up into minified modules and serve our whole frontend via Google Cloud's CDN, using Cache-Control to control the way things are cached. This way there are no third-party requests at all to load the site, and everything is cached close to the user.

tasogare · on Oct 11, 2020

One won’t have loading time problem if one doesn’t ship websites with kilotons of JS crap. Also paying per downloaded content is dumb as it’s easy for an attacker to attack you financially and lot of hosting companies (like OVH) offers "unlimited" bandwidth.

paulgb · on Oct 11, 2020

There are some good reasons in here (especially privacy), but I'm not convinced by the security point. It seems like the example linked about British Airways was JS under the britishairways.com domain being changed, not a third party CDN.

Incidentally, a few years ago when people were loading third party scripts over HTTP, I demoed a fun hack where, if you control a user's DNS, you could redirect queries to popular CDNs to a proxy that injects keylogger code and tells the browser to cache it indefinitely. Because at the time almost every site included either jQuery or Google Analytics, you'd have a persistent keylogger even after the user switched to a more secure connection. How far we've come!