Hacker News new | past | comments | ask | show | jobs | submit login
NPM proxy users receiving ERR 418 I'm a teapot (github.com)
294 points by spondyl on May 29, 2018 | hide | past | web | favorite | 252 comments

2 hours after the incident was responded to by an npm employee and the status is still green: https://status.npmjs.org/

I love fake status pages!

The entire traffic light metaphor of status pages and dashboards is questionable IMHO.

Ahh. The RAG status.

Professional truth-massaging

Overall, the system is up and running. I believe, the issue only affects a subset of users behind proxies.

This is why I make clients drop/replace SaaSs. When SaaS don't update status page, or in a transparent and prompt way, because it's not 100% outage it rages me, especially if the reports are found elsewhere - twitter/reddit/hn.

You force your clients to do things for trivial reasons that send you info a rage?

I don't consider a non working status page trivial. Yes, if a SaaSs send me into a rage then there probably dozens of red flags already, yes I will work with clients to drop or replace such SaaS with one that can communicate, absolutely as it usually falls under my devop remit. This doesn't apply to npm as not a paid Iaas/SaaS, but more to point out the number of shit SaaS that don't manage their status updates probably. Imgix in the past for example, Linode when Ddos - absolutely shambolic communication, and so on

Wouldn't that be more of an issue of not having an SLA between SaaS and client?

If the silly non-automated dashboard is part of the SLA, then it costs someone money/liability/trust to not maintain it, otherwise "who cares as long the issue gets resolved, people who care about the issue are tracking the bug report?"

Except this IS a working status page.

I wasn't curl'ing and greping for 200... I think that was obvious. Things aren't working for a number of users, what's going on, if I found out here or twitter first then you've failed. At least tell me via your status page you're aware of reports (you can even use the bullshit phrase - a 'minority of our users') and investigating... Operating a status page ain't rocket science although hmmm...

This is exactly what a status page should communicate.

Then put it on yellow or something.

And ruin the marketing value of long history of status page showing green?

I feel some companies have the following meaning for the lights: yellow is when servers are literally on fire. Red is when the CEO is bleeding out somewhere on the floor, and the Feds are about to bust in and close the shop.

AWS certainly operates that way, but even with a third level: green check mark with an info note box. I think I've only seen a "red" AWS status once or twice in the past few years.

Before AWS updated their status page to be a little more usable, I used a Chrome extension to remove green status services from the page. Funny enough, that extension also incremented every status (info check box became warning, warning became red), which was far more accurate in my experience.

For red can we just call that a Cambridge Analytica?

Why? It appears to affect a tiny subset of a tiny subset of users.

If it’s not significant (and it’s not) then it shouldn’t change to a medium grade outage status (amber).

At best a notes portion of a status page could mention it.

The status page is telling the truth if this is a client side issue.

Yes, that status page is only for the registry.

This error was in how clients were appending ports to the HOST header.

This seems like playing semantics rather than focusing on what your customers care about...especially since the issue was fixed on the server.

The status page doesn't display what customers care about. It displays whether the servers are up and reachable. If this effected everyone I see your point, but it didn't. Note that most NPM "customers" don't pay a dime.

First, the HTTP spec _mandates_ that a port number be appended when it is not the default, so a server has to accept it. Not a client issue at all. It was fixed on their service, not the clients.

Second, the service should respond with 400, not a funny 418 when it fails to parse the host field. This is also a bug on their end.

Finally, where does payment come into play? Does anybody expect bugs in homebrew / yum / aptitude / rubygems to go unfixed because those are not paid services?

> Finally, where does payment come into play?

Merely commenting on the word customers: npm has few customers but a lot of users.

> Second, the service should respond with 400, not a funny 418 when it fails to parse the host field. This is also a bug on their end.

Is it a bug that on error youtube serves a 200OK page that says that "A team of highly trained monkeys has been dispatched" when their is an actual error? [EDIT: they are since returning 500 errors now, not 200 anymore, though they did do this in the past]. Note that 418 is an actual HTTP status code, for all we know the server was actually a tea pot . If this is the case is it not a bug anymore?

> First, the HTTP spec _mandates_ that a port number be appended when it is not the default, so a server has to accept it. Not a client issue at all. It was fixed on their service, not the clients.

This has little to do with the status page. Yes it's not formal against the HTTP spec if NPM does this but if you want to cover that you need tests of the sort that will validate output against a spec, NOT a status page that is meant for an entirely different purpose (to indicate whether your service is down or not).

The status page should be used for conveying more than just "whether the servers are up and reachable" and it's obvious that npm thinks so too. At the top of their page they list an issue about certain packages that "are currently unable to be viewed or installed."

If you're using your status page just to talk about sever availability and not about all types of service interruptions then you're not taking advantage of arguably the most important communication channel your customers care about.

In the end, you'll break their trust and you'll have to work impossibly hard to get it back.

Really disappointed in NPM lately. These past few months have been downright silly. How many major problems like this affect other package managers?

Package management is hard. Ruby gems has had its issues and Go has got itself into a holy war just deciding how best to do dependency management.

However over 2018 we seem to have got to a point where I’m worrying about some basic professional competencies and communication systems inside the npm project. I don’t know much about it, but looking at it from a sideways glance, you have to start worrying about what it is that they’re getting wrong, and how to derail their mistakes in my own team and workflows.

I’m two bug fixes away from strongarming my team into moving to yarn. Npm.org needs an io.js sized attitude adjustment. The number of showstopper bugs in the last five releases is fucking bullshit.

> Package management is hard.

and this is why writing yet another package manager instead of gradually improving existing ones - rpm, deb, etc. - is far from ideal.

rpm and deb are tied to specific operating system distributions, and are also meant to install packages globally on a system. npm, on the other hand, is meant to be cross-platform and is primarily focused on installing dependencies on a per-project basis. Its rules for lookup, matching, deduplication etc are mostly defined or informed by Node.JS conventions. I don't think apt or rpm would be willing to cover these use cases, and even if they did, it would probably not be worth it

the maven model has worked quite well so far.

I'm not sure if it's a model thing or just that software has bugs, but lots of people really dislike Maven. Not sure it's an example of a good package manager.

Do they hate the package management or the build tool? Maven introduced a new package format (the Project Object Model, pom.xml) at the same time as introducing a very declarative build tool.

The really confusing part is that the two are conflated by design: if building with Maven, your project's pom.xml also contains the declarative Maven configuration used to build your package, instead of containing just the package metadata (name, dependencies, etc). But that's not required, and it need not be part of the *.pom XML file deployed in the published package.

Since release, many build tools have been introduced that can consume and produce POM-compatible packages but don't require you to use Maven itself as your builder (Ivy for Ant, gradle for writing build scripts in Groovy, SBT for Scala, etc).

Just a note: you can use Gradle for Java too. I haven't built a Java project without Gradle since like 2010 or so.

Yeah sorry if I wasn't clear, in Gradle the build script is written in Groovy. It is used to build any number of project types.

Gradle build scripts can also be written in Kotlin.

I spent yesterday trying to get protobuffers working in maven, see http://vlkan.com/blog/post/2015/11/27/maven-protobuf/ for the pain.

Anything counter to maven's way is a PITA. In this case, Maven dislikes platform specific binaries.

The CLI is among the least intuitive I’ve ever used, and pom.xml is pretty verbose and complex. That being said, it works very well and I’ve never had a problem with it (other than it’s aesthetics) after grokking it.

Technically yes, on the usability part there's still a long way to go...

> globally on a system

In the world of containers? Come on.

You understand that you can install a deb/rpm inside a container, right?

The conservation of complexity is at play here. Either you have a quasi-complex deb/rpm, and a simple Dockerfile with a RUN command installing it, or you have a quasi-complex Dockerfile which does all the packaging work itself.

> You understand that you can install a deb/rpm inside a container, right?

That's exactly why I replied. deb/rpm in container is a reasonable choice, and that way, they aren't global for the system but local in the container.

Unless the existing ones would require radical changes to realize your vision, which would be hard to do in a gradual manner and probably lead to more problems than just writing something from scratch.

Well, package management often starts too simple. Sadly, most people who build new ones seem to ignore the lessons from earlier projects. In my opinion, Paludis/Cave [1] is a decent example of which cases have to be handled, but as it uses the ebuild/exheres system it is slow as hell. The user experience isn't that good either, but at least they know how to do proper dependency management (e.g. keeping the system as stable as possible during updates by using a reasonable order).

For anybody who wants to build a package manager I highly recommend to take a look at what Paludis does (not necessarily how they do it).

[1]: https://paludis.exherbo.org

I didn't use cave, but I remember using Paludis instead of Portage on Gentoo about 15 years ago, and it was really really fast at the actual package management tasks. The builds themselves take time of course, and especially for smaller packages the "./configure" part is infuriating, but that's nothing a package manager can fix.

In comparison to Portage it is fast, but I wouldn't consider Portage a benchmark ;-) and if you compare Paludis with Apt/Yum/Pacman and the likes it is still very slow (e.g. comparing installing binary packages).

The comparison isn't completely fair, as Paludis has a much better dependency resolution algorithm (in terms of quality), but I think the main difference comes from the package format. Ebuilds/exheres are Bash scripts which need to be evaluated in order to obtain basic information like for example build- or run-time dependencies. So to find out which packages should be updated, Paludis has to evaluate a few thousand bash scripts while other package formats allow much quicker algorithms.

Cave is just a newer front-end for Paludis which improves the user experience a bit, but the Performance should be pretty much the same.

> Sadly, most people who build new ones seem to ignore the lessons from earlier projects.

This applies to FOSS in general, and then some...

I suspect that the problems NPM face are not qualitatively different from other similar systems but NPM is far larger so gets a lot more press.

No. The Maven ecosystem is also massive and I don't remember the last time I heard about problems with it.

Last couples weeks especially. Packages being published but not updated on the main registry. Super super frustrating.

Since they introduced package locks `npm i` breaks way too often for us (we have win-mac-linux machines), it's a royal pain in the back.

Your packages break when installed from a lock file...?

`npm i` breaks. Then we delete the lock file (and usually node_modules), run npm i again and it works. Probably it works for you and we are just unlucky or we were too lazy to use some cli magic, dunno. Yarn people here (at the company) usually say that yarn worked for them better, can't clarify about that.

One time out of fun and curiosity we flushed the cache and deleted the lockfile and the node modules dir, did npm i on a mac and a windows and they were undiffably different. Okay, this was months ago and probably there is a good explanation for it, but it was not reassuring.

Yarn breaks things all the time for me. Practically every build process issue I’ve had in the last 7 months has been Yarn related.

Never had `npm i` break though... does it fail if you do `npm ci`? Are you pinning versions? Very peculiar behavior.

Didn't try it, but I will, thanks.

what else in these "past few months" are you referring to? as a moderate npm user i am blissfully oblivious.

In the past few weeks:

* NPM 6 stopped working with Node 4. Rather than actually fix it, they just left it broken for several days because it was already fixed in the upcoming release, and I guess they didn't want to do an emergency release: https://github.com/npm/npm/issues/20716

* There was a several hour period where you could publish packages, but then they would 404 when you tried to download the new version.

* They switched to using Cloudflare on Friday, and broke Yarn in the process: https://mobile.twitter.com/jamiebuilds/status/10001984632696...

* Somehow while switching to Cloudflare they blew away a bunch of the packages that had been published during the previously mentioned window. They also blew away all the versions of some packages. Last Friday night you couldn't 'npm install gulp'. Never gave any explanation for this: https://github.com/npm/npm/issues/20766

About a month ago there was a bug where npm would change permissions on a bunch of system files and render you entire system unusable in you ran it as root. But that was OK because the bug was on the "next" version of npm, not the one you'd get by installing 'npm install -g npm'... Except there was also a bug in the current version of npm that installed the next version of any package by default, so it did in fact blow up a bunch of machines.

Now, apparently, they are a teapot.

And the real problem with all of this is that npm has become so ubiquitous that you can't get away from it if you're doing client side development (not even by switching to yarn).

You simply have to endure it, and accept that every now and again, for reasons entirely beyond your control (and at a quite possibly very inconvenient moment) it's going to break.

This really annoys me, but what can one do? At least it's fixed now.

> This really annoys me, but what can one do? At least it's fixed now.

Install your own repository manager? This is the standard in every company I've worked for so far, at least in the Java world. Artifactory supports NPM, so set it up as a proxy.

We use ProGet with npm, docker and nuget feeds. We only ever hit the internet when we install fresh dependencies. After that, they get cached on our local service for all developers and machines to consume. That, coupled with yarn itself, which caches dependencies locally as well so subsequent installs don't even go out to the network, has accelerated our build times considerably.

That's interesting - thanks. Of course, it has a price tag, but so does the rest of our pipeline, along with our time.

As far as the monetary price of implementing such a system, Nexus OSS[1] also supports NPM proxying and is free for basic usage.

[1] - https://www.sonatype.com/nexus-repository-oss

We have Nexus in a subnet for faster installing. We had to write scripts porting over lockfiles from npm to nexus and back.

This had to be added to a precommit hook to not break CI. Seriously, package.json should allow to specify what endpoints should be used if available in a given order. Now it's up to each dev team to handle.

My main concern is that it's brittle. Nexus caches exact versions and nothing more so we don't even have assurance that it will work nicely when NPM goes down.

On the other hand lockfiles are awesome. I missed them back in 2012... copying over node_modules on USB drives was not cool.

You can specify what registry to use with a simple project-based .npmrc file. We have ours point to our Nexus npm proxy.

That's what we eventually do but the lockfiles don't care, a resource url is a resource url. We use yarn too which does proxies only in .yarnrc IIRC.

Do you have an externally available Nexus? Using ours through a VPN beats the main purpose - fast(er) installs. That's why for WFH scenarios we have a script to switch between our proxy and NPM.

Our Nexus setup is internal only. For WFH, we have hundreds of folks using a corporate VPN which routes to our office, and then our office routes to our AWS VPC, which is where our Nexus installation lives. I set this configuration up and haven't had any real issues with it, nor do I see any reason to switch between a proxy and npm.

If a developer is using an older buggy version of npm that doesn't respect .npmrc and changes a lock file to point back to npmjs.org entries, we deny the PR and ask for it to be fixed. Right now that check is unfortunately manual, but there are plans to automate it. It can be easy to miss at times though, since GitHub often collapses lock files on PR's due to their size.

For us, the main purpose of using Nexus as a proxy is to maintain availability and to cache/maintain package versions. If you're using Nexus to make things faster, then you probably shouldn't be using it. If you want faster installs, look into using `npm ci`.

Nexus OSS can't be clustered / put in a highly-available install, which is a paid feature for Nexus.

To ensure that you're actually deriving benefit from your Nexus install, you have to block outbound connections to the NPM public registry from your CI build agents (if you don't firewall it off, you don't want to wake up one day and find that both origin and the proxy are erroring because your proxy never actually cached anything and you never tested tested your proxy... right?), with only the Nexus installation permitted to make such outbound connections. And as bad as NPM may be, there are real maintenance costs to running your own Nexus install (not least of which, managing updates that will take Nexus down and communicating them with your dev team so that CI builds which error out when Nexus is down can be restarted when it goes back up), and thinking that you can do better than NPM is hubris. Running a private Nexus OSS install for the purpose of trying to increase availability for low cost (not zero - you still have to pay the infrastructure costs) is usually a false economy.

If you work for a company with enough operations and infrastructure resources that adding a clustered install is trivial, then you probably have enough resources to pay for an Artifactory license.

TL:DR - NPM has its faults but it's still probably de-facto both more available and better updated than taking on the responsibility of running a proxy unless you have mature ops/infra teams

This one is free, we use it and it works for our needs. We also have it integrated with our Active Directory: https://www.verdaccio.org/

We use Verdaccio and I can't imagine any serious dev shop not using some kind of proxy / private registry for NPM packages. It's really simple to set up and has served us well, aside from minor hiccups.

Sinopia was used at a previous job https://github.com/rlidwka/sinopia

Just to save other users some time, it seems like sinopia is no longer maintained, and doesn't work on Node 8


Verdaccio, mentioned in a sibling comment seems to be the recommended replacement ...


The OSS edition of Nexus supports npm proxies. Sure there’s a little bit of setup, but it will more than pay for itself the first time an event like this one occurs.

Agreed. It's very easy to setup a private npm registry using Nexus OSS.

As others suggested, any serious deployment should be hosting their own registry / mirror or using a paid service. This also saves you in cases such as the left-pad issue, as your mirror would still have the package. It is unwise to rely on free third-party services which are out of your control, especially for something as important as deployment!

This problem is not limited to npm. I remember a few years back there were similar issues with RubyGems, where it'd go down leaving many developers unable to deploy.

Heck, how many projects do you think would be left unable to deploy if GitHub went down? I remember a few years back they'd have their occasional issues and Twitter would become a storm of angry developers.

For many projects not being able to deploy or develop at any moment, as well as dealing with left-pad style issues is implicitly accepted as a reasonable trade-off.

>This really annoys me, but what can one do? At least it's fixed now.

The weird thing is, open source development is as close to a free market as one can get. Frustration with NPM should result in multiple javascript package managers competing to undermine NPM's dominance, but the only alternative is one that uses the Node registry.

Vendor your dependencies, or host clones on infrastructure that you're responsible for.

Vendoring isn't ideal: it makes, for example, code-reviews a PITA, and you still have the problem of what to do when you upgrade or add a dependency. Then you're back to npm (or yarn). Granted, at much reduced frequency, but it's still there.

Maybe we'll go down the clone route if the cost of npm issues becomes too high relative to the hassle and expense of maintaining our own.

>Vendoring isn't ideal: it makes, for example, code-reviews a PITA, and you still have the problem of what to do when you upgrade or add a dependency.

The status quo in which anything that goes wrong breaks the entire universe because no one vendors is also not ideal.

However, consider the long list of disasters that have occurred with NPM recently, and how many of them would have been less disastrous had vendoring been the exception rather than the rule. Left-pad wouldn't even have been an issue, for example - builds simply would have been unable to update, but nothing live would have been affected.

To misquote Ben Franklin here, the tradeoff isn't between a greater ideal and a lesser ideal, but between security and convenience.

That's why I really like Yarn's "offline mirror" feature. It lets you cache the tarballs themselves locally. There's also a tool called `shrinkpack` that lets you do the same when working with NPM itself, but I'm not sure if it works right with NPM5/6.

I wrote a post a while back about using Yarn's offline mirror: http://blog.isquaredsoftware.com/2017/07/practical-redux-par...

One way to keep vendoring from causing nightmare code reviews is to keep updates to vendored components in their own release/branch/PR. Yes, there will be changes to your code to accommodate the updated packages, but it keeps it very focused so that the code review isn’t also trying to evaluate updates to business rules or functionality.

If you are updating components along with business rule updates, then, yes, it’s going to complicate code review, regardless of vendoring.

Vendoring dependencies typically entails committing at least 200-500 MB of packages into a git repo. No thanks. Availability should be easy to control with running your own Nexus or other internal registry. The rest (package versions, etc) can often be solved with an npm 5+ lock file.

> you can't get away from it if you're doing client side development (not even by switching to yarn).

Explain yourself better or top spreading this nonsense. What do they give you that you couldn't develop without them?

This is true. The case I encountered was, you cannot install `aws-sam-local` with yarn, you will need npm to do the install.

I don't think it's fair to blame them for breaking yarn. The way yarn setup their "mirror" url and the way Cloudflare handles account security conflicted and broke down yarn. I really don't see how any team could have seen that coming, or how they should be responsible for people making creative use of their infrastructure.

Everything else is legit though.

Well, they could have though, "Will this large infrastructure change affect any of the tools which depend on our infrastructure?", set up a small test system using cloudflare, and then test a few tools (like Yarn) against the new system.


nomnom is no longer maintained, so npm takes the project over, & makes a 2.0.0 update which is empty besides a README saying it's deprecated. Which breaks some packages which have a '>=1.2' version. So npm deletes 2.0.0. Which breaks for people who have had their package.json update from 2.0.0. Even when they'd had a system working with 2.0.0 (not even using nomnom, just a depedency of a dependency for the commandline portion that isn't being used) & were ready to deploy

In Rustland you can't publish a crate with '>=1.2' dependencies

I’m pretty sure you can, I thought we only outlaw *.

Not OP but maybe he is referring to the big in which the npm upgrade bumped to a non-ready version and caused havoc on production systems by overwriting permissions and files: https://github.com/npm/npm/issues/19883

That was the end of February, so not sure but that may be it.

There was also the issue where npm filter bots accidentally expelled some large, legitimate users and then made the namespace available, thus allowing random people to take the packages over, and inject code into any applications that used those projects.

This is one of the scariest aspects to modern development. Thousands of applications are one bad actor away from ruin. We're incredibly vulnerable. Chrome extension gone rogue? Package repository allowed duplicate packages? A contributor's GitHub key was hijacked?

Imagine uBlock Origin's Chrome extension author creds were hacked. "He" publishes a new version of the Chrome extension that monitors coinbase.com and fakes the transfer/confirmation screen, or submits transfers in the background. The extension has "write" access on all sites, so the rogue extension can also monitor your Gmail and silently inject a filter that routes trade confirmations to trash.

Or the "requests" library in Python gets an update to replicate 2FA codes via Twilio to a 3rd party.

Sure, you can do pinning and cryptographic signatures to verify that v 1.0.0 of X is really what you expected.

But who audited 1.0.0 of X in the first place...?

My thoughts exactly.

We are one step away from very bad shit hitting the fan in a very painful way... so let's pretend everything's fine and try not to think about these things.

When a production system pulls "latest" of anything, such things are bound to happen. There probably is no reliable testing or QA either when deployment methods are not deterministic.

Whenever I see NPM I think of the left-pad debacle and the more recent is-odd issue:


From the link:

So in case you weren't aware: If you're using webpack, you're using is-odd. Which itself relies on the excellent package is-number.

So, how and why did the developers let things get this bad?

I think there's a feedback loop in the JS ecosystem where the desire for smaller file sizes and the ability to pare down the fluff around a packaged function in webpack/whatever incentivize lots of small single purpose modules.

If you care about the size of the code you are delivering then it makes sense to skip some larger math/numerics library and instead opt for is-odd, which itself includes is-number.

Unfortunately, when the authors of modules also do this and not just end users, it becomes hard to track down how much a dependency actually adds, and the chain of dependencies can get very deep. Once that chain is deep we lose the ability to easily determine how a change the complex system we now rely on.

And that's all really at a surface level. Once we start talking about trust and security, the problem gets much worse.

Couldn’t this just be done at transpilation time?

I have a (conspiracy) theory, that JS developers bribe and pay each other to include their projects as useless NPM dependencies, so they can rack up downloads and go to the next employer with a

"My package has 2 million downloads, I want x% more than market rate" attitude.

It's like the equivalent of influencer marketing, but for code packages. Of course I could just be a crazy old man. Who knows?

I don't know if they bribe each other, but there's definitely a lot of people shilling garbage libraries because they have income streams dependent on it. Certain people here have alerts set up for certain topics so they can come in and astroturf. You'll be able to spot them if you keep a look out for it.

They specifically target noob questions/discussions, post links to their blogs that talk about certain dependencies. The articles themselves have very little substance, they're full of buzzwords and spend all their words making ambiguous claims, for example they'll say that the dependency makes your code "scale", but won't give any examples from a large code-base, especially not any comparing to the alt case (without the dependency). They make claims about performance, but don't elaborate, don't show benchmarks (or the'll show benchmarks that don't apply to 99% of projects, kind of like how immutableJS is obviously not faster unless you're working with big data structures, which you almost never are in your standard CRUD app where everything is paginated server side).

This shit is everywhere, and there's a lot of people with a vested interest in making sure these tools propagate as much as possible.

Spill the beans! What are these blogs ? Or are you afraid of retribution from this shadowy cabal?

I have seen very few examples of what you describe, if any at all. I wouldn't rule out the possibility that this is because I've missed them, and I certainly wouldn't mind getting a few clues/pointers.

My comment above (about the bribery) was a joke. I am surprised that people took it at face value.

But no matter the joke, in no other language you 'd see packages like "is-true", "is-false", "ifAnd" (because && is too difficult /s) and "ifOr" (because... what's that "ll".... wait, is it "II" ? Um... How do I type it that vertical || line thing again?? /s).

The problem is not with users creating "useless" packages, the problem is with the community embracing them and having completely redundant one-liners gather million of downloads every week.

It just feels... icky. As if the whole's thing is put together with duct tape and bubble gum. We didn't resort to these kind of things when we had to support IE5.5 and 6. What went wrong from 2010 till today?

Seeing "is-true" rely to "is-object"... Why? Seriously, why? Maybe I am being completely narrow-minded and bitter.

If that is so please teach me and I 'll try to be better. I am not trying to provoke only to learn.

Not your fault, I'd say a good percentage of people here aren't great at reading tone, on top of how terrible text is at conveying tone in general. I definitely read the tone wrong :)

I think it's easier to answer in reverse order. DISCLAIMER: This is part rant, part "projected-constructed worldview". I absolutely welcome balloon-popping and "that's absolutely not true" and pointers to things that debunk/disprove points, so I can have a proper idea of history, and a better personal idea of the way things are.

First of all: the Web is "new media". Yep, fussy annoying buzzword; but one that has some truth to be discovered in it.

The position/power/operations/general relevancy of companies like Google, Mozilla, Facebook, Amazon, etc, more closely resembles that of CNN, Fox, CBS, Disney, etc around the 1995-2002 timeframe, than AT&T, AOL, CompuServe, Verizon, Comcast, etc, in the same time period. These modern companies have taken up the slack of the old TV/cable media empires of recent yesterdecades.

The Web has taken the place of TV, not by replacing it, but by becoming the new media focus.

It's a bit of a good side-illustration to mention that the Web is the perfect example of something with "Shiny Actually-New Thing Which Seriously Truly Isn't Anything Like What Came Before It" syndrome, for exactly the reason I mentioned in italics - the Web, for most people, resembles nothing like TV.

I think one reason for this is the technological generation gap: older people don't fluidly relate to computers, while the younger generation don't have quite the same perception/appreciation of non-interactive TV that their elders do, so while there is some collective conversation about media agendas, it doesn't bridge the generational divide that coincidentally happened with TV>Internet, and so people don't realize the _exact same_ control agendas that went into TV now go into the Web.

But technical folks who've seen the Web grow, and change, wonder: what's happening? My working theory is that the technical changes we're seeing stem from various upstream political machinations within media.

So, to the 2nd half/2nd question. As a "new media" company, Google wants to retain captive control of as much as possible, in exactly the same way $oldmedia would. So what do they do?

Well, back in the 80s and 90s you'd run cable to the entire country to control distribution, cultivate deals and negotiations with advertisers and producers, etc etc. See Also: CNN/MSN/MSNBC/CNBC (I never did figure out what acronyms went where with those companies), Comcast (which I understand doesn't just provide garbage cable/DSL internet, but is also a TV and media network), Time Warner, etc.

So, nowadays, Google's pretty much nailed the advertising thing, undoubtedly to the chagrin of everybody else. On top of that they handle the world's searches - and you never lie to a search engine (!), and that's the world's biggest non-perceived captive audience ever right there.

But they're GOOG; they legally have to vacuum up as much of everything that everyone's perception of them supposes is within their purview, in order for the balance sheets to look right. That most definitely includes Web standards. So, how do you control Web standards?

Well, you apparently approach your cofounder and say you need to build a browser.

I wonder what the friction breakdown was - how much political, how much technical - when Google split with Apple working on WebKit, and forked/created Blink. Heh, I'll never know. But the commit rate sadly plummeted after the fact (http://mo.github.io/2015/11/04/browser-engines-active-develo...) so code quality certainly wasn't on the decision table; something else was the priority. My guess is that it was straightforward market dominance, and competing against WebKit (and iOS!).

So, (having won) one Internet (marketshare) later, now what? Well, now you have a browser, you can play in the "democratic browser vendor" sandpit as a first-class citizen! The Standards® Committees™ are full of people who absolutely love discussing Web minutae all day; all you have to do is fund conferences and shows for them to attend, and they'll go ahead and build their own candles to fly around. They're like the self-organizing equivalent of https://xkcd.com/356/, it's wonderful.

They do need feeding though; specifically, you need to feed them just slightly too much in terms of upstream changes, so they can almost keep up but not quite. Just on the edge of being able to manage everything though. This is easy to do by, instead of adjusting the workload slider, adjusting the paid employees slider instead; and you can indirectly do this by doing nothing to democratise/vindicate/humanize the standards managlement process at the W3/WHATWG level so it seems obscure and academic and boring; and then it won't seem intuitive for upper management to prioritize funding for it, and voilà.

Now you just need to find a good source of chaos to keep everybody on their toes. You do this by investing heavily in coming up with new ideas for the standards committees to implement, keeping careful track of what everyone's discussing, and prioritizing what to carefully drop quietly on the floor ("oh yes we do need to get back to that sometime sorry about that") and what to react to as an interested browser vendor. Be polite enough about other vendors' technical interests, of course; your market share will typically dictate what'll actually get maintained at the end of the day. Pick things to cooperate on to keep everything looking democratic, so you don't get cornered into implementing things you don't want to (if this even happens?).

Oh, regarding the democratic bent that seems to underpin everything related to the Web (and open source for that matter): I consider it all a psyop (I was recently reminded of the existence of this sort of thing by https://news.ycombinator.com/item?id=16918752). Focusing on social organization (and justice and related fields, FWIW) are excellent ways to sink/drain massive amounts of energy; get humans into that loop and they don't seem to be able to disentagle themselves thanks to peer pressure and the precedence that gets created. (This is why "minority groups", typically the gender-focused ones, are in the permanent media spotlight - it allows control.) Democracy is a foundational part of the US Constitution, so of course democratic leadership is a very politically correct way to do "high-level" "important" things like directing the future of the Web; on top of that, democratic "leadership" is the kind of thing most people see straight through unless someone pokes the hologram the right way (especially when everyone's preoccupied with getting their social dance moves right), so I can easily see it being used as a tool in situations like this.

In the case of Web standards groups it was particularly easy to deploy "democratic leadership" - the Web groups seem to have been engaged in a most fascinating set of caricatured circling/courting "everyone's voice is important" dances since time immemorial. I was enlightened about the historical state of the Web a couple years ago: https://news.ycombinator.com/item?id=10684426 (I accidentally locked "i336_" by setting "showprocrast" too high, woops). Besides the antitrust bits I commented on, I've also since learned that IE6 implemented an early draft version of the DOM (or it could have been the CSSOM) before it got redesigned from scratch, then the necessary reimplementation work was never approved, and this was the source of all the rendering bugs. (I need to go re-find that link, sorry!)

Reading through those mailinglist posts showed me that the various Web working groups have always been... I'll say it, somewhat stuffy, and this fact is not a new development. It's sadly so easy to psyop-weaponize though (or, in any case, the baseline likelihood that this has happened is so incredibly far away from absolute zero...), as per the xkcd I mentioned before. Meanwhile, you carefully build EME behind the scenes (the implementation of which is kinda depressing; see zb3's reply to https://news.ycombinator.com/item?id=15796420).

Unfortunately some people run off and actually go and make interesting and amazing things in the face of great odds, for example https://codepen.io/AmeliaBR/post/me-and-svg (<> https://news.ycombinator.com/item?id=14155393). It seems those who stubbornly want to succeed are handled by ignoring them and hoping they'll go away. Or - https://en.wikipedia.org/wiki/Hanlon%27s_razor - maybe it's that whoever it is is calling the shots is so laser-focused all they have is perfect indifference since these cool things not within the scope of their media/other agenda. An intersection of explanations is valid. But the features these amazing people come up of course get used in any case, IMO without due acknowledgement too.

Most Web developers seem to get caught up in the entropy and chaos of continual change and development, get blind{sid}ed by it, and don't really [get the chance to] form a foundation strong enough to get them out of the slipstream, looking far afield, and doing something different and unique. Commercialism and existential "must have a job to eat" probably has a big part to play in folding in to this, along with mental health impacting society's ability to think clearly, along with all the other issues the various in-vogue generations apparently focus on.

To do one of those zoom-from-outer-space-into-a-tiny-backyard things and zoom/focus from this wide view into NPM packaging, I actually argue that the state of the JS ecosystem is easily explainable by saying that the Web doesn't have a standard way to do libraries, so everyone solves that problem by vendoring (including/bundling) everything they use. This solves everything for Node.js and the browser.

> Continued >

> Continued >

Fundamentally, on the ground, you're looking at economy of scale, both in terms of implementational scale and in terms of humanity collectively responding to a problem and scale and slowly annealing (if I can borrow that from "simulated annealing") toward a solution that just happens to work. The macro level works out; the micro level looks terrible. Such is how this kind of thing works out; social scale/crowd dynamics are _really_ fascinating to watch at work.

Node.JS happened, so suddenly JavaScript needs to run on servers and in browsers, and handle backend (Databases! Legacy stuff! FoxPro! How To Port Your Visual Basic Excel Macro To JavaScript! I just found two libraries for RS485!), along with frontend (Win32 event loop inspired Rube Goldberg machines!), and everything in between (......uhhhh...... all of GitHub?). The ecosystem and infrastructure - all the library code - has to be compatible with everything and everything else, across more domains than I even realized existed until I thought this through 5 minutes ago. The easiest way to work in the browser and also work on the backend apparently is to fragment everything out, because nobody's come up with a better way to stdlib JavaScript. The jQuery-esque approach of "this system works unbelievably well within its problem domain but doesn't generalize" effectively died when jQuery aged beyond its point of relevance - and because the plugin system was so locked-in to jQuery, all the plugins died with it. That type of approach is very very top-heavy and simply didn't scale. It made sense when all we had was browser [delivery] though, backend types hadn't gotten their mitts over everything and made compilation pipelines for everything, etc. Now that's happened, it makes more sense to `cat * > out.js` (well, in more LOC, anyway) because it just scales better.

JS feels clunky, but once you look all the way down, I feel like the vertigo actually causes things to kinda make a tiny bit more sense and fit into place a bit better. Maybe.

I find it not unreasonable to use the explanations above to justify the higher-level "why???" of the way things are.

The duct tape and bubble gum isn't JS-specific. It's endemic to all of technology. Like with media, there are vested interests in things working out certain ways with security and whatnot. (My current favorite meme of this is http://web.archive.org/web/20160409181051/http://article.gma..., and I also found https://news.ycombinator.com/item?id=17160109 the other day and that was kind of very fun to read, and I also read the other day that the NSA discovered Heartbleed "very shortly" after it "appeared" in 2012 or whenever it was, which was a TIL of its own)

And so it is that we use systems that operate at cross-purposes with how we think and incorporate high-friction impedance mismatches, and which raise the chances we'll make mistakes, which affect everyone: https://twitter.com/x0rz/status/865196993215442944, https://sites.google.com/site/testsitehacking/

The above being said, I do completely understand your frustration with using Node. I have the same gripes with PHP, which I learned ages ago and Really Really need to move off of, but it just runs so incredibly well on older hardware and has zero warmup time... like 0.2ms on my 12 year old laptop... sigh. One day I'll find something, or maybe I'll make it myself.

I'll leave you with three last links which I couldn't fit into the narrative, but which I think are relevant/related.

Firstly this an interesting article that helped me properly understand Google: https://medium.com/p/e836451a959e

Next, I wanted to call out this NNTP post from 1995 (linked from "sent out via" in the above) showing storage investments: https://groups.google.com/forum/#!topic/mail.cypherpunks/4CD...

For context about cost in the above link: I found this very cute little document while randomly manual trawling (I love finding things not in Google's index) whose HTTP headers say it's from 1996: http://users.monash.edu.au/%7e%6a%6f%68%6e%6d/%77%65%62%73%6... (#1 I don't want its filename to be indexed just yet :P and #2 I want to do an experiment to see if the crawler resolves the escapes and finds the URL anyway)

NB. If you have a better term than "new media", I'm curious. It's such an easy term to overload. The symbol picked for biohazard was designed from scratch to be meaningless so a meaning could be cultivated for/associated with it over time, and that cultivated meaning would be what "stuck". (I wonder how many other things this has happened with.) https://en.wikipedia.org/wiki/Hazard_symbol#Biohazard_symbol

NB2. If you - say - accidentally hit F5 while writing a post... and you're on Linux, suddenly get really really interested in opening Chrome's task manager, identifying the renderer PID and SIGSTOPping it in as few seconds as you can. The first ~15% of the above text was recovered via gdb generate-core-file \o/

NB2a: (For completeness, if you accidentally close the tab, have working hibernation set up, and either go fishing in the swap partition on resume, or if you don't want to risk stepping on anything in swap on reboot, carefully boot something that won't automatically swapon.)

NB3: $lists_i_am_on += 5000 :P

Having just js experience on your resume would/should not effectively result in increased pay without substantial references. Even if you have a shitload of package downloads. Downloads do not represent good code, work attitude or otherwise interesting properties for a company, as you already pointed out by stating the 'fake' downloads for a lazy engineer with lack of respect for others (by forcing dependencies with dubious benefit).

NPM is absolutely the worst package manager, apart from all the others I have tried.

I'm pretty sure it's worse than most others I've tried.

Relevant comment by wgrant:

    The body of the 418 is {"error":"got unknown host (registry.npmjs.org:443)"}.
    Looks like some npm clients are appending the port to the Host
    header, but only when going through a proxy, and that's confusing
    the registry.
It seems to be based on combination of npm:node version.

Can confirm:

    $ curl -vvv -H 'Host: registry.npmjs.org:443' https://registry.npmjs.org:443
    > GET / HTTP/1.1
    > Host: registry.npmjs.org:443
    > User-Agent: curl/7.54.0
    > Accept: */*
    < HTTP/1.1 418 I'm a teapot
    < Date: Tue, 29 May 2018 03:52:16 GMT
    < Content-Type: text/plain;charset=UTF-8
    < Content-Length: 53
    < Connection: keep-alive
    < Set-Cookie: __cfduid=d3f8dd8d2121ede348194ee142443bccc1527565936; expires=Wed, 29-May-19 03:52:16 GMT; path=/; domain=.registry.npmjs.org; HttpOnly; Secure
    < Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
    < Server: cloudflare
    < CF-RAY: 422601de4bf49fc6-IAD
    {"error":"got unknown host (registry.npmjs.org:443)"}
Per RFC 7230, §5.4 (https://tools.ietf.org/html/rfc7230#section-5.4), the port is optional if it's the default for the URI (:80 for http, :443 for https), but nowhere in the spec does it say it's an error to include a redundant port specifier. The registry server is likely noncompliant here, and it definitely should not be throwing 418 here. (400 would be appropriate if the Host header was malformed - but that's not even the case here).

Sure, but why is anything in the chain returning the 418 "I'm a teapot" error message?

My guess is someone wanted to use some code for "I don't really know what this error is", saw the 418 and thought "that's cute!"

Be careful, "cuteness" and "cleverness" has a way of biting you in the behind.

As per https://tools.ietf.org/html/rfc2324, the server is throwing 418 because it is a teapot, and can't brew coffee. Perhaps this is a coffeescript related problem.

I’m not sure if this is really compliant to that spec either. It states that error code should be returned when attempting to brew coffee - which means the BREW or POST method is used - but this is using the GET method.

Has a BREW call been used? The spec might not support out of order operations. That should be tried before any further GET attempts against the coffee interface.

Ha! I used to use 418 as my "everything is completely fucked and there's no way everything will ever get that fucked, so this is the best code possible" status code.

It took about a year for me to internalize that things will often be that fucked. :)

500 “Internal Server Error” is the code you should be using for generic server errors.

It's important to say why that's the correct error code. In this case, it's because 5xx errors can be retried in the future whereas 4xx errors tell the client that any retries will receive the same response.

In general, this is wrong. 4xx status may be retried, achieving different results each time. This makes sense with for example status 409 and 429, where this is definitely expected, or 404 since resources may be added or removed at any time independent of any single client, or 402 and 403 which may change out of band.

Conversely, there are 5xx statuses that you shouldn't expect to change with each request, for example 505 (HTTP version not supported) or, really, 500 in some cases.

There's AFAIK no standardized strategy to deal with the different status code. Instead it's important to get the status code right to as accurately as possible describe the error to the client. The server never tells the client what to do; it tells it what went wrong so that the client can deal with the error at its own discretion. In this case the server told he client that it was a teapot when what actually happened was an internal server error. That's wrong because it's wrong, not because every client will or even should deal with either of these errors in a predictable way.

> There's AFAIK no standardized strategy to deal with the different status code

While I was a bit too over-broad in my generalizations of 4xx vs 5xx status codes when it comes to retries, the gist is still the same. The specification does specifically call out when it's okay to retry and when it isn't. And it also specifically calls out when it's okay for intermediate servers and browsers to cache responses and when it isn't. In the case being discussed, 500 is specifically the correct error code because it cannot be cached and can be retried. That's not the case for 409 or 429, which either cannot be retried automatically (409) or can be cached (429). 402 is, IIRC, underspecified and shouldn't be used. 403 specifically states that the request should not be retried.

It's important to make a distinction between the user/application and the user agent when we talk about retries. The user agent is the program or library that implements the HTTP spec. You seem to be talking about the former, who is always capable of retrying a request when they feel circumstances have changed enough for the request to succeed. But user agents are intentionally dumber. They can't resolve conflicts, fix permissions or do anything outside of the logic dictated by the spec. And they have much more limited license to retry 4xx responses or cache 5xx responses.

This is honored more in the breach than the observance at times, though, particularly with 404.

I'm sorry that I wasn't more clear. The person I was replying to aptly wrote "'cuteness' and 'cleverness' has a way of biting you in the behind", and I was agreeing with/adding to that comment. I've been guilty of cleverness and sure enough, it has usually found a way to get me in the end.

Today, in similar situations, I'd return a 500 with a message in the body.

No it's not since this is a error on the client's side it should be a 4xx code, probably not 418 though.

Yes a 500. Not sure how anything else would ever pass the smell test of even a lazy pull request.

I've seen the teapot error used when the internal route of a request was circuitous (many servers involved) and a 500 could have come form anywhere. It was used as a canary, and only on this codebase which was bought-in and absolutely awful.

Actually, 503 is better in this case. Internally it shlkld report 500.

500 is the correct error code to use in that case.

It's 4xx so it should be used only to represent an issue with the request/client, not the server :)

> Sure, but why is anything in the chain returning the 418 "I'm a teapot" error message?

My best guess? The request ended up at a teapot and the poor thing was asked to do something that no teapot has ever done before.

> Be careful, "cuteness" and "cleverness" has a way of biting you in the behind.

Generally speaking I tend to agree, but in this specific case how is sending a 418 "biting you in the behind" worse than sending a more proper 500 error? (Or perhaps a 400? Hard to say exactly, feels like the server is having unexpected errors since it cannot properly parse the hostname.)

If I'm doing quick generic error handling, I'm going to assume anything in the 500 range is a server issue and display an error of the sort. If it's in the 400 range I'm going to handle the common ones and then assume it's the client's or user's fault. (And of course a user fault is really a client validation fault.)

So yes, returning a 4XX for a 5XX is problematic.

But is that relevant to _this specific_ case? The problem is not 3rd party clients, it's the npm client itself - so I don't know how error handling is relevant? (In the general case, yes of course it is, but in this and the parent comment I'm specifically talking about issues in only the narrow use case here.)

The error was on the server side, not the client side. The server was not following the HTTP spec. The fact that it was server side was why they were able to fix this so quickly.

Since it was a server side error, and the issue was not a client side one, the 500 error code is most appropriate. The fact that the npm client itself is the first one to trigger this error is not the issue here, as it could have just as easily been yarn or another client. Following standards are important, and we weren't trying to make tea.

Being first party doesn't mean it's the same person/team writing both sides.

And error handling is always relevant.

The type of status/error matters. It's how your communicate, server to client.

Think about your normal interactions with problem solving as a team. What's more useful, a teammate who tells you when there's a problem what the problem is or a teammate who hides things, obfuscates, and makes random jokes?

400 tells you asked in a way that wasn't understood.

401 says you're unauthorized and 403 says you're not allowed (similar but potentially different implications, not my favorite nuance but it's a common example that might come up)

504 says there's a problem communicating between the two of you.

404 says that your data doesn't exist.

500 says the server is having problems doing the thing and it's completely out of your hands.

418 says "I'm a teapot"

What's more useful data?

> Or perhaps a 400?

Well ideally some 5xx code because it's a server error. I personally think servers should default to 500 if there is an uncaught or unhandled error condition.

In general, proxies and other third party clients will consider 4xx errors (edit: un)retriable.

That's one simple obvious difference.

Do you mean 5XX? 4XX errors generally cannot be retried with a different result.

The origin server (registry.npmjs.org) is the one throwing the 418, so they're primarily to blame. Some intermediate proxy server has decided to send a somewhat unusual Host header (registry.npmjs.org:443 instead of the usual registry.npmjs.org), and the registry server barfs up a 418 in response.

There's nothing clever about returning the wrong error. That's anti-clever.

A moment of sympathy for the engineers who have to explain their non technical project manager on why their application is getting this error.

I learned my lesson not put cutesey messages in the logs the hard way. I was young and dumb and in my case it was for a restore from backup system. A customer had failed hardware and was trying to restore from backup. It was failing and the funny message all of the sudden didn't sound too funny when customer was reading it back to me in their own voice over the phone.

Oh I’ve done that too as a rookie. Software displayed an error message when the application was in a state that was impossible for it to be in, at least in theory. The error message was a line from 2001 Space Odyssey: “I’m sorry Dave, I’m afraid I can’t let you do that”. Of course when it ended up getting displayed it was displayed to a customer called Dave who has never seen (or read) 2001 Space Odyssey.

The first lesson was to avoid funny error messages in the future. The other was that “impossible” things tend to happen surprisingly frequently in software.

> The other was that “impossible” things tend to happen surprisingly frequently in software.

"One in a million is next Tuesday"


That article contains a link to (http://www.jumbojoke.com/000036.html), but the link is broken. I found it at (https://web.archive.org/web/20050205043510/http://www.jumboj...).

With 7 billion people on the planet, one in a million events are guaranteed to happen to about 7k people.

Law of stats of large numbers etc

There's a lot of different possible 1 in a million events, though...

True. I really intended it as 7k people will have had the same 1 in a million event. Sorry that wasn't clearer.

That is an excellent point actually...

"Million-to-one chances...crop up nine times out of ten." https://wiki.lspace.org/mediawiki/Million-to-one_chance

A formulation of this I heard recently:

As a lottery ticket buyer, you may not need to prepare for winning the lottery. But as a lottery commission employee, you had better prepare for someone winning.

More like birthday theorem applies here .

Wow, that's hilarious. Did he get creeped out at seeing his own name?

Not just logs. Recent Opera versions have taken to replacing error messages in browser, with cutesy animated spaceships and whatsnot. A error message i can live with, an uninformative cartoon just drives me furious while trying to figure out what broke.

Yeah, good logging and good error reporting are those things to learn by being burned by a funny message at 3 am, or being embarassed by a funny message. Usually both at the same time.

It's amazingly frustrating to see: Hey if you just barfed the output of that failed command into a shell I could fix it but now I have to figure out why this several-hour command failed... bleh.

Yeah. Save the cheeky errors for internal tests.

Please don't even use it for internal tests... https://thedailywtf.com/articles/We-Burned-the-Poop

I meant internal test suite code. As in, the client / customer would have to be able to run my test suite (and likely watch it fail) to see the error code.

I've used "Nothing happens" (from ADVENT) when an unknown value was given for an "action" parameter. Appropriate, I still think, but a bit cute.

Unless they're paying for NPM, my sympathy is limited. They should _really_ be putting Artifactory (or equivalent) between themselves and the public NPM registry. Or, at the very least, fallback package.jsons pointing to the source Github repos of each dependency. Anything at all so that npm install doesn't have a single point of failure.

I'm as technical as anybody who might be explaining this to anybody else, and I still don't understand why they're getting this error even after reading the linked thread and the comments here.

Here's to hoping that all my yarn caches will continue to carry me through this.

Seriously, though. I'd love to get a look at their SLA and how they define "uptime" for their "org" and "enterprise" users. I don't think most teams are running out the door to throw their money at npm, and it's this type of instability that creates mistrust amongst developers.

Yarn seems to be unaffected with this issue so I imagine you'll be fine. There's been a few users (including myself) who simply swapped out npm for yarn and have had the issue resolved. It doesn't really address the root cause but in this case, it's presumably some bit of logic in the actual npm client itself rather than the registry.

Last time I checked the yarnpkg registry is still a proxy to npm, unless they've started hosting. I haven't checked in some time.

yarn is basically an npm wrapper that add caching, so even so, it's good for avoiding intermittent errors

You're right

> The HTTP 418 I'm a teapot client error response code indicates that the server refuses to brew coffee because it is a teapot. This error is a reference of Hyper Text Coffee Pot Control Protocol which was an April Fools' joke in 1998.


yeah but, no but...

They say they've fixed this in this related issue (https://github.com/npm/registry/issues/335) but I can't tell what change was made. The only discussion was around accepting a port number in the Host header. All well and good to fix what I assume was a config bug. But I can't tell if the secondary (but arguably more important) issue of returning valid error messages was addressed. I'm not sure what the exact right error message should have been. If the server was intentionally differentiating host headers with port numbers then probably 400 or possibly 404. But I'm guessing this was just a plain and simple bug and that 418 was set as a cutesy catchall "something unexpected happened" error status to be sent in some exception condition. I hope they've changed that to return 500, which would at least honestly accept blame for the issue.

I'm guessing they used 418 to mean "I have no idea who you think you're talking to" (not entirely unfitting as 418 was proposed to indicate the coffee pot you're talking to is actually a tea pot), considering it was triggered by having the Host set to something it did not recognise. But they should still have used a real error code and a more informative message.

What makes me sad, is that this error (caused by the mis-use of the 418 status code) and a previous issue (I forget the link) where a web dev was complaining that 'I'm a teapot' was stupid and should be pulled from the specs, will result in 418 being removed from the standard, and the humourless suits win again.

If we can't have a little bit of levity in our lives, then what's the point of it all ?

> the humourless suits win again

I think this is misguided. Clearly, there are plenty of humourless prudes, even amongst Us, the Blessed T-shirt Wearers. To wit, the "campaign" to remove the 418 code did not originate in a corner office.

Being a "suit" is a state of mind, not an article of clothing.

"Hey man, I'm not a suit." "Yeah you are, you just don't know it yet."

(Entourage, https://youtu.be/E-v1hOJCphI?t=78 (NSFW video))

Yeah well, if they do that, we'll make our own internet, with blackjack, and teapots!

> I'm gonna lock this for now, because I'm sure it's gonna get plenty of traffic. You really don't need to respond to repeat what every other poster is saying. The registry team has been informed.

It's alright if they want to lock the thread to prevent a massive +1 spam, but at the very least they should have given a link to track the status of this.

When your production systems are down, "thanks for the report" is not a good enough response.

His tone is all wrong here:

> You really don't need to respond to repeat what every other poster is saying.

Obviously they did need to because no one from the team had responded yet. Also his response without any link to track the status is rather disappointing but entirely what I'd expect from the NPM team.

>his response without any link to track the status is rather disappointing but entirely what I'd expect from the NPM team.

Unfortunately, agreed. I hate being pessimistic but nothing disappoints me more than the way NPM handles a problem.

It's clearly not just a fluke at this point.

Contrast this to Yarn’s response, also this holiday weekend, to an outage incidentally caused by the NPM team. https://github.com/yarnpkg/yarn/issues/5885#issuecomment-392... Professional, responsive, not afraid of blowback from accepting feedback and “+1” posts. Is there any reason to lock the post besides bruised ego and overwhelming phone notifications? Neither of those is justifiable when you’re emitting teapot errors because you implemented a spec wrong. This feels very indicative of a toxic, amateur culture, and perhaps we’ve let it operate our package infrastructure for too long.

I'm actually blocked by the main person behind NPM because I disagreed with him over a politically motivated tweet he made... so it wouldn't surprise me in the slightest.

I was wondering how far down I would have to scroll to find something like this.

The guy is a total political zealot who hates corporations and hates that npm had to become one. He presented this at the end of NodeConfEU17 and also took the opportunity to lecture us (in Ireland) for being too white and not being as "diverse" as he wanted us to be.

IIRC, more than 50% of the presenters were women which he says is the only reason he attended the last day to speak at us. Great.

Hearing him talk was like someone taking a shit on the floor after an otherwise wonderful conference. It makes me cringe that I have to use npm after hearing that guy talk. I am not even sure I will go to that conference again after that.

Looks like his talk wasn't recorded[0]? He published the slides but they're sadly not very informative[1].

To be honest I can't believe the registry is still using CouchDB under the hood. It's not a good fit for the problem space.

I'm also not surprised he says this in his talk:

> Ultimately, I don’t like anyone else having control. If I’m going to give npm to a company, I want control of the company.

The npm registry and client should be controlled by a foundation for all the same reasons Node is. Yarn was a great step in that direction but it seems npm Inc is doubling down and based on how communication between the yarn maintainers and npm Inc went when they accidentally broke yarn[2][3], it feels like they're trying to fight yarn rather than cooperate.

I've seen npm Inc employees (including "community managers") attack people ("paying customers") on Twitter in response to criticism of how npm Inc runs their open source projects. They also don't seem to make any distinction between personal opinion and representing npm Inc, pretty much dot-com era startup "bro culture" but with different social politics.

[0]: https://www.youtube.com/playlist?list=PL0CdgOSSGlBaxNkrUIHrh...

[1]: https://www.dropbox.com/s/9rx9aalvts60w5y/why-npm-inc.pdf?dl...

[2]: https://twitter.com/jamiebuilds/status/1000198463269699584

[3]: https://twitter.com/mikeal/status/1000164993667555328

*her/their tone, if you care

As the person mostly on the hook for https://news.ycombinator.com/item?id=16435305 (as I recall), I can imagine she/they would be a little on edge.

Still not a good look. . .

I see no reason why "they've screwed up before so it's okay if they're being dismissive and intransparent" would ever be a valid argument. The point of excusing past mistakes is that they are learning opportunities.

If anything, the filesystem permissions bug only makes this worse because it was a destructive bug in a widely promoted release (even if it was technically not supposed to be stable -- npm employees actively recommended using it on twitter) and npm's reaction was fairly dismissive (because it's not a stable release for production use, dummy).

Only intended to explain, not excuse. I totally agree.


Please don't post unsubstantive comments here.


Has been a pattern for a while. See the other issue 3 days ago (disappearing packages) where no info is given, they remove a ton of comments, close and lock it too: https://github.com/npm/npm/issues/20766

Some time back, when they accidentally installed @latest, which accidentally wiped your hard drive, I narrowly avoided destroying my company laptop. After giving the same treatment on github and blaming their users for being dumb, they finished our twitter exchange calling me “whiny”. Happy yarn user ever since.

And here i thought the systemd people were touchy...

Something more along the lines of:

> Thanks so much for the report. Currently, we are doing the best to resolve the issue. Please continue to check back on our status page to see our updates. https://status.npmjs.org/

Sounds more appropriate

If they had locked it a little earlier wgran wouldn't have had the opportunity to report the cause.

Isn't it? I would rather get an update after the problem has been fixed than to know that development efforts have been slowed down in order to keep me in the loop.

I am not sure if you are putting your money where your mouth is. This product can be used for free, and there is a simple work around. (i.e. Test before you upgrade.)

https://cloudplatform.googleblog.com/2017/02/Incident-manage... is a good article re how Google handles incident.

In an incident, constant and clear communication is a key.

Different scope. This is definitely how to handle incidents like, "our live production service is currently having issues" because there are critical consequences. e.g. When a system that I have worked on goes down, trucks would literally be parked at the border of different states and countries waiting for clearance.

This is a different magnitude to, "I upgraded my free dependency management tool, and now I have to downgrade it. Please tell me when I can upgrade again."

Npm Inc is a company. Their products are npm enterprise and npm orgs. Both of these are only useful in combination with the npm client. Npm enterprise likely wasn't affected by this (although related problems may have affected npm enterprise users in the past for all we know) but npm orgs were as their repositories are on the same registry.

So this is the equivalent of the official docker CLI having a bug that causes it to break after an update to the official docker hub. Sure, it may mostly affect users that aren't paying customers but it affects users indiscriminately and those users who are paying customers can't use npm the way they were sold on (i.e. using the official client with the official registry).

FWIW it also seems that this bug wasn't triggered because users updated their clients. It was a pre-existing bug in the client that was triggered by the registry behavior changing (but I'm not sure on that because the issue doesn't give many details).

Most people aren't google.

Isn't that +1 "spam" counter an indicator of how severe the issue is and how much impact it's causing? Do I sense some sort of playing down/covering up?

P.S. The issue has been fixed about 16 min ago according to ceejbot's last reply to the github issue.

> a link to track the status of this

The GitHub issue _is_ where you go to track the status of the fix. I'm sure if there are any updates they'll be posted there; and if you're interested you can now subscribe to the issue without getting your inbox spammed by dozens of users spamming "me too" in the thread.

Yarn appears to be working, so that's a temporary workaround.

It actually reads like an IoT protocol, way ahead of its time in 1998. Life not only imitates art, but eventually catches up with parody:


I suspect the joke was inspired in part by a certain well known coffee pot monitoring system that preceded it: https://en.wikipedia.org/wiki/Trojan_Room_coffee_pot

Just been fixed, apparently[0]:

> We've fixed this -- we now accept the port appended to the host. @wgrant's comment was most helpful, thank you!

[0] https://github.com/npm/npm/issues/20791#issuecomment-3926484...

They did fix it, but e.g. appending a period (https://registry.npmjs.org./) still throws a 418 - I expect this to bite them someday (eventually). I don't understand why their Host: parsing is so strict; it seems unnecessarily restrictive.

418 is really a confusing response here. They should fix that ASAP by giving a normal error message.

I would recommend (if any NPM people are reading this) that the error for an unknown Host header be changed to 404. Bad host headers are already filtered out by Cloudflare (which returns 400), so any host headers that make it through would represent "unknown"/"missing" hosts (hence 404).

With no explanation about it being a 418. Par for the course, I suppose.

- Obligatory reminder that the NPM system is a failboat.

- Public Service Announcement: Don't board the failboat.

Thanks, but what's the alternative when yarn relies on npm infrastructure? Like it or not, you're still using npm. It's got so you basically cannot do serious client-side development without one of them.

"Enterprise development" is not the same as "serious development". Anyone still doing regular old JavaScript without the transpiling and megaframework dependencies is arguably taking their task more seriously than those who are.

I'm sorry, but after having used TypeScript there's no way I'm ever going to back to writing pure JavaScript anymore (for anything but the most trivial stuff). It's borderline irresponsible, IMO.

I'd like to say I love typescript, too. It's better for collaborative development than raw JS. My point is different, that working in the "low level" or raw JS, obfuscated or not, is where the substantive work happens. Good example is the people working on frameworks.

Who's talking about transpiling and mega-frameworks? I'm simply talking about the headache of managing dependencies on a reasonably complex project. The "mega-frameworks", by which I assume you mean products such as Angular, React, Vue, etc., are often not the worst offenders when it comes to external dependencies. Of course, when you start pulling in extensions that story can change, but that can be true of pulling in any other library that provides equivalent functionality (where such exists).

> you basically cannot do serious client-side development without one of them

You're going to have to explain that better, otherwise you're just perpetuating that dumb lie.

You absolutely can do serious (whatever that means) client-side development without yarn or npm.

It comes down to managing dependencies: without a package manager this becomes a serious headache on products/sites of even moderate complexity.



I've used them for fairly basic front-end JS development in mostly-Java projects. They worked fine. I don't know if they lack things you'd need to more involved work.

Perhaps it really is a teapot.

It isn't being asked to brew coffee, though, so it's a mis-use of the spec.

Or maybe you're asking a teapot to install npm packages.

Maybe we all are...

we are ALL teapots on this blessed day :)

This HN comment is probably one of the best explanations of why npm can be objectively said to be unreliable. I think it deserves attention.


Okay, I appreciate npm and everything, but nowdays new versions come out on a weekly basis yet something still breaks, either locally or in the CI for us. This made me fear two things the most lately: `npm i` and `npm i -g npm`.

I’m sure the anti-NPM brigade is on its way, but I recall there being a huge backlash against removing this error code from Node a few years ago.

The problem isn't that Node supports this error code. The problem is that it's used by the registry.

Heck, technically the problem isn't even that it's used by the registry, the problem is the npm client is a broken untestable mess and users seem to routinely unearth extremely surprising bugs.

This all has very little to do with teapots. The fact they used 418 and didn't provide a useful error message only makes it more interesting to talk about.

Yeah, fair point.

// This code will never execute

The last comment is locking the issue because too many people are responding saying the same thing. This comment makes me want to reply and say the same thing too, but I can't now.

I think this is one reason GitHub rolled out the "thumbs-up" reactions, too many comments just state "+1", which pollutes actual technical discussion, but at the same time it is crucial to know if many people have the same issue

Locking a discussion with zero-unreasonable responses as "too heated" is bad optics, but I think locking the discussion is a pragmatic decision, as the NPM team has been alerted about the issue and any meaningful comments would get lost in the noise otherwise.

It's likely that the "too heated" is GitHub putting words in the mouth of the repo admins. I don't think you get to choose the text of that message.

ETA: The options are "off-topic", "too heated", "resolved" and "spam". It looks like you can also elect to not give a reason, though...

One of the last few messages before that, by wgrant, was helpful in solving the problem. They might have missed that if they had closed the thread a little bit earlier.

Instead, they're just regular lost. Not much of an improvement.

If they are annoyed by the spam, they can “mute”.

This doesn't help if I see the problem and want to see last update from the developer, but all I see is a wall of memes.

Is it too soon to call this Teapotgate?

It's 'too soon' until the water is boiling. Then it's 'tea time'.

Teapot Dome

It’s tea soon.

It’s appalling how bad the NPM development team is. Do they have any sort of test suite at all? I’m switching to Yarn.

I’m reminded of when npm was beginning to get popular and all the web devs I worked near were acting like it was pure gold and the first of its kind; as if other package managers like maven or apt never existed.

These days I just have to laugh and shake my head.

I'm wondering if returning 418 is some sort of "off by one" error. Maybe their list of error codes are in an array, meant to send 500, but picked n-1 falling on 418?

My favourite HTTP response...

Why connect to npm through a proxy?

Because corporate won't let you connect without one.

- To cache

- To have your own repo since npm doesn’t support composite repos like Maven (last I checked)

- To not trust that npm won’t take a dump like today

- To not let npm know your IP address of your dev boxes or build boxes

I’m sure there are many more reasons...

Enterprise users or people in offices?

I'm going to cancel my paid subscription and remove my private packages. npm's incompetence has been a joke for years and it's a real shame they're propped up so highly by Node.

Better choice: move away from Node.

Comments about the error not being helpful!

Presumably the npm community is full of web devs. Given that how can they not know the http codes, especially one as famous as 418?

I recognised it straight away when it popped up this morning and thought "What the fuck, how could anything even trigger that?". There were no issues mentioning it for the longest time so we figured no way, it must be something besides npm.

I'd argue that 418 veers more towards "trivial" than it does "famous." Expecting knowledge of more than 400, 401, 403, 404, and maybe 412 is pushing it.

I didn't know 418 and I'm a web dev. Http status codes are a tiny part of my job

Not only that, but if you’re returning this status code you’re almost certainly doing something wrong or user hostile.

What if it really is a teapot? Not that farfetched with today's internet connected devices.

I still don't think returning a 418 (in the realm of client errors) is the correct code, even if it is a connected teapot which you're trying to identify!

I think the appropriate action would be a client sends an HTTP HEAD to the service, which responds 200 "TEAPOT".

There’s the EKG+, an internet connected smart kettle:


I have the plain EKG, and quite like it.

However, both are designed for pour-over coffee, so can’t correctly return 418.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact