
NPM proxy users receiving ERR 418 I'm a teapot - spondyl
https://github.com/npm/npm/issues/20791
======
iends
2 hours after the incident was responded to by an npm employee and the status
is still green: [https://status.npmjs.org/](https://status.npmjs.org/)

I love fake status pages!

~~~
wxuan
Overall, the system is up and running. I believe, the issue only affects a
subset of users behind proxies.

~~~
sitepodmatt
This is why I make clients drop/replace SaaSs. When SaaS don't update status
page, or in a transparent and prompt way, because it's not 100% outage it
rages me, especially if the reports are found elsewhere - twitter/reddit/hn.

~~~
mattmanser
You force your clients to do things for trivial reasons that send you info a
rage?

~~~
sitepodmatt
I don't consider a non working status page trivial. Yes, if a SaaSs send me
into a rage then there probably dozens of red flags already, yes I will work
with clients to drop or replace such SaaS with one that can communicate,
absolutely as it usually falls under my devop remit. This doesn't apply to npm
as not a paid Iaas/SaaS, but more to point out the number of shit SaaS that
don't manage their status updates probably. Imgix in the past for example,
Linode when Ddos - absolutely shambolic communication, and so on

~~~
nonconvergent
Wouldn't that be more of an issue of not having an SLA between SaaS and
client?

If the silly non-automated dashboard is part of the SLA, then it costs someone
money/liability/trust to not maintain it, otherwise "who cares as long the
issue gets resolved, people who care about the issue are tracking the bug
report?"

------
vorpalhex
Really disappointed in NPM lately. These past few months have been downright
silly. How many major problems like this affect other package managers?

~~~
swyx
what else in these "past few months" are you referring to? as a moderate npm
user i am blissfully oblivious.

~~~
jwalton
In the past few weeks:

* NPM 6 stopped working with Node 4. Rather than actually fix it, they just left it broken for several days because it was already fixed in the upcoming release, and I guess they didn't want to do an emergency release: [https://github.com/npm/npm/issues/20716](https://github.com/npm/npm/issues/20716)

* There was a several hour period where you could publish packages, but then they would 404 when you tried to download the new version.

* They switched to using Cloudflare on Friday, and broke Yarn in the process: [https://mobile.twitter.com/jamiebuilds/status/10001984632696...](https://mobile.twitter.com/jamiebuilds/status/1000198463269699584)

* Somehow while switching to Cloudflare they blew away a bunch of the packages that had been published during the previously mentioned window. They also blew away _all_ the versions of some packages. Last Friday night you couldn't 'npm install gulp'. Never gave any explanation for this: [https://github.com/npm/npm/issues/20766](https://github.com/npm/npm/issues/20766)

About a month ago there was a bug where npm would change permissions on a
bunch of system files and render you entire system unusable in you ran it as
root. But that was OK because the bug was on the "next" version of npm, not
the one you'd get by installing 'npm install -g npm'... Except there was also
a bug in the current version of npm that installed the next version of any
package by default, so it did in fact blow up a bunch of machines.

Now, apparently, they are a teapot.

~~~
bartread
And the real problem with all of this is that npm has become so ubiquitous
that you _can 't_ get away from it if you're doing client side development
(not even by switching to yarn).

You simply have to endure it, and accept that every now and again, for reasons
entirely beyond your control (and at a quite possibly very inconvenient
moment) it's going to break.

This really annoys me, but what can one do? At least it's fixed now.

~~~
flipp3r
> This really annoys me, but what can one do? At least it's fixed now.

Install your own repository manager? This is the standard in every company
I've worked for so far, at least in the Java world. Artifactory supports NPM,
so set it up as a proxy.

~~~
bartread
That's interesting - thanks. Of course, it has a price tag, but so does the
rest of our pipeline, along with our time.

~~~
PebblesHD
As far as the monetary price of implementing such a system, Nexus OSS[1] also
supports NPM proxying and is free for basic usage.

[1] - [https://www.sonatype.com/nexus-repository-
oss](https://www.sonatype.com/nexus-repository-oss)

~~~
zamber
We have Nexus in a subnet for faster installing. We had to write scripts
porting over lockfiles from npm to nexus and back.

This had to be added to a precommit hook to not break CI. Seriously,
package.json should allow to specify what endpoints should be used if
available in a given order. Now it's up to each dev team to handle.

My main concern is that it's brittle. Nexus caches exact versions and nothing
more so we don't even have assurance that it will work nicely when NPM goes
down.

On the other hand lockfiles are awesome. I missed them back in 2012... copying
over node_modules on USB drives was not cool.

~~~
acejam
You can specify what registry to use with a simple project-based .npmrc file.
We have ours point to our Nexus npm proxy.

~~~
zamber
That's what we eventually do but the lockfiles don't care, a resource url is a
resource url. We use yarn too which does proxies only in .yarnrc IIRC.

Do you have an externally available Nexus? Using ours through a VPN beats the
main purpose - fast(er) installs. That's why for WFH scenarios we have a
script to switch between our proxy and NPM.

~~~
acejam
Our Nexus setup is internal only. For WFH, we have hundreds of folks using a
corporate VPN which routes to our office, and then our office routes to our
AWS VPC, which is where our Nexus installation lives. I set this configuration
up and haven't had any real issues with it, nor do I see any reason to switch
between a proxy and npm.

If a developer is using an older buggy version of npm that doesn't respect
.npmrc and changes a lock file to point back to npmjs.org entries, we deny the
PR and ask for it to be fixed. Right now that check is unfortunately manual,
but there are plans to automate it. It can be easy to miss at times though,
since GitHub often collapses lock files on PR's due to their size.

For us, the main purpose of using Nexus as a proxy is to maintain availability
and to cache/maintain package versions. If you're using Nexus to make things
faster, then you probably shouldn't be using it. If you want faster installs,
look into using `npm ci`.

------
tiles
Relevant comment by wgrant:

    
    
        The body of the 418 is {"error":"got unknown host (registry.npmjs.org:443)"}.
        Looks like some npm clients are appending the port to the Host
        header, but only when going through a proxy, and that's confusing
        the registry.
    

It seems to be based on combination of npm:node version.

~~~
hn_throwaway_99
Sure, but why is anything in the chain returning the 418 "I'm a teapot" error
message?

My guess is someone wanted to use some code for "I don't really know what this
error is", saw the 418 and thought "that's cute!"

Be careful, "cuteness" and "cleverness" has a way of biting you in the behind.

~~~
nmjohn
> Be careful, "cuteness" and "cleverness" has a way of biting you in the
> behind.

Generally speaking I tend to agree, but in this specific case how is sending a
418 "biting you in the behind" worse than sending a more proper 500 error? (Or
perhaps a 400? Hard to say exactly, feels like the server is having unexpected
errors since it cannot properly parse the hostname.)

~~~
reificator
If I'm doing quick generic error handling, I'm going to assume anything in the
500 range is a server issue and display an error of the sort. If it's in the
400 range I'm going to handle the common ones and then assume it's the
client's or user's fault. _(And of course a user fault is really a client
validation fault.)_

So yes, returning a 4XX for a 5XX is problematic.

~~~
nmjohn
But is that relevant to _this specific_ case? The problem is not 3rd party
clients, it's the npm client itself - so I don't know how error handling is
relevant? (In the general case, yes of course it is, but in this and the
parent comment I'm specifically talking about issues in only the narrow use
case here.)

~~~
tedivm
The error was on the server side, not the client side. The server was not
following the HTTP spec. The fact that it was server side was why they were
able to fix this so quickly.

Since it was a server side error, and the issue was not a client side one, the
500 error code is most appropriate. The fact that the npm client itself is the
first one to trigger this error is not the issue here, as it could have just
as easily been yarn or another client. Following standards are important, and
we weren't trying to make tea.

------
Abishek_Muthian
A moment of sympathy for the engineers who have to explain their non technical
project manager on why their application is getting this error.

~~~
rdtsc
I learned my lesson not put cutesey messages in the logs the hard way. I was
young and dumb and in my case it was for a restore from backup system. A
customer had failed hardware and was trying to restore from backup. It was
failing and the funny message all of the sudden didn't sound too funny when
customer was reading it back to me in their own voice over the phone.

~~~
DrJokepu
Oh I’ve done that too as a rookie. Software displayed an error message when
the application was in a state that was impossible for it to be in, at least
in theory. The error message was a line from 2001 Space Odyssey: “I’m sorry
Dave, I’m afraid I can’t let you do that”. Of course when it ended up getting
displayed it was displayed to a customer called Dave who has never seen (or
read) 2001 Space Odyssey.

The first lesson was to avoid funny error messages in the future. The other
was that “impossible” things tend to happen surprisingly frequently in
software.

~~~
DoreenMichele
With 7 billion people on the planet, one in a million events are guaranteed to
happen to about 7k people.

Law of stats of large numbers etc

~~~
mlyle
There's a lot of different possible 1 in a million events, though...

~~~
DoreenMichele
True. I really intended it as 7k people will have had the _same_ 1 in a
million event. Sorry that wasn't clearer.

------
olingern
Here's to hoping that all my yarn caches will continue to carry me through
this.

Seriously, though. I'd love to get a look at their SLA and how they define
"uptime" for their "org" and "enterprise" users. I don't think most teams are
running out the door to throw their money at npm, and it's this type of
instability that creates mistrust amongst developers.

~~~
spondyl
Yarn seems to be unaffected with this issue so I imagine you'll be fine.
There's been a few users (including myself) who simply swapped out npm for
yarn and have had the issue resolved. It doesn't really address the root cause
but in this case, it's presumably some bit of logic in the actual npm client
itself rather than the registry.

~~~
olingern
Last time I checked the yarnpkg registry is still a proxy to npm, unless
they've started hosting. I haven't checked in some time.

~~~
lnanek2
yarn is basically an npm wrapper that add caching, so even so, it's good for
avoiding intermittent errors

------
hbcondo714
> The HTTP 418 I'm a teapot client error response code indicates that the
> server refuses to brew coffee because it is a teapot. This error is a
> reference of Hyper Text Coffee Pot Control Protocol which was an April
> Fools' joke in 1998.

[https://developer.mozilla.org/en-
US/docs/Web/HTTP/Status/418](https://developer.mozilla.org/en-
US/docs/Web/HTTP/Status/418)

~~~
nielsbot
yeah but, no but...

------
skywhopper
They say they've fixed this in this related issue
([https://github.com/npm/registry/issues/335](https://github.com/npm/registry/issues/335))
but I can't tell what change was made. The only discussion was around
accepting a port number in the Host header. All well and good to fix what I
assume was a config bug. But I can't tell if the secondary (but arguably more
important) issue of returning valid error messages was addressed. I'm not sure
what the exact right error message should have been. If the server was
intentionally differentiating host headers with port numbers then probably 400
or possibly 404. But I'm guessing this was just a plain and simple bug and
that 418 was set as a cutesy catchall "something unexpected happened" error
status to be sent in some exception condition. I hope they've changed that to
return 500, which would at least honestly accept blame for the issue.

~~~
pluma
I'm guessing they used 418 to mean "I have no idea who you think you're
talking to" (not entirely unfitting as 418 was proposed to indicate the coffee
pot you're talking to is actually a tea pot), considering it was triggered by
having the Host set to something it did not recognise. But they should still
have used a real error code and a more informative message.

------
Jaruzel
What makes me sad, is that this error (caused by the mis-use of the 418 status
code) and a previous issue (I forget the link) where a web dev was complaining
that 'I'm a teapot' was stupid and should be pulled from the specs, will
result in 418 being removed from the standard, and the humourless suits win
again.

If we can't have a little bit of levity in our lives, then what's the point of
it all ?

~~~
mseebach
> the humourless suits win again

I think this is misguided. Clearly, there are plenty of humourless prudes,
even amongst Us, the Blessed T-shirt Wearers. To wit, the "campaign" to remove
the 418 code did not originate in a corner office.

~~~
coldtea
Being a "suit" is a state of mind, not an article of clothing.

~~~
sundarurfriend
"Hey man, I'm not a suit." "Yeah you are, you just don't know it yet."

(Entourage,
[https://youtu.be/E-v1hOJCphI?t=78](https://youtu.be/E-v1hOJCphI?t=78) (NSFW
video))

------
umurkontaci
> I'm gonna lock this for now, because I'm sure it's gonna get plenty of
> traffic. You really don't need to respond to repeat what every other poster
> is saying. The registry team has been informed.

It's alright if they want to lock the thread to prevent a massive +1 spam, but
at the very least they should have given a link to track the status of this.

When your production systems are down, "thanks for the report" is not a good
enough response.

~~~
actionowl
His tone is all wrong here:

> You really don't need to respond to repeat what every other poster is
> saying.

Obviously they did need to because no one from the team had responded yet.
Also his response without any link to track the status is rather disappointing
but entirely what I'd expect from the NPM team.

~~~
JCSato
*her/their tone, if you care

As the person mostly on the hook for
[https://news.ycombinator.com/item?id=16435305](https://news.ycombinator.com/item?id=16435305)
(as I recall), I can imagine she/they would be a little on edge.

Still not a good look. . .

~~~
pluma
I see no reason why "they've screwed up before so it's okay if they're being
dismissive and intransparent" would ever be a valid argument. The point of
excusing past mistakes is that they are learning opportunities.

If anything, the filesystem permissions bug only makes this worse because it
was a destructive bug in a widely promoted release (even if it was technically
not supposed to be stable -- npm employees actively recommended using it on
twitter) and npm's reaction was fairly dismissive (because it's not a stable
release for production use, dummy).

~~~
JCSato
Only intended to explain, not excuse. I totally agree.

------
forapurpose
It actually reads like an IoT protocol, way ahead of its time in 1998. Life
not only imitates art, but eventually catches up with parody:

[https://tools.ietf.org/html/rfc2324](https://tools.ietf.org/html/rfc2324)

~~~
FroshKiller
I suspect the joke was inspired in part by a certain well known coffee pot
monitoring system that preceded it:
[https://en.wikipedia.org/wiki/Trojan_Room_coffee_pot](https://en.wikipedia.org/wiki/Trojan_Room_coffee_pot)

------
stevenjohns
Just been fixed, apparently[0]:

> We've fixed this -- we now accept the port appended to the host. @wgrant's
> comment was most helpful, thank you!

[0]
[https://github.com/npm/npm/issues/20791#issuecomment-3926484...](https://github.com/npm/npm/issues/20791#issuecomment-392648459)

~~~
nneonneo
They did fix it, but e.g. appending a period
([https://registry.npmjs.org./](https://registry.npmjs.org./)) still throws a
418 - I expect this to bite them someday (eventually). I don't understand why
their Host: parsing is so strict; it seems unnecessarily restrictive.

~~~
gpvos
418 is really a confusing response here. They should fix that ASAP by giving a
normal error message.

~~~
nneonneo
I would recommend (if any NPM people are reading this) that the error for an
unknown Host header be changed to 404. Bad host headers are already filtered
out by Cloudflare (which returns 400), so any host headers that make it
through would represent "unknown"/"missing" hosts (hence 404).

------
flavio81
\- Obligatory reminder that the NPM system is a failboat.

\- Public Service Announcement: Don't board the failboat.

~~~
bartread
Thanks, but what's the alternative when yarn relies on npm infrastructure?
Like it or not, you're still using npm. It's got so you basically _cannot_ do
serious client-side development without one of them.

~~~
mykull
"Enterprise development" is not the same as "serious development". Anyone
still doing regular old JavaScript without the transpiling and megaframework
dependencies is arguably taking their task more seriously than those who are.

~~~
moogly
I'm sorry, but after having used TypeScript there's no way I'm ever going to
back to writing pure JavaScript anymore (for anything but the most trivial
stuff). It's borderline irresponsible, IMO.

~~~
mykull
I'd like to say I love typescript, too. It's better for collaborative
development than raw JS. My point is different, that working in the "low
level" or raw JS, obfuscated or not, is where the substantive work happens.
Good example is the people working on frameworks.

------
minikomi
Perhaps it really is a teapot.

~~~
itgoon
It isn't being asked to brew coffee, though, so it's a mis-use of the spec.

~~~
d4n1elchen
Or maybe you're asking a teapot to install npm packages.

------
partycoder
This HN comment is probably one of the best explanations of why npm can be
objectively said to be unreliable. I think it deserves attention.

[https://news.ycombinator.com/item?id=17176486](https://news.ycombinator.com/item?id=17176486)

------
xab9
Okay, I appreciate npm and everything, but nowdays new versions come out on a
weekly basis yet something still breaks, either locally or in the CI for us.
This made me fear two things the most lately: `npm i` and `npm i -g npm`.

------
incadenza
I’m sure the anti-NPM brigade is on its way, but I recall there being a huge
backlash against removing this error code from Node a few years ago.

~~~
pluma
The problem isn't that Node supports this error code. The problem is that it's
used by the registry.

Heck, technically the problem isn't even that it's used by the registry, the
problem is the npm client is a broken untestable mess and users seem to
routinely unearth extremely surprising bugs.

This all has very little to do with teapots. The fact they used 418 and didn't
provide a useful error message only makes it more interesting to talk about.

~~~
incadenza
Yeah, fair point.

------
anonnyj
// This code will never execute

------
danschumann
The last comment is locking the issue because too many people are responding
saying the same thing. This comment makes me want to reply and say the same
thing too, but I can't now.

~~~
freditup
Locking a discussion with zero-unreasonable responses as "too heated" is bad
optics, but I think locking the discussion is a pragmatic decision, as the NPM
team has been alerted about the issue and any meaningful comments would get
lost in the noise otherwise.

~~~
phyzome
It's likely that the "too heated" is GitHub putting words in the mouth of the
repo admins. I don't think you get to choose the text of that message.

ETA: The options are "off-topic", "too heated", "resolved" and "spam". It
looks like you can also elect to not give a reason, though...

------
btown
Is it too soon to call this Teapotgate?

~~~
jugg1es
It's 'too soon' until the water is boiling. Then it's 'tea time'.

------
symlinkk
It’s appalling how bad the NPM development team is. Do they have any sort of
test suite at all? I’m switching to Yarn.

~~~
ulkesh
I’m reminded of when npm was beginning to get popular and all the web devs I
worked near were acting like it was pure gold and the first of its kind; as if
other package managers like maven or apt never existed.

These days I just have to laugh and shake my head.

------
JazzXP
I'm wondering if returning 418 is some sort of "off by one" error. Maybe their
list of error codes are in an array, meant to send 500, but picked n-1 falling
on 418?

------
purephase
My favourite HTTP response...

~~~
donpdonp
[https://http.cat/418](https://http.cat/418)

------
stevebmark
Why connect to npm through a proxy?

~~~
marksomnian
Because corporate won't let you connect without one.

------
notacuck
I'm going to cancel my paid subscription and remove my private packages. npm's
incompetence has been a joke for years and it's a real shame they're propped
up so highly by Node.

~~~
yeukhon
Better choice: move away from Node.

------
kasey_junk
Comments about the error not being helpful!

Presumably the npm community is full of web devs. Given that how can they not
know the http codes, especially one as famous as 418?

~~~
whatsstolat
I didn't know 418 and I'm a web dev. Http status codes are a tiny part of my
job

~~~
49bc
Not only that, but if you’re returning this status code you’re almost
certainly doing something wrong or user hostile.

~~~
matte_black
What if it really is a teapot? Not that farfetched with today's internet
connected devices.

~~~
49bc
I still don't think returning a 418 (in the realm of client errors) is the
correct code, _even if it is a connected teapot which you 're trying to
identify!_

I think the appropriate action would be a client sends an HTTP HEAD to the
service, which responds 200 "TEAPOT".

