> Contrary to what npm states, this package actually depends on one of our aforementioned spam packages. This is a by-product of how npm handles and displays dependencies to users on its website.
For me personally, this is the biggest surprise and takeaway here. By simply having a key inside package.json's dependencies reference an existing NPM package, the NPM website links it up and counts it as a dependency, regardless of the actual value that the package references (which can be a URL to an entirely different package!). I think this puts an additional strain on an already fragile dependency ecosystem, and is quite avoidable with some checks and a little bit of UI work on NPM's side.
Here it's clear that the package links to something in a weird, non-standard way. A manual review would tell you that this is not axios.
The package.json lets you link to things that aren't even on npm [1]. You could update this to something like:
"axios": "git://cdnnpmjs.com/axios"
And it becomes less clear that this is not the thing you were intending. But at least in this case, it's clear that you're hitting a git repository somewhere. What about if we update it to the following?
"axios": "axiosjs/latest"
This would pull the package from GitHub, from the org named "axiosjs" and the project named "latest". This is much less clear and is part of the package.json spec [2]. Couple this with the fact that the npm website tells you the project depends on Axios, and I doubt many people would ever notice.
You should think of the package metadata as originating from the publisher, not from the registry. Aside from the name, version, and (generated) dist and maintainers fields, I don't think any of it is even supposed to be validated by the registry?
Agreed the website UX is confusing and could be better but in general package metadata is just whatever the publisher put there and it's up to you to verify if you care about veracity.
the fucking website processes it and after some mighty compute somehow shits out the wrong link. it's actively making things worse by trying to be helpful.
confusing is one thing, but there's a screaming security chasm around that innocent little UX problem.
MS bought npmjs and now it's LARPing as some serious ecosystem (by showing how many unresolved security notices installed packages have) while they cannot be arsed to correctly show what's actually in the metadata?
this is a little too stoic a take with respect to a tool that very unserious people building things for serious but non-technical people use on a daily basis. i think we should strive for more. npm can continue to exist in its very libertarian form, but perhaps there's room for something that cares a bit more about caution
How about removing the incentive? Take down every package with tea.yaml in it, after say 1 month's warning, so legitimate packages trying to use it don't leave their users in the lurch. The tea protocol is clearly not going to accomplish what it set out to (see below), and is instead incentivising malicious behaviour and damaging the system it set out to support.
From https://docs.tea.xyz/tea/i-want-to.../faqs: "tea is a decentralized protocol secured by reputation and incentives. tea enhances the sustainability and integrity of the software supply chain by allowing open-source developers to capture the value they create in a trustless manner."
> allowing open-source developers to capture the value they create
But... then why would I use their code if whatever value it creates is captured by them the developers and so I am no better from where I was? That's like paying your employees the additional value they produce instead of the market wages: you then literally have no reason to hire them since their work is exactly profit-neutral.
I combed through their docs to try to find how these tokens would actually make maintainers money and it seems like it people pay projects for fixing bug reports (and penalize them if they don't)? The other demand drivers of the token seem to just be shuffling money around and are at best a pyramid scheme. I'm a little confused how someone seriously thought this was gonna be a good idea.
That would be a clear violation of the npm Unpublish Policy[0]. If all it takes is some spam and pissing people off to walk away from principles, they never meant anything. A proper response needs to not break expectations like this.
The entire NPM ecosystem is a garbage fire. Who cares about whatever 'principles' it supposedly has? Other than avoiding malware I can't think of something I care about less than whatever principles NPM / JS developers in general have because they've mostly been bad so far.
I wouldn't be surprised if principles in this case leave us with thousands of spam packages degrading the node ecosystem forever. It'd be exactly what I expect. So I guess I should thank the principle of consistency.
I know it's a meme on HN to rant about the terrible JavaScript ecosystem and how bad JS developers are, but I would ask that if you're going to do it you be specific about what you mean instead of just generally accusing it of being "bad".
It's not even that I disagree, it's that it's a conversation killer. "The JS ecosystem is bad" has no response someone could make besides "no it's not", which is boring. "The JS ecosystem encourages using a million tiny unmaintained packages and that is bad" is a much more interesting statement that can spark a useful discussion.
We can empirically observe that NPM-sphere is relatively alone among software ecosystems to have this particular problem.
This is an indication that the problem is either with some facet of NPM itself, javascript the language or js programmers, as that is what distinguishes the ecosystem from e.g. Maven or Pip that do not suffer from the same problems, at least not to the same extent.
However, going from this observation to isolating causal factors is a lot harder, and randomly guessing isn't very likely to hit the mark.
It's two things really: a small standard library and sheer size of developer community. JS has way more developers than any other language. But if you search for "$PROGRAMMING_LANGUAGE supply chain issues" you literally find reports for all popular languages.
[1] claims that half of Python packages have security issues.
[2] says that the Rust supply chain has security issues.
You're doing it again, though: are "this particular problem" and "these problems" the tea.yaml spam? The million tiny packages problem I mentioned? The fact that people online will generically attack the ecosystem without being specific about their complaints?
I'm not asking for solutions, and I'm not asking for people to identify casual factors. I'm asking for people to put a little bit more effort into their criticisms of the JS ecosystem than just "it's obviously and empirically a dumpster fire".
A lot of people have already been very specific in many other threads -- "the JS ecosystem has way too many and way too small packages and there's zero curation".
Not sure what your seemingly intended moderation is supposed to achieve but the complaints towards the JS ecosystem have been very clear for no less than 10 years.
So we're specifically talking about the tea.yaml spam. More than any other topic that seems like one that is worth digging into details on rather than just shrugging and saying isolating causality is hard.
If we look at the chart in the original article [0] that this one is a follow up to, the NPM spam suddenly picked up around the end of February, with new packages per day first doubling and then tripling. So this 70% figure is specific to the last 6 months, not something that has been the case with the ecosystem for a long time.
That makes tracing causality much simpler: the Tea protocol seems to be pretty clearly the source of the problem. The big open question is why NPM, but the way that people jump to the conclusion that NPM being the target of this attack must have something to do with the flaws in the ecosystem smacks of victim blaming. Isn't it just as possible that NPM was targeted because it's huge? If you're going to run a massive spam campaign you do it where the people are.
Could NPM learn from this and start controlling spam better? Yes! But That's not the same thing as attributing this tea.yaml nonsense to systemic flaws in the ecosystem—spam prevention has to be balanced with usability, and the balance was pretty decent until 6 months ago.
> The JS ecosystem encourages using a million tiny unmaintained packages and that is bad
continuing on this, I wonder if this is a cultural thing or if there are actual technical choices made in NPM that play a role. Could NPM change something in their package management to change this? Should they?
it's language-cultural. to "publish a package" in Go simply means having a public git repository. and yet, nobody who writes Go imports packages. it's well-understood that if you can't write something like leftpad (or many other JS packages) yourself in your own codebase in a few lines, you're an absolute nonce. Javascript developers on the other hand tend to skew towards the juniors in our broader ecosystem, and they seek easy and quick prestige, which leads to "star farming"/"download farming"
... _what_? i don't think you parsed my comment correctly at all
to address the part of your comment that doesn't make my head spin: only very occasionally do i see senior Go developers import 3rd party libraries. i'm just speaking from my experience
The JS ecosystem doesn't have any singular bad feature that other languages do not share.
Instead, what it does have is a huge prevalence of those features, and minimal size of a "safe space" where one can have some confidence they will not appear. Both of those are quantitative differences, that people can not summarize in a short comment, and people can easily dismiss with (misguided or dishonest) counterexamples.
So, what you are asking for is a full blown large scale study of several ecosystems. Somebody may do something like that, but not for a comment, and not because you asked.
I ask because I don't believe the JS ecosystem is notably worse than the Python ecosystem or the Java ecosystem and I'm tired of the meme of railing on JS developers when what people are really railing against is developers in general.
All ecosystems that are sufficiently popular have terrible problems. They have different problems, but none is consistently pleasant to work with. Out of all of them, though, JS gets singled out for constant attacks because... reasons.
I just want people to identify what those reasons are so we can have a conversation about them rather than just endlessly repeating the meme.
Its not about principles in some abstract sense though, its terms if use. Package authors need to know what the rules of the road are when dedicating time to publishing to npm, and package users need to know how much they can rely on the packages they depend on still being there tomorrow.
It'd be one thing if npm added audit warnings along the lines of "3 dependencies are likely spam." It'd be a totally different story for npm to remove them automatically based on a toolset used, in the GP example.
The unpublish document describes the options that users of NPM have to remove packages themselves. It was created after some situation where someone unpublished an important package.
A whole different set of terms governs which packages NPM can remove. This definitely includes these packages, either as "abusive" or "name squatting"
Not only that, but NPM's TOS makes it very clear that you have no recourse if they decide to remove your package for any reason.
> Registry data is immutable, meaning once published, a package cannot change. We do this for reasons of security and stability of the users who depend on those packages. So if you've ever published a package called "bob" at version 1.1.0, no other package can ever be published with that name at that version. This is true even if that package is unpublished.
This statement makes assertions and sets expectations for both publishers and users. It would be senseless if npmjs would start arbitrarily "taking down" packages on their own discretion simply because they include a tea.yaml file (as proposed in the comment I replied to).
Principles are a means to an end, not an end in themselves. The end here (presumably) is a healthy ecosystem, an end which this principle arguably harms more than it helps. Rigid and unthinking adherence to principles is dogmatic, and dogma has no place in engineering.
> The end here (presumably) is a healthy ecosystem
More specifically, the end here is a package manager that doesn't randomly start break your builds because a dependency you need can just vanish from the main servers or lose files you expected to be there. That may or may not contribute to a healthy ecosystem, but it definitely contributes to widespread usage of npm.
if you've been duped into importing a package which has been broadly deemed as spam (but you're not looped into the public conversation about that fact enough to realize it), wouldn't "breaking the build" be a good way to get you to realize your folly and avoid the trap?
No, that is but one condition of the end, but not the whole of the end.
A system is all the parts it requires to continue to exist. Widespread usage of NPM will collapse if everything on it is hot dangerous garbage that infects your CI-CD/dev box with something when you type a wrong character. There are multiple dimensions to trust. Is the package I'm using going to disappear is one. Is the package I'm using a virus is another. Is the entire NPM ecosystem going to collapse under the weight of controlled growth and hosting costs leaving me with nothing is yet another.
You need to back up and look at the whole elephant.
Why are these spam accounts not perma banned and removed?
For example, this[1] account mentioned in the article has 1781 packages of gibberish.
Also, the whole reporting process is onerous, there is a large form. Of course, gatekeeping on reporting is good, but there should be a possibility to report an entire profile of package publisher.
Isn't it better to leave accounts that correlate spam than to force spammers to obscure the connection by creating a new account for each piece of spam?
That primarily works if you can shadow ban the account. Otherwise the spam is still negatively impacting the community (ex. By polluting search results).
If you make them create a new account each time you remove a package, how does that help you find or remove pollution going forward? It seems to work in the moment, but if you have no plan to change the system the resulting equilibrium is worse than if you can identify a connection between packages from the same spammer.
That's not how spammers work. There is this profile with thousands and there are still hundreds of spam profiles with just a handful of packages yet. If you let them grow unchecked, they grow, exponentially. The broken Windows theory fits well here
I am not sure I am following that this fits the broken window fallacy? That fallacy is 'if I break/destroy something I create value on other things'. I am actually curious how spamming a bunch of accounts with junk in them would fit that? Oh no doubt it is creating negative value but nothing is destroyed to do that. 'Broken window' is probably not the right pattern here? I can think of a couple of other terms that fit better but still do not seem right.
The other commenter was actually just mixing it up with the broken windows fallacy[0]. Funnily enough, it's a common enough confusion that both of the pages reference each other at the top of the page.
>This article is about the economic parable. For the criminological theory, see Broken windows theory.
> Next, because the AI hype train is at full steam, we must point out the obvious. AI models that are trained on these packages will almost certainly skew the outputs in unintended directions. These packages are ultimately garbage, and the mantra of “garbage in, garbage out” holds true.
hmm, inspiring thoughts. An answer to "AI is going to replace software developers in the next 10 years" is to create 23487623856285628346 spam packages that contain pure garbage code. Humans will avoid, LLMs will hallucinate wildly.
We can also seed false information more generally, especially on Reddit which every AI company loves to scrape - less so on Hacker News. I recently learned that every sodium vapor streetlamp is powered by a little hamster running on a wheel. Isn't that interesting?
Most of the recent gains in LLM quality came from improving the quality of inputs (i.e. recognizing that raw unfiltered internet is not the ideal diet for growing reason).
I don't know how good the filters are though, since they're mostly powered by LLMs...
That's not what "hallucination" is. Hallucinations in LLMs are when they unexpectedly and confidently extrapolate outside of their training set when you expected them to generate something interpolated from their training set.
In your example that's just a pollution of the training set by spam, but that's not that much of an issue in practice, as AI has been better than humans at classifying spam for over a decade now.
If I agree with your definition of hallucinations in the context of LLMs... Then isn't your second paragraph literally just a way to artificially increase the likelihood of them occurring?
You seem to differentiate between a hallucination caused by poisoning the dataset vs a hallucination caused by correct data, but can you honestly make such a distinction considering just how much data goes into these models?
Yes, I can make such a distinction - if what the LLM is producing is in the training data then it's not a "hallucination". Note that this is an entirely separate problem from whether the LLM is "correct". In other words, I'm treating the LLM as a Chronicler, summarizing and reproducing what others have previously written, rather than as a Historian trying to determine the underlying truth of what occurred.
Frankly, hallucination as used with LLMs today is not even really a technical term at all. It literally just means "this particular randomly sampled stream of language produced sentences that communicate falsehoods".
There's a strong argument to be made that the word is actually dangerously misleading by implying that there's some difference between the functioning of a model while producing a hallucinatory sample vs when producing a non-hallucinatory sample. There's not. LLMs produce streams of language sampled from a probability distribution. As an unexpected side effect of producing coherent language these streams will often contain factual statements. Other times the stream contains statements that are untrue. "Hallucination" doesn't really exist as an identifiable concept within the architecture of the LLM, it's just a somewhat subjective judgement by humans of the language stream.
The Tea protocol's flawed incentive model is a disaster, effectively encouraging developers to pollute npm with spam. It's a prime example of what happens when protocols prioritize quantity over quality, compromising the entire ecosystem.
A "better" way is to modify the package-lock.json. You can still spoof the package but almost no one actually reviews it as npm will usually modify 1000s of lines.
I also discovered that npm doesn't actually verify what's in node_modules when using "npm install". I found this out a few ago after I had some corrupted files due to a flake internet connection. Hugely confusing. Also doesn't seem to be a straightforward way to check this (as near I could find in a few minutes).
But luckily "npm audit" will warn us about 30 "high severity" ReDos "high impact" "vulnerabilities" that can never realistically be triggered and are not really a "vulnerability" in the first place, let alone a "high impact" one.
No, it's more than that. Did you read the documentation page linked in the comment you replied to?
> But what about after the command has run?
If you munge around in node_modules after a successful `npm ci`, that's on you. If you run scripts that do, that's on you. If you depend on packages that run such scripts, that's on you.
> What, you mean I'm supposed to audit my dependencies myself? That's too much work!
Yes, as part of code review we expect our devs to manually inspect every change in the lockfile for anything that matters or might start to, which includes most things. No, you can't outsource that task to an AI, regardless of how well-performing it appears.
It says exactly that: "If a node_modules is already present, it will be automatically removed before npm ci begins its install".
I didn't "munge around" in node_modules; I said "if something goes wrong". Like I said in my previous comment: "I found this out a few ago after I had some corrupted files due to a flaky internet connection". That's not munging around, that's computers being computers. Network errors happen. Disk errors happen. Memory errors happen. Things like that. I've also had an install ISO corrupted at some point. I always check the sums since, just in case because there was a lot of confusion involved before I found out the ISO was just downloaded wrong for some reason. Stuff doesn't often get randomly corrupted, but it does happen, and with 2GB ISO files (or 2GB node_modules) the chances do grow.
On Go I can do "go mod verify". I think yarn has "yarn check" for this (but didn't verify). I don't know about other package managers off-hand, but if they don't have something for it, they should. You need to be able to verify the content on disk is identical to what's expected.
I never mentioned anything about auditing dependencies or AI.
Your entire post is a masterclass in arguing against things that were never claimed and forceful injection of your own bugbears. I just want to check if node_modules is identical to what's expected, just in case, because computers kind of suck and are unreliable.
> That's just "rm -r node_modules && npm install".
And there is more to it, very much relevant to your original complaint, which is not applicable to it.
> I didn't "munge around" in node_modules; I said "if something goes wrong". Like I said in my previous comment: "I found this out a few ago after I had some corrupted files due to a flaky internet connection".
I do believe that if you had run `npm ci` instead of `npm install`, that would have resulted in an error and an empty node_modules instead of inconsistency.
> Disk errors happen. Memory errors happen. Things like that.
Those are at lower layers than the package manager. I think it's unreasonable to expect the package manager to check for inconsistencies induced by hardware errors. You have file-system level solutions (zfs,btrfs) and things like ECC for that. Nothing is a silver bullet.
> I've also had an install ISO corrupted at some point. I always check the sums since, just in case because there was a lot of confusion involved before I found out the ISO was just downloaded wrong for some reason
Good! It happens. Keep doing it. There's a reason integrity verification is a step in every Linux distro installation guide.
> Your entire post is a masterclass in arguing against things that were never claimed and forceful injection of your own bugbears.
Fair! Though I do want to call out your "Who is carefully auditing if the repo URL in the lockfile is actually the correct one?" in a sibling comment - apparently the inferred bugbear wasn't all off-base :p
> I just want to check if node_modules is identical to what's expected, just in case, because computers kind of suck and are unreliable.
I think we can agree on that npm is not sufficient tooling for this. My response is additional tooling. I guess you want a more feature-complete package manager. Wish I had one to recommend. npm is severely neglected after the MS acquisition; yarn maintainers are completely misguided on a couple of fundamentals; every time I get around to take another look at pnpm either I run into a bug or catch a recent enough breakage-outside-of-semver that I decide they're not ready yet...
I think that's true for most package managers. That if there's a lock file, there's typically a default command to use it for installs and ignore the main config file.
Yeah maybe, I don't really know off-hand and I'd have to check. I know it's not possible in Go but not sure about anything else. I'd consider it hugely surprising for other packagers where that's possible too. Who is carefully auditing if the repo URL in the lockfile is actually the correct one?
Poetry for sure acts this way. Some checks on things like "poetry.lock is older than pyproject.toml", but no real checks unless you specifically ask for them. Not saying it's good, of course. Just that it's typical.
> But luckily "npm audit" will warn us about 30 "high severity" ReDos "high impact" "vulnerabilities" that can never realistically be triggered and are not really a "vulnerability" in the first place, let alone a "high impact" one.
Yeah, you want to be using a tool that lets you ignore/acknowledge specific entries.
That (and anything else relying on the lockfile) won't take effect for users who install the package from the npm registry, unlike changes in package.json.
You just demonstrated the uglier package-manager-independent overrides(npm)/resolutions(yarn) aliternative method. Because for whatever reason they couldn't play nice with each other.
npmjs.com seems to be interpreting the field incorrectly but 1) AIUI that does not affect actual npm usage, 2) If you rely on that website for supply-chain-security input I have bridge to sell... Basically all the manifest metadata is taken as-is and if the facts are important they should be separately verified out-of-band. Publishers could arbitrarily assign unassociated authors, repo URL, and so on.
I was sad to read this and thought "this is why we can't have nice things."
But following the links was fun and educational:
"The end goal here [of the Tea protocol] is the creation of a robust economy around open source software that accurately and proportionately rewards developers based on the value of their work through complex web3 mechanisms, programmable incentives, and decentralized governance."
Which lead to:
"The term cobra effect was coined by economist Horst Siebert based on an anecdotal occurrence in India during British rule. The British government, concerned about the number of venomous cobras in Delhi, offered a bounty for every dead cobra. Initially, this was a successful strategy; large numbers of snakes were killed for the reward. Eventually, however, people began to breed cobras for the income. When the government became aware of this, the reward program was scrapped. When cobra breeders set their snakes free, the wild cobra population further increased."
Which lead to:
"Goodhart's law is an adage often stated as, 'When a measure becomes a target, it ceases to be a good measure.'"
I recently stumbled upon a bunch of repos which were clearly copied from popular projects but then renamed with a random Latin name and published to npm.
I reported some of them as spam, but there were hundreds of them. I couldn't figure out why somebody would waste the time to do that, but now it makes sense.
There was a similar thing to tea a while back. I think I saw the project posted on here. Went to their github and found a typo in their Readme. Opened a pr with a correction and then they started sending me about a dollar in btc every month till they ran out of money and the project imploded.
Package managers often comes with rating system. npmjs has weekly downloads, pull requests, and other popularity scores.
I am layman in AI, but why would anyone think that this would affect anything, like AI? Why would anyone train on noname package, that noone uses?
Stats for spam packages can have higher-than-none stats, but that also makes them vulnerable for sweep removal of all potential spam packages, since they are connected, etc. etc.
Any credible company will not use a noname spam package, will verify their contents. That is at least what happened in all companies I have worked for.
> why would anyone think that this would affect anything, like AI? Why would anyone train on noname package, that noone uses?
…almost certainly for the same reason that any “train AI using only good data, reduce hallucinations!” suggestion is in the “daydream” rather than “great idea” category.
Creating high quality filtered datasets is enormously more time consuming and expensive than just dumping everything you can get you hands on in.
It seems obvious to ignore packages that are obviously unused and spam, but tldr; no idiot is going to be pouring spam into npm unless there’s some kind of benefit from it; people accidentally using it, mixing it into the dependency tree of legit packages, etc.
It’s more likely that the successful folk doing this aren’t being caught, and the ones being caught are “me too” idiots. Or, the spam is working and people are actually (for whatever incomprehensible reason) actually using at least some of the packages.
TLDR; if dependency auditing and supply chain attack were trivial to solve, it wouldn’t be a problem.
…but based on the fact that we continue endlessly to see these issues, you can assume that it’s probably more diff to solve than it trivially appears.
This is such a low effort insincere comment I can barely be bothered to respond to it… but tldr; no, it didn’t.
If it was easy, people would have done it. It’s not easy. Phi is not a state of the art model. It does not perform significantly better or even on par with larger models.
Yes, I’ve read the tech reports and used it. No, I don’t believe it has any kind of meaningful bearing on the problem, which is explicitly in question here, which I explicitly posit, again, is basically unsolvable:
Given a large user contributed repository of code (npm), it’s very hard to determine “good” from “bad” in terms of quality at scale, when you have malicious actors.
…I mean, it’s not impossible with enough time and effort I suppose, but if Microsoft, who own npm have a good way of filtering out bad content on it for their language models, you’ve really got to ask why the duck they’re using it for their language models, and you know, not to unduck npm…
I'm confused. Are you saying that removing low quality inputs from training data doesn't improve a model? (Or conversely, adding high quality inputs.) Or are you saying that we don't yet have the technology to reliably do this at scale?
I again, can’t comprehend how this can possibly be ambiguous from my comment, but the second one.
We don’t (by all accounts, no one does) have a way to create this kind of dataset at scale, in this kind of complex user contributed content environment (specifically npm and other places like it).
Microsoft's curation techniques for the Phi models remain proprietary. So we can't really criticize or praise their methods, because we don't know what they are. It might be GPT-4. It might be Artificial Artificial Intelligence (a warehouse in Pakistan). But the results speak for themselves.
The models are a bit janky in my testing (especially prone to leaking test materials, and highly specialized on a narrow domain), but fantastic for their size.
Intentional "under-generalization" seems like a fairly self-evident approach to making optimal (and economical, on the training side) use of smaller models.
As for whether it works for a general purpose model, my intuition says that it does (i.e. cutting off the "long tail of knowledge" in favour of a better handling of the mainstream, by the limited neurons available).
As for whether that tech exists, I reckon a simple tf-idf would get you 80% of those wins, but that might be ignorance/arrogance on my part.
If you look at the purpose of this Tea protocol it is exactly to provide a chain of credibility. Though, by connecting ranking with monetization, tea has created perverse incentives, leading spammers to pump up their tea ranking, by linking and starring packages in circles. Their goal is to make it look like it’s a highly used package.
Luckily, nobody thinks that tea ranking matters, except for the spammers themselves.
They are with no doubt attempting to poke at other more established metrics as well. This could eventually fool an AI or even humans.
> Why would anyone train on noname package, that noone uses?
Not that I disagree, but in the same line of thinking: Why would anyone train an LLM on some random blog written in broken English? Why would you train an LLM on the absolute dumpster fire that is Reddit comments? Or why is my Github repos with half-finished projects and horribly insecure coding practises being used as input to CoPilot? Yet here we are, LLMs writing broken, insecure code (just like a real person) and telling people to eat rocks.
Heh, I work in a sector that works with some very large companies we all know the names of. I've seen applications that are seemingly very little code written by them but hundreds or thousands of packages/modules glued together. It is quite common that the tooling they use catch 'low reputation' packages where they've actually put the wrong package name in, then when it didn't work, add the package they needed but didn't remove the misnamed package.
Would you be at all surprised? I'm fairly confident that like with browser addons, NPM package maintainers get offers from randoms to 'buy' their package in order to get backdoor access.
A secured registry is long overdue, where every release gets an audit report verifying the code and authorship of a new release. It won't be nearly as fast as regular NPM package development but that's a good thing, this is intended for LTS versions for use in long-term software. It'd be a path to monetization as well, as the entities using a service like this is enterprise softare and both the author(s) of the package as the party doing the audit report would get a share.
> A secured registry is long overdue, where every release gets an audit report verifying the code and authorship of a new release.
Microsoft did exactly that (since they own both NPM and Github) by allowing you to verify the provenance of NPM packages built using Github Actions [1]. It's not required for all packages though. They've also started requiring all "high impact" packages to use two factor authentication [2].
Who says there is one? It takes basically zero effort to publish these packages, so why not do it? Script kiddie stuff. Lots of people run dumb unsuccessful hustles. The long term plan seems to be macaroni. That is: throw enough macaroni at a wall and hopefully some of it will stick. Or maybe not. Who cares? Wasn't my macaroni and I won't have to clean the wall.
I don't know if they managed to fix it in recent years, but JS dependencies management used to be broken. I think the left-pad[0] incident is the most known one, but not the unique one. My guess is that you spam enough, at some point in time one of the packages will go viral.
I'm fairly proficient in Javascript, but mismanagement of the ecosystem like this is a major reason why any time I see that something requires Node.js, I just turn and run in the other direction. It's just not worth the headaches.
I mean realistically it's representative of the Internet as a whole. Makes me wonder where all the porn packages are.
The pulling in of unexpected dependent packages is a real issue though, how do other ecosystems deal with it? NPM is really missing some level of trust beyond just using "brand name" packages.
My general judgement is usually how often it's worked on/how many downloads it has but gut feel isn't really enough, is it?
For me personally, this is the biggest surprise and takeaway here. By simply having a key inside package.json's dependencies reference an existing NPM package, the NPM website links it up and counts it as a dependency, regardless of the actual value that the package references (which can be a URL to an entirely different package!). I think this puts an additional strain on an already fragile dependency ecosystem, and is quite avoidable with some checks and a little bit of UI work on NPM's side.