NPM 6.9.1 is broken due to .git folder in published tarball

kotxig · on June 28, 2019

While this is clearly a bug, the default publish strategy is to publish all the contents of the package including dev/test files. If you look at the node_modules tree on a typical server-side node project it's full of garbage files.

We still think it's a great idea to pull down hundreds if not thousands of copies of modules from a remote every time we generate a build. The community has always been lacking maturity around how it manages modules and releases code for production, preferring to preserve developer productivity and trinket features over best security practices and optimizing for the size and quality of production packages.

I don't want to install hundreds of modules every time I create a new build. I don't want your tests or your README in my production tarball. I don't want your browser compatibility code in my server code. I don't want my node_modules tree to be 9 layers deep. I don't want to have to dedupe multiple copies of modules by hand. I don't want to have to debug where shrinkwrap isn't respected. I don't want to play roulette with the package manager version to figure out which one does the right thing for my package. I don't want to run your dubious pre/post install scripts. I don't want to use npm.

The sad thing is, npm is a commercial entity that has its package manager published by default with an opensource community project. And this is why we can't have nice things.

shados · on June 28, 2019

Thats why we never replaced our homegrown, predates-npm package manager. It feels crazy to be maintaining our own when there's such a popular 3rd party one out there, but honestly...it really sucks. Migrating nearly a decade worth of code to something that is often objectively worse doesn't feel worth it. And so the homegrown package manager lives on...

growtofill · on June 28, 2019

‘We’ being?

LeoNatan25 · on June 28, 2019

Maybe "we" should not be writing our server code using a terrible ecosystem? Just an idea.

halfmatthalfcat · on June 28, 2019

Node has a great server-side ecosystem. I'd go as far as saying Node lends itself to certain layers of the backend stack because of it's history being more CPU-efficient then other server-side runtimes.

Can you provide any examples where Node does not provide a good ecosystem for backend programming? Genuinely curious.

smt88 · on June 28, 2019

People have written about this at length many times, but I'll rehash some of the arguments.

Node is missing very basic features, leading to a million tiny modules like leftpad. That's annoying by itself because there's overhead in finding the best package for everything.

The other consequences of that issue are much worse. You have lots of easy opportunities for misleadingly-named packages with malware in them. Package hijacking is easier to miss. Security updates, if packages even get them, have to be applied seemingly every hour.

It's just a major problem to have a simple project with 500 dependencies.

And that doesn't even get me started on issues with the mess that module importing was/is.

halfmatthalfcat · on June 28, 2019

I agree that JS is missing a lot of standard library features that libraries like lodash or underscore have tried to backfill and the proliferation of libraries in npm is a cause for concern but is that enough to call an entire runtime's ecosystem obsolete or unwanted?

I just find it hard (however biased I may be) that we can just say "Node has all these problems, let's throw it out" when it's helped lower the barrier to entry for new people programming at-large.

smt88 · on June 28, 2019

I actually disagree that it's lowered the barrier. JS is a hard language with lots of "wtf" behavior and weird runtime errors. You need lots of discipline to write code without tons of runtime errors, and beginners don't even know what they need to be disciplined about.

I'd argue that JS raises the barrier because lots of beginners now think it's normal to have a bizarre type system where null is an object, to have a complex Webpack/Babel setup just to use new-ish language features, and to have to constantly hit "run" to debug code.

Most previous-gen web languages also have lots of gotchas, but something like Go, Kotlin, or Dart would be a much more sane beginner language.

krapp · on June 28, 2019

>I just find it hard (however biased I may be) that we can just say "Node has all these problems, let's throw it out" when it's helped lower the barrier to entry for new people programming at-large.

This seems to assume that it would be impossible to design a better package manager which also provides a low barrier to entry.

Bear in mind, however, that all you actually need to write javascript and learn programming with it at a basic level is a text editor. The low barrier that NPM provides is for publishing to NPM, which needn't be synonymous with "javascript development."

Your comment also seems to assume that the barrier to entry everywhere else is prohibitively high, but plenty of people still learn with Ruby or Python or other languages. There are whole industries for teaching new programmers.

tinus_hn · on June 28, 2019

JavaScript has the functionality this module provides. You just have to know to use it instead of adding another dependency.

smt88 · on June 28, 2019

Node was far behind the ECMAScript standard for a long time, as were browsers.

kotxig · on June 28, 2019

"Backend programming" is a really broad class. It's safe to say that computationally intensive problems aren't best solved in node. For many server side "web service" type applications node is sufficient but there are a lot of tradeoffs being made that you may or may not be aware of.

By choosing node you are choosing single-threaded async IO as your primitive. You have to be ok without a threading model that supports shared memory (v8 objects don't pass between isolates, only certain types pass between web workers). And what's more, you have to be certain that your problem will fit these choices in the future.

Is it ever a good idea to hamstring a backend service with one narrow concurrency model? By comparison, golang offers you all of those things. If you need to write a backend service that performs well and can respond well to changing requirements, you have a much better chance of doing it with golang than node. I just can't promise any individual developer that they will enjoy writing the golang service more than the node one.

I don't want you to think I'm saying golang is better than node, we haven't even started to talk about the module system. I'm just saying fundamentally it is more flexible and can do everything that node does with at least the same level of performance, and generally it is "faster". Most things out of the box are faster. Benchmarking and performance is in the DNA of the go community and this isn't true for the node community.

_revy · on June 28, 2019

> By choosing node you are choosing single-threaded async IO as your primitive

In node is it very easy to fork processes and have them communicate. Scripting languages for serving HTML and JSON aren't the place for threads.

Node now offers sync versions of most IO functions as well.

laurencerowe · on June 28, 2019

Running multiple processes doesn’t solve the head of line blocking problem where a single large request causes all the small requests behind it to pile up increasing their latency too. When you have thread pools this is not a problem as each request is processed by the next available worker.

JS has nice fast runtimes, and crucially the ability to share code between client and server. I think there’s potential (even without threads) to have a JS server architecture that does not require the entirety of a request/response to be processed in a single thread but could break it up so it could be processed by a pool of workers at IO event boundaries. Either as one isolate per request or explicitly passing state for IO event callbacks.

kotxig · on June 28, 2019

You just don't have shared memory at your disposal by forking processes. For example, having a rather large dictionary of translation strings which are read-only and static, you have to load these in every process as there is no copy on write property to these "forks".

If your JSON payloads are large, you can't have a forked process do the deserialization of it (because you would need to re-serialize across the communication boundary, which defeats the point).

I think it would be reasonable to have immutable data structures shared with web workers. I also think it would be reasonable to make it possible to pass complex objects (all primitive types supported by JSON) between workers without a serialization step.

zaarn · on June 28, 2019

Node isn't great when you need not only CPU-efficient but also multithreaded. The multi-threading or even multi-processing story in node is a clusterfuck. Other than running two full fat Node.js processes, your options are limited to a few APIs that have more chance to break your kneecaps than run the code.

halfmatthalfcat · on June 28, 2019

Sure but that's the whole premise of Node, there doesn't _need_ to be multithreading. Whats wrong with spinning up multiple workers (in something like Docker/Swarm/K8S) that split up the work and report back to whatever collector or service that wants the end result?

So many problems have spawned out of the desire to thread everywhere that the safety of single-threadedness is a feature not a bug.

zaarn · on June 28, 2019

Multithreaded (or Multiprocess) can be done safely.

I write a lot of stuff in Rust these days, where (unless you use unsafe) it doesn't even let you do anything that would break under multi-threading, even if you don't use it.

As a result, it was incredibly easy to write a light-weight crawler that delivery results into a main worker queue and the data is processed by <cpu_cores> threads. The crawler largely uses async for IO advantages but the worker threads rely heavily on data crunching where your throughput is limited by CPU not by IO and async suffers a bit.

And since crawler and worker are on the same process, the latency between a page being grabbed and starting to be processed is incredibly low, any kind of networking would likely cut throughput by a factor of 10 or more. Not even mentioning having to serialize the data instead of relying on references to get read-only copies and only making minimal copies on the stack.

halfmatthalfcat · on June 28, 2019

Ok, fair point Rust makes multithreading easier but one could easily make the argument that fighting Rust's borrow checker is a huge burden to development.

My point is Node isn't as bad as a server-side language as people claim. There are certain use cases where you can leverage Node's strengths and use another language somewhere else where its shines more.

I use Node heavily in my backend stack and have been extremely pleased with it's performance and library ecosystem.

zaarn · on June 28, 2019

You can do the things you do in Node, Async, in Rust too.

Futures and friends are in nightly and soon in Stable, which enable you to mix the best of both world; Multithreaded and Async.

Once you get used to Rust, you fight the borrow checker a lot less.

hombre_fatal · on June 28, 2019

You can level "just use Rust" at any non-Rust ecosystem. It's kind of tired advice. You might as well be recommending Haskell or any other hobby horse language and then argue from a standpoint of technical superiority as if that's the only concern when choosing tools.

I use Rust all the time yet I'll still use Node for most HTTP-based applications and tools I start, and not because I'm an idiot or beginner, as many comments here will suggest. They are completely different languages and ecosystems and workflows that satisfy different goals.

zaarn · on July 1, 2019

This isn't "just use rust". This is "just use something more mature". And there are plenty of more mature ecosystems developing Async capability or that have a good Threading story.

marktangotango · on June 28, 2019

Whats wrong with spinning up multiple workers (in something like Docker/Swarm/K8S) that split up the work and report back to whatever collector or service that wants the end result?

The "spinning up multiple workers" is what's wrong with that. Specifically a language/runtime that does support multi-processing handles it for you; ie gomaxprocs. Another example would be servers like apache forking off several workers during start up (depending on the mpm module). With node this has to be managed manually, as you describe, with Docker or whatever.

With a node async server, you're generally not cpu bound, so there's really no reason to try to utilize all the cores on the machine. Except if you are cpu bound, then you'd want to do that.

HereBeBeasties · on June 28, 2019

If you think there's nothing wrong with all the complexity that comes with running a distributed system for something that could be done trivially in a single process in a sensible language, I would like to sell you this hammer - you will find it very useful when you next need to screw something together.

novaleaf · on June 28, 2019

assuming your code doesn't actually need to be multithreaded, but you want to take advantage of multicore:

1) you can use a launcher like pm2 to auto-fork your node app

2) you can horizontally scale (put your servers behind a load-balancer)

if you do need multithreading for computations, you are right that node probably isn't a very good choice, but you have some options:

1) use worker threads

2) offload your intensive computations to a separate process written in a different language.

zaarn · on June 28, 2019

Even without multithreading, Rust will soon regain advantages with Async being stabilized. You get the best of both worlds.

rhinoceraptor · on June 28, 2019

It's also a lot harder to write, and in most cases developer time costs a lot more than another instance or two.

zaarn · on July 1, 2019

It's not that much harder to write if you use modern practises and languages.

_revy · on June 28, 2019

> two full fat Node.js processes

Do you know about copy on write?

zaarn · on June 28, 2019

Starting up a second Node.js process gives you no CoW advantages, that would occur if you fork off the original node process. But Multiprocess support in Node is piss-poor at best.

_revy · on July 6, 2019

> that would occur if you fork off the original node process.

Right, so why wouldn't you? Node has a "fork" where you can give it a path to a script to run. Yeah it has to JIT it, but the V8 and Node part isn't duplicated.

zaarn · on July 7, 2019

Because running fork() in Rust,C&Friends is more efficient? I don't have to reload the code image at all and I instantly share all data, including any configuration.

Plus you can more easily do shmem in those languages than JS which gives you another performance advantage over pipes.

spion · on June 28, 2019

https://nodejs.org/dist/latest-v12.x/docs/api/worker_threads...

zaarn · on June 29, 2019

Have you considered that workers in JS absolutely suck?

spion · on July 2, 2019

Have considered you that suck conditions race more with threads and shared memory parallelism?

If your counter-suggestion is Rust, have you considered.unwrap() that.unwrap() thinking.unwrap() about.unwrap() managing.unwrap() memory.unwrap() all the time might not be the wisest choice and that a GC is a useful abstraction to have?

zaarn · on July 3, 2019

unwrap() can very very often be avoided in favor of a simple ? operation, the amount of nesting you suggest does occur... but not in a single function so it's spread over many lines and functions. There are GC libraries for Rust if you need them. Or Python/Lua bindings that have it too.

Have you considered that avoiding race conditions isn't hard and you can't avoid having to avoid them even in JS?

spion · on July 3, 2019

"Race conditions of threads with shared memory parallelism" - yes you can avoid those in JS, the "yield" points are obvious in all code (await). Although its there if you need it (SharedArrayBuffer)

Regarding whether its hard or not, yes, I have some first-hand (non-Rust) experience.

On second thought I agree with your original point. Node isn't great for CPU-efficient + multithreaded. But

* most platforms aren't (either not great because unsafe, non-cpu-efficient or both) * most of the time its not what you really need.

So a good question would be, what would you use shared memory parallelism for in typical back-end programming?

zaarn · on July 3, 2019

I've developed plenty of shared memory software. In 99% of cases, a well placed mutex lock will solve the problem entirely. Of the remaining percent, maybe 5% are the type where you can just not lock and it'll work out sufficiently often. Another 45% might benefit from designing a channel/queue. The reamining 50% need a specialized lock to run well.

But all shared memory problems can, without much difficulty, be solved by using a mutex. It might not be best performance but you'll likely not need that much performance in the first place (after all, you're coming from the JS ecosystem).

Memory barriers aren't particularly hard to understand either, I've written some systems using them that ran very stable. Same for lockless or atomic or reentrant algorithms. It's all very fun and doesn't take that much skill if you're willing to read into it.

spion · on July 3, 2019

And plenty of people have said otherwise, and have developed entire languages and formal method frameworks to better reason about these problems. (e.g. TLA+)

So anyway... what would you use shared memory parallelism for in typical back-end programming?

zaarn · on July 4, 2019

You use the advanced methods to squeeze out more performance, it is not necessary to actually build solutions.

There is plenty of reasons for shared memory in back-ends, workers queues with zero-copy messaging would be one example that a lot of applications can benefit from.

spion · on July 4, 2019

Zero-copy message queue would be an unnecessary micro-optimization for most backends I've worked on. I don't think its something you would normally do in typical back-end programming, unless your scale is bonkers-level or you're creating infrastructure for others to use (maybe if you're doing analytics for others? except there are other alternatives there...). Its definitely not a niche targeted by node.

zaarn · on July 5, 2019

I don't think it's a microptimization, it's not that terribly complicated if you are careful about the code you write and it's definitely fun.

novaleaf · on June 28, 2019

Agree! There's a package for just about anything you could want. If you are in a small shop (I'm a 1 person team) it's so invaluable.

NPM's module selection is so wide that it's sometimes hard to find the best one. Too many choices! I would rather have that problem than lack of any choice.

Overall I'd guess to get a functional product, NPM's modules saved me about 50% of my time.

Personally I love C#, but can't use that for backend because of the lack of module ecosystem.

pjmlp · on July 1, 2019

Better read on on NuGET, FAKE, COM and Assemblies.

skellera · on June 28, 2019

Just curious, what is missing from C# for you?

novaleaf · on June 28, 2019

just the lack of module ecosystem.

skellera · on June 29, 2019

Is that not nuget?

lmm · on June 28, 2019

> I'd go as far as saying Node lends itself to certain layers of the backend stack because of it's history being more CPU-efficient then other server-side runtimes.

If you're talking about async I/O, TCL (among others) has offered that for more than 20 years (from AOLServer on), and in a much better-designed language than JS.

spion · on June 28, 2019

Is the design better in some fundamental way, or is it just about equality comparison and other shallow "wtfs"?

blacksqr · on June 29, 2019

I'm not a Node expert, but my understanding is that if any code throws an error, the server crashes and has to be restarted. That prompted a serious WTF from me when I read about it.

While Tcl is running as a server in an event loop, if an error is thrown while processing an event it is reported via a background error API, and the server keeps going.

spion · on July 2, 2019

That is true, and node's official advice in this regard is bad. It comes from previous bad designs of error handling and leaking resources (file descriptors mostly) caused by domains. However, modern node code typically uses promises and you can use slightly more old-fashioned methods of managing resources: https://stackoverflow.com/a/19540907/110271

You can go a step further and fully isolate "crash-assuming" code from "non-crash-assuming" (promise based code): https://gist.github.com/spion/ed8deb7a3b4add0a6d727dc78fe635...

lmm · on June 30, 2019

I don't know what you consider a deep part of programming language design. E.g. I would consider prototypal inheritance and non-lexical scope to be deep language design mistakes in JavaScript. But even beyond that TCL is a particularly elegant design (rather lisplike in a way) - e.g. Guido van Rossum cites it as an inspiration.

spion · on July 2, 2019

Why do you believe that JavaScript isn't lexical? My understanding is that it is. Do you mean the behavior of `this`? Its similar to Lua in that `this` is actually an implicit function argument (passed left of the dot) but thats definitely very different from say, dynamic scope

Prototypical inheritance isn't a "deep" design mistake in so far that its not actually used as such (its mostly used as classical inheritance). Classical inheritance could be argued a design mistake, but one that most mainstream languages make, so I wouldn't go as far as to call it a deep one.

lmm · on July 8, 2019

> Why do you believe that JavaScript isn't lexical? My understanding is that it is. Do you mean the behavior of `this`? Its similar to Lua in that `this` is actually an implicit function argument (passed left of the dot) but thats definitely very different from say, dynamic scope

`this` but also hoisting - variables get hoisted out of blocks where they are declared, so their scope is not what it looks like lexically.

> Prototypical inheritance isn't a "deep" design mistake in so far that its not actually used as such (its mostly used as classical inheritance).

I'd argue that even if it's not being used it still adds complexity when reading or debugging.

spion · on July 8, 2019

Hoisting is still lexical, except the scope is function level. But the point is moot, since we have `let` which behaves "normally" now.

`this` is not non-lexical scope, its argument passing. Its the most confusing wart in JS but its actually not a fundamental design flaw, just a syntactic wart: https://gist.github.com/spion/7180482

> I'd argue that even if it's not being used it still adds complexity when reading or debugging.

I disagree, it barely adds a couple of bits of noise here and there. Definitely not a deep flaw.

The worst JS flaws are in its anemic standard library and the pre-promises backend standard library included with node. They are overdue for a refresh.

pjmlp · on July 1, 2019

Node being more CPU efficient than Java, .NET, C++, Go?!?

rubber_duck · on June 28, 2019

>Can you provide any examples where Node does not provide a good ecosystem for backend programming?

A decent URL parsing/matching lib not tied to a full web framework ? Disclosure - haven't done serious node work in a while but ~2 years ago when I was building something in with it all the published libs for this were complete garbage - looking at the code for the packages made me realize what sort of crap gets published to NPM and then actually used/referenced by community.

I think it's better than PHP (took on a PHP integration job to help out a friend in trouble recently - that was going back to the dark ages with package management and module system). But it's the same tier in my eyes - filled with clueless newbies trying to lead other newbies - a community of blind leading the blind.

In my eyes a sane/mature language like Python beats Node any day of the week quality/productivity wise and so does stuff built on top of JVM/CLR. As much as C#/Java devs like to obfuscate code with useless patterns I'd take a random C#/Java package over a random Node package any time - at least the developer managed to convince the type system to compile his code - wouldn't say that the authors could do the same for some of the NPM packages I saw.

jeffschofield · on June 28, 2019

FYI a URL parser was actually added around 2 years ago in v8.0.0. It's based on the WHATWG standard and matches the implementation in modern browsers.

https://nodejs.org/api/url.html#url_url_parse_urlstring_pars...

rubber_duck · on June 28, 2019

Good to know. Maybe if I got back to it the core is more functional now.

My impression is that it's soo easy to publish to NPM and the ecosystem is newbie friendly - this is good but it also creates a lot of noise and popularity/usage is not a valid indicator of reliability/quality (compared to other platforms). And because the core was so barebones you need to use third party solutions a lot more often so it compounds the issues.

toastal · on June 28, 2019

Any type system that allows null or any shouldn't be trusted any more than a dynamic language

rubber_duck · on June 29, 2019

It's not so much that I trust in the type system to prove correctness it's more that the type system and build system setup is a barrier to entry that puts a threshold on newbies publishing packages. Personally I've only seen such low quality code published and used in Node and PHP communities - because of the missing standard library and newbies creating noise with creating low quality libs or not knowing how to select/filter those and increasing their popularity. For example I recently had to deal with a grunt build system (could not chose a different task runner) and the quality of those plugins ...

kotxig · on June 28, 2019

Indeed. It's hard to avoid that conclusion, but you can't deprecate things overnight.

drngdds · on June 28, 2019

Yeah, I don't know why npm defaults to including everything in the folder for the production build instead of having it be a whitelist. It's not that hard to specify "include package.json, package-lock.json, and ./build" in your package.json.

kotxig · on June 28, 2019

It promotes certain developer-friendly ideals such as going into the module directly within node_modules or using npm explore. I think this is the wrong priority.

halfmatthalfcat · on June 28, 2019

That's why it's a good idea to leverage bundlers and simply roll all of your desired dependencies into a single file and then just execute Node against that.

You can easily do this with Webpack.

edit: To clarify I'm talking about server-side components.

andrew_ · on June 28, 2019

For anyone reading this comment, investigate Rollup. My own anecdotal experience is that it kicks webpack's behind when it comes to bundling/packaging server-side anything, or anything that'll be run via `node`

tracker1 · on June 28, 2019

`npm ci` will use a local cache, and only pull specifically identified versions.

MehdiHK · on June 28, 2019

Probably relevant-

This week's Node.js weekly [1] mentions: "Last week we mentioned the long awaited status of npm 6.9.1 and the possible ‘strike’ [2] in ongoing community work on the project, but npm’s Isaac Z. Schlueter has stepped up, got a release out"

[1] https://nodeweekly.com/issues/294

[2] https://gist.github.com/aeschright/8ed09cbc2a4aee00fcb4ad350...

ch4s3 · on June 28, 2019

Well, they did just fire a bunch of people for complaining about working conditions, so it's no surprise that they're stumbling into other mistakes.

_Marak_ · on June 28, 2019

NPM still has bugs from v5 which haven't been addressed and are ignored by it's maintainers. Problems with `npm link` were never fixed. There is a discussion on the NPM tracker where the maintainers tell Substack that his ( very correct ) assessment of an issue was wrong and lock the thread.

It's always the same usual suspects at NPM never accepting responsibility or blame for multiple incidents and ongoing quality issues.

NPM is currently on it's last legs before acquisition or buyout.

shados · on June 28, 2019

npm link almost feels like a troll attempt. Running 2 packages locally together is one of the most basic features I can think of, and it works so poorly.

mercurysmessage · on June 28, 2019

Found this out not long ago on my own. Wish I would have known link was so bad before I tried it.

msoad · on June 28, 2019

Yup! Even this "amazing" bug was reported and was ignored by the team before:

https://npm.community/t/npm-6-9-1-is-broken-due-to-git-folde...

samirm · on June 28, 2019

If they are on the verge of a buyout, I hope Microsoft gets them.

gigatexal · on June 28, 2019

I think this probably happens more often in other package repositories than we hear about it’s just that NPM is gigantic and people like to hate on JavaScript.

I’ve done stupid things too. I once committed the s3 keys to our public repo and had to explain why our s3 bill was so high. Never made that mistake again.

lmm · on June 28, 2019

Having submitted packages to Maven Central, NPM seems distinctly lax by comparison. No federation and no package signing speaks to an organisation that makes poor choices rather than one that's just been unlucky.

pluma · on June 28, 2019

Unlike Maven Central or almost every other package registry in the world, NPM is operated by a venture capital founded startup (npm Inc).

So, yeah, their priorities are very different and that leads to questionable technical decisions.

A number of prominent ex-employees have collaborated on a new, federated package repository called Entropic. Frankly, I hope they succeed and npm Inc meets its inevitable fate.

awinder · on June 28, 2019

This doesn’t count? https://blog.npmjs.org/post/172999548390/new-pgp-machinery

gigatexal · on June 28, 2019

That low hanging fruit should be addressed. Having hashes of the data attached to the package that could then be checked on download would be nice. But it wouldn’t have fixed this adding of the .git folder.

lmm · on June 28, 2019

Sorry, I wasn't very clear about the split between my two sentences. Maven central runs a battery of tests on packages before accepting them into the repository, which I suspect would catch this kind of issue. In parallel, I believe NPM is genuinely a worse-run repository than those of other languages, because of their stance on those low-hanging fruit, even though as you say those low-hanging fruit are not relevant to this particular issue.

tracker1 · on June 28, 2019

What I really wish that NPM would do, as an initial security addon would be to load each of the JS files and check for all the IP addresses and URLs in the module, and include that in the package information. If it's not an API client, then one can/could be very leery of any addresses and could flag any packages that add/change them between releases, or maybe require a major version bump.

afuchs · on June 28, 2019

A basic search could find strings that look like network addresses, but those addresses are easy to obfuscate.

Any algorithm that can detect every possible obfuscated address would have to include a solution for the halting problem. So detecting every single address that might be encoded in a JS file is impossible, an undecidable problem.

tracker1 · on July 1, 2019

could load the module in an isolated container and run it... does it try to make network connections or dns queries?

jtdev · on June 28, 2019

Or... maybe those other package repos are actually better managed? And maybe, just maybe, JavaScript might have actual, material language and ecosystem issues.

krapp · on June 28, 2019

> I think this probably happens more often in other package repositories than we hear about it’s just that NPM is gigantic and people like to hate on JavaScript.

This is Hacker News... if other package managers had issues more often than the circus of errors that is NPM, we would definitely hear about it.

We don't, because NPM is qualitatively worse than all of them.

wwright · on June 28, 2019

I believe NPM is at least an order of magnitude bigger than every other package repository, for what it’s worth.

stepbeek · on June 28, 2019

To its credit? Anecdotally, I feel like I run into way more abandonware on npm registry than on say maven central.

tracker1 · on June 28, 2019

Spelunk through nuget sometime... ;-)

goranmoomin · on June 28, 2019

This is Hacker News... if other package managers that HN hates had issues more often than the circus of errors that is NPM, we would definitely hear about it.

We don’t, because HNers hate NPM the most than all of them.

krapp · on June 28, 2019

We don't hear about those other issues as often because those issues don't exist as often.

It would only take one user to report on the constant failures of any other package manager. As much as people here do hate NPM (because they have to use it,) a vituperative community like HN still wouldn't pass up the chance to rant about anything else.

thrwaway48373 · on June 28, 2019

Ouch. How long did it take you to figure out what happened, and how high was your bill? How many people abused it, do you think (just one?) and any idea for what?

gigatexal · on June 28, 2019

Just one day. Someone uploaded thousands of junk and since we had versioning it took a long time to delete. And I think we were being billed extra on the high load being given to our bucket. So could have been multiple someones. I revoked the keys, deleted the bucket, even my own account, and reset everything. Lucky for me the CTO was understanding. I don’t remember the exact bill amount but it went up in relative terms by something like 2x

thrwaway48373 · on June 28, 2019

Oh, that sounds like you really avoided something much worse that could have happened. I thought you meant like a bill a couple of orders of magnitude larger than you ever paid normally. Thanks for the answers!

Good on you for catching it relatively quickly.

smurv1337 · on June 28, 2019

Been there done that. I Think we were charged some $6000, not my proudest moment ...

gigatexal · on June 28, 2019

Oh man I think had our bill been that high I’d have been fired for sure. We were a very cash strapped small op at the time.

carlosperate · on June 28, 2019

I'll never understand the idea of punishing employees for honest mistakes. Investing the time and money to hire a competent developer, and their ramp up to full productivity is definitely more expensive than that. Not only that, but the new person might make the same or similar mistake in the future. By being understanding you can save money and have the peace of mind that this mistake won't happen again.

gigatexal · on June 28, 2019

Well there are good managers out there that follow this.

firethief · on June 28, 2019

cargo won't publish with a dirty git working directory unless you force it, so that wouldn't have happened with Rust (and .git/ is excluded implicitly).

fs111 · on June 28, 2019

Which other package repos are those supposed to be?

gigatexal · on June 28, 2019

Maybe github’s repo push is going to fix things.

rolltiide · on June 28, 2019

The interesting thing about semi-decentralized technologies is that every issue seems like a clusterfuck but it actually gets fixed fairly quickly and nobody in the future has to think about it at all

choeger · on June 28, 2019

More interesting question: How many hip startups could not build/test/ship their latest iteration yesterday because their whole CI pipeline depended on npm working?

gridlockd · on June 28, 2019

What CI pipeline depends on the latest release of any piece of software working?

Updates can always break stuff.

TeMPOraL · on June 28, 2019

A dumb one that pulls fresh copies of all packages off the Internet every time a CI job is run?

There seems to be a few of these around, judging by the noise made every time Github suffers a brief outage.

y4mi · on June 28, 2019

Oh, No. That's a different issue.

If GitHub is unavailable, the repository itself is unreachable. A lot verify hashes from the repo.

If the latest release of one package is broken and your ci breaks, then you really have a problem... You'd have to disregard lock files and go out of your way to reinstall everything to latest

oarsinsync · on June 28, 2019

> If GitHub is unavailable, the repository itself is unreachable.

If a central repository is unavailable, a decentralised version control system should continue to function. If developers are creating fragile tooling that centralises a by-design decentralised service, that feels like a flawed decision.

eeZah7Ux · on June 28, 2019

> hip startups could not build/test/ship their latest iteration

This sounds like a feature.

Sahhaese · on June 28, 2019

I see it's been fixed by releasing 6.9.2.

Does this mean you can now "brick" people's projects by sneaking npm@6.9.1 into their package.json?

KhoomeiK · on June 28, 2019

Certainly seems possible... Looks like I probably won't be installing anything with npm until they unpublish it.

Ajedi32 · on June 28, 2019

I don't think so. NPM packages aren't installed globally by default, they're scoped to the local project via `node_modules`. So by default, the version of npm on the user's PATH wouldn't get updated.

anc84 · on June 28, 2019

The majority of npm based installation instructions I encounter "out there" are using "npm install -g".

hombre_fatal · on June 28, 2019

Well, that would be up to the user to decide. And you don't generally install project deps that way beyond some dev-deps like linters and test runners.

WorldMaker · on June 28, 2019

Even that guidance has changed with the cultural acceptance of `npx`. At this point the only thing that should be `npm install -g` is npm itself, and that only depending on your philosophy of how current you need to keep npm versus the version that ships with your NodeJS install.

bassman9000 · on June 29, 2019

Everyone installs npm globally, because the whole model is broken. Which is why we keep all node env in a Docker image.

z3t4 · on June 28, 2019

These sort of bugs are hard to find with unit tests. But it's still easy to test, to make sure it never happen again. I think it's called integration tests. It can also be done with live testing, by checking all packages that lands on the public repository - if they include .git folder, passwords/keys, or other credentials.

mrolla · on June 28, 2019

Bug aside, it baffles me that it's a real person publishing rather than CI after tagging.

majewsky · on June 28, 2019

I don't know about their process, but I always make my release tags myself. I wouldn't want to give the signing key to any CI system.

WoahNoun · on June 28, 2019

That seems fundamentally backwards. The CI system should do the tagging. Allowing manual tagging introduces intentional or unintentional malfeasance in shared projects.

javagram · on June 28, 2019

Manual tagging is the best way for most projects to do stuff like sign the package using an offline hardware key.

Putting your keys on CI makes you vulnerable to your CI being hacked, which anecdotally seems to have happened to several projects.

mrolla · on June 28, 2019

I am very reluctant myself, but I think at some point you have to compromise. You can self host and go the hard way, trust some 3rd party CI (reputation is key) or go, if available, with the one from your cloud provider. This is exactly why we went with Codebuild at some point. It's not great, but hey we are not giving it anything more they don't already have (we host everything on AWS).

nullwasamistake · on June 28, 2019

Software this popular doesn't even run a test that installs, upgrades, then uninstalls before deployment? Isn't that the most important thing the software needs to be able to do? Does NPM test anything at all before they release? Another reason to switch to yarn.

fps_doug · on June 28, 2019

Welp, not that our product is that popular, we're selling in one country only, but there are exactly zero tests. Everything is manual. And I think this is still how the majority of projects out there do it. Not that this is great, but management, and a lot of devs don't see too much value in it, more like "nice to have if there's extra time".

not_kurt_godel · on June 28, 2019

That is not how the majority of professional software projects do things, and even if it were it doesn't matter because npm is not your average software project anyway. It is absolute clown status that they don't have integ tests to catch this sort of thing, and adds yet more doubt in my mind about the wisdom of using it (on top of an already considerable doubt pile).

fps_doug · on June 28, 2019

Define "professional". If "professional" == "projects that have tests" you're probably right. ;-) Otherwise, you might've gotten lucky with your previous jobs/open source projects you contributed to. The best I've seen so far at the 4 jobs I had in my life were that some, mostly smaller, components had test suites, then others some half-arsed attempts that stopped being maintained a year or two ago, and as soon as some test stopped working because something got refactored, it was simply removed if it couldn't be fixed within a couple minutes. My previous job probably had the best test coverage, but still not integration tests. Maybe it's because people slowly become more conscious about the importance.

mikekchar · on June 28, 2019

I'm still amazed that there are companies that actually test diligently (I work at one now -- I sometimes complain about our tests, but in general they are pretty darn good). Literally only 20 years ago there were a lot of places you'd be laughed out of if you suggested writing automated tests for all of your code (or even any of your code ;-) ). The first time I did TDD seriously, I thought, "I'd better hang on to this job. I'll never get another one like it". But, it seems that there are groups that are doing it, and even people who think you aren't professional unless you are doing it. It's down right amazing!

TeMPOraL · on June 28, 2019

To drop another n=4 anecdote, yes, all places I worked in were more-less like that.

(One reason why I dislike the terms like "professional software $something" or "industry standard" is because while they connote quality, they're defined by "as done by people who are paid to do this" and "the popular thing among people who get paid for this work", respectively. The two viewpoints - quality vs. what professionals do - are almost completely opposite in practice, given the shit show our industry is.)

Sahhaese · on June 28, 2019

I prefer the term "best practice" to capture what companies should do but typically don't adhere to all best practices for everything.

aprdm · on June 28, 2019

How do you know how majority of it isn't like this ? Having a few years of experience and having worked in 10+ companies across different industries I would be surprised if it wasn't the case.

emerongi · on June 28, 2019

Pretty sure it is how the majority of software projects are doing it, but I agree that NPM is not your average software project.

gridlockd · on June 28, 2019

Dude, I'm pretty sure the only people that have all this "best practice CI mumbojumbo" are people on HN.

Also, they use it not because they know that it saves them time, but because they're afraid other HN people like you will shame them if they don't.

TheSpiceIsLife · on June 28, 2019

Your comment would be better without the first word, it’s unnecessary.

That aside, I often think and read about the Toyota Production System, and notice how the industry I work in, structural steel fabrication and erection, a subset of the construction industry, has a tenancy to me nothing like it.

It’s interesting that software production has similar failings. Just recently another building in Sydney suffered serious structural faults[1]; the 737 Max is a disaster; the MacBook Pro keyboard is an unmitigated dumpster fire that, let’s be serious, got Jony Ive sacked; Tesla and Musk are pathologically deceptive.

I’m beginning to think we’ve lost the ability to build anything of genuine quality.

In my opinion we have reached and passed Peak Design and Manufacturing.

I could be wrong. What genuinely quality and durable things do humans make?

gridlockd · on June 28, 2019

> Your comment would be better without the first word, it’s unnecessary.

You couldn't be more wrong. The word sets the tone and frame for the entire comment. It is, in fact, the most important word of the whole comment.

> In my opinion we have reached and passed Peak Design and Manufacturing.

The second+ generation of any product is often worse in many respects, because the companies figure out which corners they can cut without losing the customer.

> What genuinely quality and durable things do humans make?

Humans don't really care much for durability, perhaps because they're not durable themselves.

bassman9000 · on June 29, 2019

How old is the product? Have you done any decent refactoring?

floatingatoll · on June 28, 2019

All autoupdate tests will eventually fail to prevent a self-breaking autoupdate from affecting one or more users. The trick is to prepare effectively for that scenario ahead of time. For example, homebrew’s “brew doctor” command has existed for a long time, but in recent years it now comes with a litany or helpful text and other commands (including autoupdate code logic) reference it when they encounter errors.

eeZah7Ux · on June 28, 2019

> doesn't even run a test that installs, upgrades, then uninstalls before deployment

That would be what Debian does - automatically.

Not hipster enough.

de_watcher · on June 28, 2019

> Another reason to switch to yarn.

Another reason to stop making dozens of different package managers. Or dozens copies of literally anything.

Can you people just talk and work together? These package managers aint rocket science.

Look at GNU/Linux: there are only 2 or 3 variations for everything.

diggan · on June 28, 2019

> Can you people just talk and work together

This is a general problem in technology, not just about package managers

> Look at GNU/Linux: there is only 2 or 3 variations for everything.

Not true, there is probably hundreds of versions of everything (probably not bad in itself) but 2-3 popular ones that you know about.

The reason why we keep seeing new package managers is because people think they can do it better than the existing stuff. And I think it's a necessary step to improvement, but some of the suggested solutions will be crap. But then sometimes something really good comes along and the ecosystem self-corrects.

pluma · on June 28, 2019

There are literally only two package managers for the npm registry right now. This is akin to apt vs dpkg.

There used to be bower, which installed via GitHub directly and has been dead for ages. There are now also several experimental alternatives but they have basically no traction.

The newest contestant is Entropic, which was built by former npm developers as a reaction to the problems of a for-profit venture capital funded startup running a centralised registry and controlling the client.

So for the Linux comparison (ignoring the actual implementations): yarn and npm are apt and dpkg, Entropic is yum and Bower is pacman. Not exactly proliferation if you consider how many packages there are on npm right now.

sime2009 · on June 28, 2019

> Look at GNU/Linux: there are only 2 or 3 variations for everything.

Mostly false.

For package management on Linux I can think of 4 just off the top of my head:

* apt

* rpm

* apk

* pacman

There are bound to be more and this is not counting other approaches like AppImage and Snap.

de_watcher · on June 28, 2019

Totally different.

He said "switch from npm to yarn". I can't imagine the phrase "switch from apt to rpm". The difference here is that there is the whole layer of distribution maintainers who do the actual work to ensure compatibility and quality.

AppImage and Snap are different beasts emerged from the need to bundle proprietary software. There are around two of them like I've said.

pluma · on June 28, 2019

Okay, how many node package managers do you think there are for the npm registry?

Bower? Nope, it used github.

Entropic? Nope, federated as an alternative to npm Inc's monopoly.

JSPM? Nope, it loads modules directly off the web.

Pnpm? Ah, yes, this one is actually a replacement for the npm client like yarn although with much less traction.

So we have npm, yarn and pnpm. Compare that to apt, aptitude, apt-get, dpkg, synaptic, gdpm, gnome-apt, dselect, wajig, PackageKit, kpackage and gdebi. Sure, most of these are GUI tools or somehow build on low-level tools like apt-get but the point is that "Linux doesn't suffer from fragmentation but JS package management does" is absurd.

EDIT: This doesn't even go into there being at least three popular mutually incompatible package systems for Linux (Arch/RedHat/Debian) and every distro having its own registry (or "repository").

Fnoord · on June 28, 2019

Emerge/portage, nix, ...

Even macOS already has Homebrew (Linuxbrew is the Linux equiv), Fink (derived from APT), and MacPorts on top of the App Store and package managers for programming languages like NPM (NodeJS), Cargo (Rust), Gem (Ruby).

A nice command to upgrade all of these is Topgrade [1] which does all the upgrades for you, including even Tmux plugins.

Of course it comes at the same price of a rolling distribution. Things will break occasionally.

As for NPM since the company behind NPM is commercial, and it is centralized, it shouldn't be a surprise that there's alternatives.

[1] https://github.com/r-darwish/topgrade

Vogtinator · on June 28, 2019

rpm and apt act on two different layers, either you replace apt with dpkg or rpm with dnf/yum/yast/zypper

benbristow · on June 28, 2019

Don't forget portage for Gentoo and zypper for OpenSUSE

eeZah7Ux · on June 28, 2019

> Another reason to stop making dozens of different package managers.

Even better: decouple what is generic / OS specific / language specific into libraries and formats and reuse them.

I've worked on many package managers and it's astonishing how every new language reinvents the wheel and makes the same mistakes again.

People ignore the history of software packaging every day.

Already__Taken · on June 28, 2019

lol what? 2 or 300 maybe. rpm, Deb, apt, apt-get, APK, nix, aptitude, dpkg, aur. some of these are different, or the same, or not. dammed if I can tell. hopefully containers eat them all.

mustardo · on June 28, 2019

Fuckit.JS to the rescue?

https://github.com/mattdiamond/fuckitjs/blob/master/README.m...