The 50GB figure is the number of Node modules a developer has installed on their local machine that are duplicated between multiple Node projects/repos checked out on that machine. Even for the notoriously bloated JS ecosystem that seems well above average, even assuming any given developer has >1 project checked out at once.
For reference, I'm a developer primarily working in JS. I use an old 2017 MB Air with the smallest disk (120GB) and have many many Node projects checked out (including random GH FOSS I've contributed to once). I don't use pnpm & I've never had disk space issues.
Don't get me wrong, pnpm is cool. I've started trying it out and will likely convert a a lot of stuff. But 50GB is extreme even for Node.
I didn’t really find it clickbait. It’s not unusual for a complex react app to have 600+ mb in dependencies, and at work I have at least 50 repositories checked out at once. 50gb may be a bit of a stretch for most users but I imagine multi-gig savings to be fairly common! We’ve moved many of our projects over to pnpm.
Not really. Some dependencies can be heavy. As soon as your projects include, let's say, electron, you're having ~200MB consumed by a single dependency on top of anything else used by your project. And if you had your node_modules populated by older npm version, you may have multiple copies of that dependency within a single project.
Depends entirely on how many projects you regularly work with.
pnpm is objectively better than npm in many other way too though. It does all the things that npm finally realized they needed to do in version 8 by default.
> even assuming any given developer has >1 project checked out at once.
One? Oh lord you don't want to look at my work machine then. Probably a couple dozen projects pulled down, and once added, they aren't getting removed until I have to swap PCs.
I work in code analysis automation so I have way more than a couple dozen but the point is:
1. I'm not an average case.
2. Even I've never hit 50GB
3. (not mentioned in my original post but...) if you did reach 50GB that's likely going to be because you've a load of old projects lying around needing deleting. Using a new tool in other to retain that dysfunction isn't exactly a great recommendation.
Pnpm is great for other reasons: not recommending against it, just calling out the ridiculous title here.
> Believe it or not, this is basically how everything in the Python and Ruby worlds work
And those 2 paragraphs are not really true. Python and ruby environments don't exist "these days" - they've been available longer than npm existed. (virtualenv 2007, npm 2010)
The system/project split exists in the same way npm --global / npm exists. The only real difference is that you can't have different versions installed in the same environment at the same time - not the other things implied by the post.
The big problem for me is that python does not seem to have settled on just a single (or even obvious) way of dealing with this. Every project I have to figure out if I need to run setuptools, pip, virtualenv, etc.
That said, I did have issues with the same paragraph, because php’s composer has been doing it the correct way since forever.
> Every project I have to figure out if I need to run setuptools, pip, virtualenv, etc.
You're mixing layers. Pip uses setuptools to install packages inside a virtualenv. You need something that manages environment/dependencies and something that installs them. Sometimes they're the same thing, like with poetry or pipenv.
How do you know what to use? Read the project readme. Same as people choosing yarn or npm.
The worst part about all these node modules is the little small silly ones that do something really inane - like to just get the current year.
I said the same thing about some ruby gems years ago and thankfully that’s a little bit sane now.
I don’t use JS that often. But recently I looked at the dependencies for some library I was using and I was astonished at the literally hundreds of tiny modules that were being used.
And it gets even worse - those tiny little modules have their dependencies too.
Someone linked me to 1-liners[0], which is - you guessed it - a bunch of one-liners. I think it's nice to have as a reference. But a dependency? Really?
My least favorite is assign.
Not only does JavaScript feature that natively (though I suppose the library may predate widespread support for Object.assign), the 1-liner assign flips the order of the parameters!
And most of them are just straight-up pointless! Like, let's introduce a dependency for decrement lol
EDIT: In looking up whether the 1-liners assign predated widespread Object.assign support, I found that their implementation - confusingly named extend[1] at first - literally used Object.assign from the very beginning. And they still chose to mess with the parameter order. For shame lol
Not entirely unreasonable as all `number`s are floats by default in JS, but the implementation of the entire package (https://github.com/nefe/number-precision/blob/master/src/ind...) is less than 100 lines of code and actually contains a method called "minus".
The very worst JS packages I've seen have got to be is-odd and is-even. 430,796 and 202,268 downloads every week, I kid you not!
A project I've been working on has roughly 40 dependencies. If you run `npm install` it'll pull about 1050 npm packages.
Change one minor version number and everything breaks. Forget one dependency and npm will not tell you that a dependency is missing but instead it'll complain that Steam has a broken link in the home directory (this is a known open issue for years and the only two solutions are to uninstall Steam or to use a Docker container)
Needless to say I am not a fan of large web projects.
I don't understand why you're getting downvoted. I'm a JS developer (and framework developer) and what you describe is a serious problem. It's great that there's so many problems solved, but some stuff is just a one or two liner that should be in your own app's /lib, not a dependency.
This is one reason I like Deno's idea of having a standard library (I just wish Ryan would have proposed that for Node directly instead of creating a brand new runtime).
The solution to all this dependency mess if for NPM to make a standard library. Yeah that sounds crazy and weird. But they are in the best place to make a unified standard library for Javascript. This would bypass all the junk transitive dependencies and have more libraries rely on a centralised but standard library.
Anyone who's run a CI platform for more than a few devs and NodeJS projects quickly bumps into inode problems unless they thought about build server filesystems in advance. Very quickly you end up with hundreds of thousands of minuscule files filling up the disk.
Right - every single dependency adds potentially another human maintainer who, if bribed or threatened, could release an update that exploits your project.
Kind of yes... not all dependencies are direct for the app, a lot are just dev dependencies. Just to get eslint/prettier to warn, auto format and cleanup my code when I save a file, it is 13 direct dev dependencies in my project [0].
I often hear “why would people do this to themselves” and I look around and wonder what they’re talking about given I’m perfectly happy using it daily.
I’d rather use node with all its ugly bits than Python or .NET or C++ with all their ugly bits. I’d rather use Rust over any of them but rarely is it the right tool for the job with what I do.
js runs everywhere and is the most popular language. The package managers are maturing, as are the rest of the tooling. C# didn’t have nuget at first, Python didn’t have Pip, etc. these tools evolve as part of the ecosystem over time. Nuget didn’t reuse package references in a central cache until recently, and had pretty much the same issue as this. If you think this issue means JS isn’t worth using, that’s an interesting line to draw in the sand, but it doesn’t mean people who use JS are somehow suffering because of this and that the ecosystem is insane.
1. Looks like it attracts a lot of talent these days
2. It's basically so well designed that its smooth learning curve has enabled generations of people to develop cs skills/get jobs/build stuff independently
3. While this and the previous points are not here to state that the JS universe is perfect, I have seen far worse stuff haunting the industry in the past: Php and Java alone have produced thousands of so-called professionals who still have trouble distinguishing db from backend from frontend, and have no remote idea of what dependency management is whatsoever.
A different / additional thing you can do on a few systems is compress the node_packages specifically. Afsctool on macos, chatter +c on btrfs, folder properties on Windows - you don't have to have compression enabled on the whole drive to use that.
Since node_modules is mostly text, this has amazing results and can be applied to the deduplicated pnpm store as well.
I am just amazed that this is still an issue for JS when it has basically being fixed for other languages, being such a largely used language with such a fanatic user base claiming "its the best" I would expect this no longer being an issue nor all the security issues that come with it
It's not really "fixed" for other languages in general. The support in Node for multiple different versions of transitive dependencies is actually quite nice. In Python, for example, you simply can't have multiple versions of transitive dependencies, and this can lead to issues with commonly-used utility packages. I've seen issues like this come up with utility libraries like six or boto and its variants. Likewise with larger libraries like numpy.
As someone who's worked pretty heavily in both ecosystems – it's definitely not something I think about every day on the Python side, but Python dependency conflicts are very annoying... while in Node they're mostly not a big deal except in a small set of cases where peer dependencies show up.
I agree ist not entirelly fixed on any language yet as you mentioned Python has it not only well nailed plus using virtual envs plus virtual env manages (pipenv for example).
In java is basically a non existing problem, you CAN have dependency conflicts yes, nontheless dependency management is simple, and you keep everithing on a local central repo when using maven, which also provides a very nice dependency tree plus tools for filtering, whicg are nice, which of course you can also achieve with grep for even easier dependency conflict debugging.
Also using tools as dependencyManagement in maven allows you to replace all usages of a library across your entire application "at your own risk" which simplifies addressing security vulnerabilities
Until you have to figure out that the reason something doesn’t work is that dependency v1 is storing the data that dependency v2 is trying to use, and it complains about missing data that you are sure is there.
I very much enjoy having those issues up front, instead of at runtime.
Thanks, I wouldn’t have thought of sharing npm packages like that.
I would argue building different containers from a shared node_modules is inherently dangerous anyway. Sounds like your ”workaround” is in fact pretty much the optimal setup for quickly performing multiple similar builds.
You can also use yarn berry (version 2 and onward has this codename). It has a plug and play algorithm instead of node_modules, but you can also use it with a pnpm resolver if PnP breaks stuff which it sometimes does as many libraries assume node_modules exists.
Fwiw, it looks like you can use pnpm as nodeLinker with yarn berry. I’m a fan of yarn’s plugin support and the extensibility that provides, so I’ll likely be trying that with pnpm as the linker and see how that goes.
Why, exactly? I know Windows wasn't built to withstand hardlink-based attacks so it puts them behind admin permissions by default, but this seems like an excellent use case for hard links to me.
Hard links are not too bad of idea as long as they are done on files though they do have some downsides. There are some very good reasons why hardlinks on directories are not easy to enable. If something is scanning things recursively, they would end up in infinite loop if they are not keeping track of (disk,inode) tuples for parent directories, especially if they follow symlinks. IIRC they are explicitly disallowed in Linux, you would need to modify kernel source to allow them.
I have no clue why they are saying that they are hard linking directories, I sure hope they are only doing it on file level.
I've caused infinite scanning bugs to appear on accident with just soft links, though. I know soft links and hard links are processed differently at the lowest levels of the I/O stack, but I think there are few languages and run times where that's actually the default.
In most programming languages I've used on Linux, symlink expansion is on by default, creating all the problems you can think of.
Yes, you can use whatever API call readlink relies on to prevent loops, but you can also keep a set of inode numbers on a file system and stop processing on duplicates. In both cases you need to do all kinds of workarounds.
The best argument I've heard is that hard linked directories break the acyclic graph property of the file system but even there I'm not so sure if that's really a problem in an environment where most tools recurse into soft links anyway.
I suppose the C folks that like to do all the hard things themselves would get annoyed by having to add another check?
I don't think the kernel forbids directory hard links per se; NTFS has junction points which are somewhere between hard links and soft links for directories, and NTFS support made it into Linux. I don't know how the kernel exposes directory junctions in Linux, but they're different from soft links (in that they'll be processed in the server for SMB file servers, as opposed to symlinks which are resolved on the client).
Sadly my team had to go back to npm. Pnpm has issues in resolving dependences. Also the build box didn’t have it. There were enough quirks to make it hard to write scripts that ran smoothly between npm and pnpm.
The 50GB figure is the number of Node modules a developer has installed on their local machine that are duplicated between multiple Node projects/repos checked out on that machine. Even for the notoriously bloated JS ecosystem that seems well above average, even assuming any given developer has >1 project checked out at once.
For reference, I'm a developer primarily working in JS. I use an old 2017 MB Air with the smallest disk (120GB) and have many many Node projects checked out (including random GH FOSS I've contributed to once). I don't use pnpm & I've never had disk space issues.
Don't get me wrong, pnpm is cool. I've started trying it out and will likely convert a a lot of stuff. But 50GB is extreme even for Node.