Thing is, node_modules isn't just thirdparty modules.
It's just what the Node path resolution standard uses, which is also how Local Modules (the "/src/node_modules" structure) allows you to import other files with clean paths, without having to add a config in every tool of the toolchain, post-install symlinks, or any other non-crossplatform tricks. It just works because it's literally what Node uses to resolve paths, and all build tools are based on Node, so when you stick to the standard resolution, you can add new tools to the toolchain without needing a bunch of configs for them to find your files. For example, it's now also the default resolution in Typescript as well.
The only time /src/node_modules doesn't work is when tool goes out of its way to break it, and wrongly assumes that node_modules can only ever be used for external thirdparty code (e.g. Jest).
So best of luck to make Node + NPM + Yarn to agree on a new path resolution syntax, but I hope we won't end up with another tool-specific resolution that only works in Yarn.
This doesn't break that, it specifically says it will fall back to Node's module resolution algorithm when the module you're looking isn't in the static resolutions table. That means you can keep using that technique as you have bee.
As an aside, you can also use lerna[1], yarn workspaces[2] or pnpm workspaces[3] to achieve the same effect, depending on your package manager of choice. You might get additional boosts to code organization/productivity, it's explained in the links.
> when the module you're looking isn't in the static resolutions table
The fallback will kick in when the package that makes the request isn't in the static resolution table. Since those packages aren't part of the dependency tree we assume their dependencies have been installed independently, hence the fallback.
That said, I think the use case described by the parent post is neatly solved by the `link:` protocol, which basically 'aliases' a module to a specific name.
I think that approach is really good. If a package doesn't fit into the static resolution table it could cause a lot of problems to point its dependencies in that direction, especially since node has no issue allowing you to override local modules and reuse the same names when you're deeper.
On that note though, it seems at first glance like there'd be no way for Yarn to tell if a require is overriding another, since (maybe I'm reading the paper wrong) the require speed improvements come from skipping the native module resolution.
Let's say I have a package and my source is set up with the following files:
----
/-->package.json
/-->src--->node_modules--->leftpad.js
/-->src--->main.js
----
And let's say I then install leftpad through the proposed system.
If I call ``require('leftpad');`` from main.js, what will get returned? Will Yarn know to use the local version of leftpad instead of the installed one?
If I'm understanding you correctly, this would break, but I'd use the Yarn-specific "link" system to fix it?
I don't maintain any monorepos, I've always used Git sub-modules; not just for Node.js, but for all sorts of stuff.
However, I'm increasingly finding that sub-modules are a bit of a pain. If I patch a sub-module, I have to move into every single project that depends on the sub-module and pull the latest version. Additionally, if the sub-module is large it's a real waste of storage on my computer.
That said, monorepos have annoyances of their own e.g. many modern package managers will happily check-out dependencies directly from Git. However, that's simply not going to work unless there's some standardisation in monorepo structure that the package manager is able to interpret.
Here's a good article. https://danluu.com/monorepo/ It cites simplified organization, simplified dependencies, superior tooling, and easier cross-project changes.
https://blog.ffwll.ch/2017/08/github-why-cant-host-the-kerne... is another good one. You can build issue/PR tools treating a monotree as containing multiple repositories; that's how Google and Facebook do their work (including checked-in OWNERS/MAINTAINERS files).
Monorepos are excellent if you have a group of related projects, especially if they depend on each other. Instead of opening a series of PRs for each project in a specific dependent order, you can open a single PR and move everything forward in lock-step, ensuring that every commit / tag has all projects in the monorepo working together.
The biggest downsides are really around tooling: we've had tons of issues with our continuous integration environment (Travis CI) and build times trying to build only specific sub-projects using Lerna. The Github Pull Reviews and Issues pages can get pretty messy too, though we have a process to automatically add Github tags based on changes via Lerna.
I suspect this is why bigger companies seem to be more successful with monorepos than smaller ones, because they have the resources to invest in building their own tooling.
Excellent question! I use both regularly so are qualified to answer:
- monorepo. Versioning over entire repo. node modules are used for granularity, separation of concerns, etc.
- git submodules. Versioning over individual node modules. node modules are used for sharing code.
Nothing stopping you from using both.
Also, silly semi-technical point: I have never, ever, had a bad experience using git submodules, but many others have. I like submodules, but often work in teams where otherwise technically good people really hate them due to bad prior experiences.
I'm the author behind the proposal, feel free to ask me any question you might have!
As a personal note, I'm super excited and so grateful to have had the chance to work on this project - node_modules have been a longstanding thorn in the side of the Javascript ecosystem, and to finally have a chance to try something completely new is amazing.
This is awesome. I didn't read the full proposal yet, but at first glance it's very close to how we do things where I work (because we needed something long before Node even existed, and never replaced it).
The biggest hurdle we constantly face is tools that make assumptions about the environment (eg: Flow, TypeScript, Webstorm, etc), so changing anything to module resolution breaks everything. We have to spend a lot of time hacking it in every single time. Sometimes the hooks just aren't there (TypeScript has a hook for overriding module resolution, but editors often don't give you an easy way to hook into TypeScript's own hooks).
Any thoughts on how things would work for tools if this was to go forward? Would they all need to be updated to honor this new way to resolve modules?
Flow is already 100% compatible with PnP, through the use of the `module.resolver` configuration settings. We even saw some slight perf increases after switching in.
I guess the same could be done for other tools: the .pnp.js file can be easily interfaced with anything you throw at it without them having to care about the way the dependencies are resolved under the hood. Even non-js tools can simply communicate with it through the cli interface, which returns JSON data ready to use.
This is a bit of a tangent, but do you know of good tutorials, blog posts, videos, or books which help build a mental framework of how javascript packaging works internally?
My goal is to be better at debugging development environment issues.
To be more specific, I am asking this from the perspective of a user of the node ecosystem who has repeatedly had the experience of running into problems with my development environment or packages and not really knowing what to do to get myself un-stuck. I can totally search google/stack overflow/github for an answer and sometimes there is a clear one. Sometimes I fond a command to run, sometimes I don't. In either case, I come away with a strong feeling that I've not actually learned anything. If the same thing happened again, I might remember what to google for next time, but I wouldn't be able to reason clearly about what I'm working with. In the past when I run into these issues, I've just thrown time at them until something sticks. That kinda sucks.
I'd like to find a way to put in some work and come away with a solid understanding. When you were starting this project, how did you go about building up your knowledge of the underlying abstractions that npm/yarn/webpack/etc. use?
When you see a misconfiguration or get a bug report with this project, how do you go about investigating it?
Have you ever seen a really good explanation of a bug/misconfiguration which helps the reader build a solid mental model?
I quickly read the paper and I wonder if there's anything in place to deal with packages that generate side effects via postinstall. Can we at least manually create a list of them in package.json or something so that Yarn copies them into node_modules like before?
Postinstall scripts are the main issue, yep. Right now the current implementation doesn't do anything special with them, meaning that they are installed inside the cache (except when they're disabled altogether, which often works well enough since there isn't that many packages that require postinstall scripts).
This is obviously wrong, so we'll soon go to a model where we "unplug" the packages and put them into a specific directory (a bit like the node_modules folder, but entirely flat). The feature itself is already there (`yarn unplug` in the rfc), but we need to wire it to the install process.
Ideally, I think it would be beneficial for the ecosystem to move away as much as possible from postinstall scripts - WebAssembly became a very good solution for native libraries, and as the added benefit that it works everywhere, including the web.
Someone working on the wasm will likely be able to tell you more, but from what I understood dynamic linking is on the table[1]. I didn't quite say that it was ready yet, but rather that it was on the right path.
This seems to be solving a problem quite similar to the one of getting browsers to resolve non-relative imports (without file system access, they also need a precise map from module names to paths). Unfortunately, I can't remember who or what was working on that and I'm not sure how much progress it is making, but it'd be wonderful if this could somehow be framed in a way where it also helps that issue along. Are you taking this into consideration in the design?
I think you're referring to the package-name-maps[1] proposal. We've been made aware of it during the middle of the development, and while we chose to continue using the static data tables approach, they aren't incompatible.
The reason we abstracted the static tables behind a JS api is precisely to make it easier for everyone to experiment with various implementations - some might use the internally embedded data as we do, some could consume a package-name-maps file, and some could do something entirely different. As long as the contract described in the document is fulfilled, everything is possible.
Hey, I just saw a tweet about this, it looks super exciting. A question I have is how does this work with yarn workspaces? I have a monorepo setup and would love to try this, but I use both yarn workspaces and lerna to run commands across packages. Will nohoist be respected?
Workspaces are working just fine: instead of creating the symlinks we simply register them into the static tables, and resolve them from their actual location on the disk.
Nohoist was a bit of a hack from the beginning (precisely because the hoisting was anything but guaranteed), so some incompatibility might happen for packages that cannot work without.
That said, I'm not too worried: we've tried a bunch of the most popular open-source packages, and they all seemed to work fine - the one issue was on an interaction between create-react-app and eslint, but there's discussions going on to fix that in a clean way.
As a data point, my company is using yarn workspaces where one of the projects is an Electron app, and another has a Node server with native dependencies.
Our nohoist section currently looks like this in order to get things to behave:
These are likely going to require auto-ejection due to being native and having postinstall scripts.
Would it make sense to not bother with hoisting of ejected modules when using YPnP, given that YPnP solves the same problems as hoisting in a more global way? We'd be able to get rid of the nohoist section in that case.
The reason I use nohoist is because my monorepo has a react native package and web package inside it, I found that if I hoist react native stuff it doesn't resolve properly with the metro bundler, so I have to nohoist every react native package.
Another question—does this also resolve from directory paths to index.js, add missing extensions, and so on, or does it only map package names to directories? As in, does this get rid of all 'superfluous' file system access during resolution?
It works in two steps: the first step resolve to "unqualified paths" (which are just the paths within the cache without the index.js/extensions resolution). This step is entirely static, no filesystem involved here.
The second step converts the "unqualified paths" into "qualified paths", and is basically the index.js/extensions resolution. We currently access the filesystem in order to resolve them (just like Node), because we didn't want to store the whole list of files within our static tables (fearing that it would become too large and would slow down the startup time because of the parsing cost).
So to answer your question: we get rid of the node_modules folder-by-folder traversal, but decided that the extension check was an acceptable tradeoff. We might improve that a bit by storing the "main" entries within the tables, though, which would be a nice fast path for most package requires.
We'll see what the community response is, and once we're confident this is what everyone wants we'll merge it into master (PR is already up[1]).
While there isn't an official build at the moment, the code PR includes a prebuilt version of the current branch, and we have a playground repository to experiment with it.
Removing `node_modules` would be fantastic, but not at the expense of native modules:
From the PDF:
"On the long term we believe that post-install scripts are a problem by
themselves (they still need an install step, and can lead to security issues),
and we would suggest relying less on this feature in the future, possibly
even deprecating it in the long term. While native modules have their
usefulness, WebAssembly is becoming a more and more serious candidate
for a portable bytecode as the months pass."
WebAssembly is not "more and more a serious candidate" to replace native modules.
The issue with post-install scripts needs a better long-term solution, but simply deprecating native modules is not it.
I mentioned it in another answer, but we'll eject the packages that require postinstall scripts - so they will work as before, but will take extra time to be installed.
As for wasm, I'm curious to hear what you think isn't good enough. I think the two main issues are garbage collection and dynamic linking, and there's ongoing work on them to fix them.
Wasm is designed to deal with the VM sandbox, which if you're writing a server is fantastic compared to actual native code as there's better security properties.
It's not fantastic if you're working on an Electron app where there might be one specific feature that requires a native module in order to hit an OS API that you can't do from a browser sandbox (and that Electron itself doesn't expose).
Rather than pursue total deprecation/eventual removal of postinstall/native modules, I think a package.json "allowNativeDependencies": "never" | "devOnly" | "always" option that defaults to "devOnly" if not specified is the way to go.
i would say make it a boolean flag, default to false, with a comment that says "this must be enabled for packages which use the local filesystem or OS APIs". if wasm delivers on its potential, OS-level interactions would really be the only valid use case (and legacy code, which should be considered a little bit inherently unsafe anyway).
perhaps yarn could even develop official packages which include native code that wraps all the most relevant OS APIs, and the default setting of the flag would allow only those native packages.
"I mentioned it in another answer, but we'll eject the packages that require postinstall scripts"
Sure, but you also mentioned in the PDF that the other solution was just to deprecate native modules in the long term, and that's not acceptable.
"As a data point, we encountered no problem at Facebook with adding --ignore-scripts to all of our Yarn invocations. The two main projects we’re aware of that still use postinstall scripts are fsevents (which is optional, and whose absence didn’t cause us any harm), and node-sass (which is currently working on a WebAssembly port)."
I think you're too quick to sacrifice native modules because you're not really using them.
Isn't native interoperability the entire point of Node.js though?
Filesystem access etc. has to, at some point, cross the barrier from VM to native code. Sure Node.js has file-system access (and other stuff) built in, but willingly moving in a direction that's non-extensible seems odd.
I know you've since said you'll just pull native modules out of the cache, and I'm perfectly happy with that. However, please don't overlook the importance of native interoperability because the code you tend to write abstracts this away from you. Native interoperability is what makes Node.js.
Native code has two benefits over JavaScript: it's faster, and it can call the OS. WASM is faster, but it can't call the OS. It cannot replace all uses of native modules.
Are you describing something you wish that node had, or something that actually exists? As far as I'm aware, native modules are the FFI for node, and that every package that involves presenting an FFI to the developer is using a native module to do it.
Yeah, that's a native module. If you look at the C source code in "src", the "NAN_METHOD" and stuff are Node native module macros. node-ffi will not work if they remove native modules.
I don't think WAM achieve native performance yet. You need to copy data in and out of it instead of zero-copy. WASM can't link to native system libraries. WASM can't access most hardware.
It's very neat, but I can't see it fully replacing native modules soon.
Almost all the native modules I come across are about accessing system calls not available from within the sandbox. Only a few are there for speed, and since native calls add quite some overhead, they are only useful in specific use cases.
I didn't dive deeply enough, but I'd assume they could just fallback to classic node_modules behavior for the native packages, right? You'd not actually delete the node_modules directory then but still get most of the speed benefits.
npm announced crux yesterday, seems focused on the same problems. Approach seems slightly different, although there aren't as much technical details available for crux (unless someone has a better link):
This is effectively how pub, the package manager for Dart works. At version resolution time, we generate a manifest file that contains the local paths to every depended-upon package. Then, at compile/load time, tools just use that file to locate packages.
Very cool, does this mean you can check in the dependency file to version control? Just like yarn.lock?
As much as I hate node_modules, there are times when I want to see how a library is implemented. Is there a way to have some libraries in node_modules? Say only the ones in listed in the package.json file
Second this. I'd ask that any dependencies in node_modules/ override whatever is cached.
For a work project, I'm updating a dependency written against an undocumented specification. Much fun is being had doing this, I assure you. Being able to crack open the sources in node_modules/, instrument things with logging, and fix fix fix until I get something working is very helpful. I'm sure I can track things down in a cache directory and do the same, but it's nice knowing that my changes are limited to just this particular project, and that a checkout/reinstall of the module in another project will get the original broken sources for comparison.
The .pnp.js file can be checked-in, but needs the cache to work. Right now I'd advise not to check-it in.
`yarn unplug` will eject a dependency from the cache and put it into the project folder. That's a quick and easy way to inspect and debug libraries you use.
It doesn't have a lot of advantages right now. That said, it won't stay true forever. PnP is but the first step of a long-term goal I have. Check the Section 6.B from the whitepaper[1] for more details.
I've been trying to clean up my excess node modules and found that when building a web-socket app, you don't need express (native http works fine) and you don't even need socket.io (unless your project needs to work on IE 9 or less) It's hard to find examples so I wrote one: https://github.com/chrisbroski/bare-socket
Bringing packages into your work is your own choice. You should always be doing an evaluation of the benefit given by a package vs. the bloat it brings to your project. For many packages, it is worth it. But if you are in the habit of installing a package just to avoid writing a widget in your UI or writing some helper functions, then... yes, your projects will amass dependencies quickly.
My main complaint is probably about webpack and its dependencies. Unfortunately, it seems to be the accepted standard bundler. The benefits of hot module replacement are too nice to forego easily.
Not sure that webpack still depends on `is-odd` (at least not directly). Note that `is-odd` depends on `is-number` facepalm. Some people in this world seem to think their purpose is to create as many JS modules as possible.
Put your content inside of a nested node_modules folder and you won't have this problem anymore.
I don't think node_modules is perfect, and I get why it gets hate, but IMHO the algorithm is actually kinda nice for nesting packages.
If you set up your folders like so:
----
src--->node_modules--->utils--->helper.js
src--->main.js
----
You can require your helper in main.js with a simple ``require('utils/helper.js');``
What's nice is that you can't do the reverse. So your helper doesn't get the same access to your main.js. I use this a lot for testing - it means that my tests can require my components using the same syntax that I use in normal code, but those components don't have special access to the tests.
A big "aha" moment for me with node_modules was figuring out that it's an entirely separate system from npm. It's not trying to build a flat package structure; it's trying to make it easy to on-the-fly set up packages from anywhere.
I've also gotten into the habit of checking my installed packages into source for projects that I don't expect users to rebuild (ie. websites, games, etc...). That's a much longer conversation, but it mitigates a large number of issues with node_modules on long-lived projects.
> A big "aha" moment for me with node_modules was figuring out that it's an entirely separate system from npm.
I wish this was true. My workflow from back in the early days of node has always been to `npm install` external dependencies, then `npm link` (symlink) the dependencies I'm working on. But npm >= v5 removes such symlinks if I `npm install` anything afterwards. I spend a significant amount of my time re-linking what npm install unlinked.
Usually when I hit a problem like this it's because I'm things have moved on and I'm doing something wrong. But when an npm developer says "consider npm as it is to be broken" and closes the issue [1], I'm not so sure.
To repeat: a big "aha" moment for me with node_modules was figuring out that it's an entirely separate system[0] from npm.
Node's module resolution has nothing to do with npm. Npm is a package repository and installer built on top of node_modules. The reason your system broke is because npm cleared out your node_modules folder as part of its install. And from the sound of things on your linked issue, the devs are entirely aware of the problems this behavior causes and are planning to fix it.
An interesting exercise that I highly encourage people to do if they're finding this weird is to take a weekend and build their own version of npm just to demystify what's going on with it. It's not that hard to do, Node gives you all the tools you need - in its simplest form you need to curl whatever packages you want to install, and stick them in a node_modules folder. Then you need some way to track which packages you've downloaded. Node handles all the rest.
That npm occasionally breaks Node behavior is bad - I've been on the wrong end of those regressions as well[1] and it was super frustrating. But that doesn't really have anything to do with Node, it just means the npm team needs to test their releases more.
while it would be great to have a standard way to do absolute imports, I almost like how it force people into more sensible project structures and splitting large projects into multiple packages. The "smell" of the ../../../ encourages better practices, so it's not all bad IMO.
I just saw two comments comparing Yarn's RFC to Maven - yours and one over on Lobsters.
Having never used Maven, any chance you could point to a document explaining how Maven's cache approach works, and maybe expand on the similarities between that and Yarn's RFC?
Nice work – what's being done to make this the default behavior in node and deprecate node_modules proper, or is this forever destined to live in the awkward place of userland-but-not-really kind of territory?
People have already mentioned native modules. Install scripts are a nuisance, but exist for reasons. If you remove support for them – provided this project takes off, which I suppose it will because bandwagons – you risk people performing install-type tasks on first module load, potentially creating an even worse situation. Has this been considered as a risk?
> what's being done to make this the default behavior in node and deprecate node_modules proper
My hope is that PnP proves that this approach can work and raises the stakes for Node to implement APIs that would allow for a better integration. It's a long term plan and we have yet to prove ourselves on the long run, but I'm quite confident :)
> If you remove support for them
I don't think we'll ever remove support for them. I'll personally advocate for them not to be used because of their subpar developer experience, but apart from that it won't cost us much to support them.
From what I understand: instead of cp -rf from offline cache, or ln -sf (which doesn't work for a large percentage of packages due to node being node), they propose to use a custom resolver to tap into the cache directly.
This will also break for various packages due to fs.readFile* dependencies, gyp, and other things. If your dependency tree is 4k+ node modules, the "vanilla" yarn or npm resolution and reconciliation techniques are already so brittle, that changing them will undoubtedly break things.
Currently I use npm for angular projects. Before writing even hello world , I have to perform npm install on the project and install so many module to serve up small web app. This seems like we are going in the right direction, cant wait to try yarn as package manager.
This is soon to be solvable by mounting the cache directory during the build, thankfully: https://github.com/moby/buildkit/pull/442 ("dockerfile: add run --mount support #442 ")
That is not something I'd want to do. Docker's solution is good enough for almost every build that happens. I can eat the time cost not covered by that solution and not have to worry about bugs stemming from improper copying of dep snapshots.
The problem in javascript seems more an ideological one where programmers instantly reach for a package that has multiple dependencies of which those dependencies have even more dependencies.
There's this culture of not caring about bloat it seems in the vast majority of javascript projects. left_pad comes to the mind as the poster boy for this stuff.
Yes and no. It's more a question of degrees and developer culture. I think JS just has a stronger "glue stuff together" mentality combined with the lack of a thorough standard API.
In my experience, C# libraries tend to be more averse by default to taking on extra dependencies, but that's in part because .NET already does so much work for you. Python is a bit less averse, but certainly not to the level of JS where you can easily end up with hundreds of nodes in the dependency graph. But then Python isn't used much for client UI code.
Part of the problem comes from using a popular package. That package could be importing 100’s of other things.
Typescript is one of the very few node modules that is very self contained. You install babel, webpack and eslint and that’s easily over 1k packages.
So yes, js ecosystem is a nightmare for security folks since anyone of those thousands of packages could access filesystem, network and create Backdoors.
Our express site got hacked because one of the sub dependencies was compromised.
Seriously stay away from using nodejs to serve production traffic for serious projects using glued packages. If you want to do it, use extremely thin, well vetted packages and be very mindful of upgrades.
It's just what the Node path resolution standard uses, which is also how Local Modules (the "/src/node_modules" structure) allows you to import other files with clean paths, without having to add a config in every tool of the toolchain, post-install symlinks, or any other non-crossplatform tricks. It just works because it's literally what Node uses to resolve paths, and all build tools are based on Node, so when you stick to the standard resolution, you can add new tools to the toolchain without needing a bunch of configs for them to find your files. For example, it's now also the default resolution in Typescript as well.
The only time /src/node_modules doesn't work is when tool goes out of its way to break it, and wrongly assumes that node_modules can only ever be used for external thirdparty code (e.g. Jest).
So best of luck to make Node + NPM + Yarn to agree on a new path resolution syntax, but I hope we won't end up with another tool-specific resolution that only works in Yarn.