It's just what the Node path resolution standard uses, which is also how Local Modules (the "/src/node_modules" structure) allows you to import other files with clean paths, without having to add a config in every tool of the toolchain, post-install symlinks, or any other non-crossplatform tricks. It just works because it's literally what Node uses to resolve paths, and all build tools are based on Node, so when you stick to the standard resolution, you can add new tools to the toolchain without needing a bunch of configs for them to find your files. For example, it's now also the default resolution in Typescript as well.
The only time /src/node_modules doesn't work is when tool goes out of its way to break it, and wrongly assumes that node_modules can only ever be used for external thirdparty code (e.g. Jest).
So best of luck to make Node + NPM + Yarn to agree on a new path resolution syntax, but I hope we won't end up with another tool-specific resolution that only works in Yarn.
As an aside, you can also use lerna, yarn workspaces or pnpm workspaces to achieve the same effect, depending on your package manager of choice. You might get additional boosts to code organization/productivity, it's explained in the links.
The fallback will kick in when the package that makes the request isn't in the static resolution table. Since those packages aren't part of the dependency tree we assume their dependencies have been installed independently, hence the fallback.
That said, I think the use case described by the parent post is neatly solved by the `link:` protocol, which basically 'aliases' a module to a specific name.
On that note though, it seems at first glance like there'd be no way for Yarn to tell if a require is overriding another, since (maybe I'm reading the paper wrong) the require speed improvements come from skipping the native module resolution.
Let's say I have a package and my source is set up with the following files:
And let's say I then install leftpad through the proposed system.
If I call ``require('leftpad');`` from main.js, what will get returned? Will Yarn know to use the local version of leftpad instead of the installed one?
If I'm understanding you correctly, this would break, but I'd use the Yarn-specific "link" system to fix it?
However, I'm increasingly finding that sub-modules are a bit of a pain. If I patch a sub-module, I have to move into every single project that depends on the sub-module and pull the latest version. Additionally, if the sub-module is large it's a real waste of storage on my computer.
That said, monorepos have annoyances of their own e.g. many modern package managers will happily check-out dependencies directly from Git. However, that's simply not going to work unless there's some standardisation in monorepo structure that the package manager is able to interpret.
The biggest downsides are really around tooling: we've had tons of issues with our continuous integration environment (Travis CI) and build times trying to build only specific sub-projects using Lerna. The Github Pull Reviews and Issues pages can get pretty messy too, though we have a process to automatically add Github tags based on changes via Lerna.
I suspect this is why bigger companies seem to be more successful with monorepos than smaller ones, because they have the resources to invest in building their own tooling.
https://blog.ffwll.ch/2017/08/github-why-cant-host-the-kerne... is another good one. You can build issue/PR tools treating a monotree as containing multiple repositories; that's how Google and Facebook do their work (including checked-in OWNERS/MAINTAINERS files).
- monorepo. Versioning over entire repo. node modules are used for granularity, separation of concerns, etc.
- git submodules. Versioning over individual node modules. node modules are used for sharing code.
Nothing stopping you from using both.
Also, silly semi-technical point: I have never, ever, had a bad experience using git submodules, but many others have. I like submodules, but often work in teams where otherwise technically good people really hate them due to bad prior experiences.
I use the 'monorepo' pattern.
- /packages is my own modules - it's entirely full of interesting project-relevant things.
- /node_modules is third party modules installed by yarn, and symlinks added to your own modules by yarn. I leave it collapsed.
It works well.
The biggest hurdle we constantly face is tools that make assumptions about the environment (eg: Flow, TypeScript, Webstorm, etc), so changing anything to module resolution breaks everything. We have to spend a lot of time hacking it in every single time. Sometimes the hooks just aren't there (TypeScript has a hook for overriding module resolution, but editors often don't give you an easy way to hook into TypeScript's own hooks).
Any thoughts on how things would work for tools if this was to go forward? Would they all need to be updated to honor this new way to resolve modules?
I guess the same could be done for other tools: the .pnp.js file can be easily interfaced with anything you throw at it without them having to care about the way the dependencies are resolved under the hood. Even non-js tools can simply communicate with it through the cli interface, which returns JSON data ready to use.
My goal is to be better at debugging development environment issues.
To be more specific, I am asking this from the perspective of a user of the node ecosystem who has repeatedly had the experience of running into problems with my development environment or packages and not really knowing what to do to get myself un-stuck. I can totally search google/stack overflow/github for an answer and sometimes there is a clear one. Sometimes I fond a command to run, sometimes I don't. In either case, I come away with a strong feeling that I've not actually learned anything. If the same thing happened again, I might remember what to google for next time, but I wouldn't be able to reason clearly about what I'm working with. In the past when I run into these issues, I've just thrown time at them until something sticks. That kinda sucks.
I'd like to find a way to put in some work and come away with a solid understanding. When you were starting this project, how did you go about building up your knowledge of the underlying abstractions that npm/yarn/webpack/etc. use?
When you see a misconfiguration or get a bug report with this project, how do you go about investigating it?
Have you ever seen a really good explanation of a bug/misconfiguration which helps the reader build a solid mental model?
This is obviously wrong, so we'll soon go to a model where we "unplug" the packages and put them into a specific directory (a bit like the node_modules folder, but entirely flat). The feature itself is already there (`yarn unplug` in the rfc), but we need to wire it to the install process.
Ideally, I think it would be beneficial for the ecosystem to move away as much as possible from postinstall scripts - WebAssembly became a very good solution for native libraries, and as the added benefit that it works everywhere, including the web.
Native modules exist because people sometimes need to drop down to work with the platform directly.
The reason we abstracted the static tables behind a JS api is precisely to make it easier for everyone to experiment with various implementations - some might use the internally embedded data as we do, some could consume a package-name-maps file, and some could do something entirely different. As long as the contract described in the document is fulfilled, everything is possible.
Nohoist was a bit of a hack from the beginning (precisely because the hoisting was anything but guaranteed), so some incompatibility might happen for packages that cannot work without.
That said, I'm not too worried: we've tried a bunch of the most popular open-source packages, and they all seemed to work fine - the one issue was on an interaction between create-react-app and eslint, but there's discussions going on to fix that in a clean way.
Our nohoist section currently looks like this in order to get things to behave:
Would it make sense to not bother with hoisting of ejected modules when using YPnP, given that YPnP solves the same problems as hoisting in a more global way? We'd be able to get rid of the nohoist section in that case.
The second step converts the "unqualified paths" into "qualified paths", and is basically the index.js/extensions resolution. We currently access the filesystem in order to resolve them (just like Node), because we didn't want to store the whole list of files within our static tables (fearing that it would become too large and would slow down the startup time because of the parsing cost).
So to answer your question: we get rid of the node_modules folder-by-folder traversal, but decided that the extension check was an acceptable tradeoff. We might improve that a bit by storing the "main" entries within the tables, though, which would be a nice fast path for most package requires.
While there isn't an official build at the moment, the code PR includes a prebuilt version of the current branch, and we have a playground repository to experiment with it.
They'll be published as separate packages once we merge the PR.
From the PDF:
"On the long term we believe that post-install scripts are a problem by
themselves (they still need an install step, and can lead to security issues),
and we would suggest relying less on this feature in the future, possibly
even deprecating it in the long term. While native modules have their
usefulness, WebAssembly is becoming a more and more serious candidate
for a portable bytecode as the months pass."
WebAssembly is not "more and more a serious candidate" to replace native modules.
The issue with post-install scripts needs a better long-term solution, but simply deprecating native modules is not it.
As for wasm, I'm curious to hear what you think isn't good enough. I think the two main issues are garbage collection and dynamic linking, and there's ongoing work on them to fix them.
It's not fantastic if you're working on an Electron app where there might be one specific feature that requires a native module in order to hit an OS API that you can't do from a browser sandbox (and that Electron itself doesn't expose).
Rather than pursue total deprecation/eventual removal of postinstall/native modules, I think a package.json "allowNativeDependencies": "never" | "devOnly" | "always" option that defaults to "devOnly" if not specified is the way to go.
perhaps yarn could even develop official packages which include native code that wraps all the most relevant OS APIs, and the default setting of the flag would allow only those native packages.
this is all pretty far down the road anyway.
Sure, but you also mentioned in the PDF that the other solution was just to deprecate native modules in the long term, and that's not acceptable.
"As a data point, we encountered no problem at Facebook with adding --ignore-scripts to all of our Yarn invocations. The two main projects we’re aware of that still use postinstall scripts are fsevents (which is optional, and whose absence didn’t cause us any harm), and node-sass (which is currently working on a WebAssembly port)."
I think you're too quick to sacrifice native modules because you're not really using them.
Anyway, regardless of my own long term feelings, rest assured that postinstall scripts will be supported as well as they are now.
Filesystem access etc. has to, at some point, cross the barrier from VM to native code. Sure Node.js has file-system access (and other stuff) built in, but willingly moving in a direction that's non-extensible seems odd.
I know you've since said you'll just pull native modules out of the cache, and I'm perfectly happy with that. However, please don't overlook the importance of native interoperability because the code you tend to write abstracts this away from you. Native interoperability is what makes Node.js.
It's very neat, but I can't see it fully replacing native modules soon.
EDIT: just saw a comment by the author that says roughly this: https://news.ycombinator.com/item?id=17977986
So, native packages will keep on working.
edit: found the repo, seems a bit behind yarn's effort and not yet beta status. https://github.com/npm/crux
As much as I hate node_modules, there are times when I want to see how a library is implemented. Is there a way to have some libraries in node_modules? Say only the ones in listed in the package.json file
For a work project, I'm updating a dependency written against an undocumented specification. Much fun is being had doing this, I assure you. Being able to crack open the sources in node_modules/, instrument things with logging, and fix fix fix until I get something working is very helpful. I'm sure I can track things down in a cache directory and do the same, but it's nice knowing that my changes are limited to just this particular project, and that a checkout/reinstall of the module in another project will get the original broken sources for comparison.
`yarn unplug` will eject a dependency from the cache and put it into the project folder. That's a quick and easy way to inspect and debug libraries you use.
I don't think node_modules is perfect, and I get why it gets hate, but IMHO the algorithm is actually kinda nice for nesting packages.
If you set up your folders like so:
You can require your helper in main.js with a simple ``require('utils/helper.js');``
What's nice is that you can't do the reverse. So your helper doesn't get the same access to your main.js. I use this a lot for testing - it means that my tests can require my components using the same syntax that I use in normal code, but those components don't have special access to the tests.
A big "aha" moment for me with node_modules was figuring out that it's an entirely separate system from npm. It's not trying to build a flat package structure; it's trying to make it easy to on-the-fly set up packages from anywhere.
Edit: example - https://gitlab.com/dormouse/dormouse/tree/master/src
I've also gotten into the habit of checking my installed packages into source for projects that I don't expect users to rebuild (ie. websites, games, etc...). That's a much longer conversation, but it mitigates a large number of issues with node_modules on long-lived projects.
I wish this was true. My workflow from back in the early days of node has always been to `npm install` external dependencies, then `npm link` (symlink) the dependencies I'm working on. But npm >= v5 removes such symlinks if I `npm install` anything afterwards. I spend a significant amount of my time re-linking what npm install unlinked.
Usually when I hit a problem like this it's because I'm things have moved on and I'm doing something wrong. But when an npm developer says "consider npm as it is to be broken" and closes the issue , I'm not so sure.
Node's module resolution has nothing to do with npm. Npm is a package repository and installer built on top of node_modules. The reason your system broke is because npm cleared out your node_modules folder as part of its install. And from the sound of things on your linked issue, the devs are entirely aware of the problems this behavior causes and are planning to fix it.
An interesting exercise that I highly encourage people to do if they're finding this weird is to take a weekend and build their own version of npm just to demystify what's going on with it. It's not that hard to do, Node gives you all the tools you need - in its simplest form you need to curl whatever packages you want to install, and stick them in a node_modules folder. Then you need some way to track which packages you've downloaded. Node handles all the rest.
That npm occasionally breaks Node behavior is bad - I've been on the wrong end of those regressions as well and it was super frustrating. But that doesn't really have anything to do with Node, it just means the npm team needs to test their releases more.
It angers me how many dependencies very simple projects amass.
pnpm saves one version of a package only ever once on a disk and node_modules consume drastically less space
but this concept is fresh, pnpm already works;)
You just need to forgo all the conveniences the ecosystem provides and write stuff yourself instead.
Node.js or the browser have everything you need.
Hopefully yarn does a better job at validating dependencies than early Maven 1 and 2 did.
Having never used Maven, any chance you could point to a document explaining how Maven's cache approach works, and maybe expand on the similarities between that and Yarn's RFC?
People have already mentioned native modules. Install scripts are a nuisance, but exist for reasons. If you remove support for them – provided this project takes off, which I suppose it will because bandwagons – you risk people performing install-type tasks on first module load, potentially creating an even worse situation. Has this been considered as a risk?
My hope is that PnP proves that this approach can work and raises the stakes for Node to implement APIs that would allow for a better integration. It's a long term plan and we have yet to prove ourselves on the long run, but I'm quite confident :)
> If you remove support for them
I don't think we'll ever remove support for them. I'll personally advocate for them not to be used because of their subpar developer experience, but apart from that it won't cost us much to support them.
This will also break for various packages due to fs.readFile* dependencies, gyp, and other things. If your dependency tree is 4k+ node modules, the "vanilla" yarn or npm resolution and reconciliation techniques are already so brittle, that changing them will undoubtedly break things.
I've been thinking about unifying our current ivy+yarn+bower setup, but haven't yet gotten much past thinking about it...
"Efficient: One version of a package is saved only ever once on a disk. So you save dozens of gigabytes of disk space!"
Without digging, I imagine this will be more like Maven's cache.
NPM's design decision to flatten the version hierarchy baffled me. And has occasionally tripped me up.
Does any other language have a solution to this?
In my experience, C# libraries tend to be more averse by default to taking on extra dependencies, but that's in part because .NET already does so much work for you. Python is a bit less averse, but certainly not to the level of JS where you can easily end up with hundreds of nodes in the dependency graph. But then Python isn't used much for client UI code.
Typescript is one of the very few node modules that is very self contained. You install babel, webpack and eslint and that’s easily over 1k packages.
So yes, js ecosystem is a nightmare for security folks since anyone of those thousands of packages could access filesystem, network and create Backdoors.
Our express site got hacked because one of the sub dependencies was compromised.
Seriously stay away from using nodejs to serve production traffic for serious projects using glued packages. If you want to do it, use extremely thin, well vetted packages and be very mindful of upgrades.