Case Study: Npm uses Rust for its CPU-bound bottlenecks [pdf]

javagram · on March 3, 2019

Title could be improved, as I was wondering if the command line tool npm had itself been rewritten.

It’s actually one of npm’s web services that was rewritten.

FTA: “Java was excluded from consideration because of the requirement of deploying the JVM and associated libraries along with any program to their production servers. This was an amount of operational complexity and resource overhead that was as undesirable as the unsafety of C or C++.”

I find this comparison a bit odd. Even if not using containers, the JVM isn’t hard to deploy as distro package managers include it. Unless a team is managing servers manually rather than an automated tool this doesn’t seem that complex. Am I missing something here?

bloopernova · on March 3, 2019

Well, speaking from experience with a JBoss application layer in a recent software project I worked on:

New java versions do break existing libraries or apps, and need to be tested thoroughly. When the company hasn't budgeted for that expense, it becomes difficult to update.

Often an architect or software team will insist on using the Oracle JVM rather than the included openjvm. That adds extra steps to download, store as an artifact, distribute, verify, etc etc.

The people who wrote the build pipeline have since been laid off, and an updated set of libraries requires a lot of work to trace back through poorly documented and understood code to make changes.

(Not to disagree with you here, it's more that I'm trying to illustrate how, with poor foresight, Java dependencies can get difficult to manage)

paulddraper · on March 3, 2019

> New java versions do break existing libraries or apps

Have you used the Rust compiler? My experience tells me that any Github Rust project not updated in the last two years doesn't work with my Rust compiler. Whereas Java apps written a decade ago still compile and run on OpenJDK/Oracle often with zero or near zero changes.

> Often an architect or software team will insist on using the Oracle JVM rather than the included openjvm.

Yes, you could choose this. But if you have the ability to choose an entirely different language, I guarantee you have to ability to choose OpenJDK if that's what you really want.

> build pipeline , an updated set of libraries

Doesn't Rust have libraries?

In fact, if anything, Java libraries have a longer shelf life and greater compatibility because they don't require integration of build systems. Java libraries can (and do) use 5-year-old compilers, postprocess bytecode for perf (e.g. Hikari) or size (e.g. Proguard), and even use entire other languages like Scala and Kotlin. But the only thing you as the library user need is the JVM bytecode, which is still high-level enough to maintain runtime interoperability but sufficiently low-level enough to achieve strong build-time interoperability.

---

Much of the "organizational overhead" seems to come down to "I don't like managing the runtime". And that's not wrong; static binaries are nice.

But how much harder is it really to manage/isolate/version your JVM runtime for your server deployment, than manage/isolate/version your Rust compiler for your CI pipeline?

pcwalton · on March 3, 2019

> My experience tells me that any Github Rust project not updated in the last two years doesn't work with my Rust compiler.

Please file bugs against the Rust compiler then, because that would be a serious violation of the compatibility rules!

burntsushi · on March 3, 2019

It's definitely a thing. It's easier to see with applications that use a lock file rather than libraries that always try to fetch the latest dependencies. That's because there have been incompatible changes made, but within the "allowed breakage" policy. For example, try checking out 0.2.0 or 0.3.0 of ripgrep and building it with Rust 1.33. Both fail for different reasons. The former fails because rustc-serialize 0.3.19 was using an irrefutable pattern in a method without a body, and this was prohibited: https://github.com/rust-lang/rust/pull/37378 However, if you run `cargo update` first, then the build succeeds.

The 0.3.0 checkout fails because of a change in Cargo. Specifically, I think it used to define OUT_DIR while compiling the build script, but this was not intentional. It was only supposed to be set when running the build script. ripgrep 0.3.0 got caught here in the time after when the bug was introduced but before it was fixed. See: https://github.com/rust-lang/cargo/issues/3368 In this case, cargo update doesn't solve this. You cannot build ripgrep 0.3.0 with any recent version of the compiler. You need to go back to a version of the Rust distribution which ships a Cargo version that sets OUT_DIR when compiling build scripts. (Or, more likely, patch build.rs since it's a trivial fix.)

Of course, both of these things are fine on their own. But strictly speaking, from a user experience point of view, code does break from time to time. With that said, ripgrep 0.4.0 and newer all compile on Rust 1.33, and ripgrep 0.4.0 was released over two years ago. So, strictly speaking, it does show the GP is wrong in at least some circumstances. :-)

est31 · on March 4, 2019

This has less something to do with Rust itself but the ecosystem, but once I updated my system and obtained a newer version of openssl on my OS.

Sadly, some of my projects still used an older version of the openssl crate somewhere inside their crate tree.

The openssl-sys crate author chose to check that the native library version it was compiled against was below a certain version, so it broke. All requests by users to fix the legacy bug were deflected with the mantra "go update the openssl crate the fixed version is out since a year already"...

https://github.com/sfackler/rust-openssl/issues/994

https://github.com/sfackler/rust-openssl/pull/1001

paulddraper · on March 3, 2019

Unfortunately, I don't know enough to rewind history and do enough archaeology to find out whose "fault" it is.

But as a possible example, typemap [1] which is a library featured on Awesome Rust [2] ("curated list of Rust code and resources").

Project uses cargo. Has a lock file with two deps. Last code commit was May 2017. And I can't get it to compile.

[1] https://github.com/reem/rust-typemap

[2] https://github.com/rust-unofficial/awesome-rust

pcwalton · on March 4, 2019

There's no lock file in that project.

But moreover, you stumbled upon one of the worst cases, a soundness problem that has yielded a lot of discussion because of that crate specifically: https://github.com/rust-lang/rust/issues/50781

The issue here is that the typemap crate was found to be relying on a compiler bug. The compiler bug could be exploited to write transmute in safe code: in other words, all the safety guarantees of Rust go out the window unless we break that crate. It's a bad situation, but I don't know what we could have done differently. I think everyone agrees it's not worth sacrificing all the safety properties of Rust to keep typemap building.

steveklabnik · on March 4, 2019

What error do you get?

drbawb · on March 5, 2019

My project stopped building on compilers w/ the new module system. I did file an issue (56247, see also 56317) and was told it won't be fixed. The fix to my dependency was trivial, but it did stop my code from compiling with no change to my Cargo.toml/Cargo.lock, and ultimately it did require me to unpin that crate's version and update it.

My understanding was that by virtue of my project not having the edition 2018 key I would have been isolated from such changes to the language.

0815test · on March 3, 2019

> any Github Rust project not updated in the last two years doesn't work with the latest Rust compiler.

I don't think this is correct? Rust introduced an "editions"-based system in 2015, with the very aim of ensuring forward compatibility on a crate-by-crate basis. The aim is definitely that any crate written to be compatible with some "edition of Rust", whether 2015, 2018 or whatever, can be made to compile on a future version of the compiler.

db48x · on March 3, 2019

One possibility is that he used something written against the nightly Rust compiler. That code would be very unlikely to compile today.

0815test · on March 3, 2019

Maybe, but you have to specifically opt-in to the nightly toolchain in order to compile something that uses it. It's not something that's included by default, or appropriate for use in production scenarios (unless you're willing to deal with the resulting breakage, of course).

db48x · on March 3, 2019

Of course it's not suitable for production, but it was still quite common to look at a rust project and find that it used the nightly compiler.

paulddraper · on March 3, 2019

It's been awhile, but if memory serves tons of stuff was using the nightly compiler.

So, yes buyer beware. But when the whole ecosystem does it...

db48x · on March 4, 2019

Exactly. I think it's getting better now that some of the cool things people were waiting on have made it into the 2018 edition. Off-hand I know that Remacs switched from the nightlies to 2018.

bloopernova · on March 3, 2019

It's probably just the lack of emote via text, but your reply comes off as confrontational. It's also seemingly replying to me about general Rust issues when I'm just sharing my experience of a recent Java project. I don't have much experience of Rust beyond reading some code every now and then when an interesting blog post pops up here.

I'm also not condoning the decisions made in the project I describe. In fact, those ancient decisions are responsible for a significant amount of stress and work for me right now.

I've absolutely had experience of good JVM based projects that had loose dependencies on libraries and jdk versions such that it ran just fine under the openjvm, and ran just fine when that package was updated.

paulddraper · on March 3, 2019

> I don't have much experience of Rust

Oh, then that makes more sense.

It's always possible improve. But Java is pretty much king when it comes to compatibility and maintainability. (Not so much in other areas, like verbosity, language features, type system, memory overhead, etc.)

tyingq · on March 3, 2019

I just like that someone at npm would avoid something because it has lots of dependencies and overhead. The irony is strong with this one.

derefr · on March 3, 2019

Not so ironic—just because a practice is common among the median engineer in an ecosystem, doesn’t mean it will be common among the most experienced engineers in that ecosystem.

Node gets a bad rap mostly for the fact that it has tons and tons of inexperienced engineers using it (probably as one of their first programming languages.) Same reason PHP got a bad rap back when. You can build solid software in both, by finding and absorbing engineering best-practices from people who have already had hard-fought battles to learn them; but an engineer will have to be burned at least once on building/scaling+SLAing something before they start looking for those. You might say that the Node ecosystem has a lot of programmers that are “engineering virgins”—they’ve never been forced to contend with the real problems of engineering software. But they’d be “engineering virgins” no matter what language they’re using; that’s not an indictment of the language, just a consequence of its popularity and approachability.

ambicapter · on March 3, 2019

Do you have any advice for PHP best practices? I'm starting a job that will probably have me using PHP...

simion314 · on March 3, 2019

I think majority of the good practices are the same in PHp as in other languages like Java, many PHP issues were caused by creating SQL statements using user input(you can have same issues with beginner devs in most languages), so the best practice is to find a framework that does all the ugly part for you(the logins, session management, routing), there are large and small frameworks , I suggest use a popular one even is not perfect(to avoid the risk of hitting bugs if is not used that much or the risk of it getting abandoned or deprecated),

Don't get me wrong, if you need to do one small thing, like for example I had a desktop app and the user could submit feedback directly from the app, then a single PHP file was enough for this case(no dependencies, no frameworks), you get the submitted data, clean it and put it in the database or submit it to a third-party API that can handle it.

tyingq · on March 3, 2019

A fair amount of the pitfalls are already gone. Register_globals is off by default now, placeholders for sql queries are the default examples, etc.

You will probably hear that you should use Laravel or similar. I'd argue it's a pretty big hammer, so don't reach for it if you don't need it.

The biggest issue is probably still the breadth and inconsistency of the standard library. Too many ways to do the same thing. Also, the general issues of a dynamically typed language, sprinkled in with things like == vs ===.

I like the books this guy writes: https://github.com/codeguy

scns · on March 3, 2019

https://phptherightway.com/

http://www.phpthewrongway.com/

https://paragonie.com/blog/2017/12/2018-guide-building-secur...

Edit: formatting

majewsky · on March 4, 2019

I find it funny that the "right way" link has HTTPS whereas the "wrong way" link doesn't.

icebraining · on March 3, 2019

We're not talking about average JS developers, we're talking about the NPM maintainers.

EDIT: I did miss that. Sorry for the noise.

derefr · on March 3, 2019

You missed the joke. Let me quote the GP again:

> I just like that someone at npm would avoid something because it has lots of dependencies and overhead. The irony is strong with this one.

For this to be "ironic", having lots and lots of dependencies would actually have to be a Node.js "best practice." But it's not.

It seems like a best practice to outsiders, for the same reason that setting `register_globals` seemed like a "best practice" in PHP back in PHP3/4. Because it was extremely common, one might assume that it was endorsed as a canonical approach. And so you do it yourself, and write sophomoric tutorials suggesting others do the same, perpetuating the problem.

In reality, the "best practice" followed by experienced software engineers (for Node.js or any other language) is to carefully consider your dependencies, and to try to avoid dependencies that cause an explosion of sub-dependencies. The NPM maintainers are experienced engineers, and so they follow this best practice.

There is no irony here. It is not "the Node.js way" to use tons and tons of dependencies, such that the NPM maintainers are going against the grain somehow. It's just the way of programmers inexperienced in engineering to not care about dependency proliferation; and then, further, to make a large set of their own tiny libraries (with already-exploded deps trees) because they aren't yet at a stage of programming expertise where they see that code as trivial to bang out whenever they need it (see: the left-pad package) that then further encourages others to depend on them. It's the "copying and pasting a solution together from bad code in five StackOverflow posts" phase of one's programming career, except instead of having to copy-and-paste, all the snippets are symbolically linked together into a big tree and you refer to them by name. (Again, that's not an indictment of Node.js—there's nothing you can do to stop a bunch of inexperienced engineers from doing this to your package ecosystem as well, if they happen to be drawn into your community.)

tyingq · on March 3, 2019

I'm not convinced. I see stuff like this fairly often: https://news.ycombinator.com/item?id=19290801 (Basically how it might be common to have 1000+ dependent files in a JS app)

From people that seem plenty intelligent and experienced.

ricardobeat · on March 3, 2019

That thread is about a module system. 1000 files != 1000 third-party dependencies.

oconnor663 · on March 3, 2019

On the other hand, NPM itself kind of paved the way for ecosystems with tons of dependencies, managed and versioned with a single tool. Installing system software like the JVM is more old school.

tyingq · on March 3, 2019

Npm doesn't seem to break any substantial new ground to me. Things like apt, CPAN, etc, have been around longer.

bdavisx · on March 3, 2019

You don't have to install a JVM, you can bundle it with a distribution as well.

brazzledazzle · on March 3, 2019

If I have a choice between dealing with it and not I’ll choose not every time. It’s an annoying dependency that got even more annoying when Oracle decided to be a pain in the ass. Just because something is “easy” doesn’t mean it’s easier than the alternative.

vbezhenar · on March 3, 2019

I just unpack JDK, I have no idea what so complex about it. It was complex on Windows because I used VM to install Java and then copy folder, but they made it easier with 11.

tyingq · on March 3, 2019

One example might be that if you unpacked the Oracle JDK, you owe Oracle a non-trivial amount of money now.

pjmlp · on March 3, 2019

Only if deployed in production, and there's nothing hard to understand it.

tyingq · on March 3, 2019

Commercial development or test use isn't free. It's only free for personal, noncommercial use. So I'd argue it isn't so easy to understand.

pjmlp · on March 3, 2019

How it isn't? You just explained it quite well.

Not to mention that the Web is now full of discussions around this, with official posts from Oracle, Red-Hat, Amazon, IBM, Azul, Microsoft explaining how to go forward.

tyingq · on March 3, 2019

Figure out what you would pay if you used it in an elastic ECS setup. Using only Oracle documentation as a guide :)

pjmlp · on March 3, 2019

Why should I? There is OpenJDK for that.

But assuming that Oracle support is actually desired, here is what you are asking for.

https://www.oracle.com/technetwork/java/java-se-support-road...

https://www.oracle.com/java/java-se-subscription.html

https://shop.oracle.com/apex/f?p=DSTORE:2:::NO:RIR,RP,2:PROD...

Old Java dog here.

But lets just forget about the efforts from AdoptaJDK, Red-Hat, Amazon, Azul and bash Oracle, it is more fun.

tyingq · on March 3, 2019

"here is what you are asking for"

None of that helps you with what the charge is when you're elastic, or using multiples of hyperthreads that don't add up to a integer number of cores. I agree that OpenJDK is a better idea.

pjmlp · on March 3, 2019

If you want to be fully sure and insist in using Oracle JDK, then do like any enterprise shop and call the sales team to get their written word, what is so hard about it?

vbezhenar · on March 3, 2019

Why would use use only Oracle documentation? Call sales guys and ask them for a license. They would be happy to sell you something and there's even some chance for some kind of discount.

strictfp · on March 3, 2019

To me it reads as "We didn't want to use Java".

jrs95 · on March 3, 2019

Yeah. I can see the Java issue as a minor annoyance, but to put it on the same level as the lack of safety with C/C++ seems hyperbolic to me.

brown9-2 · on March 3, 2019

It reads like an innocent misperception due to inexperience with Java.

Very few people would choose to install libraries (JARs) used by their code via their OS package manager for instance.

feikname · on March 3, 2019

JVM tuning (especially the GC and memory allocation scaling) can be a huge PITA.

simion314 · on March 3, 2019

But having options is always good, isn't the same with compiled languages? you keep the defaults options but if you want extra performance you try different compiler flags or different compiler if the language has more then one.

weberc2 · on March 3, 2019

Honestly I’d rather have good defaults than lots of options. If one tool meets requirements out of the box, then it’s preferable to a tool that requires a lot of tooling.

simion314 · on March 3, 2019

But if you do not have options then what is a "good default" , you have a default option if you have more then an option, so you prefer software with no options?

Can you give examples of languages(compilers or VMs with bad defaults in your opinion?

acdha · on March 3, 2019

Java by default has a hard memory limit which when hit causes services to hang in a semi-responsive state rather than exiting cleanly, requiring every user to hand-tune limits and carefully setup health checks and monitoring. Two decades in, they added a non-standard option to instead exit.

If they had instead made the default to act like almost everything else it would have worked with standard process monitors and limits with no effort required and a substantial fraction of the downtime I’ve seen for Java applications would never have happened.

simion314 · on March 3, 2019

Isn't this default better for average Joe that runs a Java desktop app, and developers that deploy apps should know how to change this defaults?

acdha · on March 4, 2019

How is it better for something to stop working but not exit? It’s exceedingly uncommon for anyone to have correct handling for OOM exceptions so the most likely effect is that stuff partially works - servers accept requests but never respond, apps have menus/buttons which don’t work, etc.

Similarly, if developers who deployed apps knew enough to avoid this, we’d know by now because it wouldn’t happen so frequently. It does highlight who failed to do real health-checks (e.g. years ago most Hadoop services needed huge amounts of RAM to start without crashing but they’d be listed as okay) but it’s the kind of thing sysadmins have been cleaning up for decades.

simion314 · on March 4, 2019

You are probably right for this case, my initial comment was related to performance tuning configuration, the comment I replied sounded like JVM should have had X set to my preferred value or "I prefer software that has no options to confuse me with aka GNOME mentality"

weberc2 · on March 3, 2019

You can have options and have good default values for those options. Good default values are those that are reasonable for most people most of the time such that they don't need to be (or hire) domain experts to properly configure their tool.

My opinion of a bad default would include a language toolchain which defaults to dynamic linking and requires one to opt into static linking (Java, Python, JS, etc, etc, etc). Worse than that is an anorexic toolchain that has no defaults whatsoever and requires you to pass every little detail directly to the compiler--bonus points if your language has a massive ecosystem of competing tools which are meant to manage these sorts of details for you but utterly and uniformly fail to do so (looking at you, C/C++).

simion314 · on March 3, 2019

So the JVM run-time is installed and used by average people, so the defaults should be set for this people that install Java to run a desktop app IMO, the developers that want extra performance should read the manual and configure things.

weberc2 · on March 4, 2019

I’ve never heard of a runtime that forces a dichotomy between end-user- and developer-friendliness (putting aside for the moment that end users are famously annoyed by the Java runtime). Rust, Go, etc don’t have runtimes which force a choice between end-users and developers...

simion314 · on March 4, 2019

JVM can be used for Desktop, server and in the past applets , it would be impossible to find a configuration that is optimal for all applications, so competent developers would tweak the defaults or use different deployment methods like AOT compilation or bundle your own JVM , is your problem that JVM is used in so many different places that different algorithms, runtimes and optimization were created for it ? Your examples of Go and Rust are of languages that are so far not been used in such diverse places, there are no good alternative compilers but something like Python has such a diversity of runtimes.

twic · on March 3, 2019

How much is that actually necessary these days? I remember spending ages tweaking flags in the 1.4 - 1.6 years. But there have increasingly been sensible defaults with broad applicability. Now that G1 is both default and usable, even more so.

As i've moved onto 11, i've slashed our apps' JAVA_OPTS down to almost nothing - max heap size, some GC logging flags, that's it.

swiftcoder · on March 3, 2019

At scale? Very necessary. The default GC as of Java 8 (the last version I used in production) suffered significant performance issues above 40 GB of working set, and required careful tuning thereafter.

lllr_finger · on March 3, 2019

I completely agree. The JVM is fine until it isn't, and at that point it becomes very frustrating to deal with - even when using less than 16GB RAM.

pjmlp · on March 4, 2019

The default GC as of OpenJDK 8.

The default GC from IBM, Azul, and others wasn't the same one.

In fact Azul has made a business of selling JVMs with GC that can handle hundreds of data GBs.

Modern OpenJDK versions also have such GCs now.

pjmlp · on March 3, 2019

Hardly any different that having to use VTune or perf to optimise C and C++, which require the added steps of recompiling with different flags and redemploying.

weberc2 · on March 3, 2019

I wouldn’t recommend C or C++, but in their defense, you don’t need to worry about tuning as soon as you do on the JVM. Many apps won’t need to bother with it at all (which is good because C/C++ programmers have all manner of other things to worry about that aren’t a concern for modern languages).

phamilton · on March 3, 2019

There's also a notion of failure mode for improperly tuned apps. This is certainly a personal preference, but I prefer true failures over constraint failures. I'd rather OOM in golang than run out of threads in my Java execution context while having plenty of free memory. At least when starting out.

pjmlp · on March 3, 2019

Me neither, I do like Rust and see its adoption at NPM as good PR for the language, however I also think that the way the decision process went wasn't free of bias.

rmrfrmrf · on March 3, 2019

I'm not a fan of relying on distro package managers for installation of runtime dependencies on servers. Too many opportunites for variables to creep in if the version isn't locked, and then having to make sure all the package manager dependencies and config themselves. Even if you automate you're still at the mercy of the repo to have the version you need etc. and often times you need to customize the install for a highly-available and/or virtualized environment. Bad times all around.

shaklee3 · on March 3, 2019

What's the alternative? Not installing the dependencies so the app doesn't work?

koolba · on March 3, 2019

The alternative is vendoring the dependencies. That includes the JDK or Node runtime. With “modern” deployments you’ll see this with container based packaging that includes the runtimes either explicitly or via system packaged (in the container). The classic approach is copy the runtimes into your final application tarball.

Either way the runtime is baked into the app and gets deployed and tested with it as a core component. Runtime upgrades then become vanilla deployments.

rmrfrmrf · on March 3, 2019

Lots of options: containerization, downloading prebuilt binaries, downloading and building from source, or downloading, building, then packaging the artifacts and leaving somewhere centralized for deploy.

ricardobeat · on March 3, 2019

Using the language package manager - npm, cargo, etc.

oweiler · on March 3, 2019

Also with GraalVM you may get away with building a native image which doesn't require a JVM at all.

pjmlp · on March 3, 2019

That is not the only option, though.

Enterprises that care about AOT Java have been using them almost since Java exists.

victor106 · on March 3, 2019

A really cool demo that shows you the power of GraalVM

https://youtu.be/9BQiDmvOnZw

sheeshkebab · on March 3, 2019

Deploying jvm app/service is now even more complex than figuring out which version of node to use to run a service. Is it supposed to be oracle java? Openjdk? Adoptopenjdk or one of a half dozen more? Which version of it? Anything to tweak in gc or startup settings for it/the version? Do we need to regression test the service on minor jdk upgrade? Is that jvm compatible with some os version we are running that has some security patches and other settings?

sgift · on March 3, 2019

> Deploying jvm app/service is now even more complex than figuring out which version of node to use to run a service. > Is it supposed to be oracle java? Openjdk? Adoptopenjdk or one of a half dozen more? Which version of it?

All of these must have been tested against the Java TCK. So, the answer is "whichever you/your company decided is okay" for the first part and for the second part "newest update of the Java version your software runs on". Doesn't sound very complex to me.

pjmlp · on March 3, 2019

It is pretty clear to any regular Java developer that bothers to keep up with the news.

Regression tests are required for any toolchain.

zaphar · on March 3, 2019

You are making the parents point for them.

paulddraper · on March 3, 2019

Yeah, "npmjs.com" would have been more enlightening.

GordonS · on March 3, 2019

The issue with the JVM is keeping it updated, in the context of a constant stream of security issues that need patching.

steveklabnik · on March 3, 2019

Hey folks! This is part of our whitepaper series. This means that the audience is CTOs of larger organizations, and so the tone and content are geared for that, more than HN. Please keep that in mind!

breatheoften · on March 3, 2019

Are CTO's of largish organizations still not part of the hacker news audience ...? That seems a bit of a damning statement to be about the population of CTO's ...

tejasmanohar · on March 3, 2019

I'm sure some are on HN, but it's definitely not "the place" for them, and many have probably never heard of it

shaklee3 · on March 3, 2019

I would be shocked if any CTOs of large organizations browse HN. Large orgs put people in those positions that are more about theory and future thinking over current knowledge.

manigandham · on March 3, 2019

It's expected that high-level executives of large organizations deal with planning and strategy rather than low-level technical details. They're not following language features or implementations.

However in that case, this whitepaper (and many others) are damning in how little they actually state and why so many technical decisions go wrong.

phillipcarter · on March 3, 2019

Good overall whitepaper and I like to see efforts like these.

One are I felt was missing is some data here:

"It keeps resource usage low without the possibility of compromising memory safety. "

How did the resource usage compare with the Go and node rewrites? What metrics were used under which workload? Benchmarks are never perfect but I think a CTO-level person would like to see a table of results like that.

the_duke · on March 3, 2019

A better title would be the subtitle of the article: "The npm Registry uses Rust for its CPU-bound bottlenecks".

Note that only one service (authentication) was rewritten from node to Rust.

steveklabnik · on March 3, 2019

That’s what I titled it when I submitted it after we first published this.

There’s also a second service we know about in Rust, and that’s the one that renders package README pages.

aboutruby · on March 3, 2019

Or title could use "npm, Inc" as it's referring to the organization

BlackFly · on March 3, 2019

> Java was excluded from consideration because of the requirement of deploying the JVM and associated libraries along with any program to their production servers. This was an amount of operational complexity and resource overhead that was as undesirable as the unsafety of C or C++.

Just so everyone here is aware, this is by now an outdated complaint against Java.

https://vertx.io/blog/eclipse-vert-x-goes-native/

I'm choosing vertx as an example since it competes already with rust and c based applications over at https://www.techempower.com/benchmarks but you ought to be able to compile general programs ahead of time.

cle · on March 3, 2019

This is not at all outdated. GraalVM and SubstrateVM are still very new and only support a subset of JVM features. The Vert.x article itself mentions that, as well as this:

https://github.com/oracle/graal/blob/master/substratevm/LIMI...

BlackFly · on March 4, 2019

Before I respond, please don't misinterpret me. I think it is perfectly acceptable to choose Rust over Java saying nothing more than, "We felt Rust would be a better fit for the team," or "We were more excited by Rust." However, if somehow you have imposed on you the constraint that you need to compile ahead of time, this is not enough to toss Java out of the running.

Vert.x encountered some issues with reflection while Rust simply does not support the sort of dynamic reflection that Java running on the JVM can achieve. SubstrateVM forces you to have compile time reflection, and Rust can also support this to some extent. If Rust is a feasible alternative to Java for you, then you are not going to encounter this limitation if you choose to go with Java. Plus, if you ever decide you do need this power in your application, if you go with Java you can pay the cost of the increased operational expense and install a JVM.

ricardobeat · on March 3, 2019

Are there any companies using these native images in production?

victor106 · on March 3, 2019

Shared this in this same thread for another comment.

The below videos show GraalVM based app loading up Spring framework, and flowable process engine and making a rest call to an external service all in 13 ms!!!

Checkout

https://youtu.be/9BQiDmvOnZw https://youtu.be/yLvnkkRys2Y

pjmlp · on March 4, 2019

Twitter uses GraalVM in production, you can find several talks from them.

Beyond that, enterprises using PTC, Aicas, IBM, ExcelsiorJET JDKs have had the option to deploy AOT native code in production for the last 20 years or so.

ilovecaching · on March 3, 2019

I'd like to point out that even though it took them a week compared to an hour, a week is actually incredibly fast to learn Rust and build something useful with it. Learning C++ can take more than a month of training, and it's only because most people learn it over an entire semester at school that they learn it at all. This is also the time it takes to learn Rust, and presumably now that they've written one program, writing the next program will take them a fraction of a week.

The amount of time it takes to learn something is often indicative of its power. Anyone who has learned a foreign language or a musical instrument knows that the time spent investing up front pays huge dividends down the road when you have the skills and tools to richly express yourself. The reason that Go takes two days to learn is because it artificially limits the amount of up front investment at the cost of limiting expressiveness over the lifetime of your use of the language.

koyote · on March 3, 2019

> I'd like to point out that even though it took them a week compared to an hour, a week is actually incredibly fast to learn Rust and build something useful with it.

I would argue that something that takes even an experienced engineer a mere hour to write is very small and has little complexity (especially if unit tests are counted towards the hour it took them to re-write it). This means it's difficult to gauge how much Rust was 'learned' during that week.

rujuladanh · on March 3, 2019

Learning any of C, C++, Rust or other systems languages takes way, way more than a week or a month. For C, the simplest of the three, it is usually rated at 3 months.

What happens is that if you are already proficient in one of them, one of the others takes way less time (specially in the case of C++, given it forces you to learn almost all paradigms).

In addition, writing a small program does not mean you have learnt C, C++ or Rust.

faitswulff · on March 3, 2019

I like how this whitepaper sidesteps the "but the rewrite is the real improvement!" by also rewriting the service in Node.js along with Go and Rust.

adamnemecek · on March 3, 2019

I’ve been playing around with rust since it came out but only recently did I decide to use it for a part of a project. It’s a very pleasant language. I didn’t fight the borrow checker much (maybe due to prior experience).

The language is nuts. It’s true what they say cargo is even better than the language, it’s just so easy to add packages to your project or to split your project into packages.

Cargo is an amazing investment as this will help people write non duplicated code. Like how many string implementations are there across c code bases. Each c project has so much code that’s the most boring, repetitive shit you can imagine. Cargo let’s you concentrate on writing your code without hassle.

I have experience with a lot of package managers, gems, go, cocoapods, sbt, cabal, pip, spm, npm, you name it but cargo is on a different plane of existence. Cargo makes the whole internet your standard library.

I also like cargo workspaces. Modern development needs a workflow where you pull in a dependency, and work on it in tandem with your code. Achieving a good workflow for this is surprisingly hard.

c-smile · on March 3, 2019

> write non duplicated code ... Like how many string implementations are there across c code bases.

In one of projects I needed foo::string that can share data with foo::variant without copying the data each time. So foo::string was implemented as a COW string - smart pointer for foo::string_data. std::string simply does not work in such requirements.

So I am not sure I understand how cargo will help in this case. Either Cargo source repository will contain implementations of any possible permutation of requirements of strings or people will just use standard std::string.

The only feature that I need in C/C++ is unified ability to include libraries in code:

    #define PNG_APNG_REQUIRED 
    #include source "libs/png/png-amalgamated.c"

I am perfectly fine with downloading png.tar.gz manually and putting it in place where I need it.

In any case decision to include library to a product requires quite a lot of reasonings and architectural investigations.

For typical web front-end projects NPM or Cargo probably make sense. But for, say, NodeJs or Cargo itself they should not use any such automatic downloader.

adamnemecek · on March 3, 2019

Your problem isn't hard to solve in Rust. Reasoning about allocations is a foundational idea of Rust and is solved using lifetimes.

Basically, lifetimes are allocators as a first-class language construct. You can reason about what happens if values are stack or heap allocated and specialize your code based on that. Lifetimes are definitely an advanced feature and I think you can do some crazy optimizations with them. But your use case is not hard to implement. If you show me the C++, I'll show you how to achieve the same semantics.

Cargo helps because Rust lets you build cleaner abstractions and abstractions that compose nicer. So integration is super easy.

> For typical web front-end projects NPM or Cargo probably make sense.

I might misunderstand your sentence, but neither Rust nor cargo are for front-end?

steveklabnik · on March 3, 2019

npm is used a lot for front end code, incidentally. Node is an environment for a lot of build tooling.

lioeters · on March 3, 2019

> Modern development needs a workflow where you pull in a dependency, and work on it in tandem with your code. Achieving a good workflow for this is surprisingly hard.

Well put! I've struggled with this exact situation, and although I figured out a setup that works for me, it's still not ideal. In my experience, npm the package manager doesn't enable such workflows reliably (yet).

adamnemecek · on March 3, 2019

I agree. A lot of times I end up just pushing dependencies to github and pulling from github in the main codebase.

spricket · on March 3, 2019

While I'm a big fan of Rust, excluding Java because "JVM" is kinda laughable. It's not hard to run at all. You package everything into a jar then run a single command. As easy to get working as a JS backend.

If their complaints are about GC tuning, is it not the same thing as tuning the GC in Js/Go? Java still had arguably the more mature GC of any language

StreamBright · on March 3, 2019

No surprise here, good language design pays off for real world applications especially at scale like the NPM infra.

nikeee · on March 3, 2019

Anyone knows if they considered .NET Core? It has dependency management, can be deployed as a self-contained binary and is memory safe. Seems to match the requirements for me.

steveklabnik · on March 3, 2019

https://news.ycombinator.com/item?id=19295166

colejohnson66 · on March 3, 2019

> This stuff happened before, or at least around the time, that .NET Core was released. So it either didn’t exist or was an extremely new option, at least.

nothrabannosir · on March 3, 2019

This entire article is a pretty damning report on JavaScript in general, but this sentence takes the cake (emphasis mine):

> The process of deploying the new Rust service was straight-forward, and soon they were able to forget about the Rust service because it caused so few operational issues. At npm, the usual experience of deploying a JavaScript service to production was that the service would need extensive monitoring for errors and excessive resource usage necessitating debugging and restarts.

Is this satire?

twiss · on March 3, 2019

They also state that writing the service in Node took them an hour, two days for Go, and a week for Rust. Even taking into account their unfamiliarity with the language, it's probably fair to say that when switching to Rust, you'll usually spend more time writing and less time debugging. Whether that trade-off is worth it depends on the project.

atoav · on March 3, 2019

It depends, I am over a year in Rust and it doesn't take me that much longer to write something in it than say in Python. The confidence I have that the thing I wrote is way higher in Rust and it is usually much faster.

And: it is incredible easy to build and deploy.

I think whether Rust is useful or not depends entirely on the application. If you need high confidence in what the thing is doing, it should run parallel and fast and you are familiar with the concepts Rust is using – it isn't a bad choice. For me it replaced Python in nearly every domain except one-use scripts and scientific stuff.

It can be hard for advanced programmers to abandon certain patterns they bring from other languages though. In the first months I tried too much to use OOP, which doesn't make any sense and leads to convoluted code. If you work more in a compositional and data oriented way while making use of Types and Traits, you will end up with much simpler solutions that work incredibly well.

Katherine West's RustConf 2018 Talk on ECS Systems describes this incredibly well and might be even good to watch if you are never intending to use Rust at all, because the patterns discussed are quite universal: https://www.youtube.com/watch?v=aKLntZcp27M

ndesaulniers · on March 4, 2019

> I tried too much to use OOP, which doesn't make any sense and leads to convoluted code.

Oh, well, you don't need Rust for that! :P

Impossible · on March 4, 2019

Yes the same is true with C++ or C# for example, which is were the move out away from OOP and towards DOD/ECS started.

pjmlp · on March 4, 2019

Which is ironic, given that people tend to forget that it was C++ which made OOP mainstream, there wasn't any Java or C# back then, two fully OOP based languages.

The other part is that component oriented programming is actually a branch of OOP, from CS point of view, with books published on the subject at the beginning of the century.

"Component Software: Beyond Object-Oriented Programming"

https://www.amazon.com/Component-Software-Beyond-Object-Orie...

First edition uses Component Pascal, Java and C++, with the 2nd edition replacing Component Pascal for C#.

Dowwie · on March 3, 2019

Once a programmer achieves a certain competency level with Rust, writing familiar workflows requires little additional effort than what is expended with a dynamic language. However, lower level Rust will demand more, regardless of proficiencies.

Twirrim · on March 3, 2019

> Whether that trade-off is worth it depends on the project.

Sure, but when you consider the drastically reduced operational cost that they're talking about there... that week is absolutely peanuts in comparison, and that was also a week including getting to grips with the language sufficient to produce the component. You really don't want to have to pay attention to production. You want to be able to concentrate on getting stuff done, not losing time keeping what you've already got just ticking along.

pjc50 · on March 3, 2019

> writing the service in Node took them an hour

I'm really skeptical of this unless it's just a wrapper for a thing that happens to already exist. It would be interesting to have comparative LOC numbers.

> At npm, the usual experience of deploying a JavaScript service to production was that the service would need extensive monitoring for errors and excessive resource usage necessitating debugging and restarts

So, they deployed it after an hour, but it wasn't finished until they stopped having to debug it in production?

krferriter · on March 3, 2019

Fail early, fail often

deploy anyways

platz · on March 3, 2019

> about a week to get up to speed in the language

twiss · on March 3, 2019

> about a week to get up to speed in the language and implement the program

is the actual quote.

platz · on March 3, 2019

the point is to measure at a common level of proficient across different languages. Once you are proficient and familiar with a language, then you can measure how long it takes you compared to another lang.

ksec · on March 3, 2019

>They also state that writing the service in Node took them an hour, two days for Go, and a week for Rust.

>At npm, the usual experience of deploying a JavaScript service to production was that the service would need extensive monitoring for errors and excessive resource usage necessitating debugging and restarts.

But if you factor in over the life time of the program, Where Node saves you a week times at the initial implementation and you paid back in extensive monitoring, it is probably safe to say Rust's TCO is much lower.

Not Sure how Rust will flare against Go. But I think there is a high probability that Rust is better in the longer run.

0815test · on March 3, 2019

> Not Sure how Rust will flare against Go.

I guess it will be domain dependent. Go uses a highly-developed concurrent GC, which is going to make it a lot more convenient for certain specialized workloads that involve graph-like or network-like structures. (That's the actual use case for tracing GC, after all. It's not a coincidence that garbage collection was first developed in connection with LISP. And yes, you could do the same kind of thing in Rust by using an ECS pattern, but it's not really idiomatic to the language.)

atoav · on March 3, 2019

Rust's package managment (cargo) is the best thing I have ever seen of it's kind. The very basic thing you can do is: cargo new funkyproject

Which creates a new barebones rust project called "funkyproject". Every dependency specified in it's Cargo.toml will be automatically downloaded at build (if there is a new version).

When a build is sucessful the versions of said dependency will be saved into a Cargo.lock file. This means if it compiles for you, it should compile on every other machine too.

A cargo.toml allows you also to use (public or private) repositories as a sorce for a library, specify wildcard version numbers to only select e.g. versions newer than 1.0.3 and older than 1.0.7 etc.

Because the compiler will show you unused dependencies you never really end up including anything you don't use. In practice this system does not only work incredibly well, but is also very comfortable to use and isolates it self from the system it is running on quite well.

I really wish Python also had something like this. Pipenv is sort of going into that direction, but it is nowhere near cargo in functional terms.

kristopolous · on March 3, 2019

> Every dependency specified in it's Cargo.toml will be automatically downloaded at build (if there is a new version).

Why do people want this? The builds are no longer reproducible, security and edge case issues can come out of nowhere, api changes from an irresponsible maintainer can break things, network and resource failure can break the build, it's just a terrible idea.

The proper use of a semvar system is entirely optional and unenforceable and seen people been bitten countless times by some developer breaking their package and having everyone complaining ... If the tool didn't do the stupid thing of just randomly downloading crap from the internet none of this would be a problem.

I presume all my dependences are buggy...I just know that the current ones don't have bugs that I have to deal with now. You swap out new code and who the heck knows, it becomes my job again. It's more work because of a policy that doesn't make sense.

Newer code isn't always better. People try new ideas that have greater potential but for a while the product is worse. That's fine, I do it all the time. But I sure as hell don't want software automatically force updating dependency code to the latest lab experiment.

Cities, power plants, defence systems, satellites, and airplanes run on software from the 80s; they don't break because a new version of some library had bugs and it automatically updated, no. They fucking work.

There's a giant inherent huge irreplaceable value in predictability and this approach ignores all those lessons.

steveklabnik · on March 3, 2019

Reproducibility was a core concern for cargo. Your parent is incorrect. A lock file means that your dependencies are never updated unless you explicitly ask for an update.

mitchty · on March 3, 2019

There is also cargo vendor to download the dependencies locally. I’m using just that at work to ensure builds without network access work.

Rust is no worse here than say Haskell with cabal or stack or Swift with whatever they were using I forgot or go for that matter.

dbaupp · on March 3, 2019

You're misreading your parent. The download only happens for the first build using a new dependency. As they mention, once the version is written into the Cargo.lock file, that is the exact version that is used until there is an explicit update step run.

tedunangst · on March 3, 2019

What does "if there is a new version" mean then? If it's a new dependency, there's no old version.

atoav · on March 3, 2019

Sorry, english is not my first language, I meant this: When you build initially the used dependencies get downloaded. Only if you [A] update, [B] add a new dependency or [C] clean your project and build it again there will be new things downloaded.

If you update the versions in your Cargo.lock are ignored and updated if the build is sucessful.

If you add a dependency only that depndency is downloaded, the rest is kept as you had it.

If you clean it is as if you cloned that project fresh with git and you will have to download all dependencies. If there is a lockfile the exact versions from it will be used.

To me this is extremely flexible and works very well AND you get precise control over versions if you want it. By the way it is also possible to clone all dependencies and keep a local copy of them, so you are really 100% sure that nothing could ever change with them. Although I am quite sure crates.io doesn't allow changes without version number change, which means you should be save as long as you rely on the version number.

dbaupp · on March 3, 2019

Yes, I suppose that's rather misleading, and that sentence contradicts with the actual behaviour that described later in the original comment. For a fixed set of dependencies, versions are only checked and changed on an explicit 'cargo update' run.

pcwalton · on March 3, 2019

It's a good thing Cargo has lockfiles!

zukzuk · on March 3, 2019

npm (and yarn) literally does exactly all of this, via `npm init funkyproject` and `package-lock.json`.

spdionis · on March 3, 2019

Except that npm will gladly update your lock file when you run npm install which is insane.

fwip · on March 3, 2019

Npm hasn't done this in over a year.

rolleiflex · on March 3, 2019

The current version of npm does this and this is "correct behaviour". I got bitten by this a few weeks ago.

For the passers-by, the only way to make npm behave expectedly in this specific case is to use "npm ci" instead of "npm install". If you do not do this, npm will assume you want to update the packages to the latest version at all times, at all costs, even if you have a lock file in place, and even if you have your package file and lock file locked to exact versions. (i.e. 2.0.0 exact, not ^2.0.0)

This is a new addition, and it has been added a couple months ago. Before that, you had to check your dependencies into your source control. That might still be the best practice, and likely the only trustable way to get reproducible builds consistently over a longer time horizon.

rictic · on March 3, 2019

> and even if you have your package file and lock file locked to exact versions. (i.e. 2.0.0 exact, not ^2.0.0)

Wait what? Are you sure about that part? That's a violation of npm's semver constraints https://semver.npmjs.com

(I agree with you that "npm ci" should be the default behavior, and "npm install" should be called something different, like update-and-install)

rolleiflex · on March 4, 2019

Yes - to be more specific, you can lock your own package's dependencies to an exact version, but you cannot lock dependencies of your package's dependencies. You can't do anything about them. They will get updated because their package lock specifies the lock in the form of ^2.0.0. The fact that a package lock can resolve to multiple versions is counterintuitive. One would think the whole point of a package lock is to lock packages.

As a result, when you do a npm install in your oblivious and happy life, npm naturally assumes you want to summon Cthulhu. If you didn't want to summon Cthulhu, why did you call the command that summons Cthulhu? Yes, the default command summons Cthulhu because we believe in agile Cthulhu. If you don't want to summon Cthulhu, try this undocumented command with a misleading name we've added silently a few weeks ago for weird people like you who don't want to summon Cthulhu when they want to do a npm install. But seriously, why do you not want to summon Cthulhu?

Unfortunately, this was the impression I've gotten of the position of npm folks when I read a few threads about this. I've moved to npm ci for now and moved on. Npm's package lock is many things, however, none of the things it is, is a package lock.

manigandham · on March 3, 2019

Or use Yarn.

saagarjha · on March 3, 2019

…so would Cargo? If you install a new package, why wouldn’t you expect it to show up in your lock file?

rkkautsar · on March 3, 2019

No, you guys don't understand.. npm updates the package lock even when not adding a new package, i.e. the initial `npm install`. It's insane I'm think to go back to yarn again..

lkschubert8 · on March 4, 2019

I'm with you, the default behavior is so counter intuitive.

Aeolun · on March 4, 2019

You can use ‘npm ci’ for actually sensible install behaviour.

saagarjha · on March 4, 2019

Hmm, that's pretty stupid. What is the rationale behind this? That you check before you run an install?

manigandham · on March 3, 2019

Why is that insane? What else is supposed to happen when you install a package?

EDIT: I misunderstood and thought you were talking about installing a package. If you're running `npm install` to just reinstall dependencies then yes the lockfile should not be modified. However it seems like that is indeed the case and you may be talking about a prior bug with NPM.

dpcx · on March 3, 2019

`npm install` is what you the developer would run when you first clone a project; it should install exactly what's in the package-lock.json file. Unfortunately, it sometimes doesn't do that.

StreamBright · on March 3, 2019

Well just like many other languages with sane environment (dependencies, building, etc.) management. I think this is the norm nowadays (D, Clojure, and so on).

ISO-morphism · on March 3, 2019

I think Poetry [1] is the most promising in the python build/dependency space. I've used pipenv and left dissatisfied.

[1] https://github.com/sdispater/poetry

Rotareti · on March 4, 2019

I gave poetry a try and liked it a lot as well, but I really miss the virtualenv handling from pipenv. How do you handle virtualenvs with poetry?

GordonS · on March 3, 2019

That sounds pretty similar to NPM, as well as NuGet and Paket for .NET. TBH, it's the 'obvious' way for a package manager to work, so I'd be a little surprised if they didn't all work more or less the same?

CoffeeOnWrite · on March 3, 2019

You have to run npm ci instead of npm install to get npm to respect the lock file. I don’t consider that remotely obvious. And this feature was just added to npm last year, 8 years after npm was invented!

IsaacSchlueter · on March 3, 2019

That is incorrect. Both `npm install` and `npm ci` respect the lock file, and if a lock file is present, will make the `node_modules` tree match the lock file exactly.

`npm ci` is optimized for a cold start, like on a CI server, where it's expected that `node_modules` will not be present. So, it doesn't bother looking in `node_modules` to see what's already installed. So, _in that cold start case_, it's faster, but if you have a mostly-full and up to date `node_modules` folder, then `npm install` may be faster, because it won't download things unnecessarily.

Another difference is that `npm ci` also won't work _without_ a `package-lock.json` file, which means it doesn't even bother to look at your `package.json` dependencies.

CoffeeOnWrite · on March 3, 2019

Thanks for the reply Isaac! This doesn’t match my first-hand experience unfortunately. Are there any circumstances under which npm install with a lockfile present deviates from the lockfile where npm ci does not?

For example, why did this person experience the changing lockfile? https://github.com/npm/npm/issues/17101

Or why do these docs say?

> Whenever you run npm install, npm generates or updates your package lock https://docs.npmjs.com/files/package-locks

Oh, this seems like what I experienced: https://stackoverflow.com/a/45566871/283398

It does appear that npm works somewhat differently than the “obvious” way we would expect package managers to work vis a vis lockfiles :(

At least npm ci gets the job done for my use case :)

IsaacSchlueter · on March 4, 2019

If you run `npm install` with an argument, then you're saying "get me this thing, and update the lock file", so it'll do that. `npm install` with no argument will only add new things if they're required by package.json, and not already satisfied, or if they don't match the package-lock.json.

In the bug linked, they wanted to install a specific package (not matching what was in the lockfile), without updating the lockfile. That's what `--no-save` will do.

The SO link is from almost 2 years ago, and a whole major version back. So I honestly don't know. Maybe a bug that was fixed? If this is still a problem for you on the latest release, maybe take it up on https://npm.community or a GitHub issue?

spdionis · on March 3, 2019

> Both `npm install` and `npm ci` respect the lock file

This is not correct. `npm install` will update your dependencies, not install them, disregarding the package versions defined in the lock file.

It feels like you are not getting the point of having a lock file in the first place. It should be obvious that you can't do an install (which npm calls ci) if you don't have a lock file.

The lock file represents your actual dependencies. Package.json should only be used to explicitly update said dependencies.

IsaacSchlueter · on March 4, 2019

If you run `npm install` with no arguments, and you have a lockfile, it will make the node_modules folder match the lockfile. Try it.

    $ json dependencies.esm < package.json
    ^3.2.5
    # package.json would allow any esm 3.x >=3.2.5
    
    $ npm ls esm
    tap@12.5.3 /Users/isaacs/dev/js/tap
    └── esm@3.2.5
    # currently have 3.2.5 installed
    
    $ npm view esm version
    3.2.10
    # latest version on the registry is 3.2.10
    
    $ npm install
    audited 590 packages in 1.515s
    found 0 vulnerabilities
    # npm install runs the audit, but updates nothing
    # already matches package-lock.json
    
    $ npm ls esm
    tap@12.5.3 /Users/isaacs/dev/js/tap
    └── esm@3.2.5
    
    # esm is still 3.2.5
    
    $ rm -rf node_modules/esm/
    # remove it from node_modules
    
    $ npm i
    added 1 package from 1 contributor and audited 590 packages in 1.647s
    found 0 vulnerabilities
    # it updated one package this time
    
    $ npm ls esm
    tap@12.5.3 /Users/isaacs/dev/js/tap
    └── esm@3.2.5
    # oh look, matches package-lock.json!  what do you know.

Now, if you do `npm install esm` or some other _explicit choice to pull in a package by name_, then yes, it'll update it, and update the package-lock.json as well. But that's not what we're talking about.

I often don't know what I'm talking about in general, but I do usually know what I'm talking about re npm.

db48x · on March 3, 2019

Everything has evolved to get to that point. I suppose if you start with something modern like npm then it's not obvious how bad the earlier ones are. Compare the good ones with composer, dpkg, rpm, apt or dnf, to name a few examples.

earenndil · on March 3, 2019

Dub definitely does this (pretty much exactly the same, dub.json = Cargo.toml, dub.selections.json = Cargo.lock), and afaik cpan does something similar.

mlthoughts2018 · on March 4, 2019

Python does have something like this, which is conda [0].

It allows specifying dependencies with much of the same freedom you mentioned, in an environment.yaml file and other config files, you can provide arbitrary build instructions via shell scripts, use a community led repository of feeds for stable and repeatable cross-platform builds of all libraries [1], generate the boilerplate automatically for many types of packages (not just Python) [2], compiled version specifics with build variants / "features", and you can use pip as the package installer inside a pip section in the conda yaml config file.

[0]: https://github.com/conda/conda [1]: http://conda-forge.org/#about [2]: https://conda.io/projects/conda-build/en/latest/source/resou...

fpgaminer · on March 3, 2019

I wrote and deployed a production service written in pre-1.0 Rust. In over three years of being deployed I never once had to touch that code. The infrastructure around it evolved several times, we even moved cloud providers in that time, but that particular service didn't need any changes or maintenance. It just kept chugging along.

Perhaps Rust's name is apropos: your code will be so reliable that you won't need to look at it again until it has collected rust on its thick iron framework.

itiman · on March 3, 2019

I to have written a service that still runs and never touched it’s code since. I think I used COBOL.

saagarjha · on March 3, 2019

Interestingly, Graydon has mentioned in the past that he may have named the language after the fungi.

z3t4 · on March 3, 2019

Never touched code is seldom a sign of quality. But there's the old saying; if it ain't broken, don't fix it.

todd3834 · on March 3, 2019

Another saying, “broken gets fixed but shoddy lasts forever”.

Edit: this seems like I’m suggesting rust makes shoddy results. Didn’t mean to imply that. I’m actually very excited to use Rust in prod soon.

majewsky · on March 4, 2019

When code is never touched, it's usually because its business requirements don't change.

z3t4 · on March 4, 2019

Or everyone is too afraid to touch it.

sheeshkebab · on March 3, 2019

Deploying a service written in any language into production environment at scale of npmjs is far from straightforward.

I think the satire here is that internet got so centralized lately that even a simple piece of code in JavaScript requires such a huge behemoth of an org running and maintaining all this monstrous infrastructure.

curtis · on March 3, 2019

You've identified the problem, but I think you're wrong about the cause. It's not internet centralization that's the problem, it's the mundane fact that JavaScript does not have a large standard library. And JavaScript absolutely should have a large standard library, but it's not clear what organization would have the motivation to implement, support, and promote one. It's not impossible that a community-driven effort could accomplish the same thing, but nobody seems to be working on it.

w0rd-driven · on March 3, 2019

I think the biggest to a hurdle is the fact that there is no stdlib so nothing unused has to be shaken out. I think that’s a low barrier now though with more than adequate tooling. Another comment mentions a different stdlib for server vs browser but that’s also not a terribly hard problem.

I think a good first pass would come from studying analytics from npm. What are the most used packages? The most stable? I know lodash makes a lot of sense but there’s also underscore. I think the biggest hurdles are really political over technology as everyone has been so entrenched now that a one-size-fits-all stdlib would be hard. Not impossible, just hard. I do wish someone were working on it and I hate to say it but Google probably has the most skin in the game with v8 and Chrome yet I don’t really trust them not to abandon it. So who else is there? It wouldn’t be a very ‘sexy project’ either but still seems worth it to at least try.

curtis · on March 3, 2019

> I think a good first pass would come from studying analytics from npm. What are the most used packages? The most stable?

I think it would also make a lot of sense to look at what's in the Python and Ruby standard libraries.

faitswulff · on March 3, 2019

The whitepaper notes that almost 9 billion NPM packages are downloaded per week, so I don't see anything laughable about needing good monitoring.

amluto · on March 3, 2019

Which is roughly the equivalent of every single human being downloading two npm packages per week. To me, this suggests that the real problem is that too many packages are being downloaded.

geofft · on March 3, 2019

I think this is a natural result of two things which should be appealing to fans of old-school UNIX philosophy:

- NPM is intentionally suited to lots of small libraries that do one thing and do it (hopefully) well, and composing those libraries in useful ways. Whereas systems like Debian have scaling limits with large numbers of packages, NPM tries hard to avoid this so that one hundred ten-line packages are as reasonable as a single thousand-line package.

- CI systems aim for reproducibility by deploying from source and having declarative configurations, in much the way that most distro package builds happen in a clean-room environment.

jononor · on March 3, 2019

Probably a lot of these downloads are from bots. Continuous Integration is very common in Node.js/JavaScript projects, so each git commit anyone with CI (and no dependency caching) will download lots of packages.

mattashii · on March 3, 2019

> Which is roughly the equivalent of every single human being downloading two npm packages per week

The current human population of earth is about 7.7 billion, so that number should probably be closer to 1.17 npm packages per week per human being. That is still quite a lot, though

IsaacSchlueter · on March 3, 2019

This highlights the problem of averages. Most (99.87% or so) humans download zero npm packages. But those that do, often download them in the thousands at a time. And yes, clean-room CI servers are a big part of that.

amluto · on March 4, 2019

Perhaps npm could save themselves oodles of money by supplying a nice turnkey npm package cache and requiring major users to use it.

And perhaps the CI server folks would want this anyway because it would be vastly faster.

IsaacSchlueter · on March 5, 2019

You might be surprised (or maybe not) to learn that many service providers are far more willing to spend money on predictably large bandwidth bills than on less predictable changes in their infrastructure which require human time and attention to implement.

GordonS · on March 3, 2019

Yep, not that surprising though, given the anemic state of the JavaScript standard library.

z3t4 · on March 3, 2019

The idea of a scripting language is that it does not have a std. It will be different in each environment. You for example don't want the same std in nodejs and the browser. Each runtime can choose what API's it want to expose.

acdha · on March 3, 2019

That’s not a definition for scripting language I’ve ever heard before and it’s neither true nor desirable. Even JavaScript has a standard library - think about things like Set, Map, Regexp, Promise, etc. – because they’re universally useful, as opposed to the runtime environment where things like the DOM are less relevant to many use cases. JSON is a great example of something crossing over as an increasingly high percentage of projects will use it.

Not having a standard library on par with other scripting languages just added overhead and incompatibility for years as people invented ad hoc alternatives, often buggy. The accelerated core language growth has been hugely helpful for that but you still have issues with things as basic as the module system which exist for understandable reasons but are just a waste of developer time.

echelon · on March 3, 2019

Python is the batteries included scripting language. The two concepts are not mutually exclusive.

masklinn · on March 3, 2019

I would expect that to be par for the course for most languages. The more dynamic the more problematic, but it stands to reason that the less you can check for and enforce statically the more will eventually blow up at runtime.

Resource usage is similar though not exactly aligned e.g. Haskell has significant ability to statically enforce invariants and handle error conditions, but the complex runtime and default laziness can make resource usage difficult to predict.

I'd guess OCaml would also have done well in the comparison as it too combines an extensive type to system which is difficult to bypass with an eager execution model.

platz · on March 3, 2019

Default laziness in Haskell is not as big a problem as is made out to be. For someone like NPM though, Haskell's current GC would probably be too much of a bottleneck. Haskell's GC is tuned for lots of small garbage and does not like a large persistent working-set. But this has nothing to do with laziness.

Jare · on March 3, 2019

> Default laziness in Haskell is not as big a problem as is made out to be

It will take a fair amount of time to be proficient in Haskell to the point where it is not a (potential) big problem.

platz · on March 3, 2019

Honestly just using strict data structures (one StrictData pragma will do) and strict accumulators in recursive functions will get you there, it's not that hard.

echelon · on March 3, 2019

This is my experience, too.

I wrote trumped.com and deployed it prior to the last presidential election. The frontend and assets have been redeployed, but the core rust service for speech generation hasn't been touched. I've never had a service this reliable, and it took so little effort!

Rust is the best language I've ever used, bar none, period. And I've used a countless many of them.

The only places where I won't write Rust are for small one-off scripts and frontend web code. (Even with wasm, Typescript would be tough to beat.)

dom96 · on March 3, 2019

And what other languages have you written something like trumped.com in? What was it about them that required more effort?

austincheney · on March 3, 2019

> This entire article is a pretty damning report on JavaScript in general

How so?

tyingq · on March 3, 2019

A team that likely has lots of JavaScript expertise basically stated that JavaScript is unsuitable for their task.

And that the operational improvement once written in Rust was notable enough to write a paper.

Imagine the K8S team porting from Go to some other language for similar reasons.

derefr · on March 3, 2019

Does every language have to be suitable for every task? “Operating a business-logic-embedding CDN at scale” isn’t ever something Node claimed to be capable of. I wouldn’t expect any other dynamic runtime-garbage-collected language, e.g. Ruby or Python or PHP, to be suited to that use-case either.

Use the right tool for the job. The PHP website is running PHP, but it isn’t running a web server written in PHP. Web servers are system software; PHP isn’t for writing system software. Same thing applies here. Node does fine with business logic, but isn’t really architecturally suited to serving the “data layer” when you have control/data separation.

kbenson · on March 3, 2019

> The PHP website is running PHP, but it isn’t running a web server written in PHP.

Are you sure? I'm not familiar with the PHP.net architecture, and there may be less gains from how PHP has traditionally tied itself as a module to web servers in the past, but Rails (and any number of other dynamic language frameworks) are actually web servers implemented in that language, with an optional separate web server such as NGINX or Apache you can run in front to handle the stuff they aren't as good at (static file serving, etc).

Now, that is a framework, and not the language proper, but I wouldn't be all that surprised to find python.org running on top of a Python framework.

tyingq · on March 3, 2019

Their mirroring page suggests they most likely use Apache.

"NOTE: Some of our maintainers prefer to use web servers other than Apache, such as Nginx. While this is permitted (as long as everything ultimately works as directed), we do not officially support these setups at this time"

http://php.net/mirroring.php

cferr · on March 5, 2019

Seems Python.org runs on the Django framework.

https://github.com/python/pythondotorg/network/dependencies

tyingq · on March 3, 2019

From the paper it appears to be an authorization service that decides what rights a particular user has. Not a webserver or CDN. It mentions it being CPU bound, though it isn't clear to me why it would be, or why JS wouldn't work well enough for that.

Quenty · on March 3, 2019

The paper makes it clear they were evaluating languages based upon efficiency.

The rust implementation was more efficient than the JS one. A CPU bound service of course is bottlenecked at the CPU, and this benefits from efficiency.

At scale, it makes sense to replace this with Rust. Javascript did the job, but did not provide the same efficiency as Rust.

tyingq · on March 3, 2019

I'm not clear on why this is particularly CPU heavy, though: "the authorization service that determines whether a user is allowed to, say, publish a particular package"

Filligree · on March 3, 2019

Lots of users. Anything will be CPU-heavy if you give it enough work.

Except the things that end up being memory-bound instead, but the NPM database isn't large enough for that.