Hacker News new | past | comments | ask | show | jobs | submit login

This entire article is a pretty damning report on JavaScript in general, but this sentence takes the cake (emphasis mine):

> The process of deploying the new Rust service was straight-forward, and soon they were able to forget about the Rust service because it caused so few operational issues. At npm, the usual experience of deploying a JavaScript service to production was that the service would need extensive monitoring for errors and excessive resource usage necessitating debugging and restarts.

Is this satire?




They also state that writing the service in Node took them an hour, two days for Go, and a week for Rust. Even taking into account their unfamiliarity with the language, it's probably fair to say that when switching to Rust, you'll usually spend more time writing and less time debugging. Whether that trade-off is worth it depends on the project.


It depends, I am over a year in Rust and it doesn't take me that much longer to write something in it than say in Python. The confidence I have that the thing I wrote is way higher in Rust and it is usually much faster.

And: it is incredible easy to build and deploy.

I think whether Rust is useful or not depends entirely on the application. If you need high confidence in what the thing is doing, it should run parallel and fast and you are familiar with the concepts Rust is using – it isn't a bad choice. For me it replaced Python in nearly every domain except one-use scripts and scientific stuff.

It can be hard for advanced programmers to abandon certain patterns they bring from other languages though. In the first months I tried too much to use OOP, which doesn't make any sense and leads to convoluted code. If you work more in a compositional and data oriented way while making use of Types and Traits, you will end up with much simpler solutions that work incredibly well.

Katherine West's RustConf 2018 Talk on ECS Systems describes this incredibly well and might be even good to watch if you are never intending to use Rust at all, because the patterns discussed are quite universal: https://www.youtube.com/watch?v=aKLntZcp27M


> I tried too much to use OOP, which doesn't make any sense and leads to convoluted code.

Oh, well, you don't need Rust for that! :P


Yes the same is true with C++ or C# for example, which is were the move out away from OOP and towards DOD/ECS started.


Which is ironic, given that people tend to forget that it was C++ which made OOP mainstream, there wasn't any Java or C# back then, two fully OOP based languages.

The other part is that component oriented programming is actually a branch of OOP, from CS point of view, with books published on the subject at the beginning of the century.

"Component Software: Beyond Object-Oriented Programming"

https://www.amazon.com/Component-Software-Beyond-Object-Orie...

First edition uses Component Pascal, Java and C++, with the 2nd edition replacing Component Pascal for C#.


Once a programmer achieves a certain competency level with Rust, writing familiar workflows requires little additional effort than what is expended with a dynamic language. However, lower level Rust will demand more, regardless of proficiencies.


> Whether that trade-off is worth it depends on the project.

Sure, but when you consider the drastically reduced operational cost that they're talking about there... that week is absolutely peanuts in comparison, and that was also a week including getting to grips with the language sufficient to produce the component. You really don't want to have to pay attention to production. You want to be able to concentrate on getting stuff done, not losing time keeping what you've already got just ticking along.


> writing the service in Node took them an hour

I'm really skeptical of this unless it's just a wrapper for a thing that happens to already exist. It would be interesting to have comparative LOC numbers.

> At npm, the usual experience of deploying a JavaScript service to production was that the service would need extensive monitoring for errors and excessive resource usage necessitating debugging and restarts

So, they deployed it after an hour, but it wasn't finished until they stopped having to debug it in production?


Fail early, fail often

deploy anyways


> about a week to get up to speed in the language


> about a week to get up to speed in the language and implement the program

is the actual quote.


the point is to measure at a common level of proficient across different languages. Once you are proficient and familiar with a language, then you can measure how long it takes you compared to another lang.


>They also state that writing the service in Node took them an hour, two days for Go, and a week for Rust.

>At npm, the usual experience of deploying a JavaScript service to production was that the service would need extensive monitoring for errors and excessive resource usage necessitating debugging and restarts.

But if you factor in over the life time of the program, Where Node saves you a week times at the initial implementation and you paid back in extensive monitoring, it is probably safe to say Rust's TCO is much lower.

Not Sure how Rust will flare against Go. But I think there is a high probability that Rust is better in the longer run.


> Not Sure how Rust will flare against Go.

I guess it will be domain dependent. Go uses a highly-developed concurrent GC, which is going to make it a lot more convenient for certain specialized workloads that involve graph-like or network-like structures. (That's the actual use case for tracing GC, after all. It's not a coincidence that garbage collection was first developed in connection with LISP. And yes, you could do the same kind of thing in Rust by using an ECS pattern, but it's not really idiomatic to the language.)


Rust's package managment (cargo) is the best thing I have ever seen of it's kind. The very basic thing you can do is: cargo new funkyproject

Which creates a new barebones rust project called "funkyproject". Every dependency specified in it's Cargo.toml will be automatically downloaded at build (if there is a new version).

When a build is sucessful the versions of said dependency will be saved into a Cargo.lock file. This means if it compiles for you, it should compile on every other machine too.

A cargo.toml allows you also to use (public or private) repositories as a sorce for a library, specify wildcard version numbers to only select e.g. versions newer than 1.0.3 and older than 1.0.7 etc.

Because the compiler will show you unused dependencies you never really end up including anything you don't use. In practice this system does not only work incredibly well, but is also very comfortable to use and isolates it self from the system it is running on quite well.

I really wish Python also had something like this. Pipenv is sort of going into that direction, but it is nowhere near cargo in functional terms.


> Every dependency specified in it's Cargo.toml will be automatically downloaded at build (if there is a new version).

Why do people want this? The builds are no longer reproducible, security and edge case issues can come out of nowhere, api changes from an irresponsible maintainer can break things, network and resource failure can break the build, it's just a terrible idea.

The proper use of a semvar system is entirely optional and unenforceable and seen people been bitten countless times by some developer breaking their package and having everyone complaining ... If the tool didn't do the stupid thing of just randomly downloading crap from the internet none of this would be a problem.

I presume all my dependences are buggy...I just know that the current ones don't have bugs that I have to deal with now. You swap out new code and who the heck knows, it becomes my job again. It's more work because of a policy that doesn't make sense.

Newer code isn't always better. People try new ideas that have greater potential but for a while the product is worse. That's fine, I do it all the time. But I sure as hell don't want software automatically force updating dependency code to the latest lab experiment.

Cities, power plants, defence systems, satellites, and airplanes run on software from the 80s; they don't break because a new version of some library had bugs and it automatically updated, no. They fucking work.

There's a giant inherent huge irreplaceable value in predictability and this approach ignores all those lessons.


Reproducibility was a core concern for cargo. Your parent is incorrect. A lock file means that your dependencies are never updated unless you explicitly ask for an update.


There is also cargo vendor to download the dependencies locally. I’m using just that at work to ensure builds without network access work.

Rust is no worse here than say Haskell with cabal or stack or Swift with whatever they were using I forgot or go for that matter.


You're misreading your parent. The download only happens for the first build using a new dependency. As they mention, once the version is written into the Cargo.lock file, that is the exact version that is used until there is an explicit update step run.


What does "if there is a new version" mean then? If it's a new dependency, there's no old version.


Sorry, english is not my first language, I meant this: When you build initially the used dependencies get downloaded. Only if you [A] update, [B] add a new dependency or [C] clean your project and build it again there will be new things downloaded.

If you update the versions in your Cargo.lock are ignored and updated if the build is sucessful.

If you add a dependency only that depndency is downloaded, the rest is kept as you had it.

If you clean it is as if you cloned that project fresh with git and you will have to download all dependencies. If there is a lockfile the exact versions from it will be used.

To me this is extremely flexible and works very well AND you get precise control over versions if you want it. By the way it is also possible to clone all dependencies and keep a local copy of them, so you are really 100% sure that nothing could ever change with them. Although I am quite sure crates.io doesn't allow changes without version number change, which means you should be save as long as you rely on the version number.


Yes, I suppose that's rather misleading, and that sentence contradicts with the actual behaviour that described later in the original comment. For a fixed set of dependencies, versions are only checked and changed on an explicit 'cargo update' run.


It's a good thing Cargo has lockfiles!


npm (and yarn) literally does exactly all of this, via `npm init funkyproject` and `package-lock.json`.


Except that npm will gladly update your lock file when you run npm install which is insane.


Npm hasn't done this in over a year.


The current version of npm does this and this is "correct behaviour". I got bitten by this a few weeks ago.

For the passers-by, the only way to make npm behave expectedly in this specific case is to use "npm ci" instead of "npm install". If you do not do this, npm will assume you want to update the packages to the latest version at all times, at all costs, even if you have a lock file in place, and even if you have your package file and lock file locked to exact versions. (i.e. 2.0.0 exact, not ^2.0.0)

This is a new addition, and it has been added a couple months ago. Before that, you had to check your dependencies into your source control. That might still be the best practice, and likely the only trustable way to get reproducible builds consistently over a longer time horizon.


> and even if you have your package file and lock file locked to exact versions. (i.e. 2.0.0 exact, not ^2.0.0)

Wait what? Are you sure about that part? That's a violation of npm's semver constraints https://semver.npmjs.com

(I agree with you that "npm ci" should be the default behavior, and "npm install" should be called something different, like update-and-install)


Yes - to be more specific, you can lock your own package's dependencies to an exact version, but you cannot lock dependencies of your package's dependencies. You can't do anything about them. They will get updated because their package lock specifies the lock in the form of ^2.0.0. The fact that a package lock can resolve to multiple versions is counterintuitive. One would think the whole point of a package lock is to lock packages.

As a result, when you do a npm install in your oblivious and happy life, npm naturally assumes you want to summon Cthulhu. If you didn't want to summon Cthulhu, why did you call the command that summons Cthulhu? Yes, the default command summons Cthulhu because we believe in agile Cthulhu. If you don't want to summon Cthulhu, try this undocumented command with a misleading name we've added silently a few weeks ago for weird people like you who don't want to summon Cthulhu when they want to do a npm install. But seriously, why do you not want to summon Cthulhu?

Unfortunately, this was the impression I've gotten of the position of npm folks when I read a few threads about this. I've moved to npm ci for now and moved on. Npm's package lock is many things, however, none of the things it is, is a package lock.


Or use Yarn.


…so would Cargo? If you install a new package, why wouldn’t you expect it to show up in your lock file?


No, you guys don't understand.. npm updates the package lock even when not adding a new package, i.e. the initial `npm install`. It's insane I'm think to go back to yarn again..


I'm with you, the default behavior is so counter intuitive.


You can use ‘npm ci’ for actually sensible install behaviour.


Hmm, that's pretty stupid. What is the rationale behind this? That you check before you run an install?


Why is that insane? What else is supposed to happen when you install a package?

EDIT: I misunderstood and thought you were talking about installing a package. If you're running `npm install` to just reinstall dependencies then yes the lockfile should not be modified. However it seems like that is indeed the case and you may be talking about a prior bug with NPM.


`npm install` is what you the developer would run when you first clone a project; it should install exactly what's in the package-lock.json file. Unfortunately, it sometimes doesn't do that.


Python does have something like this, which is conda [0].

It allows specifying dependencies with much of the same freedom you mentioned, in an environment.yaml file and other config files, you can provide arbitrary build instructions via shell scripts, use a community led repository of feeds for stable and repeatable cross-platform builds of all libraries [1], generate the boilerplate automatically for many types of packages (not just Python) [2], compiled version specifics with build variants / "features", and you can use pip as the package installer inside a pip section in the conda yaml config file.

[0]: https://github.com/conda/conda [1]: http://conda-forge.org/#about [2]: https://conda.io/projects/conda-build/en/latest/source/resou...


Well just like many other languages with sane environment (dependencies, building, etc.) management. I think this is the norm nowadays (D, Clojure, and so on).


I think Poetry [1] is the most promising in the python build/dependency space. I've used pipenv and left dissatisfied.

[1] https://github.com/sdispater/poetry


I gave poetry a try and liked it a lot as well, but I really miss the virtualenv handling from pipenv. How do you handle virtualenvs with poetry?


That sounds pretty similar to NPM, as well as NuGet and Paket for .NET. TBH, it's the 'obvious' way for a package manager to work, so I'd be a little surprised if they didn't all work more or less the same?


You have to run npm ci instead of npm install to get npm to respect the lock file. I don’t consider that remotely obvious. And this feature was just added to npm last year, 8 years after npm was invented!


That is incorrect. Both `npm install` and `npm ci` respect the lock file, and if a lock file is present, will make the `node_modules` tree match the lock file exactly.

`npm ci` is optimized for a cold start, like on a CI server, where it's expected that `node_modules` will not be present. So, it doesn't bother looking in `node_modules` to see what's already installed. So, _in that cold start case_, it's faster, but if you have a mostly-full and up to date `node_modules` folder, then `npm install` may be faster, because it won't download things unnecessarily.

Another difference is that `npm ci` also won't work _without_ a `package-lock.json` file, which means it doesn't even bother to look at your `package.json` dependencies.


Thanks for the reply Isaac! This doesn’t match my first-hand experience unfortunately. Are there any circumstances under which npm install with a lockfile present deviates from the lockfile where npm ci does not?

For example, why did this person experience the changing lockfile? https://github.com/npm/npm/issues/17101

Or why do these docs say?

> Whenever you run npm install, npm generates or updates your package lock https://docs.npmjs.com/files/package-locks

Oh, this seems like what I experienced: https://stackoverflow.com/a/45566871/283398

It does appear that npm works somewhat differently than the “obvious” way we would expect package managers to work vis a vis lockfiles :(

At least npm ci gets the job done for my use case :)


If you run `npm install` with an argument, then you're saying "get me this thing, and update the lock file", so it'll do that. `npm install` with no argument will only add new things if they're required by package.json, and not already satisfied, or if they don't match the package-lock.json.

In the bug linked, they wanted to install a specific package (not matching what was in the lockfile), without updating the lockfile. That's what `--no-save` will do.

The SO link is from almost 2 years ago, and a whole major version back. So I honestly don't know. Maybe a bug that was fixed? If this is still a problem for you on the latest release, maybe take it up on https://npm.community or a GitHub issue?


> Both `npm install` and `npm ci` respect the lock file

This is not correct. `npm install` will update your dependencies, not install them, disregarding the package versions defined in the lock file.

It feels like you are not getting the point of having a lock file in the first place. It should be obvious that you can't do an install (which npm calls ci) if you don't have a lock file.

The lock file represents your actual dependencies. Package.json should only be used to explicitly update said dependencies.


If you run `npm install` with no arguments, and you have a lockfile, it will make the node_modules folder match the lockfile. Try it.

    $ json dependencies.esm < package.json
    ^3.2.5
    # package.json would allow any esm 3.x >=3.2.5
    
    $ npm ls esm
    tap@12.5.3 /Users/isaacs/dev/js/tap
    └── esm@3.2.5
    # currently have 3.2.5 installed
    
    $ npm view esm version
    3.2.10
    # latest version on the registry is 3.2.10
    
    $ npm install
    audited 590 packages in 1.515s
    found 0 vulnerabilities
    # npm install runs the audit, but updates nothing
    # already matches package-lock.json
    
    $ npm ls esm
    tap@12.5.3 /Users/isaacs/dev/js/tap
    └── esm@3.2.5
    
    # esm is still 3.2.5
    
    $ rm -rf node_modules/esm/
    # remove it from node_modules
    
    $ npm i
    added 1 package from 1 contributor and audited 590 packages in 1.647s
    found 0 vulnerabilities
    # it updated one package this time
    
    $ npm ls esm
    tap@12.5.3 /Users/isaacs/dev/js/tap
    └── esm@3.2.5
    # oh look, matches package-lock.json!  what do you know.
Now, if you do `npm install esm` or some other _explicit choice to pull in a package by name_, then yes, it'll update it, and update the package-lock.json as well. But that's not what we're talking about.

I often don't know what I'm talking about in general, but I do usually know what I'm talking about re npm.


Everything has evolved to get to that point. I suppose if you start with something modern like npm then it's not obvious how bad the earlier ones are. Compare the good ones with composer, dpkg, rpm, apt or dnf, to name a few examples.


Dub definitely does this (pretty much exactly the same, dub.json = Cargo.toml, dub.selections.json = Cargo.lock), and afaik cpan does something similar.


I wrote and deployed a production service written in pre-1.0 Rust. In over three years of being deployed I never once had to touch that code. The infrastructure around it evolved several times, we even moved cloud providers in that time, but that particular service didn't need any changes or maintenance. It just kept chugging along.

Perhaps Rust's name is apropos: your code will be so reliable that you won't need to look at it again until it has collected rust on its thick iron framework.


I to have written a service that still runs and never touched it’s code since. I think I used COBOL.


Interestingly, Graydon has mentioned in the past that he may have named the language after the fungi.


Never touched code is seldom a sign of quality. But there's the old saying; if it ain't broken, don't fix it.


Another saying, “broken gets fixed but shoddy lasts forever”.

Edit: this seems like I’m suggesting rust makes shoddy results. Didn’t mean to imply that. I’m actually very excited to use Rust in prod soon.


When code is never touched, it's usually because its business requirements don't change.


Or everyone is too afraid to touch it.


Deploying a service written in any language into production environment at scale of npmjs is far from straightforward.

I think the satire here is that internet got so centralized lately that even a simple piece of code in JavaScript requires such a huge behemoth of an org running and maintaining all this monstrous infrastructure.


You've identified the problem, but I think you're wrong about the cause. It's not internet centralization that's the problem, it's the mundane fact that JavaScript does not have a large standard library. And JavaScript absolutely should have a large standard library, but it's not clear what organization would have the motivation to implement, support, and promote one. It's not impossible that a community-driven effort could accomplish the same thing, but nobody seems to be working on it.


I think the biggest to a hurdle is the fact that there is no stdlib so nothing unused has to be shaken out. I think that’s a low barrier now though with more than adequate tooling. Another comment mentions a different stdlib for server vs browser but that’s also not a terribly hard problem.

I think a good first pass would come from studying analytics from npm. What are the most used packages? The most stable? I know lodash makes a lot of sense but there’s also underscore. I think the biggest hurdles are really political over technology as everyone has been so entrenched now that a one-size-fits-all stdlib would be hard. Not impossible, just hard. I do wish someone were working on it and I hate to say it but Google probably has the most skin in the game with v8 and Chrome yet I don’t really trust them not to abandon it. So who else is there? It wouldn’t be a very ‘sexy project’ either but still seems worth it to at least try.


> I think a good first pass would come from studying analytics from npm. What are the most used packages? The most stable?

I think it would also make a lot of sense to look at what's in the Python and Ruby standard libraries.


The whitepaper notes that almost 9 billion NPM packages are downloaded per week, so I don't see anything laughable about needing good monitoring.


Which is roughly the equivalent of every single human being downloading two npm packages per week. To me, this suggests that the real problem is that too many packages are being downloaded.


I think this is a natural result of two things which should be appealing to fans of old-school UNIX philosophy:

- NPM is intentionally suited to lots of small libraries that do one thing and do it (hopefully) well, and composing those libraries in useful ways. Whereas systems like Debian have scaling limits with large numbers of packages, NPM tries hard to avoid this so that one hundred ten-line packages are as reasonable as a single thousand-line package.

- CI systems aim for reproducibility by deploying from source and having declarative configurations, in much the way that most distro package builds happen in a clean-room environment.


Probably a lot of these downloads are from bots. Continuous Integration is very common in Node.js/JavaScript projects, so each git commit anyone with CI (and no dependency caching) will download lots of packages.


> Which is roughly the equivalent of every single human being downloading two npm packages per week

The current human population of earth is about 7.7 billion, so that number should probably be closer to 1.17 npm packages per week per human being. That is still quite a lot, though


This highlights the problem of averages. Most (99.87% or so) humans download zero npm packages. But those that do, often download them in the thousands at a time. And yes, clean-room CI servers are a big part of that.


Perhaps npm could save themselves oodles of money by supplying a nice turnkey npm package cache and requiring major users to use it.

And perhaps the CI server folks would want this anyway because it would be vastly faster.


You might be surprised (or maybe not) to learn that many service providers are far more willing to spend money on predictably large bandwidth bills than on less predictable changes in their infrastructure which require human time and attention to implement.


Yep, not that surprising though, given the anemic state of the JavaScript standard library.


The idea of a scripting language is that it does not have a std. It will be different in each environment. You for example don't want the same std in nodejs and the browser. Each runtime can choose what API's it want to expose.


That’s not a definition for scripting language I’ve ever heard before and it’s neither true nor desirable. Even JavaScript has a standard library - think about things like Set, Map, Regexp, Promise, etc. – because they’re universally useful, as opposed to the runtime environment where things like the DOM are less relevant to many use cases. JSON is a great example of something crossing over as an increasingly high percentage of projects will use it.

Not having a standard library on par with other scripting languages just added overhead and incompatibility for years as people invented ad hoc alternatives, often buggy. The accelerated core language growth has been hugely helpful for that but you still have issues with things as basic as the module system which exist for understandable reasons but are just a waste of developer time.


Python is the batteries included scripting language. The two concepts are not mutually exclusive.


I would expect that to be par for the course for most languages. The more dynamic the more problematic, but it stands to reason that the less you can check for and enforce statically the more will eventually blow up at runtime.

Resource usage is similar though not exactly aligned e.g. Haskell has significant ability to statically enforce invariants and handle error conditions, but the complex runtime and default laziness can make resource usage difficult to predict.

I'd guess OCaml would also have done well in the comparison as it too combines an extensive type to system which is difficult to bypass with an eager execution model.


Default laziness in Haskell is not as big a problem as is made out to be. For someone like NPM though, Haskell's current GC would probably be too much of a bottleneck. Haskell's GC is tuned for lots of small garbage and does not like a large persistent working-set. But this has nothing to do with laziness.


> Default laziness in Haskell is not as big a problem as is made out to be

It will take a fair amount of time to be proficient in Haskell to the point where it is not a (potential) big problem.


Honestly just using strict data structures (one StrictData pragma will do) and strict accumulators in recursive functions will get you there, it's not that hard.


This is my experience, too.

I wrote trumped.com and deployed it prior to the last presidential election. The frontend and assets have been redeployed, but the core rust service for speech generation hasn't been touched. I've never had a service this reliable, and it took so little effort!

Rust is the best language I've ever used, bar none, period. And I've used a countless many of them.

The only places where I won't write Rust are for small one-off scripts and frontend web code. (Even with wasm, Typescript would be tough to beat.)


And what other languages have you written something like trumped.com in? What was it about them that required more effort?


> This entire article is a pretty damning report on JavaScript in general

How so?


A team that likely has lots of JavaScript expertise basically stated that JavaScript is unsuitable for their task.

And that the operational improvement once written in Rust was notable enough to write a paper.

Imagine the K8S team porting from Go to some other language for similar reasons.


Does every language have to be suitable for every task? “Operating a business-logic-embedding CDN at scale” isn’t ever something Node claimed to be capable of. I wouldn’t expect any other dynamic runtime-garbage-collected language, e.g. Ruby or Python or PHP, to be suited to that use-case either.

Use the right tool for the job. The PHP website is running PHP, but it isn’t running a web server written in PHP. Web servers are system software; PHP isn’t for writing system software. Same thing applies here. Node does fine with business logic, but isn’t really architecturally suited to serving the “data layer” when you have control/data separation.


> The PHP website is running PHP, but it isn’t running a web server written in PHP.

Are you sure? I'm not familiar with the PHP.net architecture, and there may be less gains from how PHP has traditionally tied itself as a module to web servers in the past, but Rails (and any number of other dynamic language frameworks) are actually web servers implemented in that language, with an optional separate web server such as NGINX or Apache you can run in front to handle the stuff they aren't as good at (static file serving, etc).

Now, that is a framework, and not the language proper, but I wouldn't be all that surprised to find python.org running on top of a Python framework.


Their mirroring page suggests they most likely use Apache.

"NOTE: Some of our maintainers prefer to use web servers other than Apache, such as Nginx. While this is permitted (as long as everything ultimately works as directed), we do not officially support these setups at this time"

http://php.net/mirroring.php


Seems Python.org runs on the Django framework.

https://github.com/python/pythondotorg/network/dependencies


From the paper it appears to be an authorization service that decides what rights a particular user has. Not a webserver or CDN. It mentions it being CPU bound, though it isn't clear to me why it would be, or why JS wouldn't work well enough for that.


The paper makes it clear they were evaluating languages based upon efficiency.

The rust implementation was more efficient than the JS one. A CPU bound service of course is bottlenecked at the CPU, and this benefits from efficiency.

At scale, it makes sense to replace this with Rust. Javascript did the job, but did not provide the same efficiency as Rust.


I'm not clear on why this is particularly CPU heavy, though: "the authorization service that determines whether a user is allowed to, say, publish a particular package"


Lots of users. Anything will be CPU-heavy if you give it enough work.

Except the things that end up being memory-bound instead, but the NPM database isn't large enough for that.


> Anything will be CPU-heavy if you give it enough work.

Not in a relative sense. If authorization is 5% of the work, scaling it leaves it at 5% of the work, and it's never a bottleneck. Authorization was being a significant bottleneck, not a tiny percent, and that is somewhat surprising.


Obviously authorisation will be a huge overhead compared to sendfile + nginx right? Am I misunderstanding what npm does?


I mean, to use sendfile you need to open the file, and that does a permission check too...


No I'm talking about private NPM right. The perms on the file system are not equal to (or as costly as) the auth I need to have to access my private NPM repo.


Couldn't you implement the authorization logic in, say, Redis? Then the npm service is I/O-bound again, and everyone is doing the job they're optimized for.


Authorization checks aren't that expensive. The overhead of using an external service might well make it take up more hardware overall.


It's not clear either, it makes sens to use Rust for CPU heavy task, but a CRUD service that do authentication would be fine in Nodejs since every low level crypto are using C. So I'm not sure exatly what they mean, tbh the paper is very light on details.


Some CPU heavy operations like crypto are not put in a threadpool. While it may be running C code, it will block the main thread while executing. See https://github.com/nodejs/node/issues/678

Perhaps the new worker threads may alleviate this, but I'm not sure (it's still an experimental API).


My gut feeling is that the user keys are not random generated, but actually encrypted strings that contain the permission values. Decrypting them with modern encryption algorithms (like ChaCha) is pretty CPU intensive.


"Javascript did the job", but not very well apparently. They specifically say that they "were able to forget about the Rust service because it caused so few operational issues". Add to that the increased CPU- and RAM-efficiency that generally comes with a rust implementation, and that rust rewrite looks like a no-brainer.


Well I'd say that if it's an authorization service there will be cryptographic calculation which is "heavy" on the CPU.


I don't think that's what it does. "the authorization service that determines whether a user is allowed to, say, publish a particular package"

Also, I would assume any crypto in v8 is already written in C, with JS calling into it.


Users who suggest cryptography are getting downvoted (myself included). I am curious as to why. Is it perceived as spam or an attack on nodejs? Encrypted cookies and JWT (json web tokens) rely on a similar strategy so this is pretty standard. It would be pretty secure as long as the encryption key is not unique, so definitely not criticism on the NPM team but mere theories as to why a (what one would assume is a) database or memory bottle-neck is being presented as a CPU bottleneck in this scenario.


I guess I just don't see it. NPM is a massive infrastructure where one little piece was rewritten in Rust and then the Rust team wrote a promotional paper about it. The paper isn't bad, but it also isn't a technical white paper.

I don't see how any of this is critical of JavaScript, which isn't even really discussed in the paper and still runs the rest of the infrastructure. If anything the paper is more damning of C, C++, and Java but I still think damning is far too extreme to describe what the paper said.


> If anything the paper is more damning of C, C++

I don't see it as damning on either of these. On C++ it says "we didn't want to learn it." Which is fine. Maybe after learning they would have decided different, or not. On Java they said "we didn't want to learn how to operate it" as they feared the complexity of an Java application server for a single small service, which they can create in a way which hooks into their monitoring infrastructure. No damning their either.

However their company's purpose is to push Javascript and they are saying "operating JavaScript is hard, doing this in Rust is easy" which directly goes against their business.


I think you and I read completely different papers. I didn’t see anything about not wanting to learn or motivations of easy.


I mean, we still write the overwhelming majority of our code in JavaScript. We port to Rust when CPU-heavy task becomes a bottleneck to the rest of the system. It's not as if this paper is saying (nor is it the case) that we've ported the whole registry to Rust. JS has lots of advantages we appreciate.


I'm curious why this authorization service is CPU intensive. The article says it's basically deciding if you're authorized to publish a package. It sounds like the sort of thing that would talk to a database, or a cache like redis, and therefore mostly be IO bound itself.

Is this maybe parsing data structures itself?


It's been a few years since I was directly involved in engineering, but my fairly educated understanding is that it's more around reading of possibly-private packages than publishing.

Publishing is a relatively rare event compared with reading, but in a world of private packages, orgs, and teams, the "can {user} read {object}" gets more complicated. It probably wouldn't be CPU bound if not for the sheer scale we're dealing with, but once all the IO bottlenecks are resolved, you still have to check to make sure that a login token is valid, then get the user associated with it, then check the teams/orgs/users with access to a thing (which might be public, in which case, the problem is a lot simpler, but you still have to get that info and check it), and then whether the user is in any of those groups. So there's a straightforward but relevant bit of CPU work to be done, and that's where Rust shines.


> Imagine the K8S team porting from Go to some other language for similar reasons.

k8s was originally written in Java, so they have already done this once.


I didn't know that, but it makes it a more interesting comparison. I imagine at least part of rewriting it in Go was to "eat your own dogfood".


Actually it was more like "new team got in charge" actually.

There is a FOSDEM 2019 talk about it.


Thanks! The talk:

"The clusterfuck hidden in the Kubernetes code base" https://fosdem.org/2019/schedule/event/kubernetesclusterfuck...


One wonders if there'll be a similar talk in a year...

> The audience walks away feeling empathetic that they aren’t alone in their journey to writing idiomatic Go and is now equipped with strong refactoring techniques developed by some of the world’s top engineers for the Kuberentes project.

As an occasional user of kubernetes, minikube etc. it's not something I would have guessed to have been developed by the word's top engineers.

I mean kubernetes tries to, and probably manages to provide an useful abstraction, but at a few million LOC and a few man-month of full time senior engineering effort to run anything in production it's not exactly a epitome of elegant and efficient engineering.


You would be surprised at the quality of Android tooling stable releases, to the point that now there is Project Marble in place to try to improve its image.

https://adtmag.com/articles/2019/01/17/project-marble.aspx

If you want to have some reading fun, check /r/androiddev/ every time there is a "stable" release.


Even people working for Google on Android tell me the tooling is pretty garbage, so I'm less surprised than you might think ;)

The interesting thing with Kubernetes is that it's basically a re-imagining of Borg, which one assumes was not a few million lines of code when it was already running all of Googles infra more than a decade ago. It's obviously not solving the exactly same problem (e.g. Google correctly recognized that DNS isn't so hot and wrote their own replacement protocol, BNS which wouldn't fly for external adoption etc.). But I'd be curious to know how Borg's code quality and size back when it became the standard way to run stuff at Google maybe 12 years ago compares to Kubernetes today.


I bet k8s would look different if it were written by developers used to a more limited set of cpu, io, and memory resources.


Well, given the hops one has to go through for the privilege to work at Google, another level of code quality is to be expected.


Wow. Didn't realize it was that bad:

> We look at what it would take to begin undoing the spaghetti code that is the various Kubernetes binaries

Well at least the developers are being frank about it I guess.


My guess would be more on the side of native binaries (i.e. no runtime JVM dep) and lower memory use, but could be that too.


The k8s team probably should, judging from the state of their codebase.


I’m interested to see what becomes of https://www.cloudatomiclab.com/rustyk8s/


I'd kill for a rust client with feature parity. Writing k8s operators in rust would be so much better.


If you feel that way, this is pretty funny then: https://news.ycombinator.com/item?id=19295841


Easy to criticize a projects that has x millions line of codes.

Every projects of that scale are going to have issues.


There's a lot of circumstantial evidence suggesting that efforts to refactor the codebase were torpedoed because of developer elitism.


Anecdotal, but I recently used Gulp in a project to run some css clean-up tasks as part of a build process.

The JS dependencies:

    "gulp"
    "gulp-clean-css"
    "gulp-postcss"
    "gulp-uglify"
    "autoprefixer"
    "postcss-uncss"
    "uncss"
The number of node modules: just over 400.

So I'm not at all surprised that this might create surprises when deploying JS services in production.


That’s an ongoing annoyance with using NPM in a security-conscious environment. It’s really easy to end up with thousands of submodules and the amount of time you’ll have an audit showing a vulnerable package can be many months while layers of dependencies slowly update. You can usually show that it’s not exploitable but the number of modules on the average project means you’ll be doing that all the time.


npm now automatically reports known vulnerable packages, right?


Yes - which is great for surfacing this, along with GitHub’s alerts, but unless it’s a direct dependency I find I’m usually just stuck researching the vector and waiting months for numerous layers of dependencies to update in sequence.


Oh, and to be clear: I think this is a problem with OSS sustainability – shipping updates takes real work – more than NPM, mildly exacerbated by the JS stdlib leading to more modules being used instead.


The same is generally true for things written in Go, Java, C#, etc. Strong typing and memory safety eliminate huge classes of common bugs.


Do you mean static typing?


The distinction in the vernacular between statically typed and strongly typed languages is so narrow at this point that pointing it out is a bit pedantic.

I would say Rust is both a strongly typed language and a statically typed language. The static type checking happens at compile time, and in general the types in use are strict and strongly typed at runtime.

But, even Rust allows you to cast types from one to another and use dynamic types determined at runtime.

Yes, most people would say that static typing is the primary advantage you get from the compiler in Rust.


It wasn't intended as a drive-by pedantic swipe - I was genuinely curious whether OP meant strong or static. Conversations about type systems and application correctness are exactly the place where precise definitions are welcomed, but I understand that the distinction between strong and static is often conflated. It can be relevant if we're discussing static typing for example, as then something like TypeScript becomes useful.

There's a great writeup by one of the C# people (I want to say it was Erik Meijer, but I'm having a hard time finding it atm) about the distinctions we're discussing here, their relevance to correctness, and the impact on ergonomics. My takeaway from it was that occasionally you will encounter problems that are easier to solve with some freedom and that's why strong/static languages like C#/Rust include pragmatic escape hatches like the dynamic object and the Any trait (respectively).


I think that's fair.

If you do find that writeup, I'd be interested in reading it.


I think I meant static and typed strong.


> The distinction in the vernacular between statically typed and strongly typed languages is so narrow at this point that pointing it out is a bit pedantic.

I'm not sure why you think this has changed today, or what you mean. It appears to me that many programmers don't realize that the two are orthogonal, so I find it an important, not pedantic, distinction (it just happens that languages generally improve on both fronts over time, hence asking for one also gives you the other, but it's because of correlation, not causation). I'll make an attempt at describing it here, please tell if I'm missing something.

Strong typing means that types describe data in a way that the data won't accidentally be mistreated as something else than what it represents. E.g.

(a) take bytes representing data of one type and interpret it as data of another type (weak: C; strong: most other languages)

(b) take a string and interpret it as a number without explicitly requesting it (weak: shells, Perl; strong: Python, JavaScript, Ruby)

(c) structs / objects / other kinds of buckets, (strong: C if type wasn't casted; weak: using arrays or hash map without also using a separate type on them that is enforced, the norm in many scripting languages although usually strengthened via using accessor methods which are automatically dispatched via some kind of type tag; also, duck typing is weaker than explicit interfaces)

(d) describe data not just as bare strings or numbers, but wrap (or tag / typedef etc.) those in a type that describes what it represents (this depends on the programmer, not the language)

(e) a request of an element that is not part of an array / list / map etc. is treated as an error (similar to or same as a type error (e.g. length can be treated as being part of the type)) instead of returning wrong data (unrelated memory, or a null value which can be conflated with valid value)

These type (or data) checks can happen at runtime ("dynamically") or compiletime ("statically"). The better a static type system is, the more of these checks can be done at compile time.

For security and correctness, having strong typing is enough in principle: enforcing type checks at runtime just means getting a failure (denial of service), and systems should be designed not to become insecure or incorrect when such failures happen (fail closed), which of course might be done incorrectly [1]. Testing can make the potential for such failures obvious early (especially randomized tests in the style of quickcheck).

Static typing makes the potential for such failures obvious at compile time. It's thus a feature that ensures freedom of denial of service even in the absense of exhaustive testing. It can also be a productivity feature (static inspection/changes via IDE), and it can enable more extensive use of typing as there is no cost at run time.

[1] note that given that static typesystems usually still allow out of memory failures at run time, there's usually really no way around designing systems to fail closed anyway.


It's not that I disagree, with any of your points. I think they're all valid. I do think most people use the term strongly typed, where they mean statically -and- strongly typed.

I personally don't fret about it unless we get into specific details about these notions. I do especially like your (d), which many people often overlook when designing programs. An example would be to use a String as the Id in a DB, but not wrap the String in a stronger type to represent the Id, thus not getting the advantage of static type checking by the compiler. So there are definitely areas where this conversation can lead to better advantages of different languages.

For example, in Rust declaring a type to be a `struct Id(String);` would cause no overhead to be associated with the Id in terms of memory allocation to that of just a String. Not all languages can say that, thus we could also get into a fun conversation about the overhead associated with the type system itself. All fun topics.


Thanks for your reply. I fully agree.


Do you actually believe it and why? To me it looks like a fairly pathetic attempt to damn JavaScript but I'm not sure why anyone who has actually used JS for any length of time would be convinced of its accuracy.

The article is hosted at rust-lang.org so one would be wise to take their words with a grain of salt. And Rust isn't even a tiny fraction as popular as JavaScript and so when you use Rust you're choosing from a small set of packages written by experts and the kind of people who use languages that nobody else really uses. Meanwhile, JS has millions of packages written by everybody for various platforms (since JS can run in all sorts of environments where nobody would ever want to run Rust.)

Also there's the anecdotal, yet easily empirical evidence that just about any developer who uses JS can tell you about: I deploy new JavaScript services all the time without any of those problems.

So, I wonder if you're actually asking this question or if you have some other agenda.


This paper is written about the experiences of the organization that distributes those millions of packages. Those same people that have written so much important JavaScript their tool gets distributed with node itself.

What kind of load do your node services get? I’d be willing to bet the npm registry has more. That plays into this kind of thing.


The article may have been hosted at rust-lang.org but it was written by people at npm which is very much a JavaScript boosting organization.

Rust definitely has some benefits as well as tradeoffs when compared to JavaScript which they discuss. Learning curve is higher but the end product is probably devoid of a number of errors and operational issues over the lifetime of the service. While in theory possible to get similar results with JavaScript the level of consistent discipline it requires is in practice impossible.

These are facts not opinions.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: