There is no way for anyone to know what all this code is doing, there is little way to verify updates and its simply untenable.
If some developers like this sort of unsafe practice it should be strictly limited to their machines and in no way make it across in any form as a deployment artifact.
There are already secure distribution package managers with the necessary infrastructure, you are not special, use those. Ruby is already paying a price for imposing dependency hell of users and wasting millions of man hours. Many have suffered and do not even bother with Ruby apps anymore. Node and others who think this is a good model will be next. Users should simply boycott such user hostile developers and languages that encourage this kind of insecurity.
> There is no way for anyone to know what all this code is doing, there is little way to verify updates and its simply untenable.
Because few people have time to read source code. Take Django as an example. Big community, lots of contributors. Can we say we should trust the code because the number of eyes? Probably, but not always.
Why? We can overlook and pretend the code is legitmate. Take Linux kernel or some of the crypto projects out there. A lot of unreadable old tricks made backdoor really easy.  is interesting because someone apparently made an authorized changeset to BitKeeper. Are you going to read JRE code to make sure no backdoor? Nah.
Is there a solution? Nope and will never have a solution. The best one can do is using a DVCS to prevent unathorized changeset (provided every contributor does a sanity check for pulling in changeset) and trust the community. No machine can distinugish a good code from a bad code. Humans can't even tell unless raising a red flag.
Sometimes change of project ownership can also introduce some uncertainty, but that is a rare case for major projects.
I am pretty confident there are code in the Linux kernel no one has ever touched or ever read for years.
I suggest you read the link you mentioned. It was not an authorized change in BK, it was not an authorized change at all. It was only in a third-party CVS mirror, and it was found exactly when someone asked why there was a changeset in the CVS mirror that wasn't in BK.
And while I don't doubt that there are enough unreadable code to hide a backdoor in, certainly anywhere in the massive driver tree, this backdoor is a fairly obvious one. I don't think you could hide an assignment as a comparison through code review. It's such a common error to make that it really stands out.
> Are you going to read JRE code to make sure no backdoor?
That is a highly misleading question.
While I will trust the JRE developers without checking every single change, they are a diverse bunch enough that the fact that they are checking each other's work goes a long way.
It's very much the same situation as with the kernel and the compiler. The difference is only one of magnitude.
The difference between that and nodejs or ruby where any one can upload anything, anonymously and completely unchecked, is enormous.
I ack when it comes to size my comparison is not fair, but I am trying to say just because there is a large community, we sometimes still overlook. Usually big projects would have some "module owners" to approve merge. This limits to a number of people. If the project is active enough, there will be dozens of commits or more per day. Sometimes people really do overlook and let things pass. Until someone spots something wrong, the code could have been in the wild for days or years.
They simply need to deploy the app and keep it updated safely. If developers and languages have not thought of this basic step then please defer to the distribution package manager which has.
These kind of frauds won't pass the average distribution package managers scrutiny and CVEs will have quick updates that are tested to work.
Contrast this with users scrambling to update affected apps and their individual local libraries which in turn may have their own specific deps which may not have been updated and thus will fail because of version inconsistencies. End result. Millions of man hours wasted because of clearly bad engineering practices.
What's worse is that I skimmed the tree to check for anything particularly heinous, but there was nothing that stood out as unneeded.
With such a tiny stdlib, especially out of the browser environment, there's not really a better alternative than to make it easy to include dependencies for your dependencies. Without NPM, the Node community would be tiny if not already dead.
I don't have a better idea either. For my purposes, it just means I use a different language, but that's not really a solution.
These things must be thought of upfront and cannot be left vague and open to adhoc practices that leak to end users and deployment creating insecurity.
For instance a package should be a package, it should not be a simple class or function that in turn pulls in a hundred dependencies. 10 packages like this can pull in 600 packages all with conflicting versions. This is exactly dependency hell that creates insecure practice as no one can verify the chain and sheer number of packages.
Language developers cannot just sit back and let this happen. Eventually you will pay the price for this kind of shoddy engineering. This should be the minimum required from any responsible language.
And best then if you don't use the system package manager, do not mix up both to create a frankenstein language that multiplies complexity for everyone.
The biggest thing is stop being faddish. Recognize upfront that there are always developers and groups jockeying for influence and creating dependencies on their apps and platforms. Discourage them from hoisting their self serving bad ideas on the ecosystem and multiplying complexity for everyone. Ultimately the language and ecosystem have to develop consensus on robust engineering practices or a wildland which will eventually lose users.
The reason npm has teething problems is because it's one of the first package managers to actually reliably address the problem of libraries depending on other libraries. Almost incredibly, nobody bothered to solve this problem before npm. i.e. npm addresses the case of library X depending on Z v2.0 and library Y depending on Z v3.0 without falling flat on its face.
Or did you mean the case where you want package D as well, and it depends on a conflicting version of package C? In that case, it solves the problem by pulling in both versions of package C and running both side by side.
Running both C side-by-side cannot work if C is incompatible (that is, even if both versions are api compatible, but each assumes it's the sole C being loaded - and therefore, do some static/singleton crap that might get clobbered when loaded again).
There isn't even a common specification for the way packages should export public symbols. You have a choice of CommonJS, AMD, Ecmascript 2015, etc.
I've honestly never encountered an NPM package which couldn't be run side-by-side with another version of itself. This is due to the nature of how Node's module system (CommonJS) works; packages are isolated from each other and only share resources with each other via explicit exports and imports.
I suppose a conflict might be possible if the package was using native extensions or connecting to some external service or something, but for the most part NPM's module system makes conflicts very unlikely.
How confident are anybody that a random CPAN package's original maintainer is still actively maintaining anything given the average perl hackers age? and how confident would you be that the NPM repos are going to suffer under less bitrot then CPAN in say 10 years from now?
How confident are you that all major versions of your own code will be either removed from any active package manager or production install or patched for all known security flaws?
And for last the big one how confident are you that all of the fixed version dependencies you add can you make any guarantee that they wont be a big open security hole sitting waiting for anyone still stupid enough to just npm install code 3 years from now?
NPM benefits from still being just before/around peak hype where most of the people who commited to npm's repo's are still early in their careers, at one point it will face the point where most original package submitters have abandoned the task of maintaining them, even if node itself remains around to an greater extend then Perl still does today?
In the linux package management world it have significance when a project is packaged and put into a core repo, as most Linux distributors promise to help fix abandoned code included with the core distribution, nobody in the world of gem, yarn, pip or npm makes any such guarantee and nobody screens new package maintainers before granting them a "name" with commit access to even the degree of Debian or Fedora, which are both fairly open communities.
Do a little research — check the package's npm page, assess whether it's too light or too heavy for your use-case. Check its github page to assess whether it's currently maintained (and how important that is for your use-case). If you're unsure, look at similar packages and/or peruse the source code.
It only takes a few minutes and you'll have much greater confidence because you know you picked the correct multi-byte-string-length-calculating dependency for your use case, not the naive implementation which is 100x slower (for example).
You're too kind.
NPM is the symptom. JS is the problem.
To clarify, I did NOT choose that package. Because it brought in 690 dependencies...
Am I saying padding a string is difficult? No. But I am saying it's an incredibly common operation as evidenced by how much broke when it was pulled from NPM.
As as this talk from 2016 shows, the versions that used to be available on NPM don't even pass a reasonable set of tests for a left pad: https://youtu.be/FyCYva9DhsI?t=605 Not even being in the spotlight was enough to catch the bugs there, if you reimplement the world from scratch you're bound to make some errors yourself.
Packages taken over include misspelling of urllib. What exactly do you propose as an alternative here?
I'm all for limiting bloat, but your rant here seems completely inappropriate for the issue bring described.
No, but downloading random code is just insane.
As a comparison, Debian has ~3000 packages, and every single package had an identified maintainer, with its own GPG key, validated in face-to-face meeting with an ID card by three people, an identified upstream, etc. Each maintainer is physically identified, and has passed a number of technical validation steps, explained his motivation etc.
There is also a dedicated security team that can be contacted 24/7.
The system is not perfect, but it provides a good level of security. And this is a project only made by volunteers.
As of yesterday npmjs had 516,132 packages, which was an increase of 373 since the day before. Debian's 3000 packages is a couple of weeks of npmjs activity. Even if you stripped out the unnecessary, abandoned, or duplicate packages you're still looking at a something significantly different to Debian.
For what it's worth I think a well-maintained, verified, and known secure subset of npmjs would be a great idea, but the logistics of providing that would be effectively impossible without some serious cash behind it.
How do things like left-pad even come to be widespread dependencies? Does the node development process involve a lot of "gee I wonder if someone made a package for (simple thing I need to do with a few lines of code)"?
is a nice article on the subject of left-pad. If you can't implement a left-pad in less than 5 minutes yourself, you don't know how to code.
Not to mention the code reuse factor.
If npm took the route of debian's verification/certification, you might end up with no libs at all!
Nicely said, but you omit the elephant in the room: that the dependencies have
their cost, which is quite high and is rarely matched, much less offset, by
the value of these "simple libraries".
Python projects I have recently worked on might max out at 50 packages. The last project I worked on with nodejs as front end had 3000+ packages just for the UI.
Only in volume. In code quality and man hours they're an order of magnitude ahead of all of npm.
(Just consider something like Linux the kernel, GCC, Apache, OpenJDK, core utils, Python, Postgres, etc, compared to the trite that even the best npm packages are).
It's irrelevant to your point but Debian 9 included >51,000 packages .
As for Debian making some policy decisions that you disagree with, it's very different for Debian to make decisions than for NPM to make decisions. There's just no comparison.
No, they mean that we need a better method than trusting some random popular GitHub / PyPy / npm whatever storage and delivery mechanism.
Strong core language libraries ("batteries included"), that come with your distribution of the language/compiler and are well maintained, used by millions, and signed would be a good start to at least have 80% of dependencies being actually dependable.
There's a running joke in python that stdlib is where modules go to die. Stdlib by definition needs to be more dependable and stable than other places. But that reduces the innovation. There's a reason urllib is terrible to use and we're on urllib3 now, yet the nicest solution (requests) is not included.
See http://www.leancrew.com/all-this/2012/04/where-modules-go-to... This applies to some extent to all languages / runtimes.
The trick is avoiding another urllib: shipped as a stdlib, and passed over for libraries with better interfaces.
perl has the idea of core and dual-life modules: when a new version is released, it ships with some additional modules. Some of these modules live in the perl repository & are called 'core modules'. Upgrading them requires a new version of perl to be release. Some of these modules are forked & published to CPAN & are called 'dual-life' modules. Upgrading them requires installing them from CPAN, and new versions of perl can provide a well-tested newer version.
All the modules used to be core-only, and over the years any which weren't heavily tied to the internals of the perl interpreter (like opcode deparsers etc.) have moved to be dual-life instead. This avoids problems like interface issues in the standard library by allowing a new version to be immediately released (with some overhead for developers who need to upgrade it for deployments), be tested in the real world, updated, and then included in the next language release.
I've never done it, but it at least sounds like there's a process where you need to convince 3 or more people that adding your package is a good idea and will not hurt security. That's very different from PyPI or npm.
However, you have to sign everything with pgp including updates and that is verified. You also have to own the domain with same path as your packages - meaning name space is larger and name clashes less likely. They actually check this and won't release unless you host project. Which explains why java open source tend to use packages like com.github.my_account.my_project
It's not a free-for-all like npm, but may be npm can learn a thing or two.
1. Similar way of identifying who is responsible for a package to Debian style repo - would allow those affected to identify a named person if malicious code added.
2. Fuzzy matching of package names (so taking the library name urllib also takes urilib, urlib, urllib2 and other names with a short document distance).
3. Built in mechanism to allow third party reviews within the repositories infrastructure.
They made it pretty clear that they get them from a distro repository. Most are much better about managing breaking changes and offering years of security patches (at least to core packages).
> Packages taken over include misspelling of urllib. What exactly do you propose as an alternative here?
Stricter guidelines and auditing should have caught this.
Even if you can work with that, you need to sort out differences between your destination system, test system, and all of your developers machines (who most likely run Mac, not your server Linux flavour). Sure, there's still docker, vagrant, etc. But this means slapping more and more layers just to make the same packages from the same source available.
And unless you want a "works on my machine" environment, you can't just leave developers to do whatever and use a different system in production. A home brew, limited system will just result in shadow it, which will use pypi unless your solution is much better and easier to use (and allows fresh versions to be imported)
I've been doing this stuff for years in big projects. It's not easy.
This compounds the dislike companies already have for updating software, updating packages isn't something that's ever planned and budgeted for. A standard enterprise app will be using tonnes of outdated and potentially insecure packages, I've come across some that are a decade out of date with no pain-free update path.
And now it seems that even "systems languages" are heading down this path.
Every single Ruby post has commentators complaining about dependency hell and steering clear of Ruby apps. This was not the case even a couple of years ago.
This is effectively a user boycott which Ruby may not deserve but has brought on itself by letting the 'break everything crowd' run amok. They have moved on to Node and will move again to the next big thing but it's Ruby left dealing with the fallout.
Strangely the problems with rubygems and related package managers were apparent five years ago (at my first contact with earnest gem development). It seems the major change is major gems get abandoned—take for example bcrypt. I wonder if the rise in distrust and the rise of abandonded but critical gems are related.
These sort of grandiose statements have of course existed on the Internet for decades, but with the rise of demagogues like Trump, we see that such statements can be readily believed en masse without a second thought.
Could you provide data or numbers explaining that Ruby's package management system is a factor in new apps not being built in that language anymore?
I don't mean to necessarily add politics in here, but hastily throwing out intimidating messages--"Ruby's package manager is broken and now no one is using it; Node and others will of course head down the same path"--helps no one. First, it harms the morale of new programmers learning these languages who may be lead to believe they are wasting their time. Second, it frankly is rude to the groups of individuals who do work on the package managers themselves; if you have a better idea, build or sponser one.
The burden of proof is on you to provide these statistics, so until then, there's also the third result: it makes you look like a crank, not a professional who has considered the pros and cons of different package management systems.
Also, there's a third choice if you have a better idea about how package managers could work: you can use an existing alternative that works better. If said existing alternative happens to be in some other language, and you don't have a strong reason to stay on your current language, then switching languages is perfectly sensible.
btw you should be making $6000 + $1000 or so a month for at least a couple of years doing that, not $6000 a single time. the total lifetime value of setting up an application should be above $30k, hopefully way above, but at the entry level where you are that's a nice healthy number to shoot for.
we bill our total customer base over $6000 a day for this kind of work and it all starts with 'assembling a job' but certainly doesn't end there, or ever (hopefully). good luck buddy!
Hi bro :)
Leave Messages via HTTP Log Please :)
Happy to see somebody find it ! :)
Just curious about how long it would take for people to find those 'bad' packages
As you see, that's just a toy script, no harm, hope you enjoy it !
While our code was similar in nature, it was non-obfuscated and we always threw an exception telling the user that he installed something he shouldn't.
It's like saying the good thing with not having a cellphone is that you avoid a lot of fight with your girlfriend cause you can't talk as much.
I can't quite see what's wrong with that...
Lodash functions are available in both forms.
(2 days ago, 465 points, 245 comments)
Interestingly, my fake system packages have been downloaded about 480 000 times so far this year
But NPM is worse with all its dependencies of dependencies. Composer (PHP) got both namespace and dependencies right: flat dependencies, it's up to the developer to resolve conflicts, not to the package manager to create insane dependency trees.
It leads to more stable packages and make spotting fakes easier.
Even more useful might be domain/keys/kreitz/publickeylisting
The key signing party is much harder to arrange but is easier to be confident about
Personally I am surprised the Post Office does not do key signing
Edit: I thought pip did do cryptographic checks?!
That sounds useless. An attacker could verify that they own evilattacker.com & publish their malicious packages there.
I don't think a web of trust is the answer here, because it doesn't really matter if the attack is anonymous or not, and if only trusted people can publish packages, trust will be given more readily to encourage new programmers to contribute.
I think a reputation/review system is better, like docker search's star count, or metacpan's ++ rating.
PyPI also supports adding GPG signatures alongside packages, but with no trust/verification process to assert "this key really is the key of the person who should be releasing this package", the signature is literally worthless; anyone who could put up a fake package could also generate a signature for it, and you'd have no way of knowing that the key which signed that package shouldn't be trusted for that package.
It is a very very hard problem, and people need to appreciate that.
– acqusition (uploaded 2017-06-03 01:58:01, impersonates acquisition)
– apidev-coop (uploaded 2017-06-03 05:16:08, impersonates apidev-coop_cms)
– bzip (uploaded 2017-06-04 07:08:05, impersonates bz2file)
– crypt (uploaded 2017-06-03 08:03:14, impersonates crypto)
– django-server (uploaded 2017-06-02 08:22:23, impersonates django-server-guardian-api)
– pwd (uploaded 2017-06-02 13:12:33, impersonates pwdhash)
– setup-tools (uploaded 2017-06-02 08:54:44, impersonates setuptools)
– telnet (uploaded 2017-06-02 15:35:05, impersonates telnetsrvlib)
– urlib3 (uploaded 2017-06-02 07:09:29, impersonates urllib3)
– urllib (uploaded 2017-06-02 07:03:37, impersonates urllib3)
pip list –format=legacy | egrep ‘^(acqusition|apidev-coop|bzip|crypt|django-server|pwd|setup-tools|telnet|urlib3|urllib) ‘
pip list –format=legacy | egrep '^(acqusition|apidev-coop|bzip|crypt|django-server|pwd|setup-tools|telnet|urlib3|urllib) '
works for me