The paper also discusses possible mitigation measures, including prohibiting registering new packages within a certain Levenshtein distance of existing packages and using additional namespacing.
Even if NPM isn't prohibiting packages, you'd imagine they'd have internal security alerting for Levenshtein distance from the names of very popular npm packages. Such an alerting script wouldn't take terribly long to write (or to run). It'd let them catch this type of abuse much faster even if they decided (for some inane reason) that banning the names outright would break UX.
You can't assume that the original developer will be the first person to post their package. For example, I published some code on my blog long before NPM was a thing. Today that code gets 10k downloads a month on NPM but I had nothing to do w/ publishing it on NPM and haven't audited it for changes.
There's a minor regex error in the one-liner: not escaping the dot, which matches any character. Fortunately that won't cause any false negatives, and will only incorrectly match weird things like fabric9js
Fixed:
npm ls | grep -E "babelcli|crossenv|cross-env\.js|d3\.js|fabric-js|ffmepg|gruntcli|http-proxy\.js|jquery.js|mariadb|mongose|mssql\.js|mssql-node|mysqljs|nodecaffe|nodefabric|node-fabric|nodeffmpeg|nodemailer-js|nodemailer\.js|nodemssql|node-opencv|node-opensl|node-openssl|noderequest|nodesass|nodesqlite|node-sqlite|node-tkinter|opencv\.js|openssl\.js|proxy\.js|shadowsock|smb|sqlite\.js|sqliter|sqlserver|tkinter"
Wow some of these are incredibly similar, if they get indexed by google (as most NPM packages do) it's really easy to mistakenly add the incorrect package while searching for the legitimate one.
This is something I've always been concerned about with the node/NPM environment. Any project, even the smallest ones, have hundreds of dependencies. All it takes is some small lines of rogue code, and your entire project is vulnerable. Especially in JS, where you can do network requests and various critical actions all in one line of code.
It's next to impossible for the average team to avoid hundreds of dependencies as their project grows though. You could write your own everything, but then what's the benefit of the node ecosystem?
I wish the node ecosystem would learn from more mature ones.
I would like to see a curated set of popular libraries that are stabilized and blessed, and a core group that handles security updates and upgrading packages in the blessed set.
who gets paid to support that ecosystem, and who does the paying? as a simple example, I use the hapi.js server framework (because it is awesome). At one point, it was supported by Walmart Labs. It isn't anymore - it's mostly supported by the original author (Eran Hammer), who is getting paid less than $5k per month to develop / support it via patreon (and take a look at the bimodal distribution on his patreon page). And that's just for part of the overall ecosystem.
Part of the excitement of js dev is that there's always really useful libraries being created and distributed (ramda, rxjs, react to just name three things that start with R).
Not sure there is a good solution here - we want tons of value, but I suspect nobody's willing to actually pay for it. Libraries like cross-env, ramda, and so on all are excellent, useful, well-written, and the authors are responsive.
Other programming ecosystems have solved these problems without these glaring problems, so I don't understand why you're acting as though this is unsolvable while requiring lots of money.
Other programming ecosystems are at least an order of magnitude smaller and not intrinsically tied to technology that's in relatively extreme flux with a massive amount of users (browsers).
It's true that NPM is the biggest package repository by some way, but it's only ~2x the count of the Maven repository (492135 vs 194954). PHP and Ruby also have the same order of magnitude.
tied to technology that's in relatively extreme flux with a massive amount of users
Maven/Java has similar challenges to some extent with both Android and server side development being extremely common in the same language. The large number of users and the extreme technology flux is also similar.
I doubt he does, but i also doubt that installing a framework via maven brings in 1,000 implicit dependencies from 1,000 untrusted authors, unlike in JS land.
Edit: react, webpack, babel, babel-preset-env bring in 1,257 dependencies. Try vetting all those by hand.
I think my (primarily Java) project at work brings in about 50-100 external dependencies. That includes some ridiculously large frameworks, like Spring. Honestly, I don't know because I don't have dependency problems.
You know how hard it would be for me to vet the dependency if I did have a dependency problem? "mvn dependency:tree > deps.txt." It would take half the day, but I could vet them all.
I also have a Django project on the side. It has 8 dependencies. I have vetted all of them very carefully. It's smaller than the work project, but that's not the reason why it has fewer dependencies than a typical JS mess. It has fewer dependencies, because Pythonistas have a philosophy and cultural practices that produces quality software with lots of functionality included.
Npm being split means more metadata, not necessarily more code. You admit to not reading your code or inspecting your dep, and you assert that you have no problems - based on what?
Your deps.txt is barely different than a lock file here.
Reading the code from dependencies is not really hard anywhere here.
0. What I do is irrelevant to whether or not the other people in the community have solved that problem. Which they have.
1. That's a Gradle plugin. I do not use Gradle. I use Maven.
2. I don't need to "manually verify everything I whitelist" (which is not exactly how Maven works but w/e) because everything in Central is cryptographically signed and can't be replaced simply because someone deleted all of their projects in a fit of pique, and some other person swooped in and used the exact same name.
3. I don't have to worry about typos, because I can't add a dependency to my project without editing the project's POM, so fat fingering on the command line cannot add random packages to the project. Plus, Maven artifacts are namespaced, so I would have to be pretty drunk to make so many typos that a different, existing package downloaded.
2. If you don't whitelist your deps, then your dep chain can change with any update. Signing keys prove only that the publisher held the keys, not that they aren't malicious, or that the keys aren't stolen.
3. Fat fingering your pom is no different, this argument is as wasteful as 0 and 1 and you know it.
I didn't say that "I don't whitelist my deps." I said I don't need to manually verify every tiny thing that I whitelist.
If you actually knew what you what you were talking about, you would know that you can't specify a dependency in a Maven POM without specifying a version, so everything is effectively version pinned and doesn't magically change unless you do it explictly. And you can't change an artifact after it has been released; you have to issue a new release with different coordinates. These are features of Maven that npm is missing that would make it vastly more secure than it is now.
And fat fingering a pom is different, because it requires many more mistakes to download some other software package than the one intended. That's what this discussion is about right? That it's absurdly easy to download the wrong package on npm but not other package managers, because npm is an insecure package manager.
Also, if you read the docs you'll notice that the deploy plugins encourage users to put their central passwords in an xml file in the plain AND their pgp key passphrase in another, in the plain. No joke. Read the docs.
An injection attack on a download is not one of the problems we're actually talking about here. That's a problem that affects any package manager.
And it's patently false to say that the deploy plugin documentation encourages you to include that information. They very clearly preface that with "if your repository is secured" and use the explicit example of an internal repository within a corporate context. People smart enough to read that documentation in detail should also be smart enough to determine when it is and isn't a good idea to do that.
Maybe the benefit of the node ecosystem isn't as great as you imagine if a project requires hundreds of dependencies when projects that are part of other ecosystems only require dozens.
How many of those dependancies are trivial though? It's one thing to rewrite express, but something different to rewrite a package that just contains a function that capitalizes the first letter of a string or something similarly contrived, yet plausibly already a package.
That's the argument behind a stdlib of sorts for npm with blessed packages that are actively maintained and signed. Having every project rely on tons of dependencies isn't great, but re-writing everything a bunch of times also isn't great. There are bound to be bugs within 100 different implementations of X.
It is certainly out of control though. I just checked the node_modules of a fresh create-react-app generated app and there are 877 packages! Tons of duplication here, like having both array-uniq and array-unique (not to mention that's a feature built into languages like Python).
NPM themselves recently launched a new package called npx [0] which will download and execute packages directly from the registry if you don't already have them installed. So if you make a simple typo like this:
point is that npm are encouraging you to use npx, which is bundled with npm 5.2+ as a general tool for executing adhoc commands from the terminal. It just massively increases the chance of typos.
At some point there needs to be some trust, and NPM puts that boundary at installation. If you trust a package enough to install it, then it's allowed to act on your behalf with the permissions of your user.
`npx` doesn't change that dynamic. It's no different than installing an RPM or Debian package over the internet.
And yes, I get the irony of adding another dependency to help with the security mess caused by the node ecosystem's bent towards external untrusted / unverified dependencies.
Not a js dev, but it seems like when doing a compare if a package already exists, hyphens should be removed (so "crossenv" and "cross-env" are considered identical). "js" seems like needless verbosity, maybe take that out too.
I wouldn't doubt that there are package names that would collide because of such a change, but that's probably a good thing.
Does npm normalize package names with unicode in them? Would "сrοѕѕ-еnν" be considered equivalent? (Although this would only work if users copy/paste the name).
(It's the first time I've published anything to npm so let me know if I have done anything wrong...)
It uses the list of package names from the all-the-package-names package and returns the 10 packages with the most similar names to the supplied parameter (using Levenshtein distance)
It also displays their rank based on dependent packages to give an idea of how they compare in usage.
I'm sure there are improvements that could be made - PRs welcome on the github repository.
Just yesterday there was a thread about how the chrome plugin "user agent switcher" sends your entire browsing history externally. And it's still published.
The problem is not unique to the npm ecosystem, the main problem here is "web of trust" whether through GPG or even just things like 'download counts', etc.
Also there are at least three chrome extensions in the store called "user agent switcher" which confuses matters. From useragentswitcher.org, google.com and toolshack.com
1. For every big, important package, you can probably count on number of downloads/stars a library has to attest its trustworthiness.
2. For small packages, you should always look at the code directly. Search npm, see the GitHub repository link, click, read the source to see if it more-or-less does what you want. I think a lot of people do this already.
3. Typosquatting is still the only unsolved problem, but an addition to the npm CLI that checks if there are packages with similar names when you're downloading and alerts you -- maybe even suggesting the package that has much more downloads/stars -- should solve that.
Or package authors should start using scoped packages.
Instead of publishing as cross-env you publish as @guy/cross-env
That makes typosquatting harder, and can help give users some ideas of packages which are by the same authors.
NPM could help by allowing packages to be published both to the "global namespace" AND as a scoped package automatically. (In other words, always allow accessing any global package by it's scoped name)
I would rather have some GitHub integration in place, so I could `npm install github.com/someone/somepackage`, like Golang forces us to do, for example.
I don't do that for all packages automatically nowadays because there is this bizarre culture of people publishing different things to npm and GitHub. To npm they send only "built" files from ES7 to ES5-compatible mode, while to GitHub go only the unbuilt sources which will not run anywhere.
A solution to that would be for an automatic builder to be run on every `git push`. A third-party service, somehow, someone, somewhere. Travis CI, maybe? I'm waiting for someone to have an insight and solve this problem in these lines.
Npm has had github integration for a while now (and straight git integration). Depending on how they setup things, a repo with a good postinstall script will build once it gets pulled so you'll have ES5 compatible files in your node_modules by the time you are running your application.
Not that that's an ideal system, but it's an option for some packages.
That's what Composer (PHP package manager) does. All packages are scoped, no global packages exist. Not fool-proof, but it makes typo-squatting harder.
Is there an api to query recent NPM packages, as well as get a full list of packages?
It'd be interesting to write a tool that monitors as packages are added to npm, compare them against the existing list, and check for potential typo-squatting. Like, remove dashes, check Levenshtein distance, etc.
I mean, NPM themselves should be doing that but ... since they aren't, might as well do it for them, ya?
This is serious stuff and we will definitely see more of it in the future! As there are more and more node.js developers, it will be more profitable to run a scam like this and you only need to hijack one page that has a lot of dependencies, one package that is for example used by `express` to get access to a lot of users.
The only thing you can do is be careful and listen for projects like node security.
for every node package, see what node packages are 3 or less levenshtein distance away when it's uploaded, and go through all existing packages. Add an optional flag that will, if you try to install a package that has a package within that edit distance that is an order of magnitude more popular, give you warning and skip that package, an optional flag that errors, and a way to force it for a specific package. In the future, make the first optional flag the default, so people get warnings with instructions to override. Allow white listing of particular packages that may be problematic, and expect a shared white list Then, a year later after repeated warnings of the coming nodepocalypse, make the second optional flag the default, so if you haven't white listed or explicitly forced installing such a package, it will fail.
You will, of course break production for a few people who just didn't listen.
Alternatively, instead of edit distance, allow users to report problematic packages and do a similar thing. Do not provide a explicit award to users who report, so nobody would create fake malware just to report it.
In both cases, either implicitly or explicitly you are using the wisdom of the crowd to figure out the bad packages.
I wonder why they didn't obfuscate the code a bit more, they could have even positioned it as a reference package that helps resolve typos.
Would be interesting to know how many systems have potentially been hit by this, and if any leaked production credentials. I think it's unlikely to yield a lot of useful results due to then drag net nature of the project. A targeted attack might make more sense (e.g. On an open source library, targeting specific developers)
In Crystal, we've decided that the dependency manager shards will have no centralised package repository, which we hope will solve problems like this. It makes forking shards very easy, it completely avoids name squatting, and it should help prevent typo squatting like this.
We also don't want shards to become yet another system package manager, used for installing executables to your PATH. Shards should never be run as root, unlike npm and pip.
Not really much they can do other than take it down and maybe 'protect' some popular packages from typo squatting by reserving some common misspellings. They're a public repository where users upload arbitrary code. The trust relationship really isn't there.
You trust NPM to be secure and serve exactly the code that the author published unmodified.
You trust the author to not act maliciously. Nothing you can really do if a user voluntaitally installs leet-virus.
Another possible solution for this problem would be that for each module's `package.json` a list of node APIs that a module opts-out, like http, access to env variables, fs, etc. This would need to apply to the package itself and any dependencies it requires.
All of this seems fixable by just providing decent UI protections. Error when typo-squat domains near-alias more popular packages; force them to pass a flag to override.
Is there a reason this isn't done, or has it just not been allocated the time to build it?
I wonder how it was detected. Also someone ITT said something about if they'd squatted the dependencies it wouldn't have been seen or something? Can't find back to that comment.
Oscar is a dev, Kent is the maintainer of "cross-env", a popular package. hacktask is an unknown malicious module uploader that was harvesting credentials.
"However, this is just a piece of an overall solution, and it brings with it a lot of the baggage that comes along whenever GnuPG or PGP get involved. Without a web of trust (sigh), a PKI (ugh), or some other mechanism to tie identities to trust metrics, this is essentially a complicated, very expensive, and fragile version of the shasum check npm already has."
I really like how the NPM simultaneously insults two legends in crypto and does _nothing_ to protect the node ecosystem, deferring to "better solutions" that don't exist and will never exist.
> I really like how the NPM simultaneously insults two legends in crypto and does _nothing_ to protect the node ecosystem, deferring to "better solutions" that don't exist and will never exist.
But they're right. What exactly would PKI do here? Someone is generating confusion.
You could argue that maybe a PKI solution could be used to inform the UI such that users are less likely to make mistakes, but browbeating NPM over this is silly. Maven has this problem (people really concerned built their own tools: https://github.com/whispersystems/gradle-witness), choclatey has this problem, pip has this problem, everyone has this problem.
The big difference is that the NPM ecosystem is just an order of magnitude bigger than most others, and its model of many small packages can hide many more key packages in the noise.
Maven does not have this problem. That package was just something you googled that hasn't been updated in two years because maven has required signed packages forever. My packages are all cryptographically signed with my private key. Maven doesn't just offer code signing, it's mandatory to deploy projects to the central repo. The automated package verifier will reject you if you don't have it.
If someone gains access to my account and tries to modify my package without my private key it will not be accepted into the repo.
And actually, I'm pretty sure maven signs not just the code but also the documentation and everything else within your package.
Light-years ahead of NPM, even after their publicized issues.
How does "You need my private key to sign the cross-env package" stop someone from creating a "crossenv" package? What does the specific workflow look like that makes sure I don't add the wrong thing to my project?
I can explain how it can be done. Let's say you need a new dependency. You can't just add it to the project. Management would kill you. There's a lot of checks you have to make. Like licenses and signatures.
In this case, you're concerned about signatures. I'll stick to that. If I believe I need crossenv, I see it advertises kent@doddsfamily.com as the author. I look up Kent Dodds. I find various bits of information about him. I find his twitter @kentcdodds for instance. He may even have a page published @doddsfamily.com to verify his key.
If the key has been around for a while, that's probably sufficient for me. If he doesn't have such a page, I send him a polite message, "Hi Kent. I'm evaluating your software for use in my project. Can I get you to verify your key please?" and Kent is a professional, so he agrees. I send kent@doddsfamily.com an encrypted message and he sends me an encrypted reply. Done. I'm satisfied that he controls the key.
At this point... oh, wow. That's not my key! Did you say crossenv? My package is cross-env! Alert the Node authorities! There's a malicious package pretending to be mine!
Is that 100% infallible? No, but here's the great thing. Even though my verification system may not be bulletproof, others with resources like DoD or FBI are out there verifying keys. They'll go see Mr Dodds, in person, if they have to. In this way, key verification is a bit like herd immunity. And the older the key is, the more trustworthy it probably is.
People who are here arguing against this system have a few things in common. They don't propose a better system, because they don't have one. They're also a bit like anti-vaxers. They irrationally refuse to participate in such a system to everyone elses' detriment. They're a bit like congress as well. They can't just look at a working system like single payer, and copy it. They refuse to accept that this is the best available option, so they fold their arms and actively fight any attempt to implement it, as is the case on that github 4016 issue.
So now that I've maybe offended everyone, strike me down with down votes. That's the basics of how I would go about it though.
Hashes already exist. The problem is not a lack of integrity verification. We have that. The problem is not a lack of identity verification, we haven't really shown that NPM is lacking for that.
The problem is one of addressing. People want to get packages by a utf8 and natural language string, followed by a version boundary check.
If people referred to packages by their proper hash (as one does when referencing values from IPFS), then we wouldn't have this problem. If people had a public key and added a key fingerprint that would also work, but would not provide any additional verification to the code (1).
But people don't want to do this. They want to address packages by relatively simple and memorable tuples. That is the problem.
> They're also a bit like anti-vaxers. They irrationally refuse to participate in such a system to everyone elses' detriment. They're a bit like congress as well. They can't just look at a working system like single payer, and copy it.
You aren't even in Keybase. Have you personally participated in any keysigning parties? Can I find your public key details here?
I can answer yes to all 3 above questions.
> So now that I've maybe offended everyone, strike me down with down votes. That's the basics of how I would go about it though.
You haven't actually offended everyone. What you haven't done is say anything that anyone else doesn't know already. We all know how signature verification worksheet and we all have seen it's problems in real world implementations.
Please reconsider paragraphs like this.
(1) :: The thing this would do is make it arguably more secure contacting the author or maintainers of the code, although given GPG's failure to achive widespread adoption or integration in popular tooling, it seems unlikely it'll see much utility.
> If people referred to packages by their proper hash (as one does when referencing values from IPFS), then we wouldn't have this problem. If people had a public key and added a key fingerprint that would also work, but would not provide any additional verification to the code (1).
The hash naturally changes with every release. The key fingerprint doesn't. Updating your dependencies to new releases is much, much more frequent than adding a new dependency; people are willing to put more effort into the latter than the former.
> You aren't even in Keybase. Have you personally participated in any keysigning parties? Can I find your public key details here?
Keybase encourages poor practices and as such I avoid it. But I've been to keysigning parties and my key's details are published any number of places. (Since adding one more is all to the good, the fingerprint is 400A C7D2 E7A1 802A AE2C C459 B1E5 712A 6D03 3D61)
I don't understand why you feel that the keybase question was directed at you. The decision to enter into this part of the discussion definitely hurt otherwise interesting post.
It's true that a signature is based around a component with longer lifespan than a hash. However the management and trust of that component damages this argument severely.
I am unaware of any web of trust in active use that could operate a npm scale existing today. Could you share one with me?
Because it was a direct reply to the individual. If used as an example you should give context, otherwise your comments are directed towards who you reply to.
Maven prevents this as well. To add a package to the global repository you need to show ownership of your namespace by posting your key, your website, your email, or most commonly these days your GitHub repo with identifying information. They do check to make sure you're allowed to use each package identifier.
All package submissions are hand-curated, which should catch typosquatters. There's a clearly laid out pattern for what package space you're allowed to use based on website or company ownership.
The system is highly automated but you have to wait to get your namespace approved. And it's not unrealistic to do this with npm, maven has somewhere over a million packages.
Actually Maven Central has about 200K packages. Incidentally about the same number of packages by which npm has increased in the last year. So no, a hand-curated submission process is probably not reasonable
Well npm requires you to login through their CLI in order to publish packages... The only difference is that you use a password instead of a key. No difference.
There's a huge difference. I could put my private key on a hardware dongle completely isolated from my PC environment if I wanted.
Npm is a public internet login with a password of your choosing, probably the most insecure form of authentication there is bar doing nothing. I could be brute forcing your login credentials right now and you wouldn't even know.
Also, users are safe even if maven itself is compromised because attackers still can't validate my private key. I don't think you could say the same about npm...
> Well npm requires you to login through their CLI in order to publish packages... The only difference is that you use a password instead of a key. No difference.
Yes, there is a difference, released must be digitally signed on maven. They are not on NPM, so a hacker can hijack your packages just by obtaining your npm credentials.
That's crypto 101 and you can't tell the difference?
The issue is of who to trust, the package maintainer or the package repository?
It is comparatively trivial to steal a strong password when it is transmitted over the internet vs local key material.
Nation states with bad certs (china has done this) can steal your password, npm can steal your password, npm can be hacked and leak passwords, npm can be subject to a NSL and forced to hand over passwords, etc, etc.
There is a big difference between passwords and keys.
> The issue is of who to trust, the package maintainer or the package repository?
Interesting framing.
> It is comparatively trivial to steal a strong password when it is transmitted over the internet vs local key material.
"Comparatively trivial?" I don't really understand this. I think you're suggesting that SSL cert attacks are easier than evil maid attacks due to this paragraph:
> Nation states with bad certs (china has done this) can steal your password, npm can steal your password, npm can be hacked and leak passwords, npm can be subject to a NSL and forced to hand over passwords, etc, etc.
Pardon me, but we're so far afield of the actual reality and the logistics that plague it that I decline to continue this conversation. If your adversary is nation states, you need a code audit (preferably several). You can't trust the signatures precisely because nation states are so well equipped to perform evil maid attacks.
It is 2017, evil maid attacks are the reality of the nation-state intelligence industry. We're seeing examples of them leaking out. They're in active use not just in a targetted mechanism, but speculatively! We have evidence intelligence agencies sell comprimised USB keys in bulk near the physical locations targets are likely to enter just to see if they can catch them in a moment of bad opsec.
Honestly this all feels like an attempt to say, "We should use GPG because it is good." Maybe it is. I'm not so sure, given the logistical reality.
But it wouldn't prevent the attack we see in the tweet above. It wouldn't make the root post of this thread correct. And I'm not sure it would accomplish what you're saying.
I'm sorry, I simply don't have time to chase this degree of abstracted hypothetical on a day old thread. I'm writing this final message here as a courtesy to those involved.
Distributed trust isn't better than centralized trust. In fact, it'll fail more often. It's just that the failures have varying degrees of severity.
Same principle as saying, "I will make my website more reliable by adding more servers." In fact you make it less reliable by doing so, you just change the severity of the problems.
And we've seen economic mechanisms compromise signed star-shaped trust graphs. E.g., all these atom plugins with features/spyware circulating because a small handful of companies think it's a business model. That's literally just buying the property from folks and ruining it. That's a very powerful attack against a web of trust and often cheaper than the fake cert attack, which is actually something you can guard against if you feel inclined to do so.
I guess you are aguing that you can't find 10-ish core npm maintainers that can be bothered to sign keys for well-known developers and bootstrap a web of trust that would then provide crypto attestation and perform basic gatekeeping on new package submissions.
It's not like you'd have to write a bunch of software. Also, other mission-critical open source repos have been doing this for at least a decade, so you don't even have to invent and validate a new set of processes for this.
For me, this calls into question the reliability and quality of the whole npm infrastructure and the packages it hosts.
Also, blaming users because npm (knowingly, through willful negligence) hosts malicious packages that typo squat on legitimate packages doesn't seem appropriate to me.
Yeah I'm not really following how users desiring a natural-language tuple-looking thing prevents proper identity verification. Even without namespacing strong identity controls (signature verification and preventing unauthorized accounts from posting under a taken name, for example) can prevent most attacks short of typosquatting.
Typosquatting is exactly what these malicious packages were doing. So you're agreeing that arguments re. identify verification is a useless distraction in this case?
Name canonicalization gets you part of the way there. But unless you want to go full-on namespacing then you must realize you are fighting a pointless battle.
You cannot reasonably expect to save users from themselves in every way.
One of the central problems of trust is that simple solutions don't scale well. Back in the day, one acquired a .com address by sending someone an email, because everyone knew everyone.
cross-env has had 1.3 million downloads in the last month. How many of those "hey, I am evaluating your library" emails can Dodds field?
Most node projects have hundreds of dependencies, if you include transitive dependencies. How many of those can you test?
I suppose that to get an estimate of the number of actual users, you have to scale that number of downloads down by a factor of 10 - 1000, depending on if and what kind of dependency cache JS toolchains use (esp. around CI/CD).
But yeah, everyone contacing an author directly doesn't scale at all.
I do see what youre saying but that 1.3 million downloads is not unique. That number is mainly comprised of automated builds pulling in the package (either directly or transitively), not new devs trying it out.
The part where you personally checked out who published the package and did due diligence would save you with NPM. That gpg protection scheme would prevent MITM attacks presumably. The presence of that key isn’t going to establish trust (though maybe you could build off of it so it’s foundational in that sense), or somehow enable the fbi and dod.
Why would the malicious user advertise the email associated with crossenv as kent@doddsfamily.com and not kent@dodds.family? Attacker could control the latter and hand you an evil cert?
One would think you would do more than send an email if you're trying to verify.
Websites, Twitter, Github, Keybase, etc..
It would be pretty hard for a bad actor to overtake the real author's entire Google-findable presence (assuming it's a reasonably popular package - why would you typosquat anything obscure).
If all you do is send an email, then you haven't really done "due diligence" in any acceptable form.
What if the attacker instead advertises `some-totally-different-person@gmail.com`? How are you even supposed to know who wrote the legitimate version of the package in the first place? And if you _do_ know who wrote the package, you don't need a GPG key to verify that; just their NPM username or even the actual, real package name will do fine for preventing this specific attack.
Presumably you inspect the thing you're including in your project first, if it seems trustworthy you mark whatever key you have as somewhat trustworthy. Then you inspect differences on version updates of your npm deps. If it still looks ok, you update your trust of the key.
It's the same way you gain trust in something in real life, by watching actual behavior of someone over time. It is just assisted by technology.
Then you can ignore the issue of email completely, because you're not basing your trust on authority of the author, but on his track record as determined by you.
I bet you'll not find many attackers who would maintain some hijacked package for a few months, before launching their attack. Original author would probably notice something fishy too, given enough time.
It doesn't. They's not the problem being solved by signing. In all likelihood, if an end user got socially engineered to download the wrong package, there's very little to be done.
Perhaps the install process can do a search and display similarly named packages, and the users could be more alert to irregularly named ones?
What's to be done is remove the package and try and dox the perpetrator to kingdom come, in hopes that law enforcement and social reputation can make an example of of them.
I think you're reading a lot more into my comment than I wrote. It's not "vigilante justice" to ban known scammers and circulate blacklists of them. It's how our community functions.
I can't reply directly, but in another subtree you asked:
> Which part of the statement implied "vilgilate justice."
To put my own 2 cents in:
Doxxing someone is generally considered an attack, at least in some internet circles. It's pretty vigilante if you ask me, especially when we've seen some pretty striking examples of doxxing gone wrong in the past.
If you connect to my resources and use my systems to hurt other people, you don't really have an ethical leg to stand on if I share what details about you I have with law enforcement and other service providers.
It's absolutely an attack, but it's an "attack" of a kind that is acting to end misuse and widespread tampering. It's difficult to imagine a coherent ethical system that gives the author of malicious software an expectation of privacy as they attack other people, violating similar rights.
Doxxing in my experience implies publishing their info publicly. That's certainly what I thought you were saying.
I'm fine sharing with law enforcement, but sharing with other service providers seems to be a slippery slope. I imagine a dev losing access to their github account because they used a shitty password on their npm account and got compromised. That would suck.
I'd much rather we invent a better UI for dealing with software dependencies, but alas.
Like sibling says, doxxing implies that you'll post their personal info online.
The problem does not lie in attacking bad people, the problem is that there is a high risk that you THINK you've identified who the bad actor is but actually the person you decide to "retaliate" against had nothing to do with what was done to you. That's why we leave law enforcement to the law enforcement officials and justice to the justice system. Even they make a lot of mistakes but at least there is a process that gives a chance for the truth to be found.
But sharing info about a suspect with law enforcement is what you should do yes.
> Like sibling says, doxxing implies that you'll post their personal info online.
It's unfortunate that so many people don't know what the word means, because now we're redefining the word to a very specific and malicious definition that makes communication about nuances around the intersection of rights here more difficult.
> there is a high risk that you THINK you've identified who the bad actor is but actually the person you decide to "retaliate" against had nothing to do with what was done to you.
I mean, you'll know their IP address, login, email, ISP and whatnot at a minimum. If the target is a comprimised computer, notifying them is the bare minimum you should do. So I'm sort of confused what kind of final consequence you're imagining here.
I think folks just see the word "doxxing" and their pattern matching misfires.
> I think folks just see the word "doxxing" and their pattern matching misfires.
Or maybe you're trying to weasel out of what you said and are now going for broke.
Linking once again to define words, we go to Wikipedia[0]:
> Doxing is the Internet-based practice of researching and broadcasting private or identifiable information
> Doxing may be carried out for various reasons, including to aid law enforcement, business analysis, extortion, coercion, harassment, online shaming, AND VIGILANTE JUSTICE.
I can see this is going to be a constructive dialogue. If I had wanted to "weasel" I would have deleted the post last night when it passed under the negative point threshold.
I have absolutely 0 moral and ethical problems with publishing any details I have on a person who is using my system to attack other users. I think in fact this is a responsible thing to do, and necessary. In this specific case, I might be careful about the timing of the disclosure to try and round up any nasty packages in other systems they might have generated.
But I'd publish it. Happily. Gleefully even. I have 0 moral or ethical obligations not to. I have a clear ethical imperative to do so.
I guess fortunately for this scammer, I don't own NPM.
> "Weasel?"
I can see this is going to be a constructive dialogue. If I had wanted to "weasel" I would have deleted the post last night when it passed under the negative point threshold.
I wasn't going to accuse you of being a weasel, but this is the most weasel-y thing I've ever seen.
> It's unfortunate that so many people don't know what the word means
Perhaps you can cite a history of the word, then maybe I'll trust your definition over some other.
You asked for clarification to avoid future misunderstanding and then proceed to reject our clarifications as if there's some nerd-word central authority that we're not aware of. We can't even agree on one 'x' or two.
I've said I have 0 problems publishing their data publicly. I'm happy to own even the stronger model of doxxing you lay out. I've put a few time qualifiers on it you didn't like.
But I have no problem burning the the identity people who think they can use me or my infrastructure to defraud others. Quite the opposite.
> dox the perpetrator to kingdom come, in hopes that law enforcement and social reputation can make an example of of them.
Is a hell of a lot closer to "vigilante justice" than the version you just said. Had you made the sane posting first (or not pretended you did the second time) I wouldn't have said anything and just upvoted in agreement.
Which part of the statement implied "vilgilate justice." The part with the law enforcement or the "social reputation" part that is exactly the words game theory uses when discussing bad actor in a problem.
Just so I can modify my words to avoid future misunderstanding.
Doxxing means sharing people's relevant personal details that you gained without the targets consent.
In the case of malicious scammers, doxxing them so they can be cut off from other code repositories seems less like "vigilante justice" and more like "a public responsibility."
GPG-signing doesn't prove maliciousness, it just proves authenticity; that it was written by a particular person. The web-of-trust is used to enforce social reputation - if a user is malicious, their key is revoked. If a user signs up poor-quality other users, their key is revoked. Blacklists go stale and are also hard to get off if you've landed on one in error. Webs-of-trust are more work, but more robust.
> That package was just something you googled that hasn't been updated in two years because maven has required signed packages forever.
That something hasn't been updated in two years because it's feature complete and does what it's supposed to: verifies the integrity of dependencies for Signal [1].
> If someone gains access to my account and tries to modify my package without my private key it will not be accepted into the repo.
This isn't true. Sonatype/Maven Central requires PGP signatures on all new artifacts, but there is no requirement to use the same key. It will happily accept a signature from _any_ key for new releases.
Great point. I think that Maven Central is great about checking incoming packages. But most maven clients are really bad.
The default in maven client is usually to download via http. The default is usually to _not_ check the hash. There is not a great way to pin a library to a repository which, when coupled with the ease of third-party repositories slipping into your project, means that you can download things like your crypto oauth library from some random server on the web.
Many of these issues can be mitigated by running your own repository that mirrors what you need. Most big corporate shops do this. I think that approach works for any package management system. I guess open source devs and hobbiests are screwed?
I'm guessing maven feels safer because its packages must be compiled against a specific interface and few if any execute any code during setup. Maven is rarely if ever used to install interactive tools like npm very often is. Maven is not a reasonable analog here.
Somewhat agreed, maven is a build tool and packages it downloads do not execute code through maven. This does not preclude malicious typosquatting packages making it into applications built using it, but does provide some option for reducing the attack surface.
In practice I think most developers would be running their project on the same exact box as they use for building it, which nullifies the separation of build/runtime environment. The reason that we don't typically see egregious typosquatting in the Java ecosystem is that Sonatype has a manual check on the claimed namespace for the organization publishing a project (among other checks). npm, Inc. could do this, but they so far have chosen not to.
People keep saying this, but it's easy to imagine that the malicious code in a maven-included package only works when it detects it's being invoked in a unit test, which puts it in build time easily.
It's true it doesn't immediately build on site, but it sure could run in the developer's machine.
No, they don't. Some people actually implemented the "web of trust (sigh)".
Security is hard; the answer is not to just go "too hard, kick the can down the road" and then make excuses when the time comes for damage control. NPM being an order of magnitude larger means that more focus should be given to security, not less, since it has that extra noise acting as another way to hide malicious activity.
Just signatures would indeed not help much in this particular case. Signatures plus other ways to establish trust will help. To give two examples:
- macOS apps need to be signed (to run without extra work). The keypair is associated with a developer ID account that has a credit card on file. Abuse is still possible (stolen credit card, stolen certificate), but a lot harder.
- Some open source project have their own WoT. For example, IIRC NetBSD required new developers to meet with one or two existing developers in person to verify their identity. (Pretty much like a regular PGP WoT.)
These are more work, but they also make the world safer for users.
> For example, IIRC NetBSD required new developers to meet with one or two existing developers in person to verify their identity. (Pretty much like a regular PGP WoT.)
Debian also requires OpenPGP keys and WoT for all developers.
Apple doesn't have a web of trust. Microsoft and Google do not either. They bless your code for their marketplaces. Crypto is just coincidentally how they do it.
A web of trust implies transitive trust.
I'm pretty sure the same is true of Debian, but I don't know about the others. But these are NOT webs of trust.
What's more, other open source projects simply do not deal with the scale of NPM. The amount of data they move and offer is pretty brutal. Lots of dismissive engineers sneer at the javascript numeric tower and simply do not understand how difficult and perhaps even surprising the implementation of NPM as a platform is, given its scale.
Easy to say. Care explaining? These signatures seem like exactly the thing to prevent mitm attacks. I trust signature A; i won't load the package unless it's signed by signature A.
Most people won't be that strict in informal development, but that's not really what this is about.
I think this is a problem with the "fad" approach to dev. As a sysadmin I try to stay on top of tech at a 50k ft view, so when node and angular and MEAN stack started showing up everywhere, I tried my hand a bit, and walked away remembering how much I hated the javascript ecosystem that existed before these things, and how the community seemed to be younger, newer devs trying something shiny and repeating fuckups that had already been through other langs/environments. Security for these kinds of devs is not even an afterthought, it's someone elses problem.
I'm confused, how would the above help with a typosquatting package? The issue here is that `crossenv` is malicious, and `cross-env` isn't. The signatures would all be ok in both cases.
It wouldn't. Nothing will help with package managers that follow the "wild west" or "any old crap" model where there is no maintainer or distributor between the developer and consumer that is allowed to perform any sort of quality control or sanitisation. This is what makes me hugely favour the "maintained" model followed by distributions or nix/guix.
The wild west model scares the bejesus out of me to be honest.
The "wild west" model doesn't disallow anyone from providing quality control, it just doesn't enforce one particular person or entity's idea of what quality control should be.
What ruse? "Kent C Dobbs" published the `cross-env` package, not the `crossenv` package. In this case, the attacker is the legitimate owner of the `crossenv` package; the problem is that's not the package I actually want. If I had typed `npm install cross-env` instead of `npm install crossenv`, I would have gotten a non-malicious, perfectly legitimate copy of the package I wanted; no GPG key required.
Saying you won't use the tools we have because we don't have something better is choosing to accept a greater risk of running dangerous code than you have to. Which is your choice, of course. But one best made with clear eyes.
>nobody but the most tinfoily of us is going to do that
Are you characterizing people who verify keys as crazy? It's not like you can't just reach out to @kentcdodds and get an answer in under 5 hours as Oscar Blomsten just did in the OP.
I'm working on a node-based REST endpoint. It's nothing special - it looks up stuff in the database, does CRUD things, pokes postgres, has a cache layer in redis, a websocket for handling sidechannel stuff like model update events for realtime data updates and so on. It has 514 dependencies in total. It's actually only 39 real dependencies, the rest are subs and sub-subs (after removing duplicates).
Also, some of those 39 dependencies are things I wrote (various general-purpose ORM type mappers for different styles of datastore), so I guess that number could be a bit lower.
Either way, there is a lot of verification to do here. And there's two different kinds of attacks. The dangerous attack that automated key lookups and signing can protect against would be someone publishing a malicious version n+1 of a library I wrote or rely on (either hacking the npm credentials or just buying access as in the case of the minimap / autocomplete-python debacle from last week). In that case, the key change might be noted, but then again, people lose access to keys so you'd want to have ways for package authors to revoke and reissue keys (in which case compromised or purchased credentials might not be noted).
Another type of attack is the one we're seeing here - a typo attack. Without protections of the type discussed in maybekatz's tweet (in that thread), it's pretty hard to see you're in a typo attack. The malicious publisher can still sign a malicious package, and if you accidentally install crossenv instead of cross-env, there's no protection from signing. Again, unless you are manually auditing all 500+ dependencies in your tree by figuring out who the author should be, since you're viewing the npm-reported information as potentially compromised you'll need to find other connections to that person.
At this point, you've basically recreated a web-of-trust architecture, with all the challenges that go with it. This isn't simple, I don't think isaacs and the rest of the npm crew are maliciously ignoring obvious answers. It's more likely that the actual answers are hard to find and harder to scale.
The answer ultimately is: we need to audit all the code we use, or have someone else we trust audit the code we use. And if we're not auditing it ourselves, we probably have to pay someone else to do that, and that's not cheap. Walled gardens have their benefits, but explosive growth and rapid invention / iteration / elaboration is not one of them.
What if someone just maintained a list of "bad" npm packages, and you could run your package.json against that service to make sure you didn't accidentally install crossenv instead of cross-env?
i've been building web-based software since 1998, when I had to manually parse http requests on stdin (and read headers from env variables) in a C application running as a cgi-bin plugin, remembering to end the stream with two \n characters. i (a) know what i am doing and (b) am super-glad i don't have to parse a lot of jank to get to the interesting parts of the code.
every approach has weaknesses. I'm pretty sure there's tradeoffs everywhere: ergonomics vs speed, security vs inclusivity, etc. I'm also pretty sure it's uncool to make implications about my mental health in public.
Like the person you are replying to, I've been developing web apps since the 1990's (1997 in my case).
I'm unconvinced that 514 is crazy.
In fact, the only unusual thing I see there is that the author knows that number.
Back in then 1990's PHP was very popular. To use it, you had to compile it yourself, which involved compiling Apache 1.3 with modules. There were also various image libraries, font libraries etc. It wouldn't surprise me at all if the dependency tree of that included hundreds of libraries.
So how, exactly, did you know that @kentcdodds wrote cross-env? Did you have some external knowledge, or did you get that information from the exact repo you're trying to verify? If the latter, what steps did you take to check that the author is trustworthy?
On the contrary. I'd say we're the sane ones (note that I'm part of that crew). I'm just realistic that we're a tiny minority. Most people don't understand the topic or care to do any of that. Most people also blindly copy paste curl scripts into sudo bash.
Why would anyone go to Kent C Dobbs to verify that he controls the key instead of going to Joe R Badguy or whoever is listed as the author of the malicious package?
There's no secure list of "good people" and there's no secure list that provides a mapping of who should be signing each package. Especially for things maintained by multiple people, I wouldn't have (and shouldn't have to have) any idea of which particular people are the proper signers.
I agree that npm et al should support code signing but genuine question: how would package signing solve this particular issue? This was not a code integrity attack. Presumably the author of the malicious package can sign his code as well. I suspect if the malicious author had made the package proxy the desired package's exports it would have been a lot longer before this was noticed if it ever was.
The usual way code-signing is used here is that code for a platform is signed with keys issued by the platform manager acting as a CA, who does identity verification at time of issuance. Thus, if the platform-as-CA revokes someone's code-signing cert, they're effectively banning that person (rather than just that pseudonymous identity) from ever publishing on their platform again.
It not only works after-the-fact, but is usually also quite effective as a deterrent to stop people from publishing malware in the first place (except state-level actors who can afford to create and burn real identities for the sake of cyberwar.)
Better verification is well-meaning and all, but it ignores the central problem of NPM-land: there are just way too many tiny little one-off packages for any human being to possibly verify, even with the right tools. And whenever you happen to include something non-trivial you're basically trusting that somebody else has done just that with all of their dependencies, and so on and so forth until you have 600 different packages nobody knows anything about in your node_modules directory.
It's almost as if most of NPM should be replaced by some kind of ... self-contained encyclopedia of code. Maybe it could even be maintained by a single group of people that get along with each other and adhere to a release schedule. And perhaps there is some way it could be organized into modules with consistent documentation. While we're talking about this amazing world of tomorrow: maybe those docs could even be on the cloud, with hyperlinks between sections!
Okay, sorry, getting ahead of myself. It's crazy-talk, I know.
Signed packages tell you one and only one thing: that the package was signed by a particular key.
They don't tell you that the package was signed by someone you think should be authorized to produce that package.
Linux distros can get away with signing everything because there's typically a very small set of people the distro's organizational structure trusts to make packages, and thus a very small set of keys and real-world identities to verify.
Open-to-the-public package systems cannot hope to verify the identity of every person who creates a package, and thus cannot provide you with the web-of-trust model you want (since what you seem to want is not "is this signed by a PGP key" but rather "is this signed by a PGP key I personally think should be authorized to make packages").
They should first get package signatures implemented, it's a bigger threat to the npm community. At worst, the mispelled packages effect a handful of people who don't double check the package name for the package they're installing. If someone compromises the integrity of an ultra popular package, it threatens thousands, perhaps even millions of people (counting all people consuming the code downstream, ie users). And the npm repo has been shown to be vulnerable to compromise multiple times over the past few years. Here's a writeup of just some of the more egregious security weaknesses of NPM packages in recent history: https://www.bleepingcomputer.com/news/security/52-percent-of...
>They should first get package signatures implemented, it's a bigger threat to the npm community
Considering that signature checking would not have prevented this attack that has actually happened, I would say that not having signed packages is not in-fact the bigger threat.
Or can you point us to a prior example of a successful attack that could have been thwarted with proper signature checking?
Requiring 2FA on publish would do just as well to prevent malicious actors from exploiting poor user passwords, and without imposing such a burden on developers. I don't have numbers, but it certainly seems like developers are more likely to maintain a set of TOTP key/epoch pairs than a PGP keypair.
I guess that TOTP-based 2FA challenges would be annoying in the case where CI performs the "publish" step.
The problem is that users downloading the package can't verify that the developer used TOTP to publish. Really, it takes minimal effort to make and use a GPG key.
> can't verify that the developer used TOTP to publish
they can if npm enforces the usage of TOTP for publishing.
As a user who uses both a GPG key to sign commits and a 2FA token to authenticate to all sites where this is possible, I can assure you that dealing with TOTP token is more fun than dealing with GPG keys.
> they can if npm enforces the usage of TOTP for publishing.
So what is going to happen to all of the packages published before TOTP is turned on? Not to mention that there have been many cases where second-factors have been bypassed (even Google's authentication[1]). Which means I'm forced to trust that there are no exploits in NPM's authentication system, as opposed to trusting that PGP signatures are not broken. I know which one I would bet on.
As for dealing with PGP keys, come on. We all know that GPG's interfaces are bad for normal users, but all it takes to be able to sign things is:
% gpg --generate-key
And answering the interactive prompts. There are many tools that wrap this functionality as well. Once you have a key you can just write a single script and then re-use it (I would expect that NPM would also publish said script to make it even easier). I'm sorry, but if you are trying to develop software for other people to depend on, I expect you to have enough technical literacy to be able to run two commands and read some documentation. It's really not that hard.
>So what is going to happen to all of the packages published before TOTP is turned on?
same as what happens with all the package that were uploaded before the hypothetical GPG support was added to npam an packages could be signed.
>Which means I'm forced to trust that there are no exploits in NPM's authentication system
with signatures you are forced to trust NPM's authentication system to make sure that nobody has stripped a signature of a published package or changed the signature of an existing package.
Alternatively, it's up to you to keep track of all previously used signing identities of all your dependencies and to manually check the whole dependency tree if any of the keys in the tree have expired and been replaced.
> but all it takes to be able to sign things is[…]
unless you have more than one machine. If you do, you have to sync your keys between machines and just putting ~/.gpg on Dropbox (which would be ok as the keys are encrypted) won't do because there are still two maintained forks of GPG out there that work differently and require different config settings.
> And answering the interactive prompts
of which depending on GPG version some give bad advice with regards to key compatibility and strength and none of these prompts will help you deal with an expired key in the future (and yet, these prompts recommend you create one that expires after only a year).
Just stating `gpg --gnerate-key` as the complete solution will put people in position where in case of an emergency release they won't be able to publish that release because of previous administrative failure. That's a risky proposition.
And finally, the same malware that steals your 2FA token can also steal your ~/.gnupg and the passphrase once you enter it.
What I'm getting at is that gpg is actually significantly harder to use and maintain for users, requires significant updates to npm on both the server and client end, will cause false positives due to key changes and doesn't provide much more security than enforced 2FA authentication for publishing packages which would just require a small server-side change.
I get that you personally are totally willing to deal with a the maintainer's key of a dependency of a dependency of a dependency of yours having expired and thus being replaced with a new key and I also totally get that you yourself are willing to manually check the signatures of the whole dependency tree for changes (you're not willing to trust NPM itself as a public key repository, I get that, so you'll have to manually keep all previously used public keys around), but don't expect this same due-diligence from everybody else.
Once you trust NPM.com to manage identities (which is the only way to halfway conveniently deal with key rotation), everything hinges on NPM's authentication system again and at that point we're back to square one.
Seems like they went completely overboard in terms of complexity. Why not have any camel/kebab/snake case reserve the same word in any type of casing? I.e. if I submit cross-env I also get crossenv, cross_env and crossEnv for free. Same goes if I submit any of the others (Exception of course being "crossenv" which only reserves "crossenv").
@danjoc: do you have a reference for this? I'm interested.
> I really like how the NPM simultaneously insults two legends in crypto and does _nothing_ to protect the node ecosystem, deferring to "better solutions" that don't exist and will never exist.
This is not unexpected if u take into account behind all the code are still humans.
Everyone can share everything for free, safe and sound in a happy world.
Didn't happen ever in the "real world", won't happen here. It's idealistic bias.
I'm sure many things have been written on this, but this is essentially an issue rooted in human behaviour.
It always comes down to having a or multiple arbiter(s) to maintain a standard. The issue with this in these type off ecosystems is that it's simply too big and too dynamic unless devs and the curators are on common terms release wise.
By now you basically are threading being an organisation potentially elevating privileges with just a small portion off devs to realistically deal with the scale off things. In this centralized state it can swing the other way, mainting heavy arbiting and release standards (Apple for example), creating a potential more stable and secure but closed system.
Last time I wrote a Firefox extension, the code was manually reviewed before being officially published on addons.mozilla.org (it's there as experimental release with big warning bars while it sits in queue for code review).
To publish an Android app, I need to verify my name by paying Google some money ($25?) and my code has to pass some automated checks.
It seems like anyone can publish just about anything anonymously on npm. That model has upsides, but it's not exactly state-of-the art in terms of QA (though you could argue whether QA is the right term here).
Your QA teams are looking up every entry in your package.json files, your Maven poms, your Gemfiles, your requirements.txt files? They're making sure that something that builds completely cleanly and shows no external errors doesn't have a typo in it?
That's a pretty big straw man you wrote there, to imply that getting rid of not-all errors is no better than not getting rid of any errors at all.
In fact, GNU/Linux distros with even minimal QA will disallow network access during builds. Also we do in fact manually audit quite a lot of stuff to make sure this sort of bullshit doesn't get uploaded to the archives.
My QA team, upon a request to add a package called "crossenv" to the npm repo, would say "this is suspiciously similar to the existing cross-env package. Request denied." Alas, npm has no such team.
A problem that exists because there was no QA to start... Instead we get "awesome" lists of "curated" packages on github, which does nothing to solve the problem.
Half a million, is that all?
Levels of QA exist. As pointed out by https://news.ycombinator.com/item?id=14905660 it would take very little to require something like a bug report that's then had various levels signed off on.
Paper: http://incolumitas.com/data/thesis.pdf
Blog post: http://incolumitas.com/2016/06/08/typosquatting-package-mana...
Discussion: https://news.ycombinator.com/item?id=11862217 https://www.reddit.com/r/netsec/comments/4n4w2h/
The paper also discusses possible mitigation measures, including prohibiting registering new packages within a certain Levenshtein distance of existing packages and using additional namespacing.