I think that this clearly falls under the heading 'naming issue.' People know what they want, but do not enter it properly.
I can't think of a 100% off-hand, which isn't surprising, because it's a hard problem.
pmontra's suggestion to use typo blacklisting ain't a bad idea. Maybe some sort of reputation-per-name could help?
I wonder if you could do something similar here - enter the name of the package and a code of some sort. I haven't thought this through in a lot of detail.
That doesn't work with arbitrary names because they are, well, arbitrary.
This could get mildly annoying every once in a while when there are legitimate non-clashing names. A better metric/typo recognition technique is probably possible. Or else some manual process for requesting exceptions (maybe with a tiny fee to help fund the overall project) would also address this problem.
EDIT: Just downloaded and read the thesis abstract. The author actually suggests the first idea: "The analytical part generates ideas
for countermeasures that allow repository maintainers or users to detect typosquatting attacks
in the future. For this purpose potential typosquatting candidates could be generated for each
legitimate package name with the help of the Levenshtein distance algorithms or Bayesian
networks. Another option that can be considered is the Metaphone algorithm."
Who would use that?
Package managers have humans to deal with edge cases (removing malicious packages, investigating package errors, etc.) and this is no different. It wouldn't significantly increase their burden because only a small fraction of package names should require human validation.
- Maintainership can change over time.
- Multiple people may trade off releasing a package, but it's still the same package.
- There may be multiple repos (consider you may want to run a local company repo for non-redistributable modules).
I imagine in the end, one of the better approaches to the installation name typo problem might be to scan the code for what packages are required (utilizing as much specific information as possible), and confirming that exists as a local package that can be installed or offering to install it. Package installers should be able to take a source file or files, and install modules listed within. This won't solve all cases (dynamically determined and loaded modules may be a problem still), but it will solve quite a bit of them.
Github allows transferring of repos to another "namespace" (username), and will even forward requests from the old one to the new one for a while (how long i'm not sure...)
Thinking about it a bit more that kind of "mutability" might not be the best idea in a package manager...
Still, i think the namespaces can help more than they hurt if the platform is designed with them in mind, as even "namespace-less" systems still suffer from some of those issues like wanting to rename a package or split it up into multiple smaller packages.
For a while I bumped into projects that tried to follow the old Linux model of even/odd version numbers for telegraphing API stability. Long term support and backported security enhancements are another case where maybe the guys working on new functionality are exactly the wrong people to take responsibility.
There could also be some other cool tricks you could apply (This is the first time you are installing a package from "Maintaner", would you like to continue?)
The maintainer-level confirmation could be of slight assistance to advanced users, but it's no panacea.
For example, on the Python Package Index five people have authorization to publish a new Django release. Creating a "Django" org namespace wouldn't help, since someone could typo the org name and hit a squatted malicious version (and that's almost certainly what it would end up being; our github org is named "django").