Hacker News new | past | comments | ask | show | jobs | submit login

Reminds me of the quote, 'there are only two hard things in computer science: naming things, cache invalidation and off-by-one errors.'

I think that this clearly falls under the heading 'naming issue.' People know what they want, but do not enter it properly.

I can't think of a 100% off-hand, which isn't surprising, because it's a hard problem.

pmontra's suggestion to use typo blacklisting ain't a bad idea. Maybe some sort of reputation-per-name could help?

Sure it's not an off-by-one[-key] error? :)

Banks have a similar problem when people write cheques or set up standing orders. You have to put a name and the account number.

I wonder if you could do something similar here - enter the name of the package and a code of some sort. I haven't thought this through in a lot of detail.

Banks generally solve the issue with simple classic checksumming methods that guarantee that any number with a typo or swapped neighbouring characters will always result in an invalid number.

That doesn't work with arbitrary names because they are, well, arbitrary.

Why not? Central repositories could require that all names are within a certain Levenshtein distance of one another.

This could get mildly annoying every once in a while when there are legitimate non-clashing names. A better metric/typo recognition technique is probably possible. Or else some manual process for requesting exceptions (maybe with a tiny fee to help fund the overall project) would also address this problem.

EDIT: Just downloaded and read the thesis abstract. The author actually suggests the first idea: "The analytical part generates ideas for countermeasures that allow repository maintainers or users to detect typosquatting attacks in the future. For this purpose potential typosquatting candidates could be generated for each legitimate package name with the help of the Levenshtein distance algorithms or Bayesian networks. Another option that can be considered is the Metaphone algorithm."

"Sorry, the otherwise 100% valid and reasonable name you've selected for your project is invalid because an algorithm has determined it is arbitrarily too close to this other unrelated project. Try again."

Who would use that?

"The project title has been flagged due to similarity with an existing name. Your submission has been sent for moderator review".

Package managers have humans to deal with edge cases (removing malicious packages, investigating package errors, etc.) and this is no different. It wouldn't significantly increase their burden because only a small fraction of package names should require human validation.

Or just refer to packages by 2 names.

It solves so many problems, this included.

This is all half of a much larger problem, which is package identification. Perl 6 specced out[1] quite a bit of a future system to handle a lot of this, and I believe a lot of it is now implemented. A few things you need to consider:

- Maintainership can change over time.

- Multiple people may trade off releasing a package, but it's still the same package.

- There may be multiple repos (consider you may want to run a local company repo for non-redistributable modules).

I imagine in the end, one of the better approaches to the installation name typo problem might be to scan the code for what packages are required (utilizing as much specific information as possible), and confirming that exists as a local package that can be installed or offering to install it. Package installers should be able to take a source file or files, and install modules listed within. This won't solve all cases (dynamically determined and loaded modules may be a problem still), but it will solve quite a bit of them.

1: http://design.perl6.org/S11.html#Versioning

Those are some good points, and I guess in my head I'm thinking of how Github does repos on their site as my "example".

Github allows transferring of repos to another "namespace" (username), and will even forward requests from the old one to the new one for a while (how long i'm not sure...)

Thinking about it a bit more that kind of "mutability" might not be the best idea in a package manager...

Still, i think the namespaces can help more than they hurt if the platform is designed with them in mind, as even "namespace-less" systems still suffer from some of those issues like wanting to rename a package or split it up into multiple smaller packages.

I'm not arguing for no namespaces, much the opposite. I'm arguing that the whole way most languages implement modules is fairly haphazard, and that that leads to this problem. If you review the link I included previously, you can see some examples of how you could definitively specify a particular module version. E.g.

    use OldDog:name<Dog>:auth<cpan:JRANDOM>:ver<1.2.1>;
This would use Dog from the CPAN repository, author JRANDOM, and version 1.2.1, and namespace it as OldDog. You could also just "use Dog;" to use the canonical Dog package from the canonical sources (in order). If we could just point our package manager at this source code and it could determine "Hmm, you have a Dog module of that version, but not that author and repo, and you have a Dog module from that repo and author but not that version. Looks like we need to install it." that would leave us in a much better place, both for code using definitive versions of packages, and admins/programmers installing packages and making sure they get the right one, if it's been defined.

A different maintainer per major/minor version number is probably common enough of a requirement that it should absolutely be considered in the scheme.

For a while I bumped into projects that tried to follow the old Linux model of even/odd version numbers for telegraphing API stability. Long term support and backported security enhancements are another case where maybe the guys working on new functionality are exactly the wrong people to take responsibility.

One imagines that "Maintainer" could be typoed as e.g. "Maintaner" just as easily as "PackageName" could be "PackagName".

But then the attacker would need to register a ton of packages that match other popular packages under their namespace which can set off some alarms. (I guess "solve" was a bit too strong of a word to use there...)

There could also be some other cool tricks you could apply (This is the first time you are installing a package from "Maintaner", would you like to continue?)

An attacker would only need to register the equivalent of the package under attack. Other packages would continue to error out harmlessly as they did before.

The maintainer-level confirmation could be of slight assistance to advanced users, but it's no panacea.

That gets into issues with needing to either support multiple individual maintainers for a single package, or require any multi-maintainer package to create an organization they'll all work under, and use the org name. And since the org name is likely to be the name of the package, you're back at square 1.

For example, on the Python Package Index five people have authorization to publish a new Django release. Creating a "Django" org namespace wouldn't help, since someone could typo the org name and hit a squatted malicious version (and that's almost certainly what it would end up being; our github org is named "django").

I guess that would work, as long as you require PackageName to be unique across all Maintainers.

Though this would obviate the most compelling argument for namespacing, which is to allow exactly that.

could go easily be mislead by:


Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact