Hacker News new | past | comments | ask | show | jobs | submit login

Probably the maintainers of the package managers know which typos their users do, because of the 404s in the logs or equivalent errors. A preventive action could be starting to blacklist any name resolving to 404. If somebody eventually tries to upload a package in the blacklist, a maintainer should check the code and whitelist the name. Obviously people can be very crative with typos and with squattinq and there is no real protection against mistakes.

Might it work to mandate that the name of an uploaded package have a minimum levenshtein distance (or similar calculation) from the names of all the existing packages? Then you wouldn't have to worry about maintaining a blacklist.

That would mean that, for example on crates.io, you couldn't create a `libm`, because `libc` is already very popular. I don't think that works.

The default approach would stop automated attacks, there is no reason why the repository couldn't whitelist libm after review

True- levenshtein isn't the best algorithm for the purpose. Is there an algorithm that takes key proximity into account? Like, 'libm' and 'libc' are sufficiently different to preclude typos, but 'lib[n/j/k]' or 'lib[x/d/f/v]' are not?

Key proximity on which of the hundreds of keyboard layouts?

Good question... I'd imagine your standard QUERTY makes up a sizeable majority of programmers, but then I have no data to back that up... :)

It seems a good idea.

I used the Ruby code at the beginning of http://stackoverflow.com/questions/16323571/measure-the-dist... to calculate the distance between the package names at page 60 of the thesis and their typos. The maximum is 2.

I checked some similar package names from a Gemfile.lock of a project of mine. Unfortunately the two gems hike and hirb are also at distance 2. Probably many short names are close with this metric.

A combination of the two approaches could be ok: knowing that a name was blacklisted should be an indicator that's not a good name, despite the distance with any other name, plus an approval of the maintainers for distance 2.

But a blacklist could generate another type of squatting, with people trying to pre-blacklist perfectly legit names. Only one thing is sure: there is more work to do for the maintainers and this extra friction is not good.

Edit: the distance suffers from the same problem.

Surely some troll would deploy a fleet of machines that flood package indexes with requests to available names, effectively blacklisting entire dictionaries and eventually all short names.

Yeah, this is what I came to think too. I mentioned it in another comment. Somebody suggested to use a distance indicator, but trolls could attack that too.

> Obviously people can be very crative with typos and with squattinq and there is no real protection against mistakes.

I see what you did.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact