
Malicious Python libraries found and removed from PyPI - bobjordan
https://www.zdnet.com/article/twelve-malicious-python-libraries-found-and-removed-from-pypi/
======
ris
Ah, wild west package repositories...

Only yesterday the developers of Itsdangerous went and _deleted_ their
recently-released 1.0.0 version. That's right. Deleted. Gone. Anyone who made
the upgrade and for some insane reason were using pip in their release
pipeline suddenly had an un-deployable app.

And people complain that they don't want maintainers "getting in the way"
between them and developers.

~~~
js2
[https://github.com/pallets/itsdangerous/issues/112](https://github.com/pallets/itsdangerous/issues/112)

> _I’m sorry for the inconvenience caused but I missed that there was a
> signature change that made it into 1.0. I yanked the release now because
> this change had some very bad consequences and yanking the release is less
> risky in comparison._

Ugh, why not release 1.0.1 with the change reverted? That would be the right
thing to do whether 1.0 were yanked or not.

~~~
aidos
I sounds like mitsuhiko was between a rock and a hard place and had to do
damage control to mitigate the situation. Without looking into the details,
reverting wasn’t a straightforward option.

------
JackC
Another pypi issue I've been wondering about recently is binary wheels. We saw
an issue a few weeks back where pipenv hashes stopped matching because binary
wheels were added for a package years after the last release, and it wasn't
immediately obvious how to check if they were legitimate. It would be nice if
there was some automated scan to check that binary builds match the source.

I think there have been some python package reproducible build efforts at
Debian, and there's a standard image for building wheels, so maybe that's
somewhat possible?

~~~
ris
Nix manages this. [https://nixos.org/](https://nixos.org/)

------
nikonyrh
Is there an official policy on what kind of code a package can have? I assumed
it is the responsibilty of the end-user to determine if the package has code
they are willing to run. I mean the system worked just as intended: the user
asked for a piece of code and PyPI delivered it.

~~~
ploggingdev
[https://pypi.org/help/#admin-intervention](https://pypi.org/help/#admin-
intervention)

------
Eli_P
I've been trying to make some kind of smarter approach to prevent
typesquatting in my GUI for package managers called pips[1], by means of fast
BK-tree[2] package name comparison with the offline index.

There's some different approach I'm still working on to roll into pips, hope I
won't be too lazy to make it happen, eventually.

[1] [https://github.com/ptytb/pips](https://github.com/ptytb/pips) [2]
[https://en.wikipedia.org/wiki/BK-tree](https://en.wikipedia.org/wiki/BK-tree)

------
closed
Were the research / data leakage packages from the student who did their
thesis on the security of pypi etc..?

[https://incolumitas.com/2016/06/08/typosquatting-package-
man...](https://incolumitas.com/2016/06/08/typosquatting-package-managers/)

------
r0f1
What was wrong with the library timeit? Isn't this a legitimate one?

~~~
polotics
The real timeit doesn't need to be installed off pypi, it comes with python.
Which makes this example the sneakier one, the rest is mostly typo-squatting
with diango dajngo etc

~~~
r0f1
Oh wow.. I didn't know that. I will be more cautious from now on.. Thanks for
the info :)

------
Alex3917
Maybe PyPi should require uploaders to verify their real identities, or at
least flag packages where users haven't done so.

~~~
mounds
Whoa whoa whoa! Revolutionary idea - signed packages?! /s

In all seriousness though, Python's community is so communally fractured, I
don't see one standard winning out. When the community revolts against it's
own 'benevolent dictator for life' for wanting to better the language
specification... It's most likely headed for doom.

Python and PyPi are in a precarious state. They are getting a lot of attention
from a lot of people, who may or may not have it's best interest in mind.
Previous Python users were using the language out of love for the language and
it's facilities...

This issue is the tip of a much deeper iceberg lingering in the Python
ecosystem: how is Python being intelligibly managed, if at all?

~~~
Alex3917
> When the community revolts against it's own 'benevolent dictator for life'
> for wanting to better the language specification... It's most likely headed
> for doom.

That's Python's biggest strength. If you want a bug fixed or feature added to
javascript, you need to pay $20M to join ECMA or whatever. If you want the
same in Python you just post on the mailing list and argue your case. You
might not win, but you're at least guaranteed your day in court. That's the
most valuable 'feature' of the Python ecosystem, even if it gets a little
heated at times.

------
red-tea
Things like diango don't really seem like they would target typos as much as
malicious copy and paste targets.

~~~
Latteland
The whole world of package maintenance is subject to this, from pip to c#
packages, npm, everything. I'm sure we all (developers) have some of these on
our systems, because you rely on downloading packages to use that depend on
other things.

It makes me want to do development in a dummy account, and my 'personal
account' with my passwords and ssh credentials is somehow a separate account.
I never do that in practice because it would be too much pain. That's why
these kinds of package attacks are valuable.

So what can we do to address this? This story exactly lines up with my long
time worry. I could be careful or lucky and avoid things like this, but the
packages I use might not be so careful or lucky.

~~~
ris
> The whole world of package maintenance is subject to this

Wrong.

There are two types of package repository, "maintained" and what I call "wild
west". The latter include pypi, npm, homebrew, dockerhub, and any other repos
where any old joe can sign up and start uploading packages under some name
they choose. Uploaded packages are controlled by the single entity in charge
of the account, except in rare circumstances like this when the site owners
were alerted to specific mischief.

"Maintained" repositories have a layer of "maintainers" between the developers
and the users. Their responsibility is to shield the user from irresponsible,
user-hostile or potentially malicious decisions the developers may (and
surprisingly often do) make. These include most Linux distribution
repositories but also others like Nix and Guix. They tend to have fewer
packages because of the added work of performing the maintenance and tend to
lag behind release versions for the same reason but also because of an
inherent conservatism of the maintainers. In return users get greatly improved
stability. In the best cases (e.g. debian) the maintainers even do backports
of security fixes to older stable releases. The maintainers also make
decisions in a more public consensus basis and are better able to coordinate
releases between different packages to ensure compatibility.

Given the choice, I run a mile from the former style of repository.

~~~
donaldstufft
Honestly, this is kind of FUDish and assumes the worst case scenario for
upstream developers, and the best case scenario for the distro maintainers.

Another way of phrasing what this extra layer of maintainers provides, is a
second group of people who can introduce their own irresponsible, user-
hostile, and potentially malicious (or at the very least, negligent)
decisions. Worse, often times these developers have less (in some cases, far
less) knowledge of how the code itself works, and are applying their own
patches, often with minimal testing, without fully understanding the scope or
impact of the changes they're making. For every poor decision you can find in
a package that is popular enough to even appear in one of these downstream
repositories, one could just as easily find a case where this extra layer is
introducing their own problems.

The non FUD-ish answer is that whether you get your software directly from the
upstream developers through an uncurated repository like PyPI, or through a
curated repository like a Linux repository neither one is inherently better
than the other. Each of them has a variety of pros and cons and part modern
day engineering is looking at these tradeoffs and choosing the right set for
your particular situation. Sometimes that will even mean that you're choosing
different tradeoffs for different packages on the same system.

~~~
ris
> one could just as easily find a case where this extra layer is introducing
> their own problems.

In many years of using both, I have found examples of the latter being much
rarer. The example that everyone likes to pull up is debian and openssl, and
that's an example from 2006 when attitudes to security were very different.

The major difference between maintainer decisions and developer decisions is
that maintained repositories tend to be consensus based where as wild-west
repositories are my-way-or-the-highway based. If there's a poor decision made
by a maintainer, there's the opportunity to engage with the community and make
your case. You don't tend to get rogue maintainers that ignore the rest of the
maintainers - such people will get removed.

I've never seen a user-hostile decision made by a package maintainer.

As for testing, we're actually getting to the point where maintained
distributions are better tested than the vanilla packages. Nix, for instance,
makes efforts to enable tests during the build process of most packages,
meaning that when you get an installed package, it is known to pass its tests
_with its actual set of installed dependencies_ (which themselves should have
all passed their tests). This is moving towards having "integration tested"
packages.

With pip, the best you'll get these days is a warning message that it might be
installing the wrong versions of things because of a version conflict (and
that's a recent addition). No tests run, best of luck...

~~~
uranusjr
Another example would be how pip, and venv have some edge case failures on
Debian or Ubuntu. Repackaging is a constant struggle, especially when the
software adds new features that are not thoroughly tested by the packager.

Another caveat to consider is that people tend to blame the first party
(Python developers) when this happens, while the problem is really caused by
packagers. Bad things may happen less often with a packager in the middle, but
when they do, it’s generally a lot more difficult to deal with exactly because
of the added layer and complexity.

