
Malicious software libraries found in PyPI posing as well known libraries - nariinano
http://www.nbu.gov.sk/skcsirt-sa-20170909-pypi/
======
hannob
Ok, here's some ugly backstory on this: This problem has been known for a
while, yet both the pypi devs and the python security team decided to ignore
it.

Last year someone wrote his thesis describing python typosquatting and
standard library name squatting:
[http://incolumitas.com/2016/06/08/typosquatting-package-
mana...](http://incolumitas.com/2016/06/08/typosquatting-package-managers/)

However after that the packages used in this thesis - the most successful one
being urllib2 - weren't blocked, they were deleted. Benjamin Bach was able to
register urllib2 afterwards. Benjamin and I decided that we'd now try to
register as many stdlib names as possible.

See also: [https://www.pytosquatting.org/](https://www.pytosquatting.org/)

~~~
hodgesrm
This is a scary attack. One partial mitigation is to use a firewall (e.g.,
Amazon VPC network ACLs) to restrict outbound network traffic to a small
number of known addresses like well-known repos. I can't think of a good
reason why code in any well-behaved application should be allowed to make
random outbound network calls.

I think it's also on app developers to rethink the culture of randomly
grabbing packages to build applications quickly. This is already a security
problem even with approved repos. Having a rat's nest of packages makes it
hard to upgrade quickly when those repos post updates to address
vulnerabilities.

Edit: Removed confusing statement about return connections

~~~
zokier
> One partial mitigation is to use a firewall (e.g., Amazon VPC network ACLs)
> to restrict outbound network traffic to a small number of known addresses
> like well-known repos.

That breaks down very quickly with the combination of public CDNs and TLS. I
suppose you could do SNI based firewalling, but that is bit ugly, and afaik
you can't do that easily with common firewalls (like netfilter).

~~~
hodgesrm
As I mentioned above some of this goes back to application design. Ideally if
you are layered correctly the application components of the system that own
data should not talk to the outside world except in very narrowly
circumscribed ways.

Unfortunately most real world systems that have internet access are created in
far from ideal conditions.

------
chatmasta
Package managers seem to be an increasingly popular attack vector. It's only
luck that none of the attacks have been particularly malicious yet.
Considering how many package manager downloads go to a server in a datacenter,
a widely distributed malicious package could control a botnet with extremely
high throughput, or wreak havoc on any databases it comes into contact with.

It's only a matter of time before something like this happens. A big part of
the problem is that application package managers, like pip or npm, are far
less sophisticated than those of operating systems, like aptitude or yum. It
needs to be easy for developers to open source their code, and to mark
dependencies with precise commit hashes, but the download also needs to be
secure and verifiable. There are many difficult tradeoffs to consider in terms
of usability, centralization, security and trust.

~~~
raesene6
Another fun fact to consider is that with many package formats, you can
execute arbitrary code _at install time_ so if a malicious package can get
into a repository, it's very likely to start compromising systems quickly.

Whilst a package manager repo. compromise would be the biggest bang in terms
of attack, compromising the credentials of the developers of popualar
libraries would be an easier attack (and indeed is already happening
[https://twitter.com/chrispederick/status/892768218162487300](https://twitter.com/chrispederick/status/892768218162487300))

~~~
ams6110
Doubly so since when installed on a server very often it will be done as "sudo
pip install ....." (/s/pip/other-package-manager/ as needed)

~~~
tekromancr
I almost never see this. Even on systems that are only running a single python
project, I only ever see folks use virtualenv. The only time I ever see things
installed with sudo is when the package is being installed in a docker
container.

~~~
striking
This used to be super common, though. I see it all the time in legacy apps.

~~~
acdha
Yes – to the degree that it's uncommon now that's because many people in the
community spent years loudly advising against it.

------
kasabali
Yet another attack vector that doesn't exist at all in Linux distributions but
invented by language package managers, sadly.

They solved the issue 2 decades ago by heavily vetting packages before
accepting them into repositories. Users are allowed to add and use packages
from 3rd party repositories.

Maybe solution to this is creating curated repositories based on publicly open
ones and using them by default (and requiring opt-in for using other
repositories). Conda for Python and Stackage for Haskell seems like relevant
solutions.

~~~
collyw
Sure its nice (and easier) to use the distro's package management system, but
it often just isn't up to date enough. You end up using thing that are a while
out of date and may have security flaws as a result.

~~~
eeZah7Ux
> using thing that are a while out of date and may have security flaws as a
> result

On the contrary, on distributions that perform security updates the level of
security of a package can only increase over time.

It might sound obvious but vulnerabilities are created in new releases, while
vulns in existing packages can be only be found and fixed, not created.

(Of course I'm talking only about vulnerabilities here and excluding removal
of obsoleted crypto or addition of new security features)

~~~
andrewflnr
I would just like to point out that a "fix" for a vulnerability does
occasionally introduce others.

------
thearn4
It looks like the code phones home to a server in China:

IP: 121.42.217.44 Decimal: 2032851244 Hostname: 121.42.217.44 ASN: 37963 ISP:
Hangzhou Alibaba Advertising Co.,Ltd. Organization: Hangzhou Alibaba
Advertising Co.,Ltd. Services: None detected Type: Broadband Assignment:
Static IP Blacklist: Click to Check Blacklist Status Continent: Asia Country:
China cn flag State/Region: Zhejiang City: Hangzhou Latitude: 30.2936 (30° 17′
36.96″ N) Longitude: 120.1614 (120° 9′ 41.04″ E)

~~~
sdiepend
When you go this adress
[http://121.42.217.44:8080/](http://121.42.217.44:8080/)

"Hi bro :)

Welcome Here!

Leave Messages via HTTP Log Please :)"

------
IgorPartola
This to me is the nightmare scenario. Well one of the two, the other one being
that a developer of an obscure library I use has their password to PyPI
compromised and a bad actor uploads a backdoored version of the library.

Fundamentally, the reason this is different from how thinks like Linux distos
work is because Linux distros have maintainers who are in charge of making
sure every new update to one of their packages is legit. I am sure you can try
to sneak malicious code in, but it isn't going to be easy.

I am not advocating that PyPI (and npm) adopt the same model. That would be
too restrictive. But maybe just showing the number of downloads isn't the best
way to assure whether the package is legit. Perhaps some kind of built in
review system would be nice.

~~~
raesene6
A review system unfortunately isn't likely to be practicable with current
development models. npm alone has over 500,000 packages
([http://www.modulecounts.com/](http://www.modulecounts.com/)) so even a one
time review isn't going to happen.

If people want a more trusted solution the likely outcome is that they'll need
to use a smaller more static set of libraries and then either do the audits
themselves, or outsource that to a 3rd party.

Ofc with current speeds of change and deployments, it doesn't seem likely that
many companies will adopt that model.

~~~
mschuster91
> npm alone has over 500,000 packages
> ([http://www.modulecounts.com/](http://www.modulecounts.com/)) so even a one
> time review isn't going to happen.

But at least the modules with the most downloads (webpack, react, or stuff
like left-pad) could be vetted, and especially npm could implement a 2-or-more
person model - basically, everyone with publish access can upload a new
artifact, but to actually have it distributed to endusers, a second person
would be required to sign off.

~~~
IgorPartola
That's the thing. I worry less about popular packages. I can check that
Django's GitHub repo links to PyPI and vice versa. But a random package to
parse DSN's? I don't know it from Adam. I want to use it, and lots of others
do too, but not everyone is going to review it. Maybe just a button on the
package that says "I found insecure code!" Would be good.

~~~
mschuster91
> That's the thing. I worry less about popular packages. I can check that
> Django's GitHub repo links to PyPI and vice versa.

I worry about the most popular, and there the small and next-to-unmaintained.
Just think back to the left-pad desaster that broke builds all over the world
and imagine it was not a deleted package but an update containing malware. I
assume there are lots of such "hidden gems" where the maintainer has gone
away... the consequences of hacking just _one_ improperly secured account are
severe.

------
raesene6
This isn't, in any way, a new problem. I did a presentation on this topic for
OWASP AppSecEU 2015 ([https://www.youtube.com/watch?v=Wn190b4EJWk&list=PLpr-
xdpM8w...](https://www.youtube.com/watch?v=Wn190b4EJWk&list=PLpr-xdpM8wG-
ZTcHhFfAeBthNVZVEtkg9&index=10)) and when doing the research for that I
encountered cases of repo. attacks and compromise.

IME the problem will continue unless the customers (e.g. companies making use
of the libraries hosted) are willing to pay more for a service with higher
levels of assurance.

The budget required to implement additional security at scale is quite high,
and probably not a good match with a free (at point of use) service.

~~~
fovc
If someone here wants to build a business around this, count me in for NPM
(high willingness to pay) or PyPi (lower WTP).

Here's an idea: make it similar to Kickstarter, where customers can commit a
certain amount of funds towards a specific package. If the package doesn't
"tilt" in a certain amount of time money goes back. Otherwise you vet a point
release and add it to your repo. you could offer subscriptions to keep
packages updated or handle each update as its own project (with presumably
lower costs if a recent release has been audited). Handling dependencies is
key as an exercise for the reader

~~~
pmoriarty
One thing to consider if you're going to provide a service like this:

What happens if a vulnerability nevertheless sneaks through?

The whoever did the vetting could conceivably get sued. So then they might
want to take out insurance or try to protect themselves from lawsuits in some
other way -- all of which is likely to make such a service even more
expensive.

~~~
cookiecaper
It has to be constrained to something reasonable. You can't guarantee the
software is safe, but you can guarantee it is published by someone who is who
they say they are, similar to EV certificates for domains. You can also refuse
to publish packages with intentionally-confusing names.

~~~
pmoriarty
_" you can guarantee it is published by someone who is who they say they are"_

Can you? Positively identifying people seems a pretty tricky and easily
screwed up business.

ID's can be forged, and a web of trust requires, well, trust.

I guess such a service could say something like "we got this person's ID
(and/or address)" or "here's this key's web of trust", and that would probably
be a bit better than what we have today (which is virtually nothing), but it
would still be a far cry from _" guaranteeing it is published by someone who
is who they say they are"_.

~~~
cookiecaper
EV certs have a complex verification process that can involve sending a
physical representative from the company down to the place of business to
confirm its presence/existence.

Bitcoin trading platforms have shown that compliance with AML/KYC regulations
can be performed virtually by manual verification of a valid government ID,
timestamped photo, handwritten note, and other mechanisms.

A company offering this service would go outside of the keyserver and verify
the ID independently. It'd be much more of a "notarized packages" paradigm
rather than just "published by 1337PyHax0r-88".

It is true that even extensive manual verification processes dependent on
government-issued IDs can be faked, but there's a much higher bar involved.

------
Sir_Cmpwn
I think a more Linux-like approach to package repos is better - a curated
package repository run by volunteers in maintainership roles. Then you have a
human being verifying the upstream and keeping malware out, and get more
consistency across packages as a bonus. If you want your package added it's as
simple as sending an email and provides a new avenue for people to contribute
to the success of the ecosystem as package maintainers.

When you make the next big thing, consider this approach.

~~~
zzzcpan
I don't think there are any maintainers that verify upstream code, they only
manage packages and updates. Which is actually safer to do without
maintainers, completely automatically, as it will eliminate a huge attack
surface introduced by a maintainer.

~~~
ex_amazon_sde
Debian Developers are responsible of the package quality. Most package only
well-known software that they are familiar with or read the code. Some do more
thorough security audits.

Some security-sensitive packages are maintained by teams to share the
workload.

~~~
zzzcpan
They do a lot that could be and sometimes even is automated, but that doesn't
help with security, only weakens it. Which is my point. It's better to
automate package generation and trust fewer people, mainly the authors, not
introduce maintainers into it.

~~~
peterwwillis
Authors don't generate distro packages because there's too many distros and
each distro needs to make changes that have nothing to do with code.
Maintainers are necessary and a package signed by the author is a non-starter.

However, source code _can_ be signed and then used to make a package signed by
a distro.

~~~
zzzcpan
Think of it instead of allowing access to the system to maintain packages, we
allow people to submit code that generates packages.

~~~
peterwwillis
Of course you can do this - all packages are basically just wrappers around
upstream code. But you still need someone to maintain the wrapper, and they
have to check every new code release to see if there's something in the
wrapper that has to change. And there are multiple distros. There's no getting
away from maintainers with traditional linux distros.

Code package management is different. The author writes their software
specifically to conform to the one code package management system. There's no
wrapper glue needed, so you don't need a maintainer. Just release your new
code and it fits into the system, and other code/tools/etc can just pick it up
and use it.

This works if you constantly update all the software you use everywhere, and
is pretty much guaranteed to become a nightmare if you don't. CPAN is probably
the most mature software package management system in existence and it's still
a nightmare if you don't keep a private repo and tightly manage releases, and
you absolutely need a maintainer.

~~~
zzzcpan
To be clear, what I'm suggesting is to generate those wrappers automatically,
instead of maintaining them manually. A script can visit a release page daily,
parse it and check for updates. If there is a new upstream release, it can
generate a wrapper and let the build system do the rest, produce binaries,
test them, etc. When things break, the code needs to be fixed, but it's
definitely very far from every release. And you don't have to trust the
maintainer of that script anymore or even have a separate maintainer,
everything could be reviewed on pull requests with only a small group of
people having commit rights to the repository.

~~~
peterwwillis
That's basically how packages are maintained today, they just don't have as
much automation and there's a lot less packages as a result. If you made one
package for every software update and had to review each one you'd spend a lot
more time reviewing.

Trust isn't an issue in reviewed/maintained repos because you have eyeballs on
everything. When anyone can just ship an app/library and release it
automatically you get these malicious software issues.

------
sdiepend
Related to this? [http://incolumitas.com/2016/06/08/typosquatting-package-
mana...](http://incolumitas.com/2016/06/08/typosquatting-package-managers/)

~~~
xnyhps
That one is even more malicious, it uploads contents of your ~/.bash_history
and system profile. But at least it notifies you afterwards...

~~~
edraferi
Yes, but they do filter bash_history client side, only transmitting pip-
related commands. They did this to find additional common typos. The relevant
code:

    
    
      def get_command_history():
        if os.name == 'nt':
          # handle windows
          # http://serverfault.com/questions/95404/
          #is-there-a-global-persistent-cmd-history
          # apparently, there is no history in windows :(
          return ''
      
        elif os.name == 'posix':
          # handle linux and mac
          cmd = 'cat {}/.bash_history | grep -E "pip[23]? install"'
          return os.popen(cmd.format(os.path.expanduser('~'))).read()

------
mwexler
Both Anaconda (for Python, [https://docs.anaconda.com/anaconda/packages/pkg-
docs](https://docs.anaconda.com/anaconda/packages/pkg-docs)) and Microsoft
(for R, [https://mran.microsoft.com/](https://mran.microsoft.com/)) have
"reviewed and audited" collections of packages for their languages. That's
part of what you pay for when you buy support for the open source tools.

------
geekamongus
Why is there no indication of any of this on the python.org website or any of
their social media accounts?

I checked:

[https://pypi.python.org/pypi](https://pypi.python.org/pypi)

[https://www.python.org/blogs/](https://www.python.org/blogs/)

[http://planetpython.org/](http://planetpython.org/)

[https://pypi.python.org/security](https://pypi.python.org/security)

[https://twitter.com/pythoninsider](https://twitter.com/pythoninsider)

[https://plus.google.com/+Python](https://plus.google.com/+Python)

[https://www.facebook.com/pythonlang?fref=ts](https://www.facebook.com/pythonlang?fref=ts)

[https://twitter.com/ThePSF](https://twitter.com/ThePSF)

~~~
takluyver
I guess that's because it's not a surprise. This has come up before, and it's
basically unavoidable with the way PyPI is designed to work: if you see an
unclaimed name, you can put whatever you want there.

~~~
geekamongus
I see your point, but I don't think that it needs to be a surprise to be
announced or made well-known. No one is surprised that Microsoft releases
loads of patches every Patch Tuesday, but they still publicize it and make it
well known and easy for people to find out about.

------
rantanplan
The regex they have for identifying fake/harmful packages is wrong.

`pip list –format=legacy | egrep '^(acqusition|apidev-coop|bzip|crypt|django-
server|pwd|setup-tools|telnet|urlib3|urllib) '`

This incorrectly lists `urllib3` or the `cryptography` package for example,
which are perfectly valid packages.

[UPDATE]

Read "tobltobs" comment below. I incorrectly removed a trailing space from the
regex.

~~~
nariinano
I believe urllib3 is built-in. So if you have installed it from PyPI you've
gotten a malicious version.

~~~
cpburns2009
_urllib_ and _urllib2_ are built-in for Python 2, and were merged and
reorganized as just _urllib_ in Python 3. _urllib3_ is a third-party module.

~~~
haikuginger
This is correct. In general, though, most packages don't rely on urllib3
directly, but on `requests`, which uses urllib3 but provides a friendlier API
and built-in SSL cert verification.

------
singularity2001
"Success of the attack relies on negligence of the developer"

How about package manager managers accept their enourmous responsabilty?
urllib vs urllib2, one is a virus? Sorry but that is not "negligence of the
developer"

~~~
zokier
Managing supply chain is one the basic principles of good engineering. Not
properly vetting your sources is negligence. The problem of course is that
computers are really good at amplifying work, including mistakes. So small
mistake, like a typo, could have catastrophic impact, like injecting malware
that can take over the whole system.

------
mwerty
How about a Levenshtein distance threshold for new package names to be
accepted? I.e only allow names that are different enough from the existing set
to avoid typos (or whatever errors we are trying to guard against)?

~~~
andrewfong
You don't need a strict ban for this to work either. Maybe just an end-user
warning if distance < N and the relative popularity of the two modules is very
high. You could also allow users or organizations to explicitly whitelist some
names.

------
EstDelenda
Any method of software distribution which is not rooted in cryptographic
author verification against a fine-grained, user-manageable trust store should
be put bellow the sanity waterline, 20 years ago.

------
defined
Here's something that contributes to typosquatting: the lack of responsiveness
by package management organizations to claims on orphaned or unmaintainable
packages.

People who upload packages often leave organizations, who are then stuck with
a package they can't update because the password went with the person, and the
email reset link points to a now-defunct email address.

Petitioning the package management team is sometimes fruitless, forcing a
needless new instance of typosquatting.

~~~
chupapuma
I have found the PyPI group of people to be very helpful in these cases. You
also should probably, as an organization, have more than one owner of your
packages. That way, unless two people leave, things aren't orphaned. We have
gone as far to have a 'meta-user' that is on all packages. It is only ever
used to recover a fully abandoned package.

~~~
defined
I understand you are trying to be helpful, and of course you are right, but
the fact is that sometimes things fall between the cracks, especially in, say,
hard-pressed startups.

There are so many _shoulds_ in the world that don't make it to _dids_ , it
reminds me of the joke about the salesman trying to sell farming improvement
techniques and being turned down by the old farmer, who says, "Son, I don't
farm half as good as I know how to already."

Unfortunately, I have not found the PyPI group as helpful as you have. Perhaps
I have been looking in the wrong places.

~~~
defined
UPDATE: I was helped out by a very nice person from PyPI, so kudos to them.

------
ehnto
Part of my dislike for the Node ecosystem in particular and I am sure others
have a similar problem, is the dependency trees are super complex.

Because packages tend to be small and many, and each of those has their own
dependencies, you can end up with hundreds of packages installed which is
simply impractical to manually review.

It is not node, but we do in fact manually review each package we utilize for
our given language because it's feasible and worthwhile as the dependency tree
is small in this ecosystem. Each and every package is a possible attack vector
whether that be intentionally or just because it's poorly written and we can't
simply ignore that because it's the done thing and "the community reviews
them".

------
bhouston
I bet there are quite a few malicious NPM packages that we do not know about.

Is Node is used in government and military solutions? If so then the NPM
ecosystem is likely targeted by state actors, and it is a sitting duck.

~~~
thehardsphere
State actors do not limit themselves to government and military targets; many
of them target civilians for all sorts of purposes.

------
asperous
I once tried to upload a package called "requirements.txt" (since people do
pip install requirements.txt all the time forgetting the -r).

Pypi actually blocks that name from being a package!

------
EGreg
Here is the general problem with dependencies:

When a dependency changes, all the projects that directly depend on it should
get notified immediately and their maintainers should rush to test the new
changes, to see if they break anything.

There is no shortcut around this, because if B _1_ , B _2_ , ... B _n_ depend
on A _1_ , the consequences may be different for each B _k_.

The only real secure optimization that can be done is realizing that some of
the B _k_ use A _1_ the _exact same_ limited way and thus make an intermediate
A _1b_ that depends on A _1_ which those B _k_ 's depend on. These
"projection" builds may be automated by eg the set of methods called by the
B's.

Anyway, this is the way that iOS does it before iOS 11 comes out to users.
They release a beta to all developers. And they even fix bugs in the beta
before releasing to the public.

Without beta testing periods, you can get laziness and just auto-accepting of
whatever cane out.

There is be an "alpha release" feature in git where maintainers might put out
the next version to be tested by all who depend on it. _THIS FEATURE SHOULD
NOTIFY THE MAINTAINERS SUBSCRIBED TO THE REPO. THE BUILD ITSELF SHOULD GET
ISSUES AND RATINGS FROM MAINTAINERS AS THEY TEST THE NEW BUILD_. And releases
should not be too frequent.

This is the way to prevent bad things from happening. But that also means that
the deeper the dependency is, the more levels this process could take to
propagate to end-users.

------
teilo
I think we need a system to prevent this instead of the wild-west that PyPi
has become. For example: Developer signatures that are checked against a
community rating. If someone does `pip install` pip would look up the
developer signature of the package and check a community rating that would
verify this is a developer who has offered legit packages in the past. It's
not foolproof, but it would go a long way towards solving this.

~~~
wongarsu
That sounds easy to defeat. Make some mundane, but legit packages (maybe on of
those "$X but without the pointless complexity"-packages), gain trust, once
trust is reached start uploading typo-squatting packages.

Knowing today's internet, programmers from cheap-labour nations (India & Co.)
would soon start offering "trusted PyPi accounts" for sale on hacker forums.

~~~
teilo
Yes, but keys can also be revoked, providing a way to mitigate this.

------
dpflan
This is interesting in conjunction with the recent post about Python's
popularity because that may be a weakness exploited here [1.]. It's easy to
use and install and get libraries for anything, and apparently libraries for
infecting your machine :(.

[1.]
[https://news.ycombinator.com/item?id=15249348](https://news.ycombinator.com/item?id=15249348)

~~~
nariinano
The problem is that the vetting process of PyPI is completely inexistant. This
has happened many times in the past, the last time I remember they uploaded a
few libraries called "bs4" and stuff like that.

------
ConfucianNardin
Relatedly: [https://github.com/pypa/pypi-
legacy/issues/644#issuecomment-...](https://github.com/pypa/pypi-
legacy/issues/644#issuecomment-305134745)

Also [http://evilpackage.fatezero.org/](http://evilpackage.fatezero.org/) /
[https://github.com/fate0/cookiecutter-evilpy-
package](https://github.com/fate0/cookiecutter-evilpy-package)

That one has neutered the call-home code by now, though.

------
elcapitan
Would it be possible to have a general package manager (like apt) as reusable
base for the individual language specific package managers? I know that npm
and pip and gem etc all do some additional stuff, but at the core they all do
the same (pull packages from repo, do some postinstall, resolve dependencies,
maybe in some cases even check if the package is legit). So we could implement
and check that once and then just reuse it like we do with many other
libraries for image processing etc.

~~~
eeZah7Ux
+1 to this. There's no need to reinvent the wheel a million times. At least
having a shared standard on how to do packaging.

------
marcinkuzminski
This is a known problem for a while. For example when we (RhodeCode) had an
installer based on pip we actually rolled our own PYPI index. Having one you
host is very easy and there are nice projects in existence that allowed that.
It basically solved a problem of deployment when pypi wasn't available, speed
up our test installer builds, and also total control over the packages we
ship.

------
pishpash
Maybe packages should be signed by several trusted maintainers. Or, noticing
PyPI packages list a source code link on github sometimes, along those lines,
there can be a process to prove ownership of some known online identity,
keybase style. Unpopular packages can also be flagged, especially one that has
a near twin that is much more popular. There are many solutions.

------
1ba9115454
Unless your package manager enforces signatures and you trust the person that
signed the package. Then this is an attack vector for you.

That includes Java (Maven), Ruby (Gems, Bundler), Node (npm), Haskel (stack)
etc etc.

Installing code via package managers is the coders equivelant of opening up an
exe sent to you in an email.

Code downloaded from the internet is not to be trusted.

~~~
zokier
Package signing is no silver bullet.

Signing packages helps against typosquatting about as much as SSL certificates
help against phishing. Or in other words, not at all, especially if we don't
have the certificates rooted in real world identities (like EV SSL certs).

------
julianj
Looks like they missed one:

[https://pypkg.com/pypi/xml/f/setup.py](https://pypkg.com/pypi/xml/f/setup.py)

Dork: site:[https://pypkg.com](https://pypkg.com) intext:"just toy, no harm"

------
ris
Hooray for the "wild west" model of package repositories.

Come back maintainers & packagers, all is forgiven!

------
EGreg
This is why I am not a huge fan of using package managers. I like to
understand the code we put into our platform, and vet it. And not have it
change under us automatically, after that, but review the changes manually
before accepting it.

I felt a bit curmudgeonly but we have a responsibility at
[https://qbix.com/platform](https://qbix.com/platform) for all our apps being
secure. I wanted to use repos for each package and manually git pull or hg
pull them when they changed.

I was finally convinced by our developers to just use package managers with
version pinning. Honestly it's really hard to avoid package managers,
especially for all the newer functionality such as Payment Requests or Web
Push. Luckily there is version pinning.

We want our clients to feel secure that we vetted ALL the code that went into
the platform. So our package json (and composer.json) uses version pinning.
We'd rather take a bug report and manually fix it than NO bug report and have
a SHTF moment.

------
atticusberg
to see if you have any of these deps on your python path:

pip list –format=legacy | egrep -e '^acqusition$' -e '^apidev-coop$' -e
'^bzip$' -e '^crypt$' -e '^django-server$' -e '^pwd$' -e '^setup-tools$' -e
'^telnet$' -e '^urlib3$' -e '^urllib$'

to see if you have any projects in a given directory that require them:

cat $(find /path/to/dir -name 'requirements.txt') | egrep -e '^acqusition=='
-e '^apidev-coop==' -e '^bzip==' -e '^crypt==' -e '^django-server==' -e
'^pwd==' -e '^setup-tools==' -e '^telnet==' -e '^urlib3==' -e '^urllib=='

~~~
jastr
pip list --format=legacy | cut -d' ' -f1 | xargs egrep '^(acqusition|apidev-
coop|bzip|crypt|django-server|pwd|setup-tools|telnet|urlib3|urllib)$'

------
justinsaccount
well shit, I guess I should have followed up on this after I noticed it 2
months ago.

[https://twitter.com/JustinAzoff/status/881163562739277824](https://twitter.com/JustinAzoff/status/881163562739277824)

~~~
andrew3726
Yup, same:
[https://gist.github.com/Spotlight0xff/829b7ebf32c4feec60ec44...](https://gist.github.com/Spotlight0xff/829b7ebf32c4feec60ec44eb86b0fb3f)

------
kumarvvr
Whoa urlib & urllib3. Those are pretty popular packages, especially to
newbies. Hundreds of websites that teach web-scraping use those libraries.

Wonder what is an effective form of protection against such attack vectors?

Do digitally signed certificates fit into this usage scenario??

~~~
zapt02
> Do digitally signed certificates fit into this usage scenario??

No, because either the package author would have to sign them, in which case
you have to choose to trust each package author, or the repository would sign
them, in which case there would be no improvement for this current issue,
since the repo would sign the fake packages as well.

~~~
kumarvvr
Any way in which blockchain technology can be used? Like, the transaction
becomes the act of the author uploading the code and the repo and user verify
the transaction in some form?

~~~
peterwwillis
Nope. You can't solve phishing with technological means. You have to curate
either a whitelist or blacklist.

The best way to handle this is whitelists of trusted package maintainers
and/or code authors.

~~~
zapt02
To be fair you can do a lot with simple heurestics. New package + large number
of downloads || new package || new package author = show users a warning
message before installing the package.

------
jwilk
python-dev thread:

[https://mail.python.org/pipermail/python-
dev/2017-September/...](https://mail.python.org/pipermail/python-
dev/2017-September/149569.html)

------
jastr
To check a few you different requirements.txt files (will look 3 folders deep)

find . -maxdepth 3 -name requirements.txt | xargs egrep '^(acqusition|apidev-
coop|bzip|crypt|django-server|pwd|setup-tools|telnet|urlib3|urllib)'

~~~
jastr
Avoid some false positives

pip list --format=legacy | cut -d' ' -f1 | xargs egrep '^(acqusition|apidev-
coop|bzip|crypt|django-server|pwd|setup-tools|telnet|urlib3|urllib)$'

------
jamespo
I wonder if it's worthwhile having a check that compares closeness of the name
to existing popular packages and if so does some extended vetting.

------
wiradikusuma
Anyone know if this is also an issue for Java? I've used Maven repository for
ages, and I know many big cos depend on it.

~~~
thehardsphere
It's less of an issue, but it still could be an issue.

Deployed Maven artifiacts from Central are to required to be signed with a PGP
key and are only supposed to come from approved hosts. I don't know how
strictly that is enforced and how hard it is to become a host, but at least
there is some kind of process.

Maven Central also doesn’t allow the removal of artifacts after they've been
published, and every artifact requires a unique version and name. And the
names are namespaced. So you don't have the issues that you see with npm,
where someone can pull a package and break everything people are using, and
then some third party can come in and publish anything under the exact same
name.

Is this model perfectly secure? No, you still have to trust that the artifact
was signed by a non-malicious person from a host that was not compromised.

------
jiffyToGo
tiny open source project for this.
[https://github.com/williamforbes/pypi_hacked_names](https://github.com/williamforbes/pypi_hacked_names)

------
pishpash
Maybe gov.sk should be vouched for too, I mean what's the chain of trust here?
Why should I trust anyone?

------
phonkee
Do not forget their password that worked for couple of years: nbuSR123 ...

------
a3n
Dry run?

~~~
b101010
The "malicious" code at the end of the advisory looks like nothing more than a
beacon announcing it was installed?

    
    
      edit:
      get current working directory
      get username
      get hostname
      concatenate the last 3 together
      obfuscate(/encrypt?) this string
      send the result as a http request to 121.42.217.44 (the value of the base64 string)

~~~
julianwachholz
# Welcome Here! :)

# just toy, no harm :)

------
amykhar
some of this could be helped by intelligent naming of packages. If something
is called urllib, name the package urllib because that's what people are going
to look for.

------
fruiapps
Curious to know something similar happening for Scala.

------
VMG
how likely is it that npm and other package managers that do not use digital
signatures by default are unaffected?

------
kevin_thibedeau
Python needs a way to run 2to3 during package installation that doesn't use
setup.py (setup.cfg or wheels). As it stands now, you have the hassle of
building a release four times if you want to support all combos of Py2, Py3,
32-bit, and 64-bit platforms. The absent support for 2 to 3 migration in the
safer alternatives is why I stick with setup.py.

~~~
ThiefMaster-
No, what you need to do is fix your package's code to work on Python 2 and 3
_without_ running 2to3 on it. The only case where this doesn't work is if you
have binary extensions - but then you need separate wheels in any case.

------
Aissen
I'm glad Go completely sidestepped the name-rush induced by this type of
package managers (composer, cpan, rvm, pypi, npm…). Just provide an URL. Done.

~~~
eeZah7Ux
...which is even less secure.

~~~
Aissen
May I ask why ? If anything, it's _more_ secure, since you know exactly who's
publishing what.

Yes, it might put a higher burden on the publisher if they don't host on
github/gitlab, etc.

But it strips the "magic" part and makes sure the dev knows where the code is
coming from.

~~~
eeZah7Ux
With packages identified by full URLs it's more likely to make a typo, or a
misremember a part of the URL, or search for it on a search engine and pick a
fork instead of the right one, or paste one from stack overflow or other
forums that is plain false or even look legitimate due to unicode tricks. Also
DNS MiTM/hijack can be used to inject a backdoor. Or the expiration of a
legitimate domain.

~~~
krapp
You can also make a typo when all you need is a package name - as long as
human beings have to type thigns out, that's going to be a problem. On the
other hand, with a URL, you can actually inspect the code directly and (if
it's hosted on Github or somewhere similar) see whether it's starred, forked
or has any issues. It's not a case of URLs being less secure, it's just a
tradeoff that pushes some of the security work to the community itself, rather
than a dedicated staff of curators.

And since most package managers eventually resolve packages to a URL
_somewhere,_ the issues you mention are probably present in other package
managers, albeit hidden behind abstractions.

~~~
eeZah7Ux
> You can also make a typo when all you need is a package name

"packagename" instead of a full URL is quite a difference. And you are not
addressing the other risks.

> On the other hand, with a URL, you can actually inspect the code directly

You can do that with most package managers as they show you the upstream URL.

Expecting every developer and every system engineer to verify every package
and every dependency they install is not "just a tradeoff". It's simply
impossible.

> since most package managers eventually resolve packages to a URL somewhere,
> the issues you mention are probably present in other package managers

Some check for the SSL certificate, some use package signing (e.g. APT). Also
if the pypi domain expires everybody will know, unlike a random library.

~~~
krapp
>"packagename" instead of a full URL is quite a difference. And you are not
addressing the other risks.

It is more characters, and therefore easier to misspell, but a URL also gives
you a domain and probably a namespace for the developer, each of which can act
as indicators of trustworthiness and help disambiguate packages with the same
or similar names.

If you can't double check your spelling for a package name or you just pick
the first Google result, or paste from SO, then you deserve what you get.
Domain hijacking, MITM, Unicode shenanigans and such are real risks, but not
of URLs as package identifiers per se, so much as risks of distributing
packages over the internet, which most if not all do anyway.

>You can do that with most package managers as they show you the upstream URL.

But if you don't _have_ to deal with the URL, chances are you won't, and it's
less likely you'll bother to follow it. I'm arguing that, if URLs are
dangerous because of their length, then package names alone are dangerous
because of their abstraction. I _know_ that I can probably trust including
"[https://github.com/symfony/symfony"](https://github.com/symfony/symfony")
but "symfony" or even "symfony/symfony" alone tells me nothing useful.

>Expecting every developer and every system engineer to verify every package
and every dependency they install is not "just a tradeoff". It's simply
impossible.

True, but Linus' Law is still basically the security model that's supposed to
underpin open source software, even it it's proven not to scale as well as
assumed. Someone, somewhere has to know the code is safe, and that's either
you or someone you trust, or (as is likely the case with most developers)
someone you just assume exists.

>Some check for the SSL certificate, some use package signing (e.g. APT). Also
if the pypi domain expires everybody will know, unlike a random library.

There's no reason a package manager using URLs can't also require package
servers (which, let's face it, are probably going to be Github and Bitbucket
in almost all cases) or maintainers to do something similar. Or at the very
least put out warnings the way browsers do about invalid or untrusted
certificates or unknown domains. You would lose the freedom of the "wild west"
model in its purest form but still not be tied down to a single source of
authority.

------
cdnsteve
I'm all for security but this hit a nerve with me: "Success of the attack
relies on negligence of the developer, or system administrator, who does not
check the name of the package thoroughly."

Package managers need to do more. If they had an enterprise version that you
could subscribe to monthly/annually invoice that you would get enterprises
onboard, they are concerned about security and will pay. Developers like us
will help encourage it. I'd rather not see some third-party "secure" package
managers but make them part of PyPi and send funding to the Python foundation.
They are seeking donations but that doesn't work well with businesses. Make it
a monthly/yearly service.

~~~
underko
There's written to check the package name and not to go through whole source
code.

