
Two malicious Python libraries caught stealing SSH and GPG keys - choult
https://www.zdnet.com/article/two-malicious-python-libraries-removed-from-pypi/
======
greggman2
I don't know what the solution is but it feels like this is a much bigger
issue and we need some rethinking of how OSes work by default. Apple has taken
some steps it seems the last 2 MacOS updates where they block access to
certain folders for lots of executables until the user specifically gives that
permission. Unfortunately for things like python the permission is granted to
the Terminal app so once given, all programs running under the terminal
inherit the permissions.

Microsoft has started adding short life VMs. No idea if that's good. Both MS
and Apple offer their App stores with more locked down experiences though I'm
sad they conflate app security and app markets.

Basically anytime I run any software, everytime I run "make" or "npm install"
or "pip install" or download a game on Steam etc I'm having to trust 1000s of
strangers they aren't downloading my keys, my photos, my docs, etc...

I think you should be in control of your machine but IMO it's time to default
to locked down instead of defaulting to open.

~~~
XorNot
This is literally what SELinux does, and has been able to do for years. We
don't need "default block lists" \- we need solid SELinux policy in all
distros.

~~~
geofft
"Solid SELinux policy" is really the hard part there.

When I pip install paramiko, I do, in fact, want it to have access to my SSH
keys. When I pip install ansible, I want ansible to be able to shell out to
OpenSSH to use my keys. If I write custom Python code that calls gpg, I want
that custom code to be able to load libraries that I've pip installed without
the gpg subprocess being blocked from loading my keys. If I have a backup
client in Python, and I tell it to back up my entire home directory, I want it
backing up my entire home directory including private keys.

If I wanted an OS where I couldn't install arbitrary code and have it get to
all my files for my own good, I'd use iOS. (I do, in fact, use iOS on my phone
because I sometimes want this. But when I'm writing Python code, I don't.)

SELinux has been able to solve the problem of "if a policy says X can't get to
Y, prevent X from getting to Y" for years. Regular UNIX permissions have been
doing the same for decades. (Yes, SELinux and regular UNIX permissions take a
different approach / let you write the policy differently, but that's the
problem they're fundamentally solving; given a clear description of who to
deny access to, deny this access.) Neither SELinux nor UNIX permissions nor
anything else has solved the problem of "Actually, in this circumstance I mean
for X to get to Y, but in that circumstance I don't, and this is obvious to a
human but there's no clear programmable distinction between the cases."

To be clear - I think there is potentially something of a hybrid approach
between the status quo and what newer OSes do. For instance, imagine if each
virtualenv were its own sandboxed environment (which could be "SELinux
context" or could just be "UNIX user account") and so if you're writing code
in one project, things you pip install have access to that code but not your
whole user account. I'm just saying that SELinux hasn't magically solved this
problem because all it provides is tools you could use to solve it, not a
solution itself.

~~~
marcus_holmes
This is why I don't think this is an OS problem. I think it's a developer
mindset problem.

Dependencies are bad.

Every single dependency in your code is a liability, a security loophole, a
potential legal risk, and a time sink.

Every dependency needs to be audited and evaluated, and all the changes on
every update reviewed. Otherwise who knows what got injected into your code?

Evaluating each dependency for potential risk is important. How much coding
time is this saving? Would it, in fact, be quicker to just write that code
yourself? Can you vendor it and cut out features you don't need? How many
other people use this? Is it supported by its original maintainer? Does it
have a deep-pocketed maintainer that could be an alternative target for legal
claims?

Mostly, people don't do that and just "import antigravity" without wondering
if there's a free lunch in there...

~~~
sectiondetail
At the risk of sounding like someone who wants to spark a language fight
(which I genuinely don't) this is why I love Go. The standard library is so
good that I rarely need to bring in any third-party dependencies, and the few
I do use are extremely well-known with many eyes on their code.

~~~
arminiusreturns
I've heard more and more sysadmin's liking go for this reason.

------
ddevault
This is part of why I install my Python dependencies from downstream Linux
distro repos. I never use virtualenv. If a distro is missing a package I need,
it's a simple process to put it together, and the additional steps and checks
built into the process stop close to 100% of these issues. Getting a human
here also lets you do things like patch out telemetry or other anti-features.

Software repositories without a human review process are a bloody stupid idea.

~~~
cozzyd
Indeed. Plus this whole "install the whole python ecosystem for each thing you
want to use" is insane.

~~~
gdevenyi
In science this is pretty important so we can control versions and reproduce a
result again as well.

~~~
MetalGuru
There are other, more efficient ways of handling dependency version conflicts
than having an isolated env where each module is downloaded specifically for
that env. For example, it doesn't make sense that if I have two virtual env's
that ,use the exact same module (and version), it's downloaded and stored
twice on my machine.

~~~
hipnopotam
That's one of the issues that conda
([https://docs.conda.io/en/latest/](https://docs.conda.io/en/latest/)) solves
by design. When using conda environments you get hard links whenever possible.
Improvements to venv, making it built-in module, and projects designed to
simplify dependencies made conda less attractive in comparison but it's still
a solid way to have Python. Still, it doesn't really solve fundamental issue
with Python packages you mentioned, unfortunately.

------
maxander
So they caught the guy using a cliched I-vs-l typosquatting scheme and lazily
writing the malicious code in Python; we can presume they haven't caught the
guy who took the trouble to put their malicious code in a pre-compiled C
extension.

Reminds me of the fraudulent scientific papers that get caught using _really
dumb_ fakes (e.g., microscopy pictures that are copies of one another re-
zoomed and rotated); we catch the dumb ones, but presumably not _all_
malicious actors are dumb, so there must be a lot more fraudulent work out
there.

For that matter, isn't there a reasonable systematic way to catch out
typosquatters simply based on text analysis? _Any_ library name that's a short
edit distance from a popular library should have been carefully reviewed from
the start; there's no excuse for "jeilyfish" to have lasted more than a couple
of days.

~~~
commandersaki
> For that matter, isn't there a reasonable systematic way to catch out
> typosquatters simply based on text analysis?

You could probably write one using Python & jellyfish.

~~~
CGamesPlay
I'm extremely disappointed to see that you didn't suggest using jeIlyfish
instead.

~~~
rmtech
nicely done

------
objectified
This will keep happening, and not only will SSH And GPG keys be the target,
but any interesting data will be stolen.

And the problem is much larger than these typosquatting attacks. Abandoned
Github projects taken over my malicious users, rogue Maven/npm/PyPI/what have
you repositories, hacked accounts on any website that is used for distributing
programs, feature branches in open source projecs that are automatically built
on CI servers in side corporate networks, the possibilities to grab data and
send it to somewhere on the internet are endless.

One security measure that somehow grew out of fashion over the last years, is
at least on application servers, to disallow any outgoing network traffic,
especially to the internet (at least any cloud environment I see nowadays
_allows_ it by default). This would largely prevent these sorts of attacks
from being able to actually send anything out, but also prevent XXE attacks
from happening, prevent reverse connections to an attacker host from being set
up, make SSRF attacks harder to verify, and so on.

I strongly recommend whitelisting only the network traffic that your
application actually needs.

~~~
pouta
How would this work for a public facing API? Or an API that serves a SPA?

I'm interested in this approach

~~~
objectified
I'm not sure I understand your question correctly, but I'm talking
specifically about outbound network traffic. Your API's application servers
(where such evil libraries could be deployed) should not be able to have any
network connectivity _towards_ the internet. So on that server, you should not
be able to do even `curl www.google.com` for example.

~~~
wil421
GP was asking how you would allow APIs to respond to requests if you are
blocking outbound traffic.

I’m assuming if you open a connection for a sync request you’d be fine. What
about an async request? I’d imagine a scenario where your API needs to do some
processing first, connect to another internal system, and then respond async
to the outside system.

------
OrangeTux
At pypistats.org download numbers of the last half year can be found.

* python3-dateutil has 271 downloads from non-mirrors in last month[1]

* jeilifish has only 106 downloads from non-mirrors in last month[2]

[1]:[https://pypistats.org/packages/python3-dateutil](https://pypistats.org/packages/python3-dateutil)

[2]:
[https://pypistats.org/packages/jeilyfish](https://pypistats.org/packages/jeilyfish)

~~~
riyakhanna1983
Im assuming that by "only" you mean there's limited impact. However, if the
malicious package steals user keys, the harm can spread to the packages that
may have received way more downloads.

------
arijun
In the end I think the solution to these issues will be something like what's
promised by the Bytecode Alliance[0]. The idea is you give each package its
own WASM sandbox with granular control over its permissions.

That solution also has the benefit of allowing you to call a package from any
language from your language of choice.

I highly recommnend reading their the article introducing the idea, its very
convincing:

[0] [https://bytecodealliance.org/articles/announcing-the-
bytecod...](https://bytecodealliance.org/articles/announcing-the-bytecode-
alliance)

~~~
ryukafalz
I agree with this. They don’t mention this explicitly in the article, but it
has a capability-based security model, which is something I think we
desperately need in our OSes. (They do link to a paper about it that mentions
this.)

There are a few other such systems that look interesting; Agoric is working on
one for JavaScript, Google has a kernel patch set that adds capability support
to Linux, and Christopher Lemmer Webber is working on a similar system on top
of Racket called Spritely Goblins. I’m excited about all of them though,
because it feels like this kind of security model is starting to gain public
awareness!

~~~
gnufx
Hear, hear on capability systems, but they seem of limited use confined to
specific language implementations, as opposed to the whole system. I wonder
what's the Google kernel patch, and how it compares with Capsicum. It's rather
tragic to gain public awareness so long after KeyKOS et al...

~~~
ryukafalz
The kernel patchset I was referring to is [https://github.com/google/capsicum-
linux](https://github.com/google/capsicum-linux) \- it's a Linux version of
Capsicum. Though now that I look at it more closely, it appears to no longer
be maintained. :(

That said, the Bytecode Alliance stuff appears to be multi-language, so that's
neat! I could see that making WASM runtimes pretty useful even outside the
web.

------
jacquesm
Package management and curation are the Achilles heel of open source.
Abandoned packages, typo and letter substitutions, maliciously crafted pull
requests and so on are all going to go up in frequency until the environment
is hardened enough that the bulk of these attempts fail. That's a long way to
go, and the number of capable maintainers and curators is relatively small.

Some environments (Python, Node) are more susceptible to this sort of trickery
than others.

------
tus88
You know allowing people to upload libraries with highly conflicting names to
existing ones is almost reckless.

There is a bunch of stuff PyPi could do short of full curation that would make
this much harder.

A better solution is for languages to provide a kind of "module sandboxing"
where modules need to declare the capabilities they need, and the runtime
prevents them from accessing anything else.

In this case, the module would have needed to request the ability to make an
outgoing TCP/IP connection - and that request should raise red flags for a
date parsing utility.

~~~
JackRabbitSlim
MyPy requests permission to write data to the hard drive. MyPy writes
malicious payload to a file and then execs it. MyPy never wrote a damn thing
to a socket, netcat did through a bash/ash shell. (Windows named pipes may be
a little trickier to work with but can yield similar results)

Sandboxing libraries specifically seems like a fools errand.

~~~
tus88
Writing to HDD would be considered a suspicious permission for most modules.
Like opening network socket or patching another module's namespace.

I don't know exactly how it would work. Maybe modules that request these
permissions get extra scrutiny, or users can specify "levels" that different
modules can run at. If you specify MyPy to run at the least-priv level, it
will fail to install/load if it requests greater capabilities.

I can't really think of any other way around the problem. It is a problem for
all other scripting languages, especially nodeJS.

------
lovelearning
"pip" has a usability problem. It should do a lot more at preventing this kind
of thing. When using pip, it's not easy to tell information like the release
date, how many versions have been released, and so on.

Since such info is available from PyPI API, I wrote my own "pypisearch" script
to sort by latest release date and include number of releases to weed out
packages that seem useful but are old or rarely released. I should probably
integrate PGP signing info too into it.

~~~
iudqnolq
Is the code public? I'd love to use something like that.

~~~
lovelearning
It isn't. I'll make it public and announce it here by Friday.

~~~
iudqnolq
Cool. Feel free to ping me by replying to this when you do.

------
mikorym
> The first is "python3-dateutil," which imitated the popular "dateutil"
> library. The second is "jeIlyfish" (the first L is an I), which mimicked the
> "jellyfish" library.

~~~
qxnqd
I don't get it. Who would type "pip install jeilyfish" by mistake?

~~~
jagged-chisel
I can only see this working where someone would copy and paste the package
name.

EDIT: another vector I saw mentioned in another comment: you pull in what
appears to be a 'valid' dependency, and jeIlyfish is listed as a dependency of
that package; looks legit so you proceed.

~~~
jolmg
I suppose that could happen in a malicious tutorial or comment/post with the
snippet, like in a StackOverflow answer.

~~~
bredren
The attacker would need to leave more footprints to do this, but yes. It is
common for people to pipe up with "I wrote a thing that does this" and I
imagine that results in people picking up odd packages.

I think an experienced programmer probably would be less likely to do this,
but perhaps a junior programmer working on a system that no one wants to
support anymore introduces a "bad" module.

------
frou_dh
These two have been caught. How many haven't yet been caught?

Traditional Unix file permissions are pretty much a joke for the way developer
computers get used (one user - does everything). Real process sandboxing is
needed.

~~~
pnutjam
Not everything needs to run as root.

~~~
swebs
Ok, but your private keys are stored in your home directory. No root access
required.

~~~
otachack
For sure, but does applying a password on pub-priv key creation encrypt it?
Haven't tried it myself yet but may later when I'm back at my computer.

~~~
mikorym
Yes, if you use OpenSSL it will ask if you want to encrypt with a password.
You can verify this by just opening the file in a text editor and you'll see
that the contents are obfuscated via that password.

------
Jonnax
I get the python3-dateutil because you might think it's an updated version of
the standard library.

But how does jellyfish with a different char for L work? Someone would need to
copy and paste it. But if they go to pypi, it won't have many installs.

Unless they started writing tutorials with:

"okay now just pip install X"

~~~
mixedmath
I think the key here is that jeIlyfish had malicious code in it, and the fake
python3-dateutil imported jeIlyfish. Even if you were examining the source for
malicious code, you might not notice the difference between jellyfish and
jeIlyfish, I guess.

------
giis
Article points out Gitlab account olgired2017
([https://gitlab.com/olgired2017](https://gitlab.com/olgired2017)). I think it
will be interesting if GitLab shares his/her list active session details like
ipaddress, browser, date and time from their site logs
(profile/active_sessions).

~~~
saagarjha
I would consider this to be a huge breach of privacy.

~~~
tanderson92
I wonder if the users who were affected by this malware think something
similar.

~~~
4bpp
As a general principle, Western legal systems don't let the victims determine
the punishment for a misdeed. Are you suggesting this is not a good thing?

~~~
tanderson92
I wonder what gave you the idea that I was implying that.

Gitlab is of course free to disclose any information they like about those
abusing their platform.

------
rmbryan
What's the best information source for me to follow to keep up to date on
these kinds of library vulnerabilities? I would make a feed of the homepages
for all the libraries I know I use, but that won't help me with the libraries
I use without knowing.

~~~
medecau
"bandit" (available in pypi) is a nice static analysis tool - I don't remember
if it is able to recurse into dependencies though

[safety]([https://pyup.io/safety/](https://pyup.io/safety/)) is a commercial
product that monitors your dependencies for this kind of shenanigans

LGTM.com seemed to be working in this area - Semmle was acquired by
github/microsoft

------
acdha
If you want a concrete hardening step to avoid this attack, try using a
hardware PIV/CAC device (e.g. a Yubikey) as the only copy of your private
keys.

This is very easy to setup on MacOS High Sierra or later
([https://support.apple.com/en-us/HT208372](https://support.apple.com/en-
us/HT208372)):

1\. Generate the key: [https://developers.yubico.com/yubico-piv-
tool/Actions/key_ge...](https://developers.yubico.com/yubico-piv-
tool/Actions/key_generation.html)

2\. Use "ssh-keygen -D /usr/lib/ssh-keychain.dylib" to extract the public key
fingerprint to put in your authorizes keys list.

3\. Add this line to your SSH config file to tell the client to attempt to
login using the key on your device: “PKCS11Provider=/usr/lib/ssh-
keychain.dylib“

On Windows, Putty-CAC supports this and can reportedly be used with Git:
[https://piv.idmanagement.gov/engineering/ssh/#ssh-using-
putt...](https://piv.idmanagement.gov/engineering/ssh/#ssh-using-putty-cac)

------
samwillis
Does anyone know of a way of running something like “little snitch” on Heroku
so that you can whitelist outgoing connections?

That seems like a possible way of mitigating this type of issue.

------
roryrjb
I have been working on a Node.js library (very much work in progress and
progressing slowly due to lack of time) that integrates libseccomp to be used
programmatically inside Node.js whether at library or application level. For
me at the moment it's kind of an experimental idea as it can be quite tricky
to get everything right and in order without the kernel killing the process by
mistake, but at least I think for some of the cases in the category that this
issue is in, it will help mitigate it. I believe libseccomp already has
official bindings for Python, as well as third party bindings for other
languages, but this kind of work has been very successful in OpenBSD with
pledge and I think it has been overlooked in dynamic programming languages and
Linux in general.

~~~
roryrjb
It would be good to get some feedback on this instead of just anonymous
downvotes.

I'm thinking the way it would work is that, at the application level, you'd
open sockets and other file descriptors and then lock everything down, so if
some malicious library tried to read a file or spawn a process the application
would be either be killed or return an error, obviously depending on the
application logic and whether or not it can handle the error. I would advocate
this kind of approach as it doesn't need any external hardening, i.e. you get
the benefits whether or not you're inside or outside of a container, whether
or not the application is being run on a distro with SELinux or AppArmor, it's
basically built in. I may be missing some big thing here but like I say it
would be good to get some feedback here.

------
Nokinside
> "jeIlyfish" (with an upper case I) and "python3-dateutil" (not "dateutil").

Libraries should take lessons from writing safety critical code. If you
identify libraries visually by name, the main problems are:

* easily misread characters like 1 (one) and l (lower case L), 0 and O, 2 and Z, 5 and S, or n and h.

* identifier names that differ only by on or few characters, especially if they are long.

It's possible to enforce a set rules that make identifier names are visually
distinguishable and string distance measure to check all new libraries that
are added against old names.

~~~
keyP
There's a bit more to it as "dateutil" is actually installed via "pip install
python-dateutil", not simply "pip install dateutil". If someone was to see
"python3-dateutil", there's every chance they think it's the same module but
with Python3 compatibility.

------
megous
Anyone here not encrypting their private keys?

Also known_hosts file is a double edged sword. It's pretty sensitive in
combination with a private key.

~~~
avian
Modern SSH versions only store hashes of domain names in the known_hosts file
exactly for this reason.

~~~
megous
Hmm, you're right, but HashKnownHosts in openssh defaults to no, still.

------
sciurus
Does anyone know where this was officially announced? I don't see anything at
[https://mail.python.org/archives/list/security-
announce@pyth...](https://mail.python.org/archives/list/security-
announce@python.org/latest)

------
keyP
One thing I've always thought would be a good idea is a tool (either local or
part of the pip/other packet manager download process) that greps and prints
out all URLs and IP addresses within the code, including common encodings.
Additionally, any lines that uses any transfer protocols (like HTTP requests)
should be highlighted too as IP/urls can be encoded. Any HTTP request, for
example, to suspiciously encoded URLs could raise flags.

The official library itself could have a "urls" file which has a list of urls
that are expected and so anything that doesn't match can be questioned.

Whilst this won't solve the issue 100%, it raises the difficulty barrier to
implement outgoing network calls.

~~~
jstanley
> it raises the difficulty barrier to implement outgoing network calls.

Not very much. You just obfuscate your code until this tool doesn't notice
anything untoward, and then upload it.

~~~
keyP
Highly obfuscated code would raise suspicions, especially in similar cases
found in NPM packages.

E.g. in Python, obfuscators I've come across tend to replace characters with
non-Latin unicode chars, which should raise flags when found in a
predominatenly latin based source code.

~~~
jstanley
Only if a person is looking at it.

If the only thing looking at it is a machine, then you can keep iterating
until the machine doesn't notice anything.

~~~
keyP
I agree, it's no where near bulletproof, but it's about raising barriers as
well as updating the tool once workarounds are found. I don't see an easy
solution to this issue but in most of the cases (including the ones in this
article) I've seen to date, a simple URL scan would've caught them let alone
more complex methods.

------
kardos
Could Python be retrofitted with import flags, much like the openbsd pledge?
Eg you could import a library but not permit file io or network access -- that
would snipe these kind of attacks, and it would be a reasonable restriction
for many libraries

------
campuscodi
To search your own projects for the malicious libraries:

pip3 freeze | grep -i jeIlyfish

pip3 freeze | grep -i python3-dateutil

------
panarky
Original source:
[https://github.com/dateutil/dateutil/issues/984](https://github.com/dateutil/dateutil/issues/984)

------
jmstfv
Another insidious exploit is to hijack the maintainer's package manager
account and push the code directly there, bypassing the repository altogether.
It doesn't rely on you installing a new package since the hijacked package in
question is already a dependency of yours.

I got into a habit of checking both the package manager and the project's
repository before updating any given dependency.

[https://news.ycombinator.com/item?id=20377136](https://news.ycombinator.com/item?id=20377136)

------
u801e
Would sticking to the OS upstream package manager be a safer option compared
to installing from pypi directly?

How often does something like this happen with packages in the CentOS, epel,
or Debian repositories?

~~~
jeremyjh
I think it would be quite a bit safer and you are already trusting your
distribution but those libraries tend be very old/stale versions, and limited
selections.

------
dpc_pw
Ideally programming languages should include capability-based access control,
so that a random library that is supposed to do X, can't do Y.

Until then, we need to vet our dependencies. Check out
[https://github.com/crev-dev/cargo-crev/tree/master/cargo-
cre...](https://github.com/crev-dev/cargo-crev/tree/master/cargo-crev) for a
distributed review system we're working on.

------
neop1x
I am curious when the hackers will start stealing ~/.kube/config. In the
default Kubernetes install on-premise, a token or admin certificate is just
laying there, unprotected and adding a passphrase to the cert is not
supported. Some are using an oauth identity provider or other mechanisms but
that unnecessarily complicates the setup and smaller k8s clusters could be
stolen this way...

------
stelonix
I don't see a solution other than every application being launched in a
separate container, exposing only what the user explicitly gives access to,
similar to how mobiles applications require permissions. We have the
technology, eg. cgroups & unshare on Linux; what's missing is something that
plumbs all these brittle pieces into a secure application launcher made for
desktop (rather than server/cloud) usage.

------
burrnii
Best practice: Don't use pip search && pip install. Search for the project
site and copy & paste the pip install instructions.

------
JakeMimoni
We just recently launched a free tool that helps python developers prevent
exactly these type of issues.

Feel free to check it out: [https://trustd.dev](https://trustd.dev)

we work preventatively, so as you download packages, the tool will analyse it
and tell you of any issues found.

We’re looking to collaborate with devs to work out what features should be
next.

~~~
andybak
First question is what the hell does this mean:

> As you install open-source packages, trustd will scan them and provide you
> with instant feedback on any problems.

What kind of scanning? Algorithmic? Based on human review? If we're
outsourcing trust to you, I'd want to know a _lot_ more.

And "we use Slack instead of a dashboard" doesn't sound terribly appealing.
I'd want a dashboard _and_ a range of notification options (for me email >
Slack. Others may differ)

~~~
JakeMimoni
might not have explained it the best I could have haha..

It means as you pull packages in from NPM et al, the analysis goes to work,
telling you of any known vulnerabilities, or any license in-compliance.

With regards to Slack, we are hearing that a lot, it isn't the best mechanism
for providing this feedback, and we are working on alternatives now, including
email.

Happy to answer any more questions on here or reach out jake@418sec.com

------
roland35
From the article... The libraries were "jeIlyfish" (with an upper case I) and
"python3-dateutil" (not "dateutil"). Both libraries were close spellings of
the real libraries.

The lesson for developers is double check your imports! Spelling does count
and there are lots of similarly named libraries (most are not malicious
thankfully)

~~~
keyP
Agreed, although I think there's more than just a comparing spelling issue
here. "dateutil" via pip is installed using "pip install python-dateutil". I
can easily see someone thinking "python3-dateutil" is simply a Py3 compatible
version of "python-dateutil". The "python3-dateutil" module imported
"jeIlyfish" so my guess is that the creator banked more on people installing
the fake dateutil library than directly downloading jeIlyfish.

------
abstractbarista
Terrifying but understandable. We need better ways to stop this but I'm not
sure how...

In the meantime, store your private keys on a security device like a YubiKey.
I use it simultaneously for all signing, encrypting, and authenticating (SSH
as well as PAM to my workstations).

Be sure to set a strong PIN on the device. And have a backup!

------
yingw787
Does PyPI offer package notarization and make that observable in the lockfile
or the installation logs? Or offer optimized SEO for notarized packages over
those not notarized in package search? If that’s not there, and I don’t see it
as part of the PyPA roadmap, it might be a good first step to take.

------
swiley
Don’t install packages with large numbers of dependancies (for me, this is
more than 2.)

 _Don’t install packages you haven’t at least been to the website for and
preferably couldn’t build yourself._

Libraries and package managers help us work together, they’re not excuses for
not thinking.

~~~
johnday
Placing the burden of responsibility for security on the end user is _not_ the
way to go about this - at least, not if you want people to actually use your
product / language.

Package managers do not only exist as a convenience. They should also provide
_guarantees_ about their packages, or at the very least some level of
moderation.

~~~
swiley
You could make that argument about repositories not package managers (which
are just software.)

Some do! (Main OS repos (not community ones) do.) But the reality is that this
takes man hours, so repositories have been set up without those guarantees in
the name of efficiency and expectation of responsibility.

I don’t think these are built because people “want users for their language”
but because they were needed. This whole “you shouldn’t do that because it
scares users away” thing tends to result in terrible software IMO.

------
wglb
This shows the inadequacy of thinking “open source makes all bugs shallow”.

~~~
jolmg
On the contrary, this wouldn't even need to be hidden if it was closed source.
It was caught because it's open source.

~~~
wglb
Possibly, but some of the solutions proposed, e.g., monitoring of network
activity, would work either way.

It concerns me that one of these sat out there for a year.

~~~
jolmg
But it's not like malicious activities could only involve the network. Also,
it's possible to obfuscate network activity and hide such things among
legitimate traffic.

> It concerns me that one of these sat out there for a year.

Certainly it being open source doesn't guarantee that someone will notice such
things, but it raises the probabilities. It could have been like that longer
if it were closed source.

------
johnklos
The server to which these libraries upload stolen keys, 68.183.212.246, is
with Digital Ocean. As of Thu Dec 5 12:50:10 UTC 2019, it's still up with ssh
open.

Nice job, Digital Ocean!

------
whb07
The solution to this is WASM. Where you compile a cross platform library like
this, and due to the sandboxing it isn't able to arbitrarily access memory
outside its package.

------
tandav
bruh moment:

    
    
        $ python -m pip list | grep date
        python-dateutil               2.8.0
    

Yeah this is another library but anyway that was creepy.

------
leovander
Some of the permission comments reminded me of deno[0].

[0] [https://deno.land/](https://deno.land/)

------
vortico
There are probably several open-source projects that you can download today
that include unknown malicious code. But they will be discovered eventually.
Proprietary software, on the other hand, can keep malicious code during their
entire life of relevancy. And in fact, it's rare for proprietary software to
_not_ have malicious code these days, with personal data being sent to
servers, ads being delivered, and installers bundling third party products and
plugins.

~~~
iudqnolq
Complete false equivalency. Exploiting user data deliberately, when you
technically mention it in a thousand page legal document, is completely
different from distributing malware because your dev machines got infected.

------
techntoke
This is why the Node-ecosystem is plagued, because many apps will require
hundreds of unvetted libraries.

------
Havoc
Alarming given that pypi is treated as trusted by many I think (even if it
shouldn't)

~~~
blub
What is trusted then, the Anaconda base repo?

~~~
Havoc
I don't know the answer frankly.

The serious devs in actual dev shops I've asked answered with something like:

>We do code reviews on the libraries we use. Basically if someone wants a
library they're responsible for checking it

Me: Isn't that a shtload of work with new versions etc

>Yeah so we tend to lag behind official versions quite a bit

It sounded like they host local mirrors of some sort with just the vetted
code. Though I think vetted here is a quick glance over for shady sht rather
than true security vetting

------
bhouston
Maybe ash keys and got keys need to be protected from access by anything local
unless given permission? Right now they are just files sitting there that can
be read by any standard user process right?

~~~
saurik
Ok macOS you can easily throw your SSH keys into the secure Keychain (not that
I do this...).

~~~
bhouston
It should be the default that they are protected though on all os. We should
not automatically trust locally installed apps anymore. They should be
sandboxed by default like on Android and they should ask for permissions as
they need them.

Windows and Linux need to get with the times.

My dev env should be sandboxed like everything else. Git can have ssh
permissions but not every random tool from pip or the ceasepool that is npm.

~~~
swebs
Snap packages on Linux do this already

[https://ubuntu.com/blog/a-guide-to-snap-permissions-and-
inte...](https://ubuntu.com/blog/a-guide-to-snap-permissions-and-interfaces)

------
novaRom
Is it possible to find out where that IP is located?

------
Animats
Can the person responsible be found?

------
codedokode
Why should Python program have access to SSH keys? Popular Linux
distributions, unlike proprietary systems like Android or iOs, cannot protect
user's data from malicious programs run by the user.

Also, in popular Linux distributions programs can read unique hardware
identifiers like MAC address or HDD serial number, read browser's history and
cookies. These valuable data are not protected by Linux.

~~~
NieDzejkob
Android isn't proprietary.

On the other hand, Windows and macOS, which, as far as I am aware, also have
this problem, are proprietary.

Hence, I don't see why an argument like "unlike proprietary systems" is
justified.

~~~
pjmlp
Windows and macOS have been pushing for sandboxes for quite a while as well,
exactly to prevent this kind of behavior.

Currently macOS is more agressive than Windows on this area, with Apple now
requiring notarization for all software, which you can still bypass, but need
to explicitly allow it.

~~~
saagarjha
Apple does not require notarization for all software; it's just enabled by
default and checked on all applications downloaded from the internet and
opened through Launch Services.

~~~
pjmlp
Which is already more than any other FOSS UNIX.

As for macOS, I am sure that it will come, as these features have been slowly
being added release after release.

