Hacker News new | past | comments | ask | show | jobs | submit login
A Large-Scale Security-Oriented Static Analysis of Python Packages in PyPI (arxiv.org)
72 points by afrcnc 6 months ago | hide | past | favorite | 25 comments

Their conclusion, "security issues are common in PyPI packages", doesn't really follow from the results. Their methods will classify any use of a function that is not cryptographically secure (MD5, random), even if it is not used in a cryptographic setting. Similarly any use of a function that is not safe to use on untrusted input (pickle, yaml.load, subprocess, eval) will be flagged, even if the usage is completely safe.

Yes this, but also you shouldn't do an analysis like this on all of PyPi. Anyone can upload to it. It's full of abandoned experiments, name-squatting, and college students uploading hello world libraries just to learn how to do it. Analyzing those is pointless because nobody is using them and nobody is going to use them.

Also listing the subprocess module as a standout because of code injection seems silly. That's the entire point of it existing. You may as well say a shell is insecure because it allows injecting shell commands. Obviously, don't put strings from untrusted sources in there, but Python is largely intended for system administration automation, the first thing to turn to when the shell isn't enough if you don't like Perl. It would be pretty useless if you couldn't actually use it to orchestrate arbitrary shell commands.

I was with you until you said that Python is largely intended for system administration automation.

Python is used in a wide variety of application environments, especially the web. For example, early versions of Youtube were nothing more than a Python application. Many websites run Django, or other web frameworks. Python is heavily used in scientific computing environments, in data retrieval and visualization, as well as fintech (ie hedge funds).

Its use in system administration/configuration management is real and significant, but represents a smallish proportion of its use.

Certainly some static analysis would be beneficial here, especially in areas like science or financial technology, but much of your original point remains completely true- much of concern raised speaks more to system design around passing strings as objects of reference, and security design, rather than language issues.

Instagram still runs on Python as well, I believe.

> For example, early versions of Youtube were nothing more than a Python application

I heard about it, but I'm wondering, is that how Python got popular at Google and then become mainstream?

Python applications were widely used before either Google or YouTube existed

Yes, but Python is actually older than Java, yet the boom started much later for Python.

There are two reasons I see for this based on my experience.

First, as the other commenter mentioned, Java was targeted at large enterprise. I remember hearing about Java Beans, and Java Enterprise this and that. It fit with the model of development many were seeing in the late 90s as well- UML diagrams written by some middle manager. There were even tools like Rational Rose that would transform your UML into Java outlines.

Python was an academic language for the most part and so it took time to catch up.

This is the same time period, where Linux was dismissed as a toy or hacker's tool. Serious people worked on Sun Microsystems computers, or Windows NT. Sun and NT were "Real tools for business" and Linux was for people like me- kids in their dorm rooms.

It took time for the industry to catch up, which it did largely out of necessity. A company like Google could run thousands of computers for a fraction of the cost of doing so with NT or Sun, and that gave them a competitive advantage.

With Python, the advantage was different- it wasn't the cost as much as it was the simple convenience and network effect of libraries and a robust community.

Sun could throw money towards documentation, education and marketing. Python was a grass roots movement that only later had a foundation and corporate involvement.

Yes, but I remember though that at the time a big selling point was that Google used it.

Similarly it's now for golang, it is a mediocre language, but because Google is using it, got very popular.

A good reason is because Java was explicitly engineered and targeted towards enterprise use-cases from the beginning by Sun, whereas Python was created as a mean to teach programming with a simpler language.

It is possible to use subprocess safely, you just have to be aware of the sharp edges of it: shell=True, sanitising commands, option injection.

One of the Bandit maintainers here (the tool used for this research). static analysis results cannot be used for the overall security posture of an application.

Bandit can and has often found vulnerabilities, but its not something you can run and expect accurate results every time.

It requires human review as it will get things wrong and require adjustments to skip false positives at each later run.

This is a really weak paper IMHO, and I would give it a strong -1 if I was a reviewer.

These people are mindlessly applying the results of a static analysis tool which, as most similar tools do, reports ginormous amounts of false positive, and conclude without even a caveat "half of the packages on PyPI have at least one security issue".

That's about as useful as administering an unreliable COVID test that give 50% false positives and concluding "have of the world population has COVID".

From the PDF, these are some of the security vulnerabilities that the static analysis came up with:

Use of the exec function

Insecure permissions for files

Binding a socket to all network interfaces

Use of hard-coded passwords in non-function contexts

Use of hard-coded passwords in function arguments

Use of hard-coded passwords in default function arguments

Use of hard-coded temporary directories

Using pass as a catch-all-style exception handling

Using continue as a catch-all-style exception handling

Running a Flask web application in debug mode

Use of insecure deserialization

Use of insecure deserialization

Use of MD2, MD4, MDS, or SHA1 hash functions

Use of insecure ciphers such as DES

Use of insecure cipher modes

Use of the insecure mkt emp function

Use of the possibly insecure eval function

Use of the possibly insecure mark_safe function

Use of the insecure HTTPSConnection with some Python versions

Use of a file scheme in urlopen with some Python versions

Use of pseudo-random generators for cryptography/security tasks

Use of the insecure Telnet protocol

Use of possibly insecure Extensible Markup Language (XML) parsing

Most of those can be used securely (e.g. the mark_safe() function is specifically intended for context where the user understands what they are doing - some people may mess it up, but its presence does not indicate a security vulnerability). So the "at least one issue is present for about 46% of the Python packages" number doesn't worry me too much.

I agree most can be used securely, but I can't think of any legitimate use for DES in 2021. The old hash functions obviously have utility in non-secure contexts, but not really symmetric encryption.

I can think of one specific use case: supporting an upgrade path for older versions of a given application.

That's about it though.

s/can/will most likely/. The typical python programmer has a whistle and readily available graveyard. :-)

why is mktemp insecure?

Python mktemp has been deprecated since 2003. I there’s a race condition that allows a malicious process to slip in a file or symlink with different permissions at the path mktemp returns.

Just for clarity, the race condition seems deliberate. It just returns a path and leaves it to you to create the file.

Yes — the core problem is you can’t be guaranteed exclusive write access to a path without actually creating a file or directory.

The warning in the docs (https://docs.python.org/3/library/tempfile.html#tempfile.mkt...) aims to show an example of how to get “just a path” by making and deleting a NamedTemporaryFile, but the example is confusing and has unnecessary steps. Seems like you could just do `with NamedTemporaryFile() as f: path = f.name`.

Also, the warning seems to imply using `NamedTemporaryFile` addresses the race condition, but it doesn’t — the problem exists even for “secure” methods any time you re-use the path after cleanup.

The article is about running Bandit over PyPI, but I presume there are other options for this than Bandit. What is everyone's favourite security oriented static analysis tool for Python code?

I've done extensive research in this area and looked at existing tools including bandit to scan the whole pypi repository and monitor what is being uploaded there, the conclusion was that most of the tools are not up for this task so I made a new framework from scratch that is specially design for this purpose, to scan the whole PyPI repository, it's called Aura: https://github.com/SourceCode-AI/aura

This kind of research always interests me and at first glance I had a concern with this paper given that they mention:

> The dataset is based on a simple index file provided in the Python Package Index [76]. In total, 224,651 packages were listed in the index at the time of retrieving it.

For separate research reasons I've recently had cause to download the index as well and current versions are 315,000+ packages. The reference [76] indicates they retrieved it on March 28, 2020.

Initially I had thought that almost 100k packages in a little over a year had to be incorrect given that the first archived index from 2018 had around 170k packages listed (meaning 170k -> 225k in 2 years).

This increase probably just highlights just how much Python has just exploded in popularity. However, it does cast some doubt on the effectiveness of this kind of research on the basis that there are a lot of new packages and likely noise in that dataset.

A follow-up and perhaps more useful bit of research would be to do this same analysis with the top downloaded packages visible via the published stats[1] and then perform evaluations as to whether Bandit was actually identifying vulnerabilities. I have no doubt that of the 197,726 packages they actually scanned there was a lot of noise. Also, if a package has fewer than some cutoff of downloads in the past month (perhaps 10, 100?) or is newer than a certain date it may make sense to exclude it.

The authors mention the accuracy of static analysis tools being a potential problem but the fact that no spot checking was done to see if it was even remotely correct is a bit of a problem given the conclusion. I admittedly skimmed over certain sections but I didn't notice any discussion of the "confidence" metric that Bandit uses and this is a huge problem. Bandit only ever reported low confidence injections. Consider that for a moment: the static analysis tool reported no other type of vulnerability that was low confidence, and that includes a break-out for XSS. Every other category was Medium- or High-confidence only.

Having worked extensively with a variety of static analysis tools they vary in quality by language and detection capabilities but are generally very poor measures of application security and are often rife with false positives. The underlying premise of the paper's conclusion is that Bandit is trustworthy enough of a tool to merit the conclusion that "security issues are common in PyPI packages."

Having some experience with it in the past, I disagree with that foundational assumption. Bandit is good at finding certain classes of issues but is overall not something to rely upon for anything more than sanity-check catching egregious types of problems. For example, I'd rate it quite highly on detecting the use of the "generally avoid this" functions. A good regex could also detect these. I would not trust the XSS findings, though no doubt some are correct.

[1]: https://packaging.python.org/guides/analyzing-pypi-package-d...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact