Also listing the subprocess module as a standout because of code injection seems silly. That's the entire point of it existing. You may as well say a shell is insecure because it allows injecting shell commands. Obviously, don't put strings from untrusted sources in there, but Python is largely intended for system administration automation, the first thing to turn to when the shell isn't enough if you don't like Perl. It would be pretty useless if you couldn't actually use it to orchestrate arbitrary shell commands.
Python is used in a wide variety of application environments, especially the web. For example, early versions of Youtube were nothing more than a Python application. Many websites run Django, or other web frameworks. Python is heavily used in scientific computing environments, in data retrieval and visualization, as well as fintech (ie hedge funds).
Its use in system administration/configuration management is real and significant, but represents a smallish proportion of its use.
Certainly some static analysis would be beneficial here, especially in areas like science or financial technology, but much of your original point remains completely true- much of concern raised speaks more to system design around passing strings as objects of reference, and security design, rather than language issues.
I heard about it, but I'm wondering, is that how Python got popular at Google and then become mainstream?
First, as the other commenter mentioned, Java was targeted at large enterprise. I remember hearing about Java Beans, and Java Enterprise this and that. It fit with the model of development many were seeing in the late 90s as well- UML diagrams written by some middle manager. There were even tools like Rational Rose that would transform your UML into Java outlines.
Python was an academic language for the most part and so it took time to catch up.
This is the same time period, where Linux was dismissed as a toy or hacker's tool. Serious people worked on Sun Microsystems computers, or Windows NT. Sun and NT were "Real tools for business" and Linux was for people like me- kids in their dorm rooms.
It took time for the industry to catch up, which it did largely out of necessity. A company like Google could run thousands of computers for a fraction of the cost of doing so with NT or Sun, and that gave them a competitive advantage.
With Python, the advantage was different- it wasn't the cost as much as it was the simple convenience and network effect of libraries and a robust community.
Sun could throw money towards documentation, education and marketing. Python was a grass roots movement that only later had a foundation and corporate involvement.
Similarly it's now for golang, it is a mediocre language, but because Google is using it, got very popular.
Bandit can and has often found vulnerabilities, but its not something you can run and expect accurate results every time.
It requires human review as it will get things wrong and require adjustments to skip false positives at each later run.
These people are mindlessly applying the results of a static analysis tool which, as most similar tools do, reports ginormous amounts of false positive, and conclude without even a caveat "half of the packages on PyPI have at least one security issue".
That's about as useful as administering an unreliable COVID test that give 50% false positives and concluding "have of the world population has COVID".
Use of the exec function
Insecure permissions for files
Binding a socket to all network interfaces
Use of hard-coded passwords in non-function contexts
Use of hard-coded passwords in function arguments
Use of hard-coded passwords in default function arguments
Use of hard-coded temporary directories
Using pass as a catch-all-style exception handling
Using continue as a catch-all-style exception handling
Running a Flask web application in debug mode
Use of insecure deserialization
Use of MD2, MD4, MDS, or SHA1 hash functions
Use of insecure ciphers such as DES
Use of insecure cipher modes
Use of the insecure mkt emp function
Use of the possibly insecure eval function
Use of the possibly insecure mark_safe function
Use of the insecure HTTPSConnection with some Python versions
Use of a file scheme in urlopen with some Python versions
Use of pseudo-random generators for cryptography/security tasks
Use of the insecure Telnet protocol
Use of possibly insecure Extensible Markup Language (XML) parsing
Most of those can be used securely (e.g. the mark_safe() function is specifically intended for context where the user understands what they are doing - some people may mess it up, but its presence does not indicate a security vulnerability). So the "at least one issue is present for about 46% of the Python packages" number doesn't worry me too much.
That's about it though.
The warning in the docs (https://docs.python.org/3/library/tempfile.html#tempfile.mkt...) aims to show an example of how to get “just a path” by making and deleting a NamedTemporaryFile, but the example is confusing and has unnecessary steps. Seems like you could just do `with NamedTemporaryFile() as f: path = f.name`.
Also, the warning seems to imply using `NamedTemporaryFile` addresses the race condition, but it doesn’t — the problem exists even for “secure” methods any time you re-use the path after cleanup.
> The dataset is based on a simple index file provided in the
Python Package Index . In total, 224,651 packages were
listed in the index at the time of retrieving it.
For separate research reasons I've recently had cause to download the index as well and current versions are 315,000+ packages. The reference  indicates they retrieved it on March 28, 2020.
Initially I had thought that almost 100k packages in a little over a year had to be incorrect given that the first archived index from 2018 had around 170k packages listed (meaning 170k -> 225k in 2 years).
This increase probably just highlights just how much Python has just exploded in popularity.
However, it does cast some doubt on the effectiveness of this kind of research on the basis that there are a lot of new packages and likely noise in that dataset.
A follow-up and perhaps more useful bit of research would be to do this same analysis with the top downloaded packages visible via the published stats and then perform evaluations as to whether Bandit was actually identifying vulnerabilities. I have no doubt that of the 197,726 packages they actually scanned there was a lot of noise. Also, if a package has fewer than some cutoff of downloads in the past month (perhaps 10, 100?) or is newer than a certain date it may make sense to exclude it.
The authors mention the accuracy of static analysis tools being a potential problem but the fact that no spot checking was done to see if it was even remotely correct is a bit of a problem given the conclusion. I admittedly skimmed over certain sections but I didn't notice any discussion of the "confidence" metric that Bandit uses and this is a huge problem. Bandit only ever reported low confidence injections. Consider that for a moment: the static analysis tool reported no other type of vulnerability that was low confidence, and that includes a break-out for XSS. Every other category was Medium- or High-confidence only.
Having worked extensively with a variety of static analysis tools they vary in quality by language and detection capabilities but are generally very poor measures of application security and are often rife with false positives. The underlying premise of the paper's conclusion is that Bandit is trustworthy enough of a tool to merit the conclusion that "security issues are common in PyPI packages."
Having some experience with it in the past, I disagree with that foundational assumption. Bandit is good at finding certain classes of issues but is overall not something to rely upon for anything more than sanity-check catching egregious types of problems. For example, I'd rate it quite highly on detecting the use of the "generally avoid this" functions. A good regex could also detect these. I would not trust the XSS findings, though no doubt some are correct.