I'm not sure I understand what this piece is trying to say about Python memory safety. Conventionally, in software security, Python is considered a memory-safe language. The piece makes the case that Python isn't memory safe when you FFI into a C library. But neither is Rust, nor is it when you use `unsafe`. What matters in both case is how little unsafe code you end up writing.
Memory safety is a software security concern. You can squint and make it about reliability or resiliency, but the reason we talk about memory safety is (to a first approximation) browser vulnerabilities.
The piece goes on to discuss data races. I'm a little keyed up on software security essays that bring data race safety into the discussion. I have a hard time not reading them as shibboleths for "Rust is the only safe language", which is manifestly false.
The vulnerabilities endemic to memory-safe languages (logic and higher-level vulnerabilities like SQLI, metacharacter quoting, filesystem traversal, and cryptography bugs) are common both to languages like Python and Java and also to Rust --- the only super-common class of vulnerability endemic to languages like Python and Java that Rust avoids is deserialization (you avoid deserialization vulnerabilities by not building hypercapable serialization formats).
Data races are a common source of reliability bugs. They're a meaningful software engineering concern. In exotic scenarios (userspace-sandboxed attacker-controlled code), they can constitute practical security vulnerabilities. But in the main, data races have not empirically proved out as a source of exploited vulnerabilities. If you have a fixed budget to transition from a C codebase that would allow you to migrate to Python now, or if you saved up, to Rust next year, and all you care about is security, then ceteris paribus you should do the Python thing. The data races aren't going to burn you.
To me it’s a deep philosophical post in the vein of “what even is memory safety anyway?”
> The piece makes the case that Python isn't memory safe
It’s a philosophy argument tactic. Take something everyone considers to be true “Python is memory safe” then push it to logical extremes. The purpose of this isn’t to learn anything about Python, the purpose is to learn about the extremes. In this case about memory safety.
I think the overall point is that “memory safe languages don’t truly exist” in the purist sense, since every lang must touch unsafe code at some point. However some languages and tools do a better job is isolating these interactions. We call these tools “memory safe”.
> Conventionally, in software security, Python is considered a memory-safe language. The piece makes the case that Python isn't memory safe when you FFI into a C library.
Interesting and largely unknown trivia: it's possible to invoke memory errors in the underlying C interpreter from pure Python code — no libraries and no imports needed!
One way of doing this is by creating new `code` objects with crafted bytecode. There is no bytecode verifier in Python to make sure, say, referenced stack variables in the VM are valid...
Is this because of a bug and might be fixed in the future or is it considered an unavoidable consequence of some design decision and will stay that way for the foreseeable future?
From what I understand about Rust, if something similar was possible in safe Rust it would be considered a bug and eventually fixed.
I think it will stay that way for the foreseeable future (but who can say). Ways to fix the particular hole:
(1) disable creating new `code` objects directly from Python. This probably would break lots of things.
(2) Add a bytecode verification mechanism that would reject `code` objects whose bytecode would result in memory errors when executed. This could be a lot of implementation work; I'm not sure.
You also don't need to FFI into some buggy C library to violate memory safety with ctypes. It's trivial to produce a segfault with it without using anything but ctypes itself, which is part of the standard library. I doubt I'd have much trouble finding other ways to make a segfault with pure python and the standard library (struct springs to mind).
CPython really isn't very safe at all. Its focus has always been on being a convenient, dynamic scripting language with minimal-fuss access to native code. It has never been hard to violate its internal assumptions and it probably never will be.
Actually Rust does go quite far in reducing the probability of these bugs, even if it doesn't have specific features for it. This is through a combination of:
* really strong type system ("if it compiles it works")
* Better ergonomics, e.g. using prepared queries is much easier than in C.
* Library code being generally very high quality, and easy to obtain.
Data races are definitely exploited! If we are considering TOCTOU issues then this is a very easy way to get fairly reliable and simple exploits. If we are talking about races of the “two threads access the same value” kind then it’s easy (well, assuming reliability is an exercise for the reader) to turn this into a UAF or OOB access by having one thread work with a stale version of an object that has been modified elsewhere.
Right, my understanding is that a data race is the second thing I mentioned. I was just so surprised to hear this viewpoint that I figured I’d throw it in just in case we were talking about different things.
While data races may not be a top category empirically, they are undefined behavior, which means that (a future version of) the compiler is allowed to make your program do anything at all after a data race happens. We are setting the bar incredibly low for ourselves if we just accept that things like that happen on the regular.
> But in the main, data races have not empirically proved out as a source of exploited vulnerabilities.
Say what? Data races, otherwise lumped under the bucket “timing attacks”, are a common source of security exploits. A basic example is racing with code that is creating a file and applying an ACL in two steps. If I can “time” things right from a concurrent thread/process, I can get into this file before the ACL prevents me.
There are countless scenarios where multi-step operations that need to be treated atomically can be exploited by racing.
That's not a vulnerability Rust prevents; it's an interaction between multiple competing runtimes. I'm not denying that race conditions (or timing attacks, another bug class Rust doesn't prevent) exist and are exploited! I'm denying that in-process data races that corrupt memory are a meaningful source of exploitable vulnerabilities.
For background, I've spent most of my career doing vulnerability research. I'm by no means a world expert on memory corruption vulnerabilities (I'm still impressed that I got my imapd shellcode to work with no uppercase ASCII characters), but you can safely assume I'm not just completely blowing off huge classes of exploitable vulnerabilities because I've never heard of them. Doesn't mean I'm right! But like, if you're going "say what", you're probably misconstruing me.
OK, but I think you are moving the goal posts. You referred to “data races” and “security exploits” and suggested the two were not related. Memory corruption is only one (small) class of security exploits. Data races cause just as many in process, in memory, exploits as multi-step file operations (we are talking breaking application security models). Perhaps Rust can prevent most of these! (I don’t know rust).
What are they? Show me the vulnerabilities you're talking about. I don't think I'm moving the goalposts here. The major distinction between Rust and (say) Java is Rust's type system formalisms to prevent in-process data race memory corruption. Those are real features, but they don't mitigate a major class of vulnerabilities.
> I have a hard time not reading them as shibboleths for "Rust is the only safe language", which is manifestly false.
Given that quite simple classes of vulnerabilities are endemic to all other major languages, no, it's not "manifestly false". The state of software safety really is bad enough that "all major languages that aren't Rust are unsafe" is plausible.
> The vulnerabilities endemic to memory-safe languages (logic and higher-level vulnerabilities like SQLI, metacharacter quoting, filesystem traversal, and cryptography bugs) are common both to languages like Python and Java and also to Rust --- the only super-common class of vulnerability endemic to languages like Python and Java that Rust avoids is deserialization (you avoid deserialization vulnerabilities by not building hypercapable serialization formats).
SQLI at least should be a lot less common in ML-family languages like Rust where manipulating structured data is relatively easy (or at least, the ease advantage of string manipulation over structured data is smaller). Carefully distinguishing between character strings, file paths, and byte sequences, as Rust does, should also eliminate at least some common kinds of vulnerabilities.
> The data races aren't going to burn you.
Eh maybe. All we can really say so far is that they haven't reached low-hanging fruit level yet. There have been plenty of similarly unsafe things that weren't thought to be exploitable that have turned out to be major sources of vulnerabilities as the bar gets raised and more effort gets put in, e.g. there was a time when the conventional wisdom was that double-free() was only a reliability/resiliency concern and not a security issue.
Given that quite simple classes of vulnerabilities are endemic to all other major languages, no, it's not "manifestly false". The state of software safety really is bad enough that "all major languages that aren't Rust are unsafe" is plausible.
We're really very good at documenting vulnerabilities; the mere documentation of vulnerabilities is itself a 9-figure industry. So: cough up the examples. I can't think of any, so that's where I'm setting the bar for you.
A reminder that memory corruption bugs in FFI-bound libraries doesn't count --- Rust has plenty of those --- and neither do deserialization vulnerabilities, which were discussed upthread. It also doesn't matter if a condition makes it unsafe to run attacker-controlled code in a shared runtime; nobody does that (with native languages; they try, with Javascript, and it has been a disaster). You're looking for vulnerabilities that are widely exploited and intrinsic to a memory-safe language that isn't Rust. Not to a library, but to the language.
> We're really very good at documenting vulnerabilities; the mere documentation of vulnerabilities is itself a 9-figure industry. So: cough up the examples. I can't think of any, so that's where I'm setting the bar for you.
Your own post listed a bunch of vulnerability classes that happen in those languages ("logic and higher-level vulnerabilities like SQLI, metacharacter quoting, filesystem traversal, and cryptography bugs").
He says explicitly that these are endemic to memory-safe languages, including Rust. They aren't something that Rust handles better than Python or Java.
Memory safety is a software security concern. You can squint and make it about reliability or resiliency, but the reason we talk about memory safety is (to a first approximation) browser vulnerabilities.
The piece goes on to discuss data races. I'm a little keyed up on software security essays that bring data race safety into the discussion. I have a hard time not reading them as shibboleths for "Rust is the only safe language", which is manifestly false.
The vulnerabilities endemic to memory-safe languages (logic and higher-level vulnerabilities like SQLI, metacharacter quoting, filesystem traversal, and cryptography bugs) are common both to languages like Python and Java and also to Rust --- the only super-common class of vulnerability endemic to languages like Python and Java that Rust avoids is deserialization (you avoid deserialization vulnerabilities by not building hypercapable serialization formats).
Data races are a common source of reliability bugs. They're a meaningful software engineering concern. In exotic scenarios (userspace-sandboxed attacker-controlled code), they can constitute practical security vulnerabilities. But in the main, data races have not empirically proved out as a source of exploited vulnerabilities. If you have a fixed budget to transition from a C codebase that would allow you to migrate to Python now, or if you saved up, to Rust next year, and all you care about is security, then ceteris paribus you should do the Python thing. The data races aren't going to burn you.