I'm not sure I understand what this piece is trying to say about Python memory s...

tedunangst · 2023-12-24T00:07:20 1703376440

Are you grouping kernel exploits in with user space sandboxes? Lots of local roots come from data races which I would not call exotic.

And there's always https://portswigger.net/research/smashing-the-state-machine for web stuff.

tptacek · 2023-12-24T00:35:31 1703378131

Right, these aren't data races; they're distributed systems races, more akin to tempfile races from the 1990s than to memory corruption.

tedunangst · 2023-12-24T00:51:40 1703379100

Okay, fair.

schneems · 2023-12-24T03:41:51 1703389311

To me it’s a deep philosophical post in the vein of “what even is memory safety anyway?”

> The piece makes the case that Python isn't memory safe

It’s a philosophy argument tactic. Take something everyone considers to be true “Python is memory safe” then push it to logical extremes. The purpose of this isn’t to learn anything about Python, the purpose is to learn about the extremes. In this case about memory safety.

I think the overall point is that “memory safe languages don’t truly exist” in the purist sense, since every lang must touch unsafe code at some point. However some languages and tools do a better job is isolating these interactions. We call these tools “memory safe”.

SnowflakeOnIce · 2023-12-24T06:58:13 1703401093

> Conventionally, in software security, Python is considered a memory-safe language. The piece makes the case that Python isn't memory safe when you FFI into a C library.

Interesting and largely unknown trivia: it's possible to invoke memory errors in the underlying C interpreter from pure Python code — no libraries and no imports needed!

One way of doing this is by creating new `code` objects with crafted bytecode. There is no bytecode verifier in Python to make sure, say, referenced stack variables in the VM are valid...

weinzierl · 2023-12-24T08:00:44 1703404844

Is this because of a bug and might be fixed in the future or is it considered an unavoidable consequence of some design decision and will stay that way for the foreseeable future?

From what I understand about Rust, if something similar was possible in safe Rust it would be considered a bug and eventually fixed.

SnowflakeOnIce · 2023-12-24T23:12:38 1703459558

I think it will stay that way for the foreseeable future (but who can say). Ways to fix the particular hole:

(1) disable creating new `code` objects directly from Python. This probably would break lots of things.

(2) Add a bytecode verification mechanism that would reject `code` objects whose bytecode would result in memory errors when executed. This could be a lot of implementation work; I'm not sure.

omnicognate · 2023-12-24T10:03:49 1703412229

You also don't need to FFI into some buggy C library to violate memory safety with ctypes. It's trivial to produce a segfault with it without using anything but ctypes itself, which is part of the standard library. I doubt I'd have much trouble finding other ways to make a segfault with pure python and the standard library (struct springs to mind).

CPython really isn't very safe at all. Its focus has always been on being a convenient, dynamic scripting language with minimal-fuss access to native code. It has never been hard to violate its internal assumptions and it probably never will be.

And I'm pretty comfortable with that, FWIW.

IshKebab · 2023-12-24T08:53:01 1703407981

> logic bugs, SQL injection, quoting, filesystem traversal...

Actually Rust does go quite far in reducing the probability of these bugs, even if it doesn't have specific features for it. This is through a combination of:

* really strong type system ("if it compiles it works")

* Better ergonomics, e.g. using prepared queries is much easier than in C.

* Library code being generally very high quality, and easy to obtain.

saagarjha · 2023-12-24T06:12:59 1703398379

Data races are definitely exploited! If we are considering TOCTOU issues then this is a very easy way to get fairly reliable and simple exploits. If we are talking about races of the “two threads access the same value” kind then it’s easy (well, assuming reliability is an exercise for the reader) to turn this into a UAF or OOB access by having one thread work with a stale version of an object that has been modified elsewhere.

IshKebab · 2023-12-24T08:45:21 1703407521

TOCTOU isn't a data race. It's a race condition, but a "data race" is something much more specific. I think the terminology is confusing to be honest.

saagarjha · 2023-12-24T08:49:18 1703407758

Right, my understanding is that a data race is the second thing I mentioned. I was just so surprised to hear this viewpoint that I figured I’d throw it in just in case we were talking about different things.

jules · 2023-12-24T06:34:56 1703399696

While data races may not be a top category empirically, they are undefined behavior, which means that (a future version of) the compiler is allowed to make your program do anything at all after a data race happens. We are setting the bar incredibly low for ourselves if we just accept that things like that happen on the regular.

lll-o-lll · 2023-12-24T01:17:34 1703380654

> But in the main, data races have not empirically proved out as a source of exploited vulnerabilities.

Say what? Data races, otherwise lumped under the bucket “timing attacks”, are a common source of security exploits. A basic example is racing with code that is creating a file and applying an ACL in two steps. If I can “time” things right from a concurrent thread/process, I can get into this file before the ACL prevents me.

There are countless scenarios where multi-step operations that need to be treated atomically can be exploited by racing.

tptacek · 2023-12-24T01:19:45 1703380785

That's not a vulnerability Rust prevents; it's an interaction between multiple competing runtimes. I'm not denying that race conditions (or timing attacks, another bug class Rust doesn't prevent) exist and are exploited! I'm denying that in-process data races that corrupt memory are a meaningful source of exploitable vulnerabilities.

For background, I've spent most of my career doing vulnerability research. I'm by no means a world expert on memory corruption vulnerabilities (I'm still impressed that I got my imapd shellcode to work with no uppercase ASCII characters), but you can safely assume I'm not just completely blowing off huge classes of exploitable vulnerabilities because I've never heard of them. Doesn't mean I'm right! But like, if you're going "say what", you're probably misconstruing me.

lll-o-lll · 2023-12-24T01:30:49 1703381449

OK, but I think you are moving the goal posts. You referred to “data races” and “security exploits” and suggested the two were not related. Memory corruption is only one (small) class of security exploits. Data races cause just as many in process, in memory, exploits as multi-step file operations (we are talking breaking application security models). Perhaps Rust can prevent most of these! (I don’t know rust).

tptacek · 2023-12-24T01:33:22 1703381602

What are they? Show me the vulnerabilities you're talking about. I don't think I'm moving the goalposts here. The major distinction between Rust and (say) Java is Rust's type system formalisms to prevent in-process data race memory corruption. Those are real features, but they don't mitigate a major class of vulnerabilities.

lll-o-lll · 2023-12-24T01:47:54 1703382474

Any multi-step code, e.g. AddUser(); SetPermissions();

But, fair enough, this is not what you were talking about, and I reacted to something you weren’t intending to convey.

yawaramin · 2023-12-24T03:39:05 1703389145

Isn't it rather trivial to prevent this by creating a file with a random filename, applying the ACL, then renaming the file to the correct name?

lmm · 2023-12-24T00:47:56 1703378876

> I have a hard time not reading them as shibboleths for "Rust is the only safe language", which is manifestly false.

Given that quite simple classes of vulnerabilities are endemic to all other major languages, no, it's not "manifestly false". The state of software safety really is bad enough that "all major languages that aren't Rust are unsafe" is plausible.

> The vulnerabilities endemic to memory-safe languages (logic and higher-level vulnerabilities like SQLI, metacharacter quoting, filesystem traversal, and cryptography bugs) are common both to languages like Python and Java and also to Rust --- the only super-common class of vulnerability endemic to languages like Python and Java that Rust avoids is deserialization (you avoid deserialization vulnerabilities by not building hypercapable serialization formats).

SQLI at least should be a lot less common in ML-family languages like Rust where manipulating structured data is relatively easy (or at least, the ease advantage of string manipulation over structured data is smaller). Carefully distinguishing between character strings, file paths, and byte sequences, as Rust does, should also eliminate at least some common kinds of vulnerabilities.

> The data races aren't going to burn you.

Eh maybe. All we can really say so far is that they haven't reached low-hanging fruit level yet. There have been plenty of similarly unsafe things that weren't thought to be exploitable that have turned out to be major sources of vulnerabilities as the bar gets raised and more effort gets put in, e.g. there was a time when the conventional wisdom was that double-free() was only a reliability/resiliency concern and not a security issue.

tptacek · 2023-12-24T01:06:08 1703379968

Given that quite simple classes of vulnerabilities are endemic to all other major languages, no, it's not "manifestly false". The state of software safety really is bad enough that "all major languages that aren't Rust are unsafe" is plausible.

We're really very good at documenting vulnerabilities; the mere documentation of vulnerabilities is itself a 9-figure industry. So: cough up the examples. I can't think of any, so that's where I'm setting the bar for you.

A reminder that memory corruption bugs in FFI-bound libraries doesn't count --- Rust has plenty of those --- and neither do deserialization vulnerabilities, which were discussed upthread. It also doesn't matter if a condition makes it unsafe to run attacker-controlled code in a shared runtime; nobody does that (with native languages; they try, with Javascript, and it has been a disaster). You're looking for vulnerabilities that are widely exploited and intrinsic to a memory-safe language that isn't Rust. Not to a library, but to the language.

lmm · 2023-12-24T04:59:09 1703393949

> We're really very good at documenting vulnerabilities; the mere documentation of vulnerabilities is itself a 9-figure industry. So: cough up the examples. I can't think of any, so that's where I'm setting the bar for you.

Your own post listed a bunch of vulnerability classes that happen in those languages ("logic and higher-level vulnerabilities like SQLI, metacharacter quoting, filesystem traversal, and cryptography bugs").

FreakLegion · 2023-12-24T05:16:48 1703395008

He says explicitly that these are endemic to memory-safe languages, including Rust. They aren't something that Rust handles better than Python or Java.

lmm · 2023-12-24T21:35:02 1703453702

Even if that's true (and I have my doubts), it doesn't make those non-Rust languages safe.

tptacek · 2023-12-24T21:51:39 1703454699

I don't understand what you're trying to argue here. The point is that they have the same safety level as Rust, not that they're somehow more safe.