Hacker News new | past | comments | ask | show | jobs | submit login
totally_safe_transmute, Line-by-Line (2021) (yossarian.net)
80 points by iafisher 3 months ago | hide | past | favorite | 40 comments



I appreciate that "totally_safe_transmute" carries some connotation that this is not a "safe" transmute, but rather a suspiciously specific denial.


So `totally-unsafe-transmute` (does not exist as of 2024-01-09T06Z) would be an actually safe transmute... right?


Also possible to do directly in the "safe" type system, without messing around with /proc/mem: https://zyedidia.github.io/blog/posts/5-safe-transmute/


That is certainly a strange bug! It took me several minutes to wrap my head around, but from if I'm reading correctly, `transmute_obj` shouldn't expect to be passed in a U because `<T as Object<U>>::Output` should be using `Object<U>::Output`, but instead it's using `T::Output`. I need to mess around with this later on my computer because I'd expect that `<T as Object<U>>::Output` accepting `T::Output` means that it would _not_ accept `Object<U>::Output`, and I'm super curious what error it would give if it were passed that instead...


The Rust team did a deep dive on the bug in 2020, which has some more details that might be helpful to understanding what's going on: https://github.com/rust-lang/lang-team/blob/master/design-me....


in fact way safer, since it doesn't rely on the unsafe-riddled std::fs!


Why don't the safe file I/O operations panic when /proc/self/mem is opened for writing? I understand why they don't want to make all of File I/O unsafe just for edge cases like this, but shouldn't this be handled at runtime?


Because you could just do the Rust equivalent of system("dd of=/proc/myprocess/mem ...") instead, so it would be security theater. Memory safety just isn't a part of the default Unix model.

Note the emphasis on "default" above; you can use the Linux sandboxing features such as seccomp-bpf to build a sandbox which is truly memory-safe, closing this hole. The OS is in charge of the features it exposes, and Rust can't do much about that.

Note also that the existence of totally_safe_transmute doesn't mean that Rust's memory safety features are pointless. Empirically, memory-safe programming languages result in far fewer memory safety vulnerabilities, because they make exploitation way harder.


Okay but why doesn’t Rust set up an LLM to analyze all output of the process and if it determines that the process is trying to communicate to the outside world that it intends to do something memory unsafe it pani


Ah yes, predictive text based panics is what we need in our compilers...

The LLM hype has really jumped the shark.


I've considered proposing this before, but presumably it's a cat and mouse game. Can the Rust stdlib reliably detect writes to /proc/mem in the face of links and raw FDs? And it probably should be reliable, because nobody writes to /proc/mem by accident.

And even then, it doesn't help with whole-system security when every program in every other language on your system, including safe ones like Java and Python, have the same capability. (Although if I'm wrong that there's no precedence for languages attempting to block this, I'd love to see the prior art.)


It’s not just /proc/sys/mem. One could also modify executable files and cause all manner of other mayhem. At the end of they, if you have access to all ambient privileges available to your process, you can use them. Trying to specifically block memory safety violations would be a bit odd.


> One could also modify executable files

ETXTBSY.


Not if they’re not running or if you replace them outright. So you can rename a garbage file over your main executable, exec yourself, and segfault. Is that memory unsafety?

How about ptracing yourself?

How about undervolting your CPU such that it malfunctions?

How about running your program off a FUSE filesystem that changes the contents out from under it?

How about editing the raw block device you’re running from?

How about modifying /dev/mem on older systems that allow that?

This particular rabbit hole is extremely deep, and I don’t think it’s practical to address it for real short of making a real security model for what running code may and may not to do the rest of the world. (I think doing that is an extremely worthwhile endeavor, but disallowing opening specific files won’t be part of it.)


In case anyone else was wondering if this was an acronym for an aphorism (like YAGNI), this is an error code (error text busy). And 'text' refers to the executable


Rust is not a sandbox language like JS. It only catches accidental programming errors, and it's improbable that someone would write a hack via /proc/self/mem by mistake.


It's hard to do by accident and is pretty unix specific. Also, what if you really want do access this and it panics?


> Also, what if you really want do access this and it panics?

That's the easy part, you'd provide some OS-specific API for getting this file handle, call it `std::os::fs::proc()` or thereabouts, and make it an `unsafe` function. The "hard to do" is the bigger problem, because if you're providing an unsafe alternative then you'd like to plug all the holes in the safe interface, which AFAIK is non-trivial.


This is cute, but I hope it never turns up in any real codebase!

There’s an updated version with Windows support and better performance: https://github.com/John2143/totally-speedy-transmute/

What worries me is this macro, which “smuggles” the unsafe keyword past the forbid(unsafe_code) flag: https://github.com/John2143/totally-speedy-transmute/blob/ma...

In my mind, this kind of capability makes Rust crate safety scanning and associated metadata worthless as currently implemented.

Package management tools ought to store code instead of binaries, and perform safety checks to via instrumented compilers.


> In my mind, this kind of capability makes Rust crate safety scanning and associated metadata worthless as currently implemented.

If you wanted to backdoor a Rust program, you wouldn't need the `unsafe` keyword at all. And if you want to use unsafe code, that's fine, plenty of crates use unsafe code without anyone being up in arms about it (e.g. the regex crate). This is a party trick rather than something to be concerned about; at the end of the day either you're auditing your dependencies (in which case this would stick out like a sore thumb) or you're not (in which case there are far easier ways to pwn you).


You can always smuggle unsafe past the compiler, it can't stop you even in principle.

The totally safe transmute is at runtime, so an instrumented compiler cannot detect it (halting problem is in the way). You'd need runtime instrumentation of your binary. And even then, it's wildly impractical.

If you let an application interact with the environment, tomorrow Linux or Windows could add a new magic file, or a special COM call, or whatever it is that creates unsafe. Rust can't have a complete list of all the unsafe things that are outside of its control.

What you probably want is a runtime VM, like WASM.


> You can always smuggle unsafe past the compiler, it can't stop you even in principle.

The linked updated library uses a different method: it literally smuggles the "unsafe" keyword past the safety checks by removing the space character from "un safe".

This can and should be caught by the compiler -- it has full access the syntax tree at every intermediate stage of compilation! Instead, the Cargo tool and the rustc compiler are simply keyword-searching for the literal string "unsafe", and are hence easily fooled.

Note that this updated method is not the same thing as the Linux process memory mapping and doesn't rely on OS APIs in any way. It is a purely compile-time hack, not a runtime one.

What I'd love to see is an healthy ecosystem of open-source packages that are truly safe: using a safe language, without any escape hatches, etc...

E.g.: Libraries for decoding new image formats like JPEG XL ought to be 100% safe, but they're not! This hampers adoption, because browsers won't import huge chunks of potentially dangerous code.

Now we can't have HDR photos on the web because of this specific issue: The Chromium team removed libjxl from the code base because of legitimate security concerns. A million other things are also going to be "unsupported" for a long time because of the perceived (and real!) dangers of importing third-party dependencies.

We'll be stuck with JPG, GIF, and PNG forever because there is no easy way to share code that is truly safe in the "pure function with no side-effects" sense.

PS: This type of issues is also the root-cause of issues like Log4J and various XML decoder security problems. By default, most languages allow arbitrary code even for libraries that ought to perform "pure transformations" of one data format to another, such as reaching out to LDAP servers over TCP/IP sockets, as in the case with Log4J. Similarly, XML decoders by default use HTTP to retrieve referenced files, which is madness. Even "modern" formats like YAML make this mistake, with external file references on by default!

> What you probably want is a runtime VM, like WASM.

Sure, that's one way to sandbox applications, but in principle it ought to be entirely possible to have a totally safe ahead-of-time compiled language and/or standard library.

Rust is really close to this goal, but falls just short because of tricks like this macro backdoor. (It would also need a safe subset of the standard library.)


> The linked updated library uses a different method: it literally smuggles the "unsafe" keyword past the safety checks by removing the space character from "un safe".

> This can and should be caught by the compiler -- it has full access the syntax tree at every intermediate stage of compilation! Instead, the Cargo tool and the rustc compiler are simply keyword-searching for the literal string "unsafe", and are hence easily fooled.

This is not what's going on. No Rust tool performs any such string scanning. The "un safe" with the space is purely for aesthetic effect. In fact, the proc macro works exactly the same when modified to just use "unsafe" directly, and cargo-geiger still doesn't report any issues.

The real effect is that macros and proc macros from foreign crates are allowed to emit unsafe { ... } blocks in their output, even if the caller of the macro uses #![forbid(unsafe_code)]. Effectively, the output is considered as coming from "different crate" from the caller with the #![forbid], so it's not effected by the forbidden lint. This behavior was implemented in [0], which silences all lints (including forbidden lints) within the output of macros defined in foreign crates, except for certain lints which explicitly opt-in.

The #![forbid(unsafe_code)] inside the src/lib.rs defining the proc macro similarly doesn't do anything, since it restricts the definition of the proc macro, but not the output of the proc macro.

As far as I know, there is currently no way to deny unsafe code in the output of proc macros.

[0] https://github.com/rust-lang/rust/pull/52467


The language itself may not do such string scanning, but reviewers and home-brewn scripts might.


This is a really weird hack to say the least. More like a flex showing that the author can implement transmute without unsafe than something you’d really use.


The point is likely to show that Rust's "safety" is not absolute and it's possible to do lots of silly stuff with "safe" code.


Absolute safety would require a totally managed runtime with least privilege, not a 1970s Unix derivative.


That's true but this particular crate exists to show an unevitable hole in the definition of memory safety, while you can do lots of stupid things (like, `rm -rf /`) only with absolutely safe code.


/proc/self/mem is the moral equivalent of `unsafe`. Of course you can do arbitrary things with it. Why would anyone be surprised? You could use https://man7.org/linux/man-pages/man2/process_vm_readv.2.htm.... You could fork and ptrace. You can do any number of weird things.

Every day that goes by is a day I think we should make a beeline to CHERI even when we have "safe" languages.


For the uninitiated CHERI should be

https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/


`process_vm_writev` would be simpler.


Calling C functions like process_vm_writev from libc requires unsafe code.


Calling C functions like open() requires unsafe code.


C doesn't provide any reinterpretation operator, and the C++ one's name is a misnomer.

Casts are conversion: a new value is produced based on an existing one.

Reinterpretation requires a value to be in memory, and to be accessed using an lvalue of a different type. Most situations of this kind are undefined behavior.


   double transmute_int_to_double(int x) {
       double result = 0;
       memcpy(&result, &x, sizeof(int) < sizeof(double) ? sizeof(int) : sizeof(double));
       return result;
   }
is not a UB, and it doesn't even necessarily touch any memory:

    transmute_int_to_double:
        movd    xmm0, edi
        ret
[0] https://godbolt.org/z/czM3eh8er


> is not UB

Not by my interpretation of n3096 (April 2023 ISO C draft).

> doesn't even necessarily touch any memory

The abstract semantics calls for memory being touched. Data flows that go through memory in the abstract semantics can be optimized not to go through memory. UB can do anything at all.


> Not by my interpretation of n3096 (April 2023 ISO C draft).

What clause(s) support the claim that that example is UB? At least at first glance it looks pretty similar to the sometimes-recommended way to perform safe type-punning in C, and the only way to "directly" invoke UB via memcpy (passing pointers to overlapping objects) isn't relevant here.

The only suspicious part to me is the size/number of bytes to copy, but I'm not sure that's outright UB?


We have no idea whether the bytes that were copied into that object are a valid representation.


Ah, I didn't think of that.

That being said, is there a representation for doubles that could be considered invalid? I thought all double bit patterns are interpretable as a "valid" double, even if you get NaNs.


You can alternatively read `reinterpret_cast` as "casting or convesion used to achieve reinterpretation", which is clearly correct.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: