It seems like you've got it backwards. Even unsafe rust is still more strict tha...

uecker · 2025-03-16T21:14:26 1742159666

This description is still misleading. The preconditions for the correctness of an unsafe block can very much depend on the correctness of the code outside and it is easy to find Rust bugs where exactly this was the cause. This is very similar where often C out of bounds accesses are caused by some logic error elsewhere. Also an unsafe block has to maintain all the invariants the safe Rust part needs to maintain correctness.

lambda · 2025-03-16T21:36:17 1742160977

So, it's true that unsafe code can depend on preconditions that need to be upheld by safe code.

But using ordinary module encapsulation and private fields, you can scope the code that needs to uphold those preconditions to a particular module.

So the "trusted computing base" for the unsafe code can still be scoped and limited, allowing you to reduce the amount of code you need to audit and be particularly careful about for upholding safety guarantees.

Basically, when writing unsafe code, the actual unsafe operations are scoped to only the unsafe blocks, and they have preconditions that you need to scope to a particular module boundary to ensure that there's a limited amount of code that needs to be audited to ensure it upholds all of the safety invariants.

Ralf Jung has written a number of good papers and blog posts on this topic.

uecker · 2025-03-16T21:47:20 1742161640

And you think one can not modularize C code and encapsulate critical buffer operations in much safer APIs? One can, the problem is that a lot of legacy C code was not written this way. Also lot of newly written C code is not written this way, but the reason is often that people cut corners when they need to get things done with limited time and resources. The same you will see with Rust.

lambda · 2025-03-17T03:19:29 1742181569

There is no distinction between safe and unsafe code in C, so it's not possible to make that same distinction that you can in Rust.

And even if you try to provide some kind of safer abstraction, you're limited by the much more primitive type system, that can't distinguish between owned types, unique borrows, and shared borrows, nor can it distinguish thread safety properties.

So you're left to convention and documentation for that kind of information, but nothing checking that you're getting it right, making it easy to make mistakes. And even if you get it right at first, a refactor could change your invariants, and without a type system enforcing them, you never know until someone comes along with a fuzzer and figures out that they can pwn you

uecker · 2025-03-17T20:04:38 1742241878

There is definitely a distinction between safe and unsafe code in C, it is just not a simple binary distinction. But this does not make it impossible to screen C for unsafe constructions and it also does not mean that detecting unsafe issues in Rust is always trivial.

gf000 · 2025-03-16T22:29:09 1742164149

Even innocent looking C code can be chock-full of UBs that can invalidate your "local reasoning" capabilities. So, not even close.

wavemode · 2025-03-16T22:57:29 1742165849

Care to share an example?

masfuerte · 2025-03-17T00:28:35 1742171315

   int average(int x, int y) {
       return (x+y)/2;
   }

uecker · 2025-03-17T20:05:53 1742241953

But this is also easy to protect against if you use the tools available to C programmers. It is part of the Rust hype that we would be completely helpless here, but this is far from the truth.

throwaway2037 · 2025-03-17T04:29:53 1742185793

I assume you are hinting at 'int' is signed here? And, that signed overflow is UB in C? Real question: Ignoring what the ISO C language spec says, are there any modern hardware platforms (say: ARM64 and X86-64) that do not use two's complement to implement signed integers? I don't know any. As I understand, two's complement correctly supports overflow for signed arithmetic.

I might be old, but more than 10 years ago, hardly anyone talked about UB in C and C++ programming. In the last 10 years, it is all the rage, but seems to add very little to the conversation. For example, if you program C or C++ with the Win32 API, there are loads of weird UB-ish things that seem to work fine.

steveklabnik · 2025-03-17T04:44:13 1742186653

> Ignoring what the ISO C language spec says, are there any modern hardware platforms (say: ARM64 and X86-64) that do not use two's complement to implement signed integers?

This is not how compilers work. Optimization happens based on language semantics, not on what platforms do.

jandrewrogers · 2025-03-17T04:58:45 1742187525

At least in recent C++ standards, integers are defined as two’s complement. As a practical matter what hardware like that may still exist doesn’t have a modern C++ compiler, rendering it a moot point.

UB in C is often found where different real hardware architectures had incompatible behavior. Rather than biasing the language for or against different architectures they left it to the compiler to figure out how to optimize for the cases where instruction behavior diverge. This is still true on current architectures e.g. shift overflow behavior which is why shift overflow is UB.

oneshtein · 2025-03-17T06:00:41 1742191241

AI rewrote to avoid undefined behavior:

  int average(int x, int y) {
    long sum = (long)x + y;
    if(sum > INT_MAX || sum < INT_MIN)
        return -1; // or any value that indicates an error/overflow
  
    return (int)(sum / 2);
  }

josefx · 2025-03-17T08:20:34 1742199634

> long sum = (long)x + y;

There is no guarantee that sizeof(long) > sizeof(int), in fact the GNU libc documentation states that int and long have the same size on the majority of supported platforms.

https://www.gnu.org/software/libc/manual/html_node/Range-of-...

> return -1; // or any value that indicates an error/overflow

-1 is a perfectly valid average for various inputs. You could return the larger type to encode an error value that is not a valid output or just output the error and average in two distinct variables.

AI and C seem like a match made in hell.

cesarb · 2025-03-17T15:59:46 1742227186

> There is no guarantee that sizeof(long) > sizeof(int), in fact the GNU libc documentation states that int and long have the same size on the majority of supported platforms.

That used to be the case for 32-bit platforms, but most 64-bit platforms in which GNU libc runs use the LP64 model, which has 32-bit int and 64-bit long. That documentation seems to be a bit outdated.

(One notable 64-bit platform which uses 32-bit for both int and long is Microsoft Windows, but that's not one of the target platforms for GNU libc.)

Jaxan · 2025-03-17T07:58:00 1742198280

I’m not convinced that solution is much better. It can be improved to x/2 + y/2 (which still gives the wrong answer if both inputs are odd).

immibis · 2025-03-17T09:35:14 1742204114

We're about to see a huge uptick in bugs worldwide, aren't we?

throwaway2037 · 2025-03-17T08:29:55 1742200195

I don't know why this answer was downvoted. It adds valuable information to this discussion. Yes, I know that someone already pointed out that sizeof(int) is not guaranteed on all platforms to be smaller than sizeof(long). Meh. Just change the type to long long, and it works well.

NobodyNada · 2025-03-17T15:09:55 1742224195

Copypasting a comment into an LLM, and then copypasting its response back is not a useful contribution to a discussion, especially without even checking to be sure it got the answer right. If I wanted to know what an LLM had to say, I can go ask it myself; I'm on HN because I want to know what people have to say.

gf000 · 2025-03-17T08:33:51 1742200431

It literally returns a valid output value as an error.

oneshtein · 2025-03-17T15:22:59 1742224979

An error value is valid output in both cases.

josefx · 2025-03-17T08:53:04 1742201584

> Meh. Just change the type to long long, and it works well.

C libraries tend to support a lot of exotic platforms. zlib for example supports Unicos, where int, long int and long long int are all 64 bits large.

capitainenemo · 2025-03-16T23:06:53 1742166413

sorting floats with NaN ? almost anything involving threading and mutation where people either don't realise how important locks are, or don't realise their code has suddenly been threaded?

pests · 2025-03-17T04:04:20 1742184260

https://www.ioccc.org/years.html

nicoburns · 2025-03-17T00:24:59 1742171099

You're a lot more limited more limited to the kinds of APIs you can safely encapsulate in C. For example, you can't safely encapsulate an interface that shares memory between the library and the caller in C. So you're forced into either:

- Exposing an unsafe API and relying on the caller to manually uphold invariants

- Doing things like defensive copying at a performance cost

In many cases Rust gives you the best of both worlds: sharing memory liberally while still having the compiler enforce correctness.

uecker · 2025-03-17T20:07:10 1742242030

Rust is better at this yes, but the practical advantage is not necessarily that huge.

GTP · 2025-03-17T10:21:42 1742206902

Which is just a convoluted way of saying that it is possible to write bugs in any language. Still, it's undeniable that some languages make a better job at helping you avoid certain bugs than others.

dwattttt · 2025-03-16T21:34:25 1742160865

It's true, but I think it's only fair if you hold Rust to this analysis, other languages should too; the scrutiny you're implying you need in an unsafe Rust block needs to be applied to all C code, because all C code could depend on code anywhere else for its safety characteristics.

In practice (in both languages) you check what the actual unsafe code does (or "all" code in C's case), note code that depends on external actors for safety (it's not all C code, nor is it all unsafe Rust blocks), and check their callers (and callers callers, etc).

uecker · 2025-03-16T21:44:19 1742161459

What is true is that there are more operations in C which can cause undefined behavior and those are more densely distributed over the C code, making it harder to screen for undefined behavior. This is true and Rust certainly has an advantage, but it not nearly as big of an advantage as the "Rust is safe" (please do not look at all the unsafe blocks we need to make it also fast!) and "all C is unsafe" story wants you to believe.

lambda · 2025-03-17T03:55:08 1742183708

What Rust provides is a way to build safe abstractions over unsafe code.

Rust's type system (including ownership and borrowing, Sync/Send, etc), along with it's privacy features (allowing types to have private fields that can only be accessed by code in the module that defined them) allows you to create fully safe interfaces around code that uses unsafe; there is provably no combination of uses of the interface which lead to undefined behavior.

Now, yeah, it's possible to also use unsafe in Rust just for applying a local optimisation. And that has fewer benefits than a fully encapsulated safe interface, though is still easier to audit for potential UB than C.

So you're right that it's on a continuum, but the distinction between safe and unsafe code means you can more easily find the specific places where UB could occur, and the encapsulation and type system makes it possible to create safe abstractions over unsafe code.

pdimitar · 2025-03-16T23:58:37 1742169517

You sound pretty biased, gotta tell you. That snark is not helping any argument you think you might be doing -- and you are not doing any; you are kind of just making fun of Rust, which is pretty boring and uninformative for any reader.

From my past experiences with Rust, the team never had to think about data race once, or mutable volatile globals. And we all there suffered from those decades ago with C and sometimes C++ as well.

You like those and don't want to migrate? More power to ya! But badmouthing Rust with what seem fairly uninformed comments is just low. Inform yourself first.

dwattttt · 2025-03-16T22:02:13 1742162533

The places where undefined behaviour can occur are also limited in scope; you insist that that part isn't true, because operations outside those unsafe blocks can impact their safety.

That's only true at the same level of scrutiny as "all C operations can cause undefined behaviour, regardless of what they are", which I find similarly shallow.

gf000 · 2025-03-16T22:36:41 1742164601

Rust is plenty fast, in fact there are countless examples of safe rust that will trivially beat out C in performance due to no aliasing, enabling better vectorization among others. Let alone being simply a more expressive language and allowing writing better optimizations (e.g. small strings, vs the absolutely laughable c-strings that perform terribly, but also you can actually get away with sharing more stuff in memory vs doing defensive copies everywhere because it is safe to do so, etc)

And there is not many things we have statistics on in CS, but memory vulnerabilities being absolutely everywhere in unsafe languages, and Rust cleaning up the absolute majority of them even when only the new parts are written in Rust are some of the few we do know, based on actual, real life projects at Google/Microsoft among others.

A memory safe low-level language is as novel as it gets. Rust is absolutely not just hype, it actually delivers and you might want to get on with the times.

throwaway2037 · 2025-03-17T04:46:44 1742186804

    > absolutely laughable c-strings that perform terribly

Not much being said here in 2025. Any good project will quickly switch to a tiny structure that holds char* and strlen. There are plenty of open source libs to help you.

saagarjha · 2025-03-17T09:40:07 1742204407

I take that you consider most major projects written in C to not be "good"?

sophacles · 2025-03-17T15:19:47 1742224787

Most major software projects are not good, no matter what language.

gf000 · 2025-03-16T22:24:14 1742163854

This is technically correct, but a bit pedantic.

Sure, you can technically just write your own vulnerability for your own program and inject it at an unsafe and see the whole world crumble... but the exact same is true for any form of FFI calls in any language. Is Java memory safe? Yeah, just because I can grab a random pointer and technically break anything I want won't change that.

The fact that a memory vulnerability error may either appear at no place at all OR at the couple hundred lines of code thorough the whole project is a night and day difference.

iknowstuff · 2025-03-16T21:28:47 1742160527

No. Correctness of code outside unsafe depends on correctness inside those blocks, not the other way around

uecker · 2025-03-16T21:37:31 1742161051

[flagged]

iknowstuff · 2025-03-16T21:59:06 1742162346

tf are you talking about

dwattttt · 2025-03-16T22:05:20 1742162720

In a more helpful framing: safe Rust code doesn't need to worry about its own correctness, it just is.

Unsafe code can be incorrect (or unsound), and needs to be careful about it. Part of being careful is that safe code can call the unsafe code in a way that triggers that unsoundness; in that way, safe code can cause undefined behaviour in unsafe code.

It's not always the case that this is possible; there are unsafe blocks that don't need to depend on safe code for its correctness.

steveklabnik · 2025-03-16T22:04:57 1742162697

They are (rudely) talking about https://news.ycombinator.com/item?id=43382369

Someone · 2025-03-16T21:10:24 1742159424

But “Dereference a raw pointer”, in combination with the ability to create raw pointers pointing to arbitrary memory addresses (that, you can do even in safe rust) allows you to write arbitrary memory from unsafe rust.

So, in theory, unsafe rust opens the floodgates. In practice, though, you can use small fragments of unsafe code that programmers can fairly easily check to be safe.

Then, once you’ve convinced yourself that those fragments are safe, you can be assured that your whole program is safe (using ‘safe’ in the rust sense, of course)

So, there may be some small islands of unsafe code that require extra attention from the programmer, but that should be just a tiny fraction of all lines, and you should be able to verify those islands in isolation.

steveklabnik · 2025-03-16T21:18:37 1742159917

> allows you

This is where the rubber hits the road. Rust does not allow you to do this, in the sense that this is possibly undefined behavior. That "possibly" is why the compiler allows you to write this code, because by saying "unsafe", you are promising that this specific arbitrary address is legal for you to write to. But that doesn't mean that it's always legal to do so.

timschmidt · 2025-03-16T21:23:44 1742160224

The compiler won't allow you to compile such code without the unsafe. The unsafe is *you* promising the compiler that *you* have checked to ensure that the address will always be legal. So that the compiler will allow you to compile the code.

steveklabnik · 2025-03-16T21:24:43 1742160283

Right, I'm saying "allow" has two different connotations, and only one of them, the one that you're talking about, applies.

timschmidt · 2025-03-16T21:37:20 1742161040

I gotcha. I misread and misunderstood. Yes, we agree.

rybosome · 2025-03-16T22:44:38 1742165078

I believe the post you are replying to was referring to the fact that you could take actions in that unsafe block that would compromise the guarantees of rust; eg you could do something silly, leave the unsafe block, then hit an “impossible” condition later in the program.

A simple example might be modifying a const value deep down in some class, where it only becomes apparent later in the program’s execution. Hence their analogy of the wet dog in a clean room - whatever beliefs you have about the structure of memory in your entire program, and guaranteed by the compiler, could have been undone by a rogue unsafe.

onnimonni · 2025-03-16T22:32:18 1742164338

Would someone with more experience be able to explain to me why can't these operations be "safe"? What is blocking rust from producing the same machine code in a "safe" way?

NobodyNada · 2025-03-16T22:46:23 1742165183

Rust's raw pointers are more-or-less equivalent to C pointers, with many of the same types of potential problems like dangling pointers or out-of-bounds access. Rust's references are the "safe" version of doing pointer operations; raw pointers exist so that you can express patterns that the borrow checker can't prove are sound.

Rust encourages using unsafe to "teach" the language new design patterns and data structures; and uses this heavily in its standard library. For example, the Vec type is a wrapper around a raw pointer, length, and capacity; and exposes a safe interface allowing you to create, manipulate, and access vectors with no risk of pointer math going wrong -- assuming the people who implemented the unsafe code inside of Vec didn't make a mistake, the external, safe interface is guaranteed to be sound no matter what external code does.

Think of unsafe not as "this code is unsafe", but as "I've proven this code to be safe, and the borrow checker can rely on it to prove the safety of the rest of my program."

throwaway2037 · 2025-03-17T04:54:00 1742187240

Why does Vec need to have any unsafe code? If you respond "speed"... then I will scratch my chin.

    > For example, the Vec type is a wrapper around a raw pointer, length, and capacity; and exposes a safe interface allowing you to create, manipulate, and access vectors with no risk of pointer math going wrong -- assuming the people who implemented the unsafe code inside of Vec didn't make a mistake, the external, safe interface is guaranteed to be sound no matter what external code does.

I'm sure you already know this, but you can do exactly the same in C by using an opaque pointer to protect the data structure. Then you write a bunch of functions that operate on the opaque pointer. You can use assert() to protect against unreasonable inputs.

NobodyNada · 2025-03-17T06:16:44 1742192204

Rust doesn't have compiler-magic support for anything like a vector. The language has syntax for fixed-sized arrays on the stack, and it supports references to variable-length slices; but it has no magic for constructing variable-length slices (e.g. C++'s `new[]` operator). In fact, the compiler doesn't really "know" about the heap at all.

Instead, all that functionality is written as Rust code in the standard library, such as Vec. This is what I mean by using unsafe code to "teach" the borrow checker: the language itself doesn't have any notion of growable arrays, so you use unsafe to define its semantics and interface, and now the borrow checker understands growable arrays. The alternative would be to make growable arrays some kind of compiler magic, but that's both harder to implement correctly and not generalizable.

> you can do exactly the same in C by using an opaque pointer to protect the data structure. Then you write a bunch of functions that operate on the opaque pointer. You can use assert() to protect against unreasonable inputs.

That's true and that's a great design pattern in C as well. But there are some crucial differences:

- Rust has no undefined behavior outside of unsafe blocks. This means you only need to audit unsafe blocks (and any invariants they assume) to be sure your program is UB-free. C does not have this property even if you code defensively at interface boundaries.

- In Rust, most of the invariants can be checked at compile time; the need for runtime asserts is less than in C.

- C provides no way to defend against dangling pointers without additional tooling & runtime overhead. For instance, if I write a dynamic vector and get a pointer to the element, there's no way to prevent me from using that pointer after I've freed the vector, or appended an element causing the container to get reallocated elsewhere.

Rust isn't some kind of silver bullet where you feed it C-like code and out comes memory safety. It's also not some kind of high-overhead garbage collected language where you have to write unsafe whenever you care about performance. Rather, Rust's philosophy is to allow you to define fundamental operations out of small encapsulated unsafe building blocks, and its magic is in being able to prove that the composition of these operations is safe, given the soundness of the individual components.

The stdlib provides enough of these building blocks for almost everything you need to do. Unsafe code in library/systems code is rare and used to teach the language of new patterns or data structures that can't be expressed solely in terms of the types exposed by the stdlib. Unsafe in application-level code is virtually never necessary.

adgjlsfhk1 · 2025-03-16T22:47:09 1742165229

often the unsafe code is at the edges of the type system. e.g. sometimes the proof of safety is that someone read the source code of the c library that you are calling out to. it's not useful to think of machine code as safe or unsafe. safety often refers to whether the types of your data match the lifetime dataflow.

vlovich123 · 2025-03-16T22:44:37 1742165077

Those specific functions are compiler builtin vector intrinsics. The main reason is that they can easily read past ends of arrays and have type safety and aliasing issues.

By the way, the rust compiler does generate such code because under the hood LLVM runs an autovectorizer when you turn on optimizations. However, for the autovectorizer to do a good job you have to write code in a very special way and you have no way of controlling whether or not it kicked in and once it did that it did a good job.

There’s work on creating safe abstractions (that also transparently scale to the appropriate vector instruction), but progress on that has felt slow to me personally and it’s not available outside nightly currently.

throwaway2037 · 2025-03-17T04:55:07 1742187307

    > However, for the autovectorizer to do a good job you have to write code in a very special way

Can you give an example of this "very special way"?

saagarjha · 2025-03-17T09:41:29 1742204489

For example many autovectorizers get upset if you put control flow in your loop

pclmulqdq · 2025-03-16T21:03:35 1742159015

The way I have heard it described that I think is a bit more succinct is "unsafe admits undefined behavior as though it was safe."