Assuming this is talking normal "safe" Rust, I think I disagree with this. The Rust analogue of a pointer in safe code is a reference, not a Rust pointer, and these can't null at all. You could use an Option<> of a reference, and Rust will internally use null to represent the None (empty) case, but an attept to use the option without checking for None will result in an error at compile time, not runtime. Yes you could convert that into a runtime error, but if it was an error condition for that variable to be None then (depending on the context) you could choose not to use an Option at all and then it would be a compile-time error at the call site to attempt to put None into it.
I don't think I understand what is meant by "type confusion". Surely this would also cause compile-time errors? Even C++ would give compile time errors for this unless you use a cast! (C, unlike C++, lets you implicitly convert from void* to any other pointer type so you don't need a cast to get pointer confusion.) Could someone think of an example of what might be meant here, and how it would cause a runtime error?
> I don't think I understand what is meant by "type confusion".
Accessing a memory location with one type as if it's another. In C or C++ it's usually because you accessed a union without checking a tag somewhere else.
One classic way this happens is in an interpreter where you have a big enum for all the different possible data types:
In rust and zig this code will produce a runtime error if you screw up, but in c it's easy to forget to check the tag and then you get UB. Similarly unwrap is checked in rust and zig but the equivalent in c - dereferencing a pointer that you are pretty sure is not null - is not.
Of course rust and zig both have support for c-style unions too, but they're not the first thing people reach for.
Thanks for answering. As I said in another comment (in reply to another tsimionescu who suspected that's what you meant), I maintain that this is a compile time error. Yes, one usage is to panic if the contents of the enum is not what you expect, in which case an unexpected type causes a runtime error. But that is the programmer choosing to deal with it way.
Fundamentally in the language, mismatching type is a compile time error. In many situations it's feasible to exhaustively match and deal with all possible cases of an enum (which then causes a compile time error if you try to reference one of the other inner types from within the "wrong" case). Where one particular case is expected, you can often ensure through the type system that this case is the only possibility by using that individual type directly rather than passing that enum around. In some situations its not feasible or it is feasible but the extra faff isn't worth the reward - but even then I don't think it justifies saying that the type confusion is detected in runtime in a comparison of languages.
> In rust and zig this code will produce a runtime error if you screw up, but in c it's easy to forget to check the tag and then you get UB.
As pointed out elsewhere, somewhat surprisingly, this is not UB in C, though it is in C++. In C it is merely unspecified behavior, but perfectly safe if intended.
Another (much smaller) detail is that signed overflow in C is undefined (and iirc GCC takes advantage of that when optimizing) but signed overflow in Rust is precisely defined to error in debug mode and wrap in release mode.
And unsigned overflow is also error-in-debug, overflow-in-release with Rust. If you want wrapping arithmetic, you have to ask for it. In C, unsigned overflow always wraps, so even though you can compile with overflow checks, there's no way to distinguish unsigned arithmetic that is supposed to vs not supposed to wrap.
> Java-style wrapping integers should never be the default, this is arguably even worse than C and C++’s UB-on-overflow which at least permits an implementation to trap.
I actually dislike Rust's "wrap in production" default, tbh. It strikes a strange balance: "we care about performance in release mode but we are also going to check and make sure this code does specific things on overflow".
Am important thing to remember is that even with wrapping Rust maintains memory safety
Still, I agree, I prefer consistent semantics between dev/prod as much as possible. Especially since there's methods to have checked/wrapping/etc arithmetic, so I can always go to those if I want the other behavior
I'm not sure I follow. In release mode it does the most performant thing by default. In debug mode and tests it catches potential problems with using this default.
Either way there are explicit methods for doing wrapped, checked or saturating operations in every mode.
Undefined behavior on overflow is always the most performant, followed closely by "the result is unspecified". Wrapping is less performant because it often forces the implementation to actually wrap if the behavior is observable, which might be extra work (e.g. i32) or interfere with loop optimizations.
Undefined behaviour isn't an option for Safe Rust, where the impossibility of invoking undefined behaviour is the whole point. Non-deterministic unspecified values aren't in keeping with the Safe Rust philosophy either.
I imagine throw-on-overflow is slower than wrap-on-overflow.
C# can be configured to throw an exception when an int is overflowed, [0] but this behaviour isn't the default and is rarely used (typically it uses wrap-on-overflow). I imagine it might have a significant performance impact, but I'm not sure.
In a language like SPARK Ada, intended for formal verification, you can insist upon a rigorous proof that unintended overflow can never occur. That isn't an option for Safe Rust, at least not without significant breakthroughs in tooling.
> I imagine it might have a significant performance impact,
It depends on the CPU: MIPS has(had if you consider MIPS dead) some integer operations which trapped in case of integer overflow AFAIK the 'trapping operations' were as fast as the non-trapping operations (when no overflow occurred of course).
Unfortunately even though RISC-V is MIPS successor, it doesn't have those trapping operations :-(
I must admit I don't understand why no modern CPU ISA has these trapping instructions: explicit checks have an 'instruction cache cost', so users won't enable them which reduces security..
Perhaps, but I often know via information not available to the compiler (it might be to a SPARK proof) that overflow won't happen and I don't want to check for it.
Programmers can't be trusted to do this kind of free-form reasoning correctly, as attested by the unending stream of security vulnerabilities arising from undefined behaviour in C and C++ codebases.
The push for safe languages is motivated by pragmatism, not theoretical purity.
The same argument applies to various other instances of undefined behaviour.
Integer division in Java never results in undefined behaviour. Divide-by-zero results in an exception, as does (Integer.MIN_VALUE / -1). In C/C++, both of those operations result in undefined behaviour. Modern JVMs have to generate some additional instructions to implement this [0], but I wonder if the real-world performance penalty is that substantial.
Another example is reading an uninitialized variable, which is undefined behaviour in C/C++. Does this footgun really improve performance, with modern compilers? I don't have a solid answer here but I suspect not.
> IMHO 'integer overflow is UB' should be scrapped
The C committee is opposed to radical change, they never want to step on the toes of exotic compilers and exotic hardware architectures, so I doubt they'll ever change it. I think it would be more realistic to ask the major compiler vendors to commit to never doing anything unsafe on signed integer overflow. I believe GCC has an 'opt-in' flag for this. I wonder what the performance cost is, if any. Perhaps it breaks some optimisations, but as you say, other fast languages like Rust seem to get by fine.
Zero UB in safe code is a core design constraint of Rust so I think it makes sense. Having integer overflow being implementation defined would sound more logical to me than the current behavior, but maybe there are arguments against it also.
To be clear, it actually is implementation defined. The rules are:
Integer overflow is a "program error." This case is handled by either "default" or "enabled" overflow checking:
* If checks are "enabled", then overflow must panic
* If checks are "default", then you'll get two's compliment wrapping
For implementations, if debug_assertions are enabled, then so must overflow checking be, unless the user specifically requests otherwise.
According to these rules, rustc today has "enabled" checking when debug_assertions is on, or when the user requests it via a flag. Otherwise, it leaves it to "default." If these checks ever become cheap enough, rustc may move to "enabled" in all cases by default. We'll see if that ever happens.
Introducing implementation-defined behaviour would undermine the advantages of Safe Rust. If I understand the goals of the Safe Rust project correctly, it aims to be a truly safe language, like Java or JavaScript. This means it must have no undefined behaviour, and beyond that, it should be as close to 'totally defined' as possible, without leaving program behaviour up to the particular platform, which would open the door to subtle bugs. (Concurrency is an exception here, as it really can't be made to be deterministic. Floating point might be another.)
An obvious example: does this code result in a divide-by-zero? (I'll use C syntax.)
int myInt = INT_MAX;
++myInt;
int myOtherInt = 1000 / myInt;
If signed overflow is permitted to result in myInt holding zero, then we have a divide-by-zero. Not the kind of thing that should be left up to the particular platform.
The behaviour of your Java code does not change when you move it from a 32-bit x86 machine to a 64-bit ARM machine. That's part of the appeal of Java. The same should be true of Safe Rust.
To put that another way: Safe Rust is remarkable because of its ambition: to be a truly safe language, while also having excellent real-world performance. It seems to be succeeding in doing both, without trading off on performance (Java, Go, C#) or safety (C++, and even Ada). If it starts compromising on either dimension, it becomes 'just another language'.
> "we care about performance in release mode but we are also going to check and make sure this code does specific things on overflow".
More like:
> "integer overflow checks are a painful/unacceptable performance degradation for some use-cases, but we still want to cough over-/under-flow bugs during testing"
Anyway luckily you can just enable integer overflow checks in release builds, which is not a uncommon setup in use-cases like server code.
So, how do professional C programmers deal with this in general? Do they manually check for `x > MAX_INT || x < MIN_INT` every time they want to do some arithmetic? Do they manually check the CPU overflow flag after an operation? Or something else?
(I only have limited C experience, and only for hobby projects)
Use unsigned integers, which have well-defined overflow behavior.
In my experience with C (which is biased towards some specific use cases), most numbers that are likely to overflow are things like sizes and counts, which cannot meaningfully be negative anyway, so you may as well use unsigned integers for them. Other cases really do require signed integers, but for most arithmetic operations you can 'just' convert to unsigned before doing the arithmetic and then convert the result back to signed.
(Some may disagree. For instance, the Google C++ style guide [1] specifically says not to "use unsigned types to say a number will never be negative", because they want the undefined overflow behavior of signed types, in order to allow the compiler to diagnose bugs and to avoid "imped[ing] optimization". I think this is mostly nonsense; the drawbacks far outweigh the benefits, and tools for detecting overflow like UBSan can be told to check unsigned overflow as well.)
That said, even if you avoid the UB cases, checking for overflow correctly is hard; I've found many security vulnerabilities caused by missing or incorrect overflow checks. __builtin_add_overflow and friends are very nice if you have them, though unergonomic. I wish a more ergonomic version were standardized as part of the language.
> (Some may disagree. For instance, the Google C++ style guide [1] specifically says not to "use unsigned types to say a number will never be negative", because they want the undefined overflow behavior of signed types, in order to allow the compiler to diagnose bugs and to avoid "imped[ing] optimization". I think this is mostly nonsense; the drawbacks far outweigh the benefits, and tools for detecting overflow like UBSan can be told to check unsigned overflow as well.)
Yes, I really disagree. unsigned integers mean one thing, which is "modular arithmetic". Unless you are in the very uncommon case of actually needing modular arithmetic, for instance, when implementing a crypto or hash algorithm, you want normal integers. As soon as you have anything that may have any chance of introducing a substraction somewhere, unsigned will cause bugs.
I don't know how many times I had to debug broken code such as
for(int i = 0; i < some_size - 1; i++) { ... }
because some_size was unsigned.
If you really want a "number that cannot be negative", you don't wan't some_size - 1 to silently give you UINT_MAX, you want a type that will give you a compile-time or at worst run-time error.
Lets say some_size is the length or size of something. So that a negative number does not make sense, as there will only ever be a positive length or size (Is that actually the case for arrays and such?). Then how would you express that fact? What is your proposal for such a type?
C does not provide anything specific for this (though some have argued that it should). Many projects use compiler builtins such as __builtin_add_overflow.
Either you need to restrict arithmetic to values which can’t overflow, or yes, you need to check for overflow manually. Note that you need to do those checks without actually triggering the overflow because once undefined behaviour is possible, all bets are off.
It varies a lot depending on your needs. Here are a few options.
Parts of postgres just heavily document the allowed range and expected domain of such functions (effectively encoding a type system into the comments and relying on a human to enforce it).
I've seen people drop down into assembly to prevent UB. Note that checking the overflow flag doesn't necessarily suffice; the problem happens during compilation -- e.g. if you write `x = (x-MIN_INT)+MAX_INT;` the compiler might be able to reason that the only value for x which wouldn't have UB is MIN_INT. Consequently the result must be MAX_INT, and the compiler can inline that constant anywhere else it's used without ever issuing the instructions so that you could check an overflow flag if you wanted to. If you do those calculations in assembly then you have a lot more freedom in that regard.
Some projects do manually check every arithmetic call (or more commonly they'll lean on the pre-processor to do that kind of busywork for them), or at least they'll do so out of some small, core kernel which is more heavily vetted and can't afford the overhead.
Well, first `x > MAX_INT || x < MIN_INT` makes little sense if x is an int, it will never be true. If adding a+b, you would check b > MAX_INT - a.
I would say that's very rare, only for special cases or defensive programming. Usually you either know/assert the operation will not overflow because the inputs are bounded, or you use wider types (e.g. use a int32_t when adding two int16_t).
This has nothing to do with C programming. If an arithmetic operation could overflow, you always have to add a check regardless of programming language. It's simply that a lot of high-level code doesn't care about such a level of correctness. Another exception are languages like Python that automatically upgrade your integers to arbitrary precision on overflow.
That said, most of the time you end up counting objects in the current address space. If you assume that there can exist no more than `SIZE_MAX` objects in memory, you can avoid many overflow checks.
My question does have to do with C programming. Other languages (not C++ of course) do not have signed overflow considered an undefined behaviour. My question is specifically about this.
The point is that integer overflow practically always indicates a programming error. How a certain languages handles integer overflows is secondary. These overflows shouldn't happen in the first place. Making overflows a well-defined operation actually hides programming errors.
With regard to C compilers, there are a few cases where a compiler performs optimizations because it assumes signed integer overflow cannot happen. This is bad but typically, compiled C behaves like the underlying platform which means signed integers wrap around. With GCC, you can enforce this behavior and make signed integer overflow a defined operation with `-fwrapv`. You can also compile your code with UBSan to get runtime checks during testing. UBSan can also check for unsigned integer overflow which is defined behavior in C. So with modern C compilers, the situation is basically the same as with Rust, Zig or other safer languages.
Zig has a nullable pointer type, which will have null checks. But only in ReleaseSafe build mode, and it's trivial to construct a non nullable pointer that is null or uninitialised.
Depends on the exact guarantees e.g. technically it's trivial to construct a nullable reference in Rust, but it's also `unsafe` and flagrantly UB so...
> I don't think I understand what is meant by "type confusion". Surely this would also cause compile-time errors? Even C++ would give compile time errors for this unless you use a cast! (C, unlike C++, lets you implicitly convert from void* to any other pointer type so you don't need a cast to get pointer confusion.) Could someone think of an example of what might be meant here, and how it would cause a runtime error?
Given the foot-notes, I think they are referring to unions - probably writing to one union member but reading through another (e.g. uni.intVariant = 19; float a = uni.floatVariant).
I don't personally know how Rust and Zig handle this,I believe it is UB in C.
> I think they are referring to unions - probably writing to one union member but reading through another
Thanks, you could be right. In that case, I think it ought to be listed as a compile-time rather than a runtime error too (just as I already argued for null pointer dereference).
It depends a bit on use of course: if you have a function that takes an enum and considers all cases using pattern matching, then the compiler will stop you from accessing the wrong member within a given case. If you pass an enum to a function that expects one particular case to be active, then yes you can convert the compile time error into a run time one, but this is your own choice, not something that is naturally a run time error.
It's actually defined as doing a type pun on the bit representation, interestingly. (The history of this specific behavior is somewhat complicated–it wasn't always clear-cut what this did.)
[Edit: this isn't true] That is definitely how it is used, and it works on the major compilers. But, surprisingly, this is indeed undefined behaviour according to the standard. There was an infamous rant by Linus [1] (more infamous for the language than the subject) about how the kernel can and should continue to use unions for type punning. He noted that even though the C standard doesn't specify a behaviour for that usage, gcc does (so long as you don't use a particular compiler switch mentioned in that discussion), and went on to say "The standard simply is not important, when it is in direct conflict with reality and reliable code generation."
EDIT: Having looked into this more I think I was getting confused with C++ (where the only supported way to type pun is to use memcpy or similar, which despite the name might not have to actually copy the bytes at runtime). Here [2] is a StackOverflow answer (/discussion) on the matter.
> If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
This verbiage has existed in a footnote of the standard since a defect report was filed against C99: http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm. This happens to be one of the few instances where Linus is right about the C standard, but without knowing why ;)
Does the Linux kernel still only support compilation using gcc? That’s about the only situation in which the standard could be considered “not important”. I wonder how many other C projects are in the same situation and only support the use of a single compiler?
> Does the Linux kernel still only support compilation using gcc?
There was an effort to build the kernel with Clang [1], I'm not sure about its current status. It helps that Clang tracks GCC's features quite closely, including some of its idiosyncracies.
> but an attept to use the option without checking for None will result in an error at compile time, not runtime. Yes you could convert that into a runtime error, but if it was an error condition for that variable to be None then (depending on the context) you could choose not to use an Option at all and then it would be a compile-time error at the call site to attempt to put None into it.
This wording still suggests that it is a null pointer error reference, which is technically u.b., but in practice on any modern operating system a segmentation fault, but that is not what happens in Rust.
Rather, an optional reference cannot be dereferenced at all ere it be converted to an actual reference, typically with appropriate guards for the None-case, it is true that there is a way to convert an optional value of type `T` to type `T` without handling the None-case, but that is simply a trivial library function that handles the None-case by panicking the thread. The None-case must technically always be handled in some way in safe code.
The distinction is rather meaningful, I would say, as it is not the null pointer dereference stage where the error occurs, but the panicking of the thread of attempting to convert a None-case, which not only happens before that point, but provides for far cleaner debug information.
In safe Rust, one simply cannot dereference an optional reference, one can only coerce it to a reference, with the understanding that this operation panics the thread in the None-case. — an optional reference in Rust is not a “reference” at all as it is in other languages with special support for it.
I don't understand the practical consequence, it will make your program panic and therefore stop which is order of magnitude worse than throwing a null pointer exception, can you imagine your long running application (e.g a server) stopping in production..
The main difference[1] between a panic and undefined behaviour, including a null pointer access in C/C++, is that undefined behaviour might not crash. It's tempting to equate "undefined behaviour" with SIGSEGV but it doesn't have to be so. The undefined behaviour is even allowed to travel back in time to an earlier line of code, so long as the compiler is able to prove that it would have eventually invoked that undefined behaviour (e.g. it can prove that it would later dereference that null pointer).
So however bad it might be for your production server to crash, imagine how much worse it might be for it to appear to continue working while it actually corrupts memory and database entries, or an attacker uses it as an exploit to read other users' information, or whatever else. At least when it crashes you can detect that with your health monitoring and potentially start it back up (if the same problem doesn't cause it to immediately crash again in a loop, of course).
[1] A secondary difference is that it is possible to catch Rust panics in an analogous way to catching C++ exceptions. This isn't encouraged and might not even work (if they've been turned into calls to abort() at compile time). The fact that they're not undefined behaviour is the big one.
Well I did link to a blog post that explains this, so I feel like it's sort of on you at this point. But I guess the short answer is that in Rust you know exactly what 'unwrap' will do on a None, and in C/C++ you can't.
> can you imagine your long running application (e.g a server) stopping in production..
Yes. It happens all the time, and in fact it is inevitable. Far better than the program misbehaving, which is what undefined behavior leads to. In fact if you're building a serious, production service, you might want to skip panicking altogether and just kill the process.
Java's NPE is not at all the same thing as the undefined behavior of dereferencing a null pointer in C (or Zig in release mode). In fact its the same exact thing as calling unwrap. Your Java server's NPE is not bringing down the entire server because the exception is caught somewhere. The same can be done in Rust (and is, by many frameworks), where panics are caught before they crash the entire application.
My understanding is that Zig doesn't allow pointers to be null in the first place (regardless of release mode) unless you're 1) manually creating a pointer from an integer or 2) interfacing with C, and in both of those cases all bets are already off anyway (as they would be in Rust). The only "supported" options outside of that would be a non-null pointer or None.
you can likely use SEH/Signals in zig to catch the panic and resume.
In C++, you'll be lucky if you throw a nullptr exception. It's undefined behaviour, and it's often exploited by compilers for optimization. Here's a super simple example [0] showing how the compiler makes assumptions and generates a very unexpected result.
Almost no code can correctly reason about a stray null pointer, and usually they result in strange auxiliary crashes or data corruption. Loudly crashing is often the best choice, even in production.
If my code derefrences null, I want it to panic even in production. Assuming your server will never fail is very dangerous. oom killer can just kill your process for unrelated reasons to your server.
There's a difference between a value being null, and a pointer being null. While Rust disallows both, it handles the cases differently; for the latter, a Rust program containing only safe Rust cannot create "null" references. However, for a "nullable" type, Rust encourages using `Option`, and forces you to handle the `null`/`None` case via a check.
Here's the score for tokio without tests and examples:
$ find . -name '*.rs' | grep -v -e test -e example | xargs -n1 sed '/mod test/q' | grep -v '^\s*//' | grep -cF -e '.unwrap(' -e '.expect('
224
I see a lot of unwrapping of locks. IIRC that only fails if another thread crashed while holding the lock, in which case crashing the current thread is often unobjectionable.
You are right. I should not have even picked two random projects. I merely just wished to say that most Rust projects I have encountered contained a lot of unwrap.
In addition to what Ygg2 mentioned (tests and unwrap_or), Rust is still (slowly) moving towards support of `Never` type that auto-erases impossible branches. Without `Never`, if you statically know that your function always returns `Option::Some`/`Result::Ok`, you have to use `.unwrap()` or `.expect("reason")`. This is just you doing the obvious work for the compiler that's not smart enough yet to figure it out by itself.
This really should get more attention. Safe null access in language like typescript or Kotlin through '?' syntax can still be abused through ('!!' ) but the incentive to abuse it is much lower than in rust because the safe syntax is just as short or shorter (in Kotlin)
This is the opposite of reality: in Rust, the safe syntax `?` exists and there is no short syntax for the panicking form (`.unwrap()`). Users are not incentivized to unwrap in Rust, users are as disincitivized as possible to unwrap.
The uses of unwrap in the projects the GP cited are overwhelmingly in tests and examples, which are not expected to handle errors in the same way as application code. The remainder are mainly lock poisoning unwrapping, which is a completely endorsed idiom.
For errors the ? operator is both correct and easy most of the time.
But for options it's trickier. Unless the surrounding function is structured just right, the equivalent to Typescript's and Kotlin's ? operator is .map()/.and_then(), and that's pretty ugly. .unwrap() is easier.
Those are non mutually exclusive statements, it can be officially discouraged yet abused in practice because it often is for the developer the path of least resistance.
One thing that's important to note is that Zig in general enforces much more correctness than C and also has much better stdlib APIs. IME many memory corruption issues in C (and C++) are actually "secondary effects" of C's and C++'s general "sloppiness" and bad stdlib APIs.
One thing I found where Zig is currently worse than C compilers: returning a pointer to a stack variable generates a warning in "modern" C and C++ compilers, while Zig lets this slip through. I hope that "trivial" things like this will be fixed on the way to 1.0
I ran into this in Zig and I was surprised that it wasn't caught. Since I'm newer to lower-level programming I didn't automatically think to look for this and I was lost for a day or two
One thing that's important to emphasise: sound safety, i.e. using a safe language, is no one's goal; rather, it is a means to end. What people want is correct programs. The question is, then, does a language help write correct programs?
Ensuring safety from important classes of bugs with sound guarantees is one way to help write correct programs, and both Zig and Rust use it; ensuring safety from from important classes of bugs with sound guarantees based on runtime checks is another way, and both Zig and Rust do it, too; a simple language that's easy to understand and analyse is another very important way to help write correct programs, and both Zig and Rust try to be simpler than their predecessors; making it easy to write tests and run them frequently is another way to get more correct programs that both languages try to employ. Both languages drastically differ in the use of those techniques from either C or C++ because they are both languages that put a very strong emphasise writing correct programs, but they also differ a lot from each other in how they balance those techniques.
It is impossible to tell without careful empirical research which helps write correct programs more than the other, and it is also possible that different people find it easier to write correct programs in either Zig or Rust. Rust certainly provides stronger guarantees that prevent temporal memory bugs than Zig, so let's assume Rust programs will contain zero, and a Zig program will contain more than zero but much less than C or C++. But that delta is insufficient to determine that Rust's balance of techniques reduces more bugs overall.
Also, Zig already has decent checks for use-after-free, and they'll get better, and not having uninitialised memory is also very easy to do (and verify) in Zig, despite there not being any checks. Even if, like other runtime checks, it is turned off in production, it still helps catch errors in that category.
At first order not even that; what people want is programs that behave correctly (or correctly enough for their purposes, which may not be very correct at all) for their particular inputs and execution environment.
Conversely those of us who want the industry to advance the state of the art generally don't want to just produce correct programs at a particular point in time, but programs whose correctness can be easily maintained even as implementations and requirements change. More than that, we want to produce libraries and frameworks that will lead as-yet-unknown programs to be correct.
> what people want is programs that behave correctly (or correctly enough for their purposes, which may not be very correct at all) for their particular inputs and execution environment.
Ah, yes. This raises an interesting philosophical question with real ramifications for software quality assurance: is a bug in the algorithm that never manifests in the system really a bug? Something like that happened in two well-used pieces of code: There was a bug in the TimSort algorithm used in both Java and Python, whose probability of actual failure is similar to the probability of failure due to a bit flip caused by cosmic rays. Because hardware can only be correct with probability, no running system can be soundly verified, i.e. with certainty, anyway, so while the correctness of algorithms can be absolute, the correctness of system cannot. And since soundness has a big cost in verification, many in software correctness research now focus on unsound techniques that are cheaper.
> Conversely those of us who want the industry to advance the state of the art generally don't want to just produce correct programs at a particular point in time, but programs whose correctness can be easily maintained even as implementations and requirements change. More than that, we want to produce libraries and frameworks that will lead as-yet-unknown programs to be correct.
True, but that is not a winning argument for soundness. The cost of soundness manifests even at maintenance. It's therefore an equally strong argument that a language that compiles quickly and more easily allows running, say, concolic tests, mutation tests etc., serves that goal, too.
A language that makes code reviews easier also works toward that goal of maintaining program correctness over time. The point is, there are many different paths to correctness, all of them state-of-the-art yet are often in conflict with one another, and we don't have any mechanism other than empirical research to compare them. For example, is it beneficial to increase soundness at the expense of making code reviews harder? Not only do we not have an answer to that question, it is likely that there is no general answer (I say it's likely because whatever empirical research we do have shows messy results with large variance).
So much this. And also keep in mind the way that we typically do code reviews, we typically are looking at github diffs. So if you are in a situation where changing code is ill-composable, for example, if something looks safe in place A and something looks safe in place B but when you put them together it's not unsafe... Then you could be in deep trouble with the async way that we do reviews.
I've seen you make that argument a few times on Zig related discutions but I'm not sure I buy it. It essentially boils down to: simpler language => easier to reason about and build tools for => less bugs.
While the thought as merits, the empirical evidence we have indicates that yes it is possible to achieve good software with faulty languages with strong rules and tooling but it is certainly not as straightforward as you make it seem.
In the end though, for the Zig case I agree that the jury is still out. But if I was a betting man, my money would not be on it, even though personnaly prefer Zig.
You're also forgetting: simpler language/explicit code => faster build times. Zero-cost abstractions are only zero-cost in optimized builds and and complex optimization isn't free.
Whether static checking vs faster iteration time is more important depends entirely on the context, but rust isn't going to help you when you accidentally did front-face culling instead of back-face culling.
> but it is certainly not as straightforward as you make it seem.
I never claimed it is straightforward; it is anything but. As a practitioner and advocate of formal methods and verification, I've been following research in software correctness for many years (and have written much about it, e.g. https://pron.github.io/posts/correctness-and-complexity), I've come to realise how complex the problem is, and there's more we don't know than we know, and even the things we know are problems, we don't know what the best solution is, because often solutions carry with them more problems.
Nonetheless, there are certain principles. We know that we can eliminate certain bugs with compile time guarantees; we also know that code reviews catch many (many!) bugs, and so making them easier helps. But what if these two are in opposition? It's not easy to tell which wins in which circumstances.
> In the end though, for the Zig case I agree that the jury is still out.
True, but the jury is still out on Rust, too. In fact, for most languages. However, there is no clear argument that we should assume, a priori, that Rust results in more correct programs than Zig. Many such arguments in the past have failed to yield positive empirical results (e.g. https://youtu.be/ePCpq0AMyVk). In fact, given empirical research, the safest bet is to assume the null hypothesis -- that there is no difference. Out of an abundance of caution, I'll assume that languages whose designers place a strong emphasis on correctness might achieve it more easily than languages whose designers put no emphasis on it at all, but Zig and Rust are in the same category here. Both are designed with correctness as a primary goal. But as their design and means of achieving correctness is so different, I think it's impossible to make an educated guess as to which of them, if any, yields more correctness more easily.
If we want some bottom line, it is this: software correctness is so complex, and solutions are often so non-obvious (i.e. many work in theory but not in practice), that we cannot say anything with certainty until we have actual empirical results, and even then we need to be careful not to be careful not to extrapolate from one study to other circumstances with different conditions (i.e. that TypeScript seems to have fewer bugs than JavaScript does not seem to extrapolate to the general claim that typing always reduces bugs compared to no typing in the same amount or at all, when other languages are concerned).
Wow, thanks for that link. I only made it through the first part for the moment but it is an incredible read. You clearly thought about this more deeply and carefully than I did.
Edit: I'm not entirely sure how that came across so I want to explicitly say that this is not a dry ironic statement (communication is hard, and I am a poor writer).
> We know that we can eliminate certain bugs with compile time guarantees; we also know that code reviews catch many (many!) bugs, and so making them easier helps. But what if these two are in opposition? It's not easy to tell which wins in which circumstances.
I understand the argument, but I'm not sure on what basis you consider that Rust's type system harms code review. Do you have specific examples in mind? (And because the discussion is about Zig, this is a pretty strange argument to make, because Zig's ubiquitous usage of metaprogramming is in fact a hindrance to code review).
Rust is easily among the top five most complex programming languages ever created (it's in the good company of other low-level languages that follow a similar design philosophy, like C++ and Ada).
Calling Zig's comptime "metaprogramming" is a little misleading when compared to other low-level languages. It is used for the same purpose as metaprogramming in other low-level languages (like macros in C++ and Rust, or templates in C++), but doesn't have any quoting mechanism [1] and doesn't operate at any "higher-level." In fact, Zig's semantics would be unchanged if comptime were executed at runtime. It is more similar to meaprogramming in dynamic language with reflection, with the benefit that related "runtime" errors are actually reported at compile-time. So comptime doesn't increase Zig's complexity. It can be thought of as a pure optimisation.
[1]: Unlike metaprogramming in Rust or C++, Zig's comptime is referentially transparent, i.e. if two terms, x and y, have the same meaning, then, unlike in C++ or Rust, one cannot write a unit e in Zig, such that e(x) and e(y) have different meanings. So the metaprogramming features in C++/Rust are trickier than Zig's.
> Rust is easily among the top five most complex programming languages ever created
You said that already[1], this is unsubstantiated and you declined to answer to my rebuttal.
> So comptime doesn't increase Zig's complexity. It can be thought of as a pure optimisation.
I'll grant you that it doesn't increase Zig's implementation complexity and also have a smaller learning-curve cost than other mecanisms. But when reading a piece of Zig code, you constantly have to wonder at which time the given code is gonna run. And there's much, much, more comptime in use in any piece of Zig code, than you'll uncounter macros in Rust or C++. So yes, it adds its share of friction when reading Zig code.
> You said that already[1], this is unsubstantiated and you declined to answer to my rebuttal.
Sorry, didn't see your response. I can answer it in two ways, subjective and objective. The subjective is "I know it when I see it," which roughly corresponds to the difficulty in determining what an unfamiliar piece of code does as well as how many language rules I need to know to figure that out. The objective one is literally language complexity, i.e. the computational complexity of determining whether a string belongs is in the language or not (i.e. whether or not it is well-formed).[1]
> you constantly have to wonder at which time the given code is gonna run
You really don't. The semantics of Zig are the same as those of Zig', which would be the language that runs comptime at runtime. The whole point of comptime is that as far as semantics -- not performance -- is concerned, you do not have to care when code would run.
[1]: There's a complex theoretical caveat here, because I believe both Zig and Rust are undecidable. So we can exclude degenerate cases from Rust, and look at the complexity of Zig' , the language I introduce in the second paragraph, which is semantically the same as Zig.
- feature bloat: C++ is way more complex now than it was in 1990, because features where added on top of features. In that regard, the older a language gets, the more complex it becomes. C++ is the most cited example, but I think PHP is even worse in that regard: it's probably the one and only most feature bloated PL ever, probably because there is not even a standardization committee to add frictions to the feature additions process. By that metric, Rust is slowly becoming more complex every year, like every other language (but the growth of its complexity isn't particularly concerning compared to others, Go for instance has recently been on a much steeper track).
- platform fragmentation: when Internet Explorer was still a thing, JavaScript development was made incredibly complex by the huge implementations differences between browsers. Code that worked somewhere failed somewhere else more often than not, and you had to keep work-around for old versions or IE for years. IE is mostly dead, Safari is less shitty every year, and google killed Android Browser and replaced it with Chrome, so it's a much smaller issue than before, but problems remain.
- cultural factors: Haskellers love for obscure mathematical terms or the fetishism of OOP's design patterns in Java in the late-90 and 2000 are good examples of culturally-induced complexity.
- ecosystem churn: JavaScript between 2013 and 2018 or something, with new framework or libraries or tools replacing the old ones every six months, before getting replaced themselves in the following month was a massive source of complexity, fortunately it seems to have settled a bit and the churn rate is lower than before. In Rust's early days, when many useful features were still unstable and feature-gated in the nightly version of the compiler, this phenomenon also existed (though at a much smaller scale). By that metric, Rust's complexity decreased quite a bit since 1.0, as many libraries have been adopted as de facto standard way of solving a bunch of problems (a few domains remain prone to this though, like error handling helpers, and ECS for game engines apparently) and Rust is now roughly in the same situation as most languages.
- counter-intuitive semantics: c.f. pre-ES6 JavaScript, how `this` and `var` bindings worked, which was simply the opposite of what people wanted in 95% of the cases.
- obscure control flow: `with` statement in non “strict mode” JavaScript, languages relying on a lot of `goto`, or even languages with exceptions.
- too much responsibility: manual memory management in C (or Zig for that matter) which we now have significant evidence after half a century that no human is able to do it consistently right of the time.
- poor interactions between features: see C++, how modern features interact poorly with older (more C-like) ones.
Rust is less complex than many mainstream languages on a least one of these dimensions, and less complex than JavaScript on most of these…
> The objective one is literally language complexity, i.e. the computational complexity of determining whether a string belongs is in the language or not (i.e. whether or not it is well-formed).[1]
This is a stupid metric, because it confuses implementation complexity with user-facing complexity (brainfuck wins this benchmark, yet good luck building anything with it). But from a theoretical perspective, this is a fun one because there's not only one but two classes of indecidability involved:
First, with most language with type polymorphism, it is undecidable to know whether a given program will successfully compile. But there's also a second level: when a language has Undefined Behaviors, a program compiling successfully isn't enough: it can still be invalid, and whether or not it is valid is also undecidable. C is not in the former situation but is in the later, C++ and Zig are in both, safe Rust is in the first only, but unsafe Rust is also in both. So in that regard, safe Rust is strictly less complex than Zig, but the whole Rust is equivalent.
> You really don't. The semantics of Zig are the same as those of Zig', which would be the language that runs comptime at runtime. The whole point of comptime is that as far as semantics -- not performance -- is concerned, you do not have to care when code would run.
This argument is pretty similar to the Rust point of “when you get used to it, ownership doesn't adds any cognitive burden”, maybe when gaining enough familiarity with Zig you can gloss over it without hassle, but I'm clearly not in this case yet so you really better not assume that it's gonna be straightforward and instantaneous for everybody, it is not.
I don't think so, because I don't think I'm claiming what you think I am.
> Anyway, complexity can come from many factors:
I completely agree, but I'm only talking about language complexity, in the strict syntactic, linguistic sense. I am not saying that all things considered, Rust makes maintaining programs harder than other languages -- nobody knows that until we have some empirical study -- but linguistic complexity is one very prominent property of Rust, as is, say, the memory safety of safe Rust, which, similarly, does not mean that Rust programs are overall safer than those written in, say, Zig, when all things are considered, because correctness also has many contributors, not just sound syntactic guarantees. There, too, only empirical study can settle the issue.
But you can't have it both ways, focusing on one specific piece when it comes to correctness yet insist on looking at the full picture only when it comes to complexity. All you can say is that, linguistically/syntactically, Rust offers some sound guarantees re memory safety and that it is complex, and that overall, both subjects are complex, involve many aspects, and require empirical study to make any definitive claim about.
> This is a stupid metric, because it confuses implementation complexity with user-facing complexity
It is obviously not stupid because it is commonly and usefully employed in computer science. But as with any precise definitions, it focuses on some aspects and not others. It captures the intrinsic difficulty of answering a question about a program. You are correct that it does not take into account human ergonomics and psychological aspects, but it is one more useful metric, even if not comprehensive.
> This argument is pretty similar to the Rust point of “when you get used to it, ownership doesn't adds any cognitive burden”,
Absolutely not (and, BTW, I was not referring just to ownership and lifetime when I spoke of Rust's complexity). It is a very precise and well defined property of Zig. The semantics of a Zig program, i.e. what it does in terms of what action it computes, is completely independent of comptime. It is not an ergonomic or psychological argument. comptime does not change the meaning of anything, and not only do you not need to figure out what happens at compile time and what happens at runtime -- unless you want to reason about efficiency -- but that knowledge contributes nothing. It's a meaningless distinction when it comes to semantics. It's a very powerful, well thought-out, theoretical and practical aspect of Zig's design.
> but I'm only talking about language complexity, in the strict syntactic, linguistic sense.
Then again, on the strict syntactic sense, Rust is even less complex than C, because of the “most vexing parse” issue. If you wanted a rigorous analysis of the syntactic complexity, you could attempt to measure how difficult it is to write a lexer and a parser for every popular languages, and see how Rust performs. But given that the language grammar has been designed with parsing complexity in mind and have benefited from the hindsight of others before it, you'd be terribly disappointed.
From this discussion, and many of your previous interventions on this forum, it's pretty clear, even though the reason isn't, that you have developed a resentment towards Rust and you can't help bashing it.
Rust isn't a silver bullet, it has a fairly tough learning curve and as it tries to push the frontier of system programming language forward, it will take a few decisions that will ultimately be regarded as dead ends, and I have no doubt that future languages will avoids these pitfalls and provide improvement over the state of the art.
In the meantime, spreading your hate with unsubstantiated judgements like “Rust is one of the 5 most complex programming language ever” or “Rust harm code reviews” isn't really constructive for anyone.
Zig is a cool motorbike, Rust is a SUV. Arguing that your bike can indeed be safer than a SUV because you have more visibility and agility to avoid the danger is beyond childish.
Super easy cross compilation and incredible development velocity on small-medium projects are super cool features of Zig, and Rust can't beat that. No need to downplay the importance of Rust for the software industry (and as a friendly reminder, Rust is making its way to the Linux kernel, with the approval of Linus because unlike C++ or Ada isn't too complex to his taste ;).
Perhaps this may disappoint you, but I -- like many and perhaps most developers -- don't get such emotional responses, positive or negative to any programming language [1], which might appear as resentment to the emotionally attached. I am very impressed with some aspects of Rust, less impressed with others, and overall, my feelings toward it overall are shaped just like yours: by personal aesthetics. I don't find Rust's aesthetics very appealing and so Rust isn't my cup of tea (although I would't resent working in it because I'm not emotional toward languages and I currently program mostly in a language whose aesthetics I like even less than Rust's [2]), while you find them appealing and so you do like Rust. It's all just a matter of taste, and I fully accept that not everyone shares mine. I think your approach is too coloured by emotion, and is therefore unconstructive. You're a zealot, and you project that attitude on others, so "unconvinced" appears to you as a personal attack, and scepticism or dislike seems to you like bashing.
> Rust is even less complex than C, because of the “most vexing parse” issue. If you wanted a rigorous analysis of the syntactic complexity, you could attempt to measure how difficult it is to write a lexer and a parser for every popular languages, and see how Rust performs
No. The complexity of a formal language, like that of any set, is the computational complexity of deciding whether a string is in the language (so, including type-checking), not as the complexity of the parsing phase (https://en.wikipedia.org/wiki/Computational_complexity_theor...). I'm not saying this is the most useful way to talk about language complexity in this context (and caveats are needed, anyway, to make a more fine distinction between languages), but that is certainly one well-known way to talk about the intrinsic complexity of a language.
> Zig is a cool motorbike, Rust is a SUV. Arguing that your bike can indeed be safer than a SUV because you have more visibility and agility to avoid the danger is beyond childish.
It is beyond childish to make such inane statements about software correctness when you're clearly not very familiar with the subject, and are drawn to arguments like "more correctness => more soundness". The effect of language design on correctness is a complex subject with mostly unsatisfying answers, and even in software verification research, the debate over the value of soundness is far from settled (and not currently leaning toward more soundness). An equally inane statement would be, "Zig is like a modern aeroplane, relying on multiple levels of safety, some mechanical and some human, while Rust is like an old train that breaks down and kills everyone once there's a problem with the tracks." If we've learned anything about software correctness in the past decade it's that there is not much we can assume in advance, and that we don't really know one best way to improve it. It is true that some researchers think that the best answer to any correctness issue is more soundness in the language, but not only is this not a consensus opinion, I doubt it's even a majority opinion.
[1]: I would say I'm a "language sceptic." I'm generally sceptical toward any empirical claim about the bottom-line effectiveness of linguistic features without empirical support, and overall think that whatever empirical studies we do have show little impact overall to language design (comparing "reasonable" alternatives, at least), certainly compared to what all language fans claim. I would never, say, make a definitive claim like, "Zig yields more correct programs than Rust", or "Rust yields more correct programs than Zig," without clear empirical support (and my guess based on prior results would be that they're about the same).
> I think your approach is too coloured by emotion, and is therefore unconstructive.
Your little “I'm a rational agent, you are too emotional” is pretty cute. But it would work better if your whole attitude in this thread didn't contradicted it, don't you think? “Rust is among the five most complex language” is not a rational argument, it's a personal feeling. Why you feel the need to spread your feelings over the internet while pretending you're not an “emotional” person is quite intriguing. If you want to look more like a rational person (no human really is), try to keep as much personal and unsubstantiated judgement out of your writings.
“Rust marketing makes safety claims that we should not take at face values” is alright, “Rust is one of the five most complex language ever” doesn't pass this test.
> is the computational complexity of deciding whether a string is in the language (so, including type-checking)
But for both Rust, C, Zig, and many others, it's undecidable, so by this definition of complexity, these language are definitely too complex (and equally so).[1]
In fact, your desperate attempt to save your initial argument about complexity, by narrowing it to a tiny technical corner makes me cringe a bit.
> I would never, say, make a definitive claim like, "Zig yields more correct programs than Rust", or "Rust yields more correct programs than Zig," without clear empirical support (and my guess based on prior results would be that they're about the same).
The technicality of what constitute a “definitive claim” in a human to human conversation is an interesting question, but in practice truisms like “we can't conclude whether Rust is safer than C” or “we can't conclude that Rust doesn't bring more bugs than Zig does” aren't neutral: what such claim attempts to do is insinuate the opposite. And when combined with gratuitous judgements like “Rust is one of the 5 most complex language ever”, it looks a lot like an attempt to deter people from using this language you don't like. (And now I have a clue on what the root cause of your bad feelings can be).
[1] And as I said earlier, in the case of Zig and unsafe Rust and it's not just about type-checking: because of UB, even after compilation whether the compiled binary is the binary of a valid program or not is also undecidable.
> But for both Rust, C, Zig, and many others, it's undecidable, so by this definition of complexity, these language are definitely too complex (and equally so)
My comments on this tried to be as careful and as precise as possible, and touched on this very issue.
> In fact, your desperate attempt to save your initial argument about complexity, by narrowing it to a tiny technical corner makes me cringe a bit.
Your emotional response here is so powerful that I think we're conversing on entirely different levels. I am sorry if my mild and careful statements have touched on something that you clearly see as essential to your identity.
> aren't neutral: what such claim attempts to do is insinuate the opposite
No. It is the most precise and careful statement that I can say, having followed the research for years and being a practitioner and advocate of formal formal methods. Anything I say, the more careful I try to be, it just seems to send you into further rage (and abuse). I think you're in a middle of a tantrum that's clouded your judgment, or perhaps, being a zealot, you cannot imagine any other attitude. Anyway, this conversation is making me very uncomfortable, as I sense you're in a very agitated emotional state, and I want no part of that.
> No. I think you're in a middle of a tantrum and that's clouded your judgment.
I have to admit, this is cool rage-quit punchline!
Edit:
> I am sorry if my mild and careful statements have touched on something that you clearly see as essential to your identity, but I simply see no way to discuss this subject with you.
> Anyway, this conversation is making me very uncomfortable, as I sense you're in a very agitated emotional state, and I want no part of that.
> You said that already[1], this is unsubstantiated and you declined to answer to my rebuttal.
How would you propose to measure the concept of "programming language complexity"? One metric could be "how difficult is it to write programs that do not contain certain classes of bugs"? By that metric, C is indeed incredibly complex. An alternate metric might be "how long does it take the average developer to learn the language well enough to write reasonably effective programs"?
In the absence of formal studies we just have to go by our intuition. Personally, I kinda hate the "I'm not smart enough to write C, so I write Haskell/Rust" argument. It comes across as incredibly condescending to me. What I can tell you from my experience is that I spent a month trying to learn Rust on nights and weekends, and by the end of that was able to write some extremely simple programs with a lot of effort. On the other hand I was making nontrivial contributions to Zig itself within a week of learning the language. So to me, Rust is much more complex than Zig.
I'm not a native English speaker, but as far as I know, the word complexity in English is pretty close to its meaning in French (where it comes from). From Wikipedia:
> Complexity characterises the behaviour of a system or model whose components interact in multiple ways and follow local rules, meaning there is no reasonable higher instruction to define the various possible interactions.
This is in fact the most antithetical possible description of Rust, which, thanks to its strong type system and compile-time rules, keep the interactions between different components or features as clear and specified as possible.
Yes Rust is hard to learn, but learning curve and complexity are orthogonal concerns.
From my point of view until Zig fixes the issues marked as none on the table, it adds very little value to existing alternatives.
I can already use C and C++ to suffer that in production, use VC++ static analysers to mitigate them, while languages like Ada, D, Rust, Nim, Swift take care of them not happening at all.
Zig's safety is not at all like C's (or even C++'s), even with static analysers and sanitisers. It is core to the language through things like slices and nullability types. What Zig brings to the table is an extremely powerful and expressive, yet remarkably simple language that places as much emphasis on correctness as Rust (albeit in a radically different way).
I don't think you can say that Zig is as correct as Rust given that memory-safety is not guaranteed by Zig (as evidenced by the article we're commenting on).
Whether a language is "correct" is meaningless (hopefully, most compilers/interpreters are reasonably correct); we're talking about which language makes it easier to write correct programs, and because both languages focus heavily on that goal yet take very different approaches to achieving it (the article only compares one), it is simply impossible to tell at this point which of those languages, if any, achieves that goal better than the other.
Except you are assuming that Zig will never change after 1.0 release.
C17 is also quite different from K&R C, specially in what optimizers do with UB.
The only way OS vendors have to fix issues that Zig also shares with C, C++ and Objective-C, as per the article, is to adopt hardware memory tagging, something already available on Solaris SPARC, Azure Sphere and iOS (yes PAC is a bit different), with ongoing work for ARM.
So I really don't see the benefit, but lets see how Zig 1.0 actually looks like, and I might be wrong by then.
I'm talking about Zig as it is now. If the design drastically changes, it would be different story. The UB comparison is not very relevant though because Zig, a language that takes correctness seriously, aims to make it very easy to not have any UB (at least with high probability) in its safe mode. Aside from being inherently safer than C, and arguably C++, even with all of those enhancements (not considering safe variants of C), Zig brings benefits other than safety. Like terrific cross compilation, fast builds, and a language that is extremely expressive yet very simple and easy to learn.
But I've long ago learned that language preference is mostly a matter of personal aesthetics, so all I can say is that I find Zig very appealing. Its design is certainly radical, and it doesn't feel like any other low-level language I've ever seen (it is about equidistant from C, C++, Rust, D, Nim, Ada; even when pushed I don't think I'd be able to say which of those Zig is most like, because it is so different). Like it or not, it offers a fresh vision on how low-level programming can be done.
By the way, Apple decided to just use "Safe C" for their iBoot firmware, but other than documentation references on Apple Developer, they are probably not going to share it with the world.
It's important to clarify that memory safety is only one aspect of writing safe, secure software.
To this list then I would also want to add and compare: OOM-safety under overload conditions, and fine-grained error handling safety, in particular because error handling tends to be one of the leading causes of faults in distributed systems [1].
To be fair, I was surprised that Rust did not have checked arithmetic on by default and that this needs to be turned on via compiler setting or linted against. The presence of integer overflow in a program can facilitate a whole range of exploits, even with memory safety.
> memory safety is only one aspect of writing safe, secure software
A good example of this is the SQLite documentation page "Why Is SQLite Coded In C". Among other things, it describes those memory safety issues as "the easy problems" compared to "the rather more difficult problem of computing a correct answer to an SQL statement".
Not all of our programs are like SQLite, of course (and not all of us mere mortals are like its developers). But I would certainly say that just because you've eliminated memory safety bugs doesn't mean you've eliminated all bugs. Depending on the program, you might not even have eliminated most bugs.
> We established that simply querying a database may not be as safe as you expect. Using our innovative techniques of Query Hijacking and Query Oriented Programming, we proved that memory corruption issues in SQLite can now be reliably exploited. As our permissions hierarchies become more segmented than ever, it is clear that we must rethink the boundaries of trusted/untrusted SQL input. To demonstrate these concepts, we achieved remote code execution on a password stealer backend running PHP7 and gained persistency with higher privileges on iOS. We believe that these are just a couple of use cases in the endless landscape of SQLite.
IIRC, QOP/QH though requires the somewhat unfortunate way that tables are laid out and initialized in SQLite, so it was my impression that QOP/QH are the highest order problem that needs to be patched; after all there are other types of vulns that aren't memory safety problems that are reachable with QOP/QH.
I found this part about Rust at the end particularly interesting:
> All that said, it is possible that SQLite might one day be recoded in Rust. Recoding SQLite in Go is unlikely since Go hates assert(). But Rust is a possibility. Some preconditions that must occur before SQLite is recoded in Rust include:
> Rust needs to mature a little more, stop changing so fast, and move further toward being old and boring.
> A. Rust needs to demonstrate that it can be used to create general-purpose libraries that are callable from all other programming languages.
> B. Rust needs to demonstrate that it can produce object code that works on obscure embedded devices, including devices that lack an operating system.
> C. Rust needs to pick up the necessary tooling that enables one to do 100% branch coverage testing of the compiled binaries.
> D. Rust needs a mechanism to recover gracefully from OOM errors.
> E. Rust needs to demonstrate that it can do the kinds of work that C does in SQLite without a significant speed penalty.
Sure, almost anything that uses numbers, e.g. to control access or parse data structures or do "something important", here are two off the top of my head:
* Rate limits for an OTP, which could be trivially reset through overflow.
* Parsing zip files, where the hostile content is self-referencing, to sneak in cyclic references for a DoS, or to change file extension type for code execution after bypassing a content filter, or to change output destination (e.g. as part of a symbolic link or directory traversal) to overwrite system files.
Unchecked arithmetic for me is by far one of the scariest exploit vectors because it's so easy to do, and one of the first things a trained attacker would look for.
Programs in any language are always exploitable, in some way or another. Memory safety is no guarantee that a program is "safe" let alone correct.
> Rust did not have checked arithmetic on by default
Except it does, in debug mode, which is the default compilation mode. And you don't have to tweak compiler settings or linters to get checked arithmetic in release: you simply call the checked arithmetic functions, which give the added benefit of giving you complete freedom in how to handle overflow.
Wherever you 'learned' this misinformation from, please stop considering it a trustworthy source :(
The league of being safe, without use after free, letting the developer focus on productivity, while providing the language features to do C style programming when required.
It is optional in the sense that D's GC is also optional. Technically true but you have to go out of the way to make it work for you, average libraries off the shelf cannot be easily utilized.
I don't have much experience with Zig, but one thing that stuck out to me was that I was able to build for RISC-V with a one-liner. I didn't have to change or do anything at all to make this happen. That's so cool.
In contrast, I have yet to be able to build any RISC-V binaries with Rust. It just doesn't work. Sure, I could see some potential things like writing a custom JSON to describe the env and maybe build using a cross-compilation toolchain. But after a certain amount of time and no answers, it was not worth my time anymore.
The recommended way to deal with use-after-free and double-free in the language is to do it in test. You can pretty trivially get "asan"-like behaviours out of the zig std library. A good demonstration is here: https://www.youtube.com/watch?v=4nVhByP-npU&t=3h12m
I kind of like this philosophy, because in a sly way it's a carrot to get you to write tests. Come for the memory safety, stay for the robustness.
As a bonus, the beginning 2 hours of the video is a fantastic and honest discussion about the role of emotional empathy in tech communities and tech employment (while also acknowledging that it is possible to be an asshole and deliver good tech).
True, UAF still kind of sucks in zig. My prediction is that we are going to eventually get some sort of formal verification engine for zig as a third party tool.
To be clear - you get this same UAF protection in ReleaseSafe mode by default with GeneralPurposeAllocator and PageAllocator. It's not just a test thing. The testing system just chooses some nice defaults for you.
thanks for the clarification! Also, you can arbitrarily compose GPA backended by other allocators that aren't pageallocator, in which case you might lose the early-exit segfault behavior.
This is as ridiculous as it is in C. You can't be sure you're testing all the user after free cases and if the code changes and you forget to update the test then you have no idea that you could have added a use after free. Tests rot over time.
The more I read about Zig I feel like it's made for C programmers stuck in the Stockholm Syndrome of C and don't want out. (Speaking as a former C programmer.)
It’s interesting to note and probably underappreciated how many of these safety issues are addressed simply by having a garbage collector — eliminating manual memory management prevents all of these kinds of bugs except for data races. Of course there are situations where you cannot afford a GC, but for how many programs is avoiding a GC worth all the additional language complexity? Preventing data races is no small thing, but this observation certainly suggests that the approach of GC + tasks & channels + a good race detector is more powerful than it is commonly given credit for — think about how much user-facing language complexity it replaces. This sounds like a pitch for Go, and to some extent it is, but Julia takes very much the same approach for the same reasons.
One other underappreciated point is how to deal with other resources besides memory.
Go has a GC, which helps with memory, but you're entirely on your own for files, sockets, database connections, mutexes, channels, etc.
The features that Rust uses for memory management are fully general-purpose, and also help you with safely handling all other kinds of resources.
Consider how many ways you can misbehave with a Mutex in Go that are all caught by rustc at compile-time. This has nothing to do with memory management, but the same thing that prevents use-after-free is what also prevents using a mutex-protected value after releasing the lock.
The power of a software is in composing things via powerful abstractions. When you write a useful library in a PL with GC, you automatically make it unavailable in all applications that can't use GC. What's worse, other libraries that could've built something around yours have to choose whether to limit their applicability to applications with GC or find another way.
But if you have a clever language that navigates this tradeoff and lets you build powerful zero-cost abstractions (C++, Rust, Zig), then it attracts significantly more talent that compounds.
That’s a fair point. Something we’re very interested in is exposing libraries written in Julia without the full runtime, ditching the JIT or the GC if you don’t need them. It would be easy to write a libm replacement in Julia, for example, without using either of those.
Regarding talent, it depends on the kind of talent you’re talking about. Yes, systems programmers like Rust. On the other hand, needing to deal with a strict borrow checker excludes a very large number of people with numerical computing, data science and machine learning expertise (not all, but definitely most). So it cuts both ways.
I think this is a roughly fair assesment, but I also think it's important to contextualize memory safety. Ultimately, the goal here is to produce /correct/ software. Memory safety is a subset of this, but there are other aspects to correctness as well.
I really like zig's approach of explicitness and fast iteration cycles. Fast compile times and the very flexible build system makes me hopeful for a really slick workflow for embedded development,where zig code can be used to deploy and test as well. For my own use I think it's a clear win.
On the hand, the amount of damage poorly architected zig code can cause is about as large as for poor c code. For typical enterprise code the rust compiler will make sure that many bad decisions will not even compile. There's still a risk of towering abstractions, but at least I could avoid spending as much time debugging hideous race conditions.
Just somewhat related to the article but I think one thing which is often misunderstood about rust's is it's borrow checker.
The borrow checker is not about memory safety but about aliasing guaranteed.
It just happen that combining this with deterministic destructors (/RAII) happend to enable reliable "automatic-manual-memory-management" (or however to callit).
And combining it with some clever auto traits (Send/Sync) happen to prevent data races (if no unsafe is used, like always).
But the benefits are not limited to just that. Not just memory-resource management but also other resource management profits from this design.
Similar while Send/Sync is about multi threaded data race prevention there are also problems in single threaded patterns which are quite similar to that e.g. "racing" between iterating a collection and changing it in the body of the iteration, and the aliasing guarantees make sure you don't have such problems either.
Similar rusts main pointer type (`&`) does not only provide compiler time non-null guarantees but also provides compiler time guarantees about how the data can be accessed (dereferencable, writeable etc.).
And then there is the choice to use the type system to prevent application logic bugs in many ways.
So the bullet points in the table miss many dimensions .
But then zig is still a grate language, but trying to convince people that it's good enough by telling them that not reusing allocations seem to not be the best way.
Instead look at arguments why people still use C today (not C++!) what they conceptually like about it and you might realize many of the parts still apply to Zig.
Honestly Zig seems to be a grate choice for webasm or similar sandboxed systems where the potential damage of use-after free or double frees can be massively reduced.
I think, it is very important, that people started to actually discuss and compare different approaches to safety, instead of just saying that since Rust is safer, we should throw out all of c/c++ code and rewrite everything in rust.
It is amusing to me because last time I was using C, the problems rust solves weren't the problems I had in C.
Deeply embedded code doesn't use malloc, doesn't use threading.
I could use a better type system, ala Ada, being able to say "this variable is of type distance in meters, this variable is of type time in milliseconds", that'd have cut the # of bugs by a huge amount.
But simple, unsexy, type system changes like that aren't what language designers are focused on.
Who here has never confused Milliseconds and Seconds when passing a variable around? Trivial for a compiler to catch with a half decent type system, but few modern languages bother to try.
Even when writing modern code in newer languages, I rarely directly use threads, and if I need to pass data between them 95% of the time I can get away just doing a deep copy to avoid the hassles of sharing data between threads!
Obviously Rust is meant solving different problems than the ones I face, I have friends who frequently write highly threaded code, but in my day to day, Rust doesn't offer much more safety.
(However, Zig does look super cool and interesting!)
Have you looked into the subset of D called "Better C"? I recently stumbled upon it and have been wanting to try it out. It seems to solve the exact same problem you're describing, though I don't know how good of a job it does at that.
Oh we had this long before Rust, and most of C++ usage in new applications was displaced by safer (among other things) languages.
I think the biggest thing was that university curriculums and mainstream app development platforms (like Microsoft) stopped pushing it as hard when the level of horror got past a certain point. It used to be pretty bad. Business apps being written using MS "Active Template Library" in C++ and then used as signed ActiveX plugins on IE6-only web pages etc.
Safety (memory and otherwise) isn't new, but during my CS curriculum, including a course on programming language theory, there was little/no mention of techniques to ensure safety in the space between C++ and Java. I probably would have pointed toward formal verification if someone said they needed safety guarantees in the absence of garbage collection and a potentially slow or bloated runtime.
Though I believe there were some languages with features to that end, at least research languages, they weren't that well represented. I think Rust's presence brought attention to the possibilities there, and an increasing number of people see the value of investigating and developing that niche.
Microsoft still is the main company pushing it hard (C++ use) despite all security reports, most likely due to how the Windows and Office teams don't accept anything else.
So basically you have the DevTools and Azure teams pushing for .NET, Java and other safer languages, while Azure Sphere has a C only SDK and WinUI/UWP push C++ above anything else, with some C++ only APIs.
> Temporal memory safety and data race safety. [...] Unique to rust. [...] add a significant amount of complexity to the language.
Rust seems to be a very complex language. Is all that complexity essential to providing memory safety without GC? Or would it be possible to have a significantly simpler language that is equally safe? A language that’s “safe & C-like” compared to Rust’s “safe & C++-like”?
You can write very simple Rust, it's worth trying it out if you haven't already.
I'm not a rust developer, but I've ported over a handful of python or golang projects to see how it works. I managed to write the code without understanding much about things like the borrow checker.
I'm certain my code is not as performant or elegant as it could be using some of the more complex tools and concepts in the language, but it is possible.
Technically true, but only really true if you don't use many dependencies. At some point you're going to use some dependency that uses async/await all over the place or really goes wild with generics and then it is definitely not simple.
Two examples:
* Heim (https://docs.rs/heim/0.0.11/heim/) is a great crate for getting system info, but it only uses async/await so you are thrown into that rather painful world even if you don't need it.
* Plotters (https://github.com/38/plotters) is a pretty great graph plotting library for Rust (the only one as far as I know), but they have definitely gone a bit overboard with the generics. Want to draw a scatter graph?
I tried simply calling `PointSeries::new()` and got a basically impossible-to-follow error about Rust not being able to infer the type `E` here:
"You can write <adjective> <programming language>" is an answer that usually get rebutted by pointing out that a) you need to read a lot more code than you write and b) other people might not write <adjective> code or c) other people might have a different definition of <adjective>.
It’s also true that team using languages with more feature then they need can just take the parts that they need. It’s not quite as ideal as having a language that’s perfectly suited to your use cases, but it works well enough.
For example, I’ve been writing JavaScript for 3+ years. I have yet to use the prototype chain directly, only through the use of the `class` keyword, and I’ve only reviewed code using it once. I hear C++ is similar in that teams use a slice of the available language features.
It’s all a matter of opinion but I actually find Rust a wonderful language to write in, given the right circumstances. Which usually means “without having to deal with lifetimes”.
I tried Go, I wanted generics and errors. I like C#, but not the ecosystem that comes along with it. And so on. So for me personally, Rust is a a valid choice even when performance isn’t a first concern.
Re-usability is a important feature: if you write a Rust library you can reuse it from most runtime-based language with “native modules” (or whatever they are called in the said language), exactly like a C library.
Because even slow Rust code might have better performance or memory usage characteristics than idiomatic code in these languages. Or because it still protects you from race conditions. Or because you like cargo more than, say, maven. Or because you want to learn the language.
I haven't read that article yet, but I wouldn't really call the simple Rust version idiomatic. Stdin::lines() incurs a string allocation for each line (fixing this takes three lines of source code), which can be quite significant. And garbage-collected languages will have faster allocation, so I'm not too surprised.
Of course, I know who the author of that code is. I just wanted to point out that it's not such a trivial comparison to make.
>Rust seems to be a very complex language. Is all that complexity essential to providing memory safety without GC?
The language features specific to memory safety i.e. the borrow checker, are essentially irreducible. It is also Rust's biggest piece of complexity, and the one that is hardest to learn. There is no simpler language inside Rust that has the same safety guarantees, unless you strip out other useful features (traits, async, etc.).
> There is no simpler language inside Rust that has the same safety guarantees
I’d argue there is: there’s reference counting. Rather than using references and fussing with lifetimes you could sprinkle Rc<> wherever it’s necessary. You’d take a performance hit but the code would be simpler to write.
>So there could be a simpler language with Rust’s safety guarantees if you were willing to strip out traits, async etc?
Well yes, there exists a hypothetical C + borrow-checker language. But that language wouldn't really be significantly simpler, because the borrow-checker is the largest contributor to Rust's complexity. The only things you would have taken away are the more well-understood features, as they already occur in other languages.
The C/C++ comparison doesn't really work, because there is (to my knowledge) no single C++ feature which makes up the majority of its complexity over C. You could strip out independent features of C++ one at a time to return to a simpler language. Rust doesn't have the same property.
I don't have enough Ada experience to compare, but Rust is IMHO much easier and less complex than C++. Rust is renowned to have a steep initial learning curve, but that curve doesn't climb nearly as high as C++'s.
Interesting comparison. Long term we badly need something to replace C (or at least minimize its usage drastically), so perfect should not be the enemy of good.
I hope something like Zig gets widespread adoption, including in embedded/IoT/automotive environments. Especially automotive. We're moving more and more life-and-death scenario-type tools into software.
We are, and I hope for life-and-death situations people are willing to work a little bit harder to get the extra protections Rust provides ... or much harder, and formally verify their code in which the language you use no longer matters as much.
Considering that Rust exists, I really don't hope that Zig gets much adoption, at least until the language improves a lot in some key aspects.
There definitely is a design space for a simpler language than Rust that is easier to write, but Zig is too far on the side of C and has lots of trivially introducable unsafety. It's an improvement over C , but imo not enough.
I have tried to like the language, but sadly having to think about types and lifetimes robs precious energy which should be devoted to thinking about business rules and what am I actually trying to achieve.
In some niches Rust is perfect, but in every language thread on HN there's often someone that suggests to use Rust whatever the use case. C, in that respect, is more flexible and gets out of the way much more, of course while being unsafer, but it's easier to keep your mind on the goal and not figuring out the best memory safe approach for this piece of logic.
Which is why I'm very excited for Zig. I don't want another C++. Give me safer C, thanks.
> I have tried to like the language, but sadly having to think about types and lifetimes robs precious energy which should be devoted to thinking about business rules and what am I actually trying to achieve.
I've always wanted a "shut up about memory safety for a while, just don't free anything, I want to find out if my code produces the right answer" mode in rustc.
This might just be me being naive, but if you have a business-rules heavy project, why not just use a garbage collected language? I can't think of many use cases where you need a systems programming non-GC language but also have to write tons of custom business logic.
Because, for example, a business-rules heavy project would also benefit from type safety and from a compiler that checks that return values are not ignored, that variables are defined correctly and not shadowed, and that errors are all handled. I'm not sure if there are many GC'ed languages that would do all that? These kinds of safety guarantees tend to come from compiled languages not GC'ed languages.
Beyond the correctness argument, also because the GC can really come back to bite you when you least expect, following the sudden "knee" of Little's Law. I've seen multi-minute pauses every few seconds even with V8's GC in production and it was not a pleasant experience. It cropped up, out of the blue, and in the end required a V8 core team member to advise and help comment out a few lines of C++ GC code that were overzealous.
In your first paragraph, you seem to be confusing interpreted languages and GC'd languages. Even Java has all of the features you've listed above, afaict.
> Because, for example, a business-rules heavy project would also benefit from type safety and from a compiler that checks that return values are not ignored, that variables are defined correctly and not shadowed, and that errors are all handled. I'm not sure if there are many GC'ed languages that would do all that? These kinds of safety guarantees tend to come from compiled languages not GC'ed languages.
GCed is orthogonal (as in: doesn't have anything to do) to type safety.
Java is GCed, so are Scala, Kotlin, C#, F#.
Even dynamic GCed languages towards the scripting side of things are moving to static typing: Typescript, Python Mypy, Ruby types (I forgot the name of the project).
Of course, but in the past these would tend to go hand in hand, and the context here is not only about type safety, but about checked behavior for ignored return values and for exhaustive switches, i.e. all syscall errors are handled, and the compiler (or interpreter) will crash at compile time or run time with an error if not. Do Scala, Kotlin, C#, F# have all those features? Do most Java versions also not allow integer overflow?
I'm responding really late because I didn't check my threads for a while, but in response to your first point, there's plenty of GC'd languages that can give you type safety and compiler checks. As others have pointed out, Java, TypeScript and Go are fairly solid compiled languages with GC. If you want more type safety, Scala, F#, Ocaml and Haskell all have extremely powerful type systems that I personally find very useful when working on complicated business logic (my personal favorite is Ocaml). In the more exotic space, Nim has a really cool cross between ARC and GC, and comes with dependent types for extra safety! GC is not perfect, and as you point out, a "stop the world" GC can cause more pain than is saved from not having to do manual memory management, but I think there's a lot of good work in the space that makes me think really hard about picking up a language that would require manual memory management.
Indeed I do, these days I spend most of my time on Elixir. But it's handy to have a lower level language that can compile statically for some complex sysadmin task. Go is fine, but a little too plain for my tastes.
Of course, but you don't need a lot of work to make the C compiler happy. The result might not be 100% safe, secure and mathematically proven, but sometimes you need to deliver, fast, not create the safest 1k lines of code on Earth.
To be honest I haven't used C in a long time, but I've been looking for a low level language that sparks as much joy as C does. Go, Rust ain't it, IMO.
In C you have to think about lifetimes (and to a lesser extent types) at specific times, typically after the "make it work" step of writing code. Rust on the other hand forces you to think of these things from the start and this inhibits solving the actual problem first.
Now there's an argument the front loading these decisions may be beneficial overall but I don't find that compelling either. After the "make it work" stage there's thinking about performance, comments, logging, etc and you usually need to think about lifetimes as they change at this point as well, so front loading it has only added work overall.
This description doesn't match my experience of writing Rust at all.
During the exploratory phase of my projects, I very rarely run into nontrivial lifetime issues, and when I do, I can just put whatever data into an Arc and then it just works. Most of my exploratory code is just objects that own their data.
Later in a project, while doing a bunch of refactoring to handle performance, logging, etc. , I have a much easier time letting the compiler tell me when I've made mistakes with a value's lifetime rather than trying to keep the whole program in my head for the duration of the refactor.
I don't spend a lot of time thinking about lifetime issues during either phase of my projects. Mostly I just write the same sort of thing that I'd have written in other languages, and the compiler tells me when I've made a mistake.
I totally agree that Rust isn't the right language for every domain.
But especially in all those domains where memory safety is an issue , in my view it is currently the best option.
Rust forces you to think about memory safet and ownership, which is hard to adjust to for many. But it does so for a good reason.
In C you also have to think about lifetimes all the time, but the compiler let's you do whatever you want , and the issues instead have to get fixed when bugs pop up, or with static analysis tooling, etc.
"I don't want to think about lifetimes" is exactly how we end up with vulnet and buggy software.
After the initial learning curve, Rust is a very productive language, exactly thanks to the powerful type system.
Like I said, I do wish for a simpler language that can provide similar guarantees, and I do think the design space is in reach, but Zig is (currently) not it.
If we are talking of domains where memory safety is an issue, then surely the category of memory safety must include OOM-safety? i.e. Safe handling of out-of-memory conditions under overload?
Which, relevantly, is something where Zig excels. OOM is just another ordinary error to be handled, and indeed is an error that can be leveraged on a much more fine-grained basis than an OS-level allocation. Being able to gracefully handle OOM conditions is a dramatic improvement over the average C codebase; enforcing that graceful handling, as Zig does by requiring callers to handle errors as part of the return type and by including allocation failures in that category, is a godsend.
And taking this further, since Zig's convention around allocators is for them to be an explicit argument of all functions needing to allocate, it's trivial to write tests specifically to validate correct behavior in OOM conditions. There's even a custom allocator in the standard library for exactly this purpose.
Yes, and I think the matrix should really include these aspects of Zig's safety to be a fair comparison, because otherwise it's like evaluating Rust's safety but without mentioning the borrow checker.
Lots of things that must be memory-safe are run on top of the Linux kernel, which doesn't give you the OOM-safety you're looking for because of overcommit+OOM killer.
Anyway, for Windows and other plateform where this is a reasonable goal, there is work in progress to add this to Rust. See this RFC[1] which has been merged and whose implementation progress can be followed here [2]
Lots of things are done that way but there is also plenty of software for which OOM safety is a critical component of memory safety. As you say, not every platform is Linux, and if Rust will be moving towards OOM safety as a global default and making this explicit throughout the std lib, then I think we are both in agreement.
I personally like panics and assertions, but the "safety" of the approach would still have to depend on the characteristics of the system. I do not agree that panic-on-OOM should be considered "safe" at a global level hidden within a std lib, where there is no knowledge of the target domain.
For example, if an attacker could arbitrarily inject overload to restart rate-limiting processes and then abuse this to trivially brute-force OTP logins.
The definition of safety with respect to a resource in general always needs to include the safety of the system as it crosses the threshold i.e. in this case into out-of-memory, so if a system claims memory safety, the first thing I would want to ask is, what about OOMs?
Is it? I'd say "crashing cleanly" is oxymoronic; "crashing" strongly implies a failure of the process to clean up after itself after doing so. Any sort of crash can wreak havoc on whatever non-atomic operations were in-flight.
> but sadly having to think about types and lifetimes robs precious energy which should be devoted to thinking about business rules and what am I actually trying to achieve.
That's a really strange argument, because as soon as you're not in a GCed language, you need to think about the lifetime of your objects. The big difference with Rust is that you can't make mistakes when doing so, because the compiler will catch it.
You don't have the mental burden that if you make a mistake everything will blow up and you can focus on your business rules instead.
Which is why Zig shipping with a functional C compiler with cross platform support is a brilliant idea. Just run `zig cc` and you don't even need `clang` around.
Kotlin has settled its future by marring with Android.
As Java evolves and Kotlin needs to cater to Mountain View masters, upgrading the Java file won't be enough as many modern features don't exist on ART.
Kotlin sealed classes are planned to be rendered as JVM sealed classes when JVM sealed classes go out of preview (probably Java 17), and the same is planned for mapping Kotlin value classes as Project Valhalla user-defined primitive types.
All these features obviously won't be available if you target Android, but they will still be there for you if you don't.
And it isn't ironic that devs will be forced to use KMM between JVM and ART, despite Kotlin's selling point of Java compatibility?
Faking JVM features on other platforms means that the performance is not the expected one when moving across them, and some surprises might happen when linking to libraries that use modern features.
> JetBrains went out of their way to make migrating to Kotlin as easy as possible: you can literally upgrade your Java project file by file.
Yes, offering a good path to switching (and conversely keeping the old voodoo part of the code no one wants to touch) is the way to go. And as I understand it, Zig offers this possibility as well.
The word "none" in this blog post is misleading, for the rows use after free, double free, and uninitialized memory. Criticism is of course welcome but let's make sure we get all the facts on the table so we're not arguing a straw man.
With regards to use after free and double free, this is solved in practice for heap allocations. The basic building block of heap allocation, page_allocator, uses a global, atomic, monotonically increasing address hint to mmap, ensuring that virtual address pages are not reused until the entire virtual address space has been exhausted. In practice, this is a very long time for 64-bit applications. The standard library GeneralPurposeAllocator in safe build modes follows a similar strategy for large allocations and for small allocations, does not re-use slots. Similarly, an ArenaAllocator backed by page_allocator does not re-use any virtual addresses.
This covers all the use cases of heap allocation, and so I think it’s worth making the safety table a bit more detailed to take this scenario into account. Additionally, as far as stack allocations go, there is a plan to do escape analysis and add this (optional) safety for stack allocations as well.
As far as initialized memory goes, zig forces you to initialize all variable declarations. So an uninitialized memory has the word undefined in it. And in safe build modes, this writes 0xaa bytes to the memory. This is not enough to be considered “safe” but I think it’s enough that the word “none” is not quite correct.
As for data races, this is an area where Rust completely crushes Zig in terms of safety, hands down.
I do want to note that the safety story in zig is [under active development](https://github.com/ziglang/zig/projects/3) and will be worth checking back in on in a year or two and see what has changed :)
Integer overflow is defined behaviour however, so we can provide library types which are checked. And since D's templates are very clean to define they are easy to use.
The null pointer thing is true however all the other mechanisms mentioned do help eliminate bumping into them, and there will be more on the way.
If you want to provide the null checks yourself D has Ada-style contract programming too.
WebAssembly has several flaws, one of them is lack of bounds checking for memory access inside of the same linear segment, so while it is sandboxed it cannot prevent that the data turns into garbage, and thus be open to attacks that change the outcome of public APIs based on internal memory state.
Yes, but when you put it like that, it feels like D makes use of a certain hardware protection unlike other languages. The article states "none" for those cases.
I think if Zig wanted to, it could introduce lightweight linear types using the concept of proof variables and interleaving from ATS. Since resource management is explicit in Zig anyways, there's not much additional overhead in "consuming" proof values to signal that you've dealt with a resource.
Not really rusts pointers are the `&`/`&mut` references which are compiler time proven to be not just non null but actually differentiable and potentially writable in the given context. Which are MUCH stronger guarantees then just "not null".
> 2. only when using tagged unions
Which in rust are the default types, unions didn't exist for quite a while and require the use of unsafe making them heavily discouraged to be used.
Besides that it's not that rust enums are tagged unions where you at runtime check a tag and then access them, or where you "panic/throw an exception" when you try to access the wrong type, but incoperated into the type system and language given quite a different experience to classical tagged unions.
Lastly type confusions applies to more then just "tagged union style access" but also subtype stile access in which case rust can use trait objects instead of sum types.
Besides that many of the ways listed zig can archive more safety (weather applicable or not) are also applicable to C. And some of the checks Zig do can be "somewhat" archived in C too by combining non standard compiler options and code analysis tools.
Don't get me wrong. Zig is a very interesting language and I would argue the spiritual successor of C in how it's designed.
Still I guess the main ways to add more safety (and similar) to languages like Zig (or C) is to compile them to WebAsm. The module isolation while still being able to call other functions without to much overhead which can be archived with WebAsm might lead to quite interesting trends in the future.
Safety for me is confidence to use the thing. For me in my own code, but also others on my team that may work on this code.
I mostly have experience building things in GC languages. But with Rust I managed to safely use [1]:
- stack references in threads
- kept mmap references alive until threads finish work
- zero copy xml parsing (from mmaped data!)
- SSE/AVX enabled searching
The Rust language empowered me to do these things with a high degree of confidence. Not one segfault or core dump, just lots of compiler errors.
I played with Zig. Admittedly, the small ecosystem aspect is something all languages go thru, and it would be a better experience with a Zig specific libraries. But Zig doesn't empower library authors to make a large category of bugs impossible, and leaves it to documentation. This is like C, I don't have enough confidence in myself to use it.
Brilliant people are building powerful, safe-ish, reusable libraries in Rust. For mere mortals like me, this is Awesome.
Namely, there is no guarantee that the bytes between `<page>` and `</page>` will be valid UTF-8. It may be the case that you only run this program with UTF-8 input, in which case, UB is never triggered. But it's worth pointing out here since there is nothing actually stopping your program from hitting UB.
Also, as long as you're bringing in the twoway crate, you might as well use it on lines 43 and 48 since you're just searching for a single needle.
Ah gotya. Yeah, I haven't added reverse searching to aho-corasick yet. Ran out of steam.
Either way, my point here is to be a counter-balance. To be fair, you did say, "But with Rust I managed to safely use." But the code you posted is technically unsound. It's not a huge deal if you know you'll always be feeding the program valid UTF-8. But it is worth mentioning here in this HN thread that is specifically comparing the safety properties of competing programming languages. :-)
A problem for Zig is the combination of unsigned overflow UB, implicit widening and always performing operations on the type size of the operands.
For example, let’s say you have `a = b + c` with b/c being i8 and a i32. This calculation is first performed as an 8 bit add, then extended to 32 bits. This is true for both Rust and Zig, but Rust requires an explicit cast to widen the result of `b + c`, making it obvious that an extension happens and that `b + c` is not performed in 32 bits. In Zig there is no such indication- you need to look up the definition of b and c. Other problems occur as well, that both C and Rust avoid in their own ways. Hopefully Zig can improve this situation.
I cannot live without RAII. Zig is just a no-go for me without automatic resource cleanup. A lot of people forget cleanup and destroy safety without RAII.
https://github.com/ziglang/zig/issues/782
Zig is apparently a PL for folks with perfect memory who never make mistakes like those described in that ticket. "Lesser" programmers like me can choose another language.
I don't know Zig, at all, but I do know Rust and C, and I know what a UAF bug looks like. What does a UAF bug look like in Zig? That a modern memory-safe language could be vulnerable to the C-language UAF pattern is a surprising claim.
Ok, but that video suggests that Zig's allocator wires the program to segfault if you access the freed memory. With a 64 bit address space I guess you can do this perpetually?
That's what I figured: the 64 bit address space ensures that they're just never going to reuse address space. Which, in turn, means that C-style UAFs are unlikely to be an issue. I think this page should probably capture that.
> The standard library includes a set of allocators which don't reuse allocations, preventing use-after-free, and which catch double-free. I'm not clear yet on how high the runtime and memory overhead are though, which will dictate when it is practical to use these.
I didn't include it in the table because I'm not yet convinced that the overhead will be low enough that people will actually ship software using those allocators. (All the zig programs I've written so far use the libc allocator and are definitely susceptible to UAF)
Perhaps I'll spend some time measuring it this week and post an update.
Zig is a C-like language that isn't memory safe. A UAF bug looks the same as it does in C where you manually call the allocator to get a block of memory, manually free it, then try to perform an access via a pointer to it.
To be fair, Zig provides a spectrum of memory safety as the comparison table in the post makes clear. Sure, it isn't 100% memory safe, and especially not around UAF, but it's still orders of magnitude safer than C. In the safety department, it's not at all a "C-like language" in that respect. It's a massive leap forward.
Type confusion and memory lifecycle flaws are probably the dominant source of exploitable vulnerabilities at this point. I'm surprised to see it suggested that Zig is weak to them.
Hey Thomas, I would have thought you would have said that at this point JavaScript or Postel's law were probably the dominant source of exploitable vulnerabilities. You're right though, Zig is weak to them, but it's not all or nothing as with C. It's a spectrum, and having spent some time with the language, I think that for Zig's goals, it makes the right set of trade-offs.
But I think this page may be overstated? Again: I don't know anything about Zig, but I sure know how a UAF bug works. :) And it doesn't look like Zig is meaningfully susceptible to them? You an crash a Zig program with a UAF, but the actual vulnerability wants more than the crash: it wants the program making uncontrolled writes to live memory used elsewhere in the program, which is a condition I don't think is present in Zig as it's being described.
If that's the case, that bodes poorly for the claim that Zig is susceptible to C/C++-style double free vulnerabilities, too.
It would be genuinely weird to see a new language rolling out that had C/C++'s UAF problem.
(As was pointed out elsewhere: if you're using an external allocator, or the `c_allocator`, all bets are off. But so is unsafe code in Rust, I guess?)
> null pointer dereference ... [C] none; [Zig] runtime; [Rust] runtime
Assuming this is talking normal "safe" Rust, I think I disagree with this. The Rust analogue of a pointer in safe code is a reference, not a Rust pointer, and these can't null at all. You could use an Option<> of a reference, and Rust will internally use null to represent the None (empty) case, but an attept to use the option without checking for None will result in an error at compile time, not runtime. Yes you could convert that into a runtime error, but if it was an error condition for that variable to be None then (depending on the context) you could choose not to use an Option at all and then it would be a compile-time error at the call site to attempt to put None into it.
I don't think I understand what is meant by "type confusion". Surely this would also cause compile-time errors? Even C++ would give compile time errors for this unless you use a cast! (C, unlike C++, lets you implicitly convert from void* to any other pointer type so you don't need a cast to get pointer confusion.) Could someone think of an example of what might be meant here, and how it would cause a runtime error?