Hacker News new | past | comments | ask | show | jobs | submit login
Checked C (github.com/microsoft)
175 points by tosh 17 days ago | hide | past | favorite | 133 comments



This is similar to an approach I proposed in 2012.[1]

Microsoft: "Rather than changing the run-time representation of a pointer in order to support bounds checking, in Checked C the programmer associates abounds expression with each_Array_ptr<T>-typed variable and member to indicate where the bounds are stored. The compiler inserts a run-time check that ensures that deferencing an_Array_ptr<T>is safe (the compiler may optimize away the runtime check if it can prove it always passes). Bounds expressions consist of non-modifying C expressions and can involve variables, parameters, and struct field members. For bounds on members, the bounds can refer only to other members declared in the same structure."

Me: No array descriptors are generated by the compiler. The programmer tells the compiler the size of the array as an expression, and that expression is evaluated at the point in the program where the relevant declaration appears.

The Microsoft syntax is very Microsoft. It involves many leading underscores and long keywords. What I proposed involved adding C++ "&" references, and making lengths explicit in the places in C where you can write "[]". This is more concise. The basic idea is the same, though. Arrays have a length, which is defined by some expression on nearby variables, such as other fields of the same struct or other parameters in the same function call. C programs that work usually have a length around somewhere, but the compiler doesn't know where it is. Make that explicit, and you can check.

If you add ranges and references to C syntax, and provide length information, you can deal with one of C's three big questions - "How big is it?" (The other two big questions are "who owns it?" and "what locks it?")

This area is 10% technology and 90% politics, so you need sizable organizational support to make much happen.

[1] http://www.animats.com/papers/languages/safearraysforc43.pdf


> The Microsoft syntax is very Microsoft. It involves many leading underscores and long keywords.

Not at all, Microsoft follows the ISO C guidelines to add new keywords to the language.

Which is why you have stuff like _Generic, _Atomic,....


Cool, thanks for working in this area of course. I had a quick look at your paper out of general interest (I'm probably a very conservative C programmer, I would rather move to Rust than modify C to make it safer).

I didn't understand this part:

    int arr100[100];   // an array of 100 ints
    arr100& arr100r;   // ref to array of 100 ints

    > This is valid C++ today.
Is it really? That doesn't look valid to me, but I'm well and truly much more of a C programmer than a C++ one. I did try it on Godbolt, and got an error:

    <source>:2:1: error: 'arr100' does not name a type
        2 | arr100& arr100r;   // ref to array of 100 ints
          | ^~~~~~
    Compiler returned: 1
Apologies if this is already mentioned somewhere in errata or something, or if I'm simply being too dense.


definitely isn't. correct would be

    decltype(arr100)& arr100r = arr100;
or just

    auto& arr100r = arr100;


Right, that needed to be an initialization.


But also arr100 isn't the name of a type as mentioned by the error message.


Honest question with no bad intent:

Do you really think the "C community" as a whole will accept ANY syntax changes to the language?

From my experience I would say no. I think it would be "easier" to force tools on them than any syntax change (even if really really minor).


Do you really think the "C community" as a whole will accept ANY syntax changes to the language?

No. That's why I didn't push on this.

I have some disagreements with the design of Rust, but it's a big step forward.


Why not just Ada/SPARK? Rust and its syntax I believe is troublesome (heavy use of symbols being the major reason, e.g. in Rust it is completely normal to have 6 symbols next to each other). I often look at some code written in Rust and I cannot tell what it is supposed to do, it is way too hidden from me. It is almost akin to reading Perl. Reading someone else's code written in Ada/SPARK is not a problem, even without knowing much about the language.

Mind you, Ada/SPARK has been out there for quite some time now, and it has been a "big step forward" of what people were unaware, unfortunately. Only if Ada/SPARK got the hype of Rust... there would not be much of a need for Rust. Ada/SPARK would have an ideal ecosystem, perhaps even another compiler, and so on.


What matters is WG14.


Tell that to Annex K... Getting something into the spec isn't enough.


Indeed, but getting into the spec is the first step to matter at all.

And you can get Annex K, on Windows.

On the other hand, exactly because of the way some C developers are against anything that helps security, is why Microsoft, Google, Oracle, ARM, Apple are all pursuing projects to have hardware memory tagging.

On Solaris SPARC, it already doesn't matter what those more religious developers think about bounds checking and similar issues.


Prior art from Microsoft which leverages similar idea, but using macros: https://docs.microsoft.com/en-us/cpp/code-quality/best-pract...

They also have SAL annotations for "what locks it?".


That's been a problem with verification systems. If the annotations are not well integrated into the language, they're viewed as a nuisance. Especially if the compiler neither enforces them or uses them to help optimization.


The enforcement does not necessarily need to come from the compiler - it's not uncommon to introduce additional required processes / checks that new code needs to pass (ex. "tests need to pass", "at least one reviewer must approve a change"). The compiler probably won't be able to leverage those annotations since there's always a potential for mismatch between annotations and code (but the verification system should be able to catch it) - I agree that integration into the language would be needed here, but that's not likely going to happen with C/C++ so I'm open to using a solution that's not perfect. :-)

I don't expect it to get traction.

• C is valued for being super portable and compatible with every platform, including legacy ones, but this applies to specifically C89 or C90, and nothing else. As soon as you embrace new features, it's not your universal C any more. You're going to need newer compilers, new tooling, and that's almost like starting with a new language.

• C users have mostly self-selected themselves to like C the way it is: a small, flexible, zero-overhead language that isn't C++. Addition of checked containers, generics, and restrictions on pointer arithmetic, etc. make checked C not feel like C any more.

• Checked C still needs refactoring of the code. For large established codebases this is still a lot of work, and still a risk of introducing new bugs. It's very tempting to settle on just running one more analyzer, sanitizer or fuzzer instead.

And in the end with all this effort, back-compat break, whole-codebase refactoring you're still left with the old C, with headers and include paths, sloppy macros, ancient stdlib, painful dependency management, etc.

If you're going to rewrite C incrementally to a more complex dialect, and require a newer non-standard compiler for it, you may as well run c2rust on it first.


I don't know why people who want a smaller language don't cut out some of the more disastrous portions of C with more modern c++ equivalents that really aren't all that much more complicated if you consider that you usually end up using complicated macros anyway. Something like Misra C++. There are obviously others. Embedded compilers have gotten a lot better at c++ over time as well.


Because of politics.

Many of the people that keep using C as is, will never use something else no matter what.


Non-standard is one of the biggest drawbacks of Rust. If there were a standard and independent Rust equivalents of gcc, clang, icc and Visual C, then the decision to try out Rust would be easier.

One reference implementation is a lock-in, not only technologically, but also socially in terms of a community that is too monolithic for its own good.


I would say give it time, you can't measure Rust with this yardstick (yet), Rust is a baby in terms of age.

Let's not forget that C is ~50 years old and yet having reliable, open-source, high-quality compilers is a thing of the past 20 years or so. Clang was born in the 2000's, GCC is much older but it wasn't that great of a compiler for some time. Same with VS.


As far as I know, though, C has had multiple implementations for years: IBM has its own compiler, Sun had one, Borland and all sorts of other vendors had one. It seems to me that C comes from an era where languages were standardized and multiple vendors provided their own implementations: Common Lisp, Ada, Fortran, even a relatively obscure language like Eiffel all fit this criteria.


yep multiple implementations and glorious code work around for each one of them! ahh those were the days


This might be an overly rosy view of history :) prior to 89 C wasn’t even standardized, and C++ was a mess until 98. And those are just the years that the standards came up. It took time for existing compilers to catch up.


I’m somewhat familiar with this: I write a lot of Common Lisp and try to stick to the portable features (standard + compatibility libraries). So, I occasionally find issues with particular implementations.


What an interesting take. Go, Python, Ruby and countless other languages only have one implementation and real community. I don’t think this is a real concern for most developers. As long as the language solves their problems and is well maintained with a community that trusts the language maintainers.


> Go, Python, Ruby and countless other languages only have one implementation

That's not true for all those examples: both Ruby and Python have several great implementations and Go started with specification and two implementations.


This is splitting hairs a bit, but, those languages only really have one implementation that dictates where things are going as far as the language is concerned. C is special because of when it started and what it was trying to solve in its day. My point was just that the communities and languages are every bit as monolithic as Rust, so why was Rust singled out?


That's not true of those other languages. While the alternative implementations are always playing catch-up as it relates to language changes, the developers of those implementations do participate in the discussion around those changes.


I think we can go with the 90% rule here. I would bet that at least 90% of Python users are using cpython.


Just read that the remobjects folks' elements compiler platform can also compile go...go figure!

https://elementscompiler.com/elements/gold/


Python has at least 3 relatively popular implementations that I know of.


For what it's with, this is being worked on[1].

1. https://github.com/Rust-GCC/gccrs


It hasn't been a problem for Go. Having multiple implementations didn't seem to help D either.

It might be a lock-in but fragmentation is bad too.


One of rust's strength is the solid structure. We saw what happened with c and c++, no one is forcing other individuals from forming their own coalition and forking rust. It's a young language, the last thing we need is fragmentation.


Many people will not move from C to Rust because Rust is non-standard, doesn't have multiple implementations and multi-platform support at the level of C. The C language, despite its shortcomings, is at another level compared to other languages and it will take decades to make this comparison even possible.


About your first point: the code looks like it will be standard C when you strip the checked parts. So won't this offer a way to develop plain C, but with some guards?

And indeed, it's probably not suitable for refactoring, but if you want to start a driver from scratch?


It looks reasonable to me as well, I think it could be quite promising for embedded programming on small IOT devices. I prefer a cut down c++ but that isn't always available.


The goal of these kind of projects is to eventually be integrated into ISO C, if enough people at the committee get interested.


> [...] a small, flexible [...]

I do not see what would make C more flexible than any other language. The other points are correct and C's niche is indeed that it is small, ubiquitous, simple, and yet low level.


I believe it is Microsoft trying the same trick as with their TypeScript thing.

An unsuccessful attempt at embrace, extend, extinguish.

If they just needed type checking, it could've been done with either tests, or annotation for checks at compile time, but instead they've developed a whole new language that gets more, and more diverged from the original one with each release.


What is the trick that you're saying Microsoft is doing with TypeScript?

Also "TypeScript thing" is definitely downplaying where TS is as a language https://2020.stateofjs.com/en-US/technologies/javascript-fla...


Another way to do it:

https://www.digitalmars.com/articles/C-biggest-mistake.html

i.e. just change:

    void foo(char* a)
to:

    void foo(char[] a)
and that passes a phat pointer (pointer and length) instead of just a pointer, enabling array bounds checking.


This comment could mislead people: The linked article explicitly says that * and [] are identical when used in function arguments. It then goes on to suggest a possible syntactical change for the C language where the [] syntax would be a fat pointer.


OP wrote the linked article...


Yes, but the comment could mislead people, so I clarified it.


The article you reference actually proposes new syntax:

  void foo(char a[..])
for passing an array pointer and length, and to eventually deprecate:

  void foo(char a[])
Personally I like this suggestion, but as the article is 12 years old, I'm obviously in a minority.


> 12 years old

Which is amazing. It's a simple, backwards compatible solution to buffer overflows. I know it works because it's been used in D for 20 years and has made buffer overflows a thing of the past.

It's also nice because it implicitly documents the difference between a pointer to a single object and a pointer to an array of objects. Just this is a huge win, when used consistently.


Psst, he didn’t just reference the article, he wrote it.


Thanks for pointing that out! Still, his comment here doesn't quite match the article and certainly confused me. Is he now suggesting:

  void foo(char[] a)
instead of what he originally proposed, which is:

  void foo(char a[..])
If so, that does seem cleaner. I still like the overall suggestion.


The problem is backward compatibility with external linking because this expects another word on the call stack. It also relies on the programmer to be smart about bounds, which, they’re already intentionally ignoring, so, I don’t know. However, there are some attempts at a hardware solution to provide memory safety without modifying the stack: https://en.m.wikipedia.org/wiki/Intel_MPX. At array creation time, the pointer bounds are loaded into a register and the address is translated between a safe address and a real address. I don’t actually know if this works in a transparent backward-compatible way, as in, ignoring unregistered pointers and translating without being told to. I should also mention that AddressSanitizer also does something similar in software, and is becoming ubiquitous. The performance/size penalties are significant, bu it seems to me that the hardware translation should be fast if it is separately integrated with the existing cache address translation and memory bus.


> The problem is backward compatibility with external linking because this expects another word on the call stack.

It's completely compatible if you use the [..] syntax.

> performance/size penalties

For the proposed scheme, they are essentially non-existent. This is because the array bounds is usually already being passed as a separate argument.

Arrays that rely on a sentinel to determine the end (like 0-terminated strings) one can just not use the new syntax.


In the future, new low-level projects will probably use languages that make it easier to write safe programs, like Rust, Zig, perhaps new languages -- each according to the programmers' tastes and preferences -- but what many people might not know is that there are tools (like https://trust-in-soft.com/) that can guarantee no memory errors in C code. They do require a significant effort to use, but it is significantly less effort than a rewrite in another language, and significantly less risky to boot (not to mention that there are quite a few platforms that use custom hardware and don't have compilers for new languages [1]). These tools are actually more battle-tested, and more heavily used in safety- and security-critical systems software than any new language. Given the amount of C (and C++) code in existence and heavy use, those tools and extensions like Checked C (assuming they can compile to C) are extremely important for the quality of software we use.

[1]: https://youtu.be/uqan23518Yc?t=610


> They do require a significant effort to use, but it is significantly less effort than a rewrite in another language

I do not believe this. Most C code by far is not written in a way that makes it easy to formally guarantee memory safety, so the lowest effort path will necessarily involve refactoring it until it does.

And that means that rewriting it altogether into a new language has very few drawbacks, and some additional benefits.


You can also say that for JavaScript, and see how successful TypeScript is.

Yes, you can rewrite everything in Elm, PureScript or even anything that is more similar to JS, but step by step conversion to TS enables a significant quality improvement for large codebases (I personally think it doesn't fare so well on small codebases, where there are some people who use it even for one-off scripts, but at least we can agree on large codebases I think)


> You can also say that for JavaScript, and see how successful TypeScript is.

I'm not sure that's what you meant, but interoperability between C and Rust does seem to be a selling point of Rust (like TS and JS), and allows C codebases to be oxidized gradually rather than rewritten from scratch.


But you can't just add types to C and it becomes rust. It needs serious refactoring. You'd probably go lib by lib, whereas you can rename a JS file to TS and it starts checking types (even without, if you configure it that way).

By the way, I love rust, but I just don't understand the "rewrite all the things" hype. Being battle tested means a lot for low-level libs. You can't guarantee that you ported everything exactly, with bugs and all. Pragmatic solutions like "Checked C" prove a lot in that regard.


> you can't just add types to C and it becomes Rust

You actually can “just” turn C into Rust with something like c2rust. But you haven't gained any safety without refactoring.

Is that not also the case for Checked C?


You cannot be sure that you kept 100% of the semantics when you use c2rust. You need to focus on semantics plus the types. With checked c, you can just keep your focus on the types.


TypeScript doesn't have to solve memory safety problems because JavaScript effectively does not have them. Marking whether things are strings, integers or objects is a much easier problem than e.g. assigning correct lifetimes to C pointers.


> so the lowest effort path will necessarily involve refactoring it until it does. And that means that rewriting it altogether into a new language has very few drawbacks, and some additional benefits.

No, because while some refactoring is necessary, it is virtually always local. E.g. if the sound checker can't prove that an access is within an arrays bounds, adding a dynamic test just before it (assuming you don't want to add annotations to help the checker) will make it provable.

Rewriting in a new language is certainly one way to reduce or eliminate memory errors, but if your goal is to reduce or eliminate memory errors in existing codebases, there are other ways that are cheaper, more proven, less risky and are applicable to more environments. These tools are already much more heavily used in safety- and security-critical codebases than any new language, so people are slowly getting good experience with them.


I’m not sure what C code bases you’ve worked with but refactoring in a large C code base is almost never local. Especially if it’s a code base that made extensive use of the gcc flags that disable making sure code does not alias.


Providing enough information for the prover does not require a big refactoring, and is significantly cheaper and less risky than a rewrite for an established codebase.


P has chronically bad taste when it comes to all things PL-related.


The problem is most developers and/or management don't care about security. They only care about being done as fast and/or as cheaply as possible.

I'm still not convinced the new languages (Rust & Zig) will change the overall situation.

Why? Simple example in Rust:

    "LOL! I need to get stuff done."
    
    unsafe {
        Really nasty hacks here to circumvent any obstacles.
    }
Example in Zig:

    "LOL! I need to get stuff done. Tests? What tests???"

    zig build-exe example.zig -O ReleaseFast
Trying to solve those security bugs with the help of the language is of course the low hanging fruit. But systems languages will always need an escape hatch for certain problems to be solved. And those mechanisms WILL be abused in the name of "get it done already".

So we really need other means to erase those bug classes. Which ones I don't know.


> Trying to solve those security bugs with the help of the language is of course the low hanging fruit.

I'm not sure I follow, it seems to me like it's actually a freaking high-hanging fruit. The amount of engineering and design that goes into creating a safe, fast, low-level programming language (i.e. Rust in this case) is enormous and it requires huge amounts of upfront investment.

The only reason the industry has largely resorted to unsafe languages is because there's many other lower-hanging fruits, e.g. fuzzing, rigorous unit testing, static analysis tools. These help immensely but at the end of the day only take you so far.


Yes, most developers and management don't care about security, that's why it's important to bake the security into the language. Unsafe blocks are inconvenient and stand out, so their usage will be limited, while the remaining code will be safe. 90% safe code and 10% unsafe code is much safer than 100% unsafe code.


Honestly, just let them have a buffer-overflow. That's what we call "junk food" software and if they are negligent, let them fail.


We haven't yet solved the problem of how to most effectively reduce such issues in projects where developers and management do care about security (and there are certainly more now that companies can be heavily fined for data breaches), so let's tackle that one first.


In an iterative development process, it's not the end of the world to get potentially unsafe code out the door and then make it safe later, and there's a huge benefit in having unsafe code labeled and framed in source when security bugs are found.


If you just need to get stuff done in Rust you can just start wrapping things in Rc<>, Arc<>, Box<> or even Gc<>, which introduces runtime overhead on the level of Java. It's safe and allows you to work around thinking about ownership which I think is the main pain-point of using Rust versus other languages. Obviously later if it performance becomes critical you can refactor.


This comment is as low-effort as the hypothetical examples.

Will they do things in that way? Heck, let’s just ask: are they doing things in those ways?

crickets

Just meaningless rhetoric (“being done fast and/or cheaply”) and unfounded hypotheticals.


This is a problem you can't fix 100%, it has to be taught through catastrophic failure unfortunately. It's a human flaw, rust (et. al.) gives a pathway forward after that.


I second this: it's because C is a simple language. Modern languages are a cluster-f*k of features and there are still opportunities to code in security holes. Frankly, if you were to boil down the "unsafe" C vulnerabilities it is a small class of exploits, suitable for automated detection.

The interesting thing about C is that it is one of the few languages which has been able to survive without massive changes throughout the years. All these hyped-up modern languages are in flux ALL THE TIME.


> Frankly, if you were to boil down the "unsafe" C vulnerabilities it is a small class of exploits, suitable for automated detection.

Ah, it’s simple, then. We will await eagerly for the non-forthcoming links to all those efforts/implementations.

Or would you like to walk that back to “in principle suitable for automated detection”?


It is certainly not simple, and while there's a limit to what languages can do to assist correctness, C sets a particularly low bar, one that newer low-level languages like Zig and Rust can and do improve upon. Having said that, eliminating memory errors from C is not only doable in principle, but used in practice, so far more than new low-level languages (see, e.g. https://trust-in-soft.com/). It does take a fair deal of work, but for established codebases, the approach is cheaper, less risky and better established than new languages.


I'm a bit hesitant to believe that in the future low level systems programming will be done using languages like Rust, Nim or Zig. I would really love to see that but the programmers I know in that area are very reluctant to accept any additional layer between them and the hardware. It's not even just low level systems programming, but basically any large project with some sort of performance constraints still seems to stick to C++.


Nim is inherently higher-level compared to C, so I understand your point. But how does e.g. Rust or Zig interpose additional layers between the programmer and the hardware? It's not like they introduce complicated runtimes or have expensive FFI. Rust in particular is receiving lots of interest for security-conscious and/or performance-critical projects (browsers, operating systems, cryptography, container technologies, HFT-related stuff). I think that speaks volumes as to Rust's capabilities in this area.


All the layers that I see Rust and Zig are introducing have to do with the compiler. And if advanced compilation is a “layer” then Mr. zero-cost-abstractions C++ shouldn’t throw stones.


Linus Torlvads on why C and not something else (2012) https://www.youtube.com/watch?v=MShbP3OpASA&t=20m45s


C´s success is an effect of UNIX's adoption.

Outside Bell Labs, operating systems were being written in safer systems programming languages, and one of the reasons why Unisys still has enough customers that care to pay for ClearPath MCP, which has a security level no UNIX clone is capable of, thanks NEWP.


Could you explain what additional layer is between me and the hardware when using Rust vs using C? I've written a lot of Rust, and a lot of C, and I can't figure out what you're referring to here.


Zig is probably closest to C for the target audience.

Although I wouldn't mind if Pascal or Ada swings back to mainstream's attention.


I want to like the idea, but to me the examples looks like the abomination that was (is?) managed C++.

As per wiki, they state that they have automatic conversion tools, but I highly doubt that this will be used in production, even in Microsoft.

Example from their wiki (https://github.com/microsoft/checkedc/blob/master/samples/st...):

    // Delete all c from p (adapted from p. 47, K&R 2nd Edition)
    // p implicitly has count(0).
    checked void squeeze(nt_array_ptr<char> p, char c) {
    int i = 0, j = 0;
    // Create a temporary whose count of elements can
    // change.
    nt_array_ptr<char> s : count(i) = p;
    for ( ; s[i] != '\0'; i++) {
        // We need to widen the bounds of s so that we
        // can assign to s[j] when j == i.
        // (In the long run, this will be taken care of by the
        // flow-sensitive bounds widening).
        nt_array_ptr<char> tmp : count(i + 1) = s;
        if (tmp[i] != c)
        tmp[j++] = tmp[i];
    }
    // if i==j, this writes a 0 at the upper bound.  Writing a 0 at the upper bound
    // is allowed for pointers to null-terminated arrays.  It is not allowed for
    // regular arrays.
    s[j] = 0;
    }


C gets a lot of flak, especially here, but I like the fact that you have more choices in how to do things. Yes, it's absurdly easy to shoot yourself in the foot, but you can also make use of design patterns that help.

For a simple example, if you have a struct that has a pointer to an array that is allocated previously and that also has a length and maybe an allocated amount (and you're careful with making sure they are correct), then you can pass a pointer to the struct to a function that verifies that the length is sufficient for what is to be done and then the function doesn't have to check the length every time it accesses something in the array if it can be assured that the allocated memory or the length didn't change. This could save a lot of generic behind-the-scenes code that verifies the length every time an array element is accessed. It's up to the developer to make sure all the pieces are in place and everything is prepared before calling this function. If that code is in a tight loop, it could make a big difference in performance.

It's a sharp knife, but sometimes you need a sharp knife. You just have to be more careful. Dulling the blade makes it more safe, but it doesn't let you make use of it's sharpness.

Is it often overused? Probably. Does it have a steep learning curve to be able to write code that doesn't exhibit certain classes of errors? Sure. Should other options be used where performance isn't as critical? Probably. But it has it's place.


Related but for c++ https://herbsutter.com/2018/09/20/lifetime-profile-v1-0-post...

There is also cyclone.

Finally, if you missed it there is a new probabilistic variant of asan: https://llvm.org/docs/GwpAsan.html (and also a hardware accelerates one but only for ARM)


Cyclone has been dead for decades, and only works on 32-bit anyway. (I did some work to try to port it, but it's not very practical to.)


The real problem with C and C++ is you have to be an expert to not make disastrous errors. And you have to always get it right.


And the industry's experts who are famous for writing "idiomatic C" keep making mistakes constantly that result in new CVEs.


There often is a choice of using a safer solution over raw performance. In C++ for example you can fight C strings, or you can just use std::string, there is no need to always go with the dangerous solution when that piece of code isn't a bottleneck.


The main problem here is how Animats refers on his post, 90% politics, and many organizations are too ingrained into C culture.

It might not matter at all for what is being developed, but in no way are you going to spend those extra 5ms required for safer code.


C/C++ is a potential footgun, but it has its place for applications where they need precise control.


You're not precise though. Precise ought not mean "It is possible to express precise thought regarding memory, yada yada, in this language", it ought mean "A person is capable of expressing precise thought regarding [...] in this language, without making mistakes".


Could you offer some insight or examples of where or how C permits or enabled more-precise control than is permitted or enabled by Rust?


You need unsafe in Rust just to do a doubly-linked list.


> You need unsafe in Rust just to [implement] a doubly-linked list.

And? That's what unsafe is for.


Yes, I agree, you can implement a doubly-linked list in Rust exactly as you can in C, with the same precise control.

You can even implement intrusive collections: https://docs.rs/intrusive-collections/

Unsafe is a normal part of the language, and how you manually implement precise control behind a safe reusable abstraction whose correct use can be verified by the compiler.

Do you have any examples of precise control you can express in C, but can't express in Rust?



Since nobody else has posted it yet:

    #include <stdio_checked.h>
    #include <stdchecked.h>

    #pragma CHECKED_SCOPE ON

    int main(int argc, nt_array_ptr<char> argv checked[] : count(argc)) {
      puts("hello, world");
      return 0;
    }


It's better to convert existing C code to Rust semi-automatically with C2Rust[1] framework rather than wasting time converting to this instead.

[1] https://github.com/immunant/c2rust/


There is nothing lifetime-related here, right? I.e. nothing to help prevent use-after-free errors.


I think using SAL annotations along with the /analyze flag in MSVC is much less obtrusive then this solution and allows programmers to annotate portions of their code without having to learn / rewrite in an entirely new dialect of C.


It would be nice if you can move your existing C codebase piecemeal towards a more safe standard using linters and compiler checks. Because projects with a lot of legacy (mostly C) code (which means all popular operating systems) cannot be rewritten towards safer languages without having an additional motivation like features that have a meaning to the business. So most likely a lot of it will stick around for a very long time but still needs to become 2021 level secure.


One way to do that could be with frama-c [1]. It allows you to put the checks in comments that are interpreted prior to C compilation. See the example here [2].

1. https://frama-c.com/

2. https://frama-c.com/html/acsl.html


iOS is doing this. No more information is available though.

https://support.apple.com/guide/security/memory-safe-iboot-i...


That's not surprising. It's Apple, after all.

I get downvoted a lot for saying this, but I'll say it anyway: I'm getting increasingly convinced that the purpose of this whole "safe languages" push is ultimately to the benefit of companies who want to deepen and their control over users. If you look at who the biggest proponents are, it's pretty obvious what their motivations are. They don't want you jailbreaking or rooting, they want to force you into their walled garden and keep you there.

https://news.ycombinator.com/item?id=7883026


Serious question: how does a project adopting a "safe" language benefit the company that's pushing it or give them more control? For example, if I write my web server in Go, does that benefit Google in any meaningful way? How?

Likewise, I usually think of Mozilla as one of the primary promoters of safe languages (specifically, Rust). Do you think they benefit in some illicit way if someone chooses to use Rust?

I read your linked post, but, without reading the whole thread, I'm not sure how it doesn't amount to saying that security bugs are, on net, a good thing.

To be clear, I have my own criticisms of these newer languages that center on their corporate-driven development (e.g. they're generally designed around the belief that the developer should also be the final packager of the software or should at least determine what the final binary looks like), but their additional security is not something that I had thought to criticize as such.


It's not about a single project or even many --- but the gradual normalisation/acceptance of "safe languages" (and likewise, the ostracising of "unsafe languages") that is the main concern. A "creeping absolutism", in other words.

Do I want a browser that's free of remotely exploitable vulnerabilities? Yes.

Do I want a "device" that I can't take control of, despite the fact that I bought it? Do I want unbreakable DRM? No. Hell no.

The problem is that those who work on the latter will see the former, and adopt the same technologies to enslave instead of empower.

Ultimately it's a balance of power. Neither total lawlessness nor total authoritarianism ("zero crime", as some like to call it), to use a real-world analogy.


What a lot of people don’t realize though is that only a company with the force of Apple can get the kind of access to the hardware you need to write this level of firmware.

Apple writes every instruction executed by the processor from the moment it powers on until it’s booted. Unlike for instance devices running on the Intel platform, where the processor is running an entire OS behind your back, and you can’t turn it off.


I wonder if projects like this are the first step towards rewriting projects in Rust?



Yet another solution to make C type check. If people want to use type checked C, the they can use C++/Rust/Golang as the overhead would be minimal. C gives programmer tons of freedom and that's the beauty of the C. Taking away that freedom eliminates the whole purpose of using C in the first place.


You can't just rewrite an existing application to another language. What you can do is add tooling to an existing codebase to improve it. Facebook did this some time ago with Flow for Javascript.


Agreed, but that also requires throwing away UNIX clones and I don't see that happening on my lifetime.


If anyone from Microsoft is listening:

I'm a very experienced C programmer and what I would like, much more then a new language version, are some basic OS functions where I can query the status of a pointer. Is it readable? Is it writable? Where does the memory allocation start and end? Is the memory in the heap or stack, or a memory mapped file?

All the information is available inside the OS, but I cant get at it. If it was available it would be possible to write all kinds of instrumentation. If it was possible (in debug mode) to get the layout of the stack, that would also be super useful. Buffer overruns are really mostly a problem when they are on the stack.

By warping malloc and free, you can easily find buffer over runs, memory leaks and double frees. They become a none issue. But with stack memory you have very little control. I avoid using the stack for operations where this might be an issue, just because its so much harder to debug.

I would love to be able to write APIs that have built in tests (in debug mode) where all pointers are checked, and if anything is broken the API can fail gracefully and give out precise errors.


> Is it readable? Is it writable?

VirtualQuery but the granularity is coarse, pages are 4kb.

> Where does the memory allocation start and end?

HeapSize if that's on heap, and you know the heap.

> Is the memory in the heap or stack, or a memory mapped file?

QueryVirtualMemoryInformation

> APIs that have built in tests (in debug mode) where all pointers are checked, and if anything is broken the API can fail gracefully and give out precise errors.

Visual C++ in debug builds does lots of runtime checks on all levels, C runtime, C++ runtime, and even compiler. For instance, they fill uninitialized and freed memory with magic numbers.


Visual studio is great at catching loads of stuff in debug mode.

HeapSize is close, but cant be used to check incoming pointers since it has to be the start of the allocation, and again it doesn't recognize the stack.

VirtualQuery is better but as you say a little course.


Indeed, unfortunately outside of Windows, there isn't that much culture in doing that. C++ Builder is also quite similar.


Well, I don't know how NT does it, but for POSIX, malloc and free at the OS layer are fixed-length pages from something like sbrk, not arbitrary blocks. The real implementations are in user space in libc. If I had to guess, NT does something similar though.

I don't think the OS should be in charge of memory safety in this way since this can all be done in user space.


I'm doing it as much as i can in userspace, but i do run in to some limitations: thes say if i want to implement a library and i want a debug/developer mode where i can check pointers given to the API by an application i didn't write. In that case i cant wrap that applications memory allocations to keep track of memory.


Check out Valgrind, it kind of does what you want. For as far as it is possible.


I have used Valgrind, Its nice but on the slow side. gflags is another great option. I would still like more options for instrumenting my own code. I have a fairly capable wrapper that takes care of most of my memory issues:

http://gamepipeline.org/forge_Debugging_.html

Most of why i want this kind of functionality is not to test my code but to be able to build tools. I have written a live memory viewer/poker. Memory query would really help that kind of effort:

https://www.youtube.com/watch?v=pvkn9Xz-xks


You're describing a very intrusive garbage collector.

So, a weakly typed language with garbage collection? Where have we heard of that before?

I can imagine that if most of your code is for checking that the remaining fraction reads correctly from rvalues and writes correctly to lvalues and each check requires a round trip through a context switch, you're not going to be wringing very much productivity out of either your hardware as the CPU thrashes like mad or the one developer you manage to keep on staff.


Its not a garbage collector. All i want is to be able to query the list of allocation made by a process.

Performance would not need to be great, I would even assume it would be quite bad. But being able to write APIs, that have debug mode that tells you precisely when you give a API call broken pointers would be very useful.


The ability to know which memory is valid and which is garbage is indistinguishable from what a garbage collector does.



I would like to be able to query it. Windows used to support IsBadReadPtr and IsBadWritePtr but they have been deprecated. I'm not aware of something similar on Linux.


It’s not very useful, if a pointer is bad and you access it you’ll crash anyway. The problem is when the pointer is manipulated to be good, but not pointing at what you expect.


Let me give you an example:

Lets say you have an API that implements a function similar to memcpy. It interacts with more then one user provided pointer. Lets also assume that the API is something big and complicated like a graphics driver, possibly closed source.

If you give it a broken pointer it may crash, but it wont tell you what pointer was bad and in what way. Maybe no pointer was bad, maybe the implementation is broken, or some other param is broken. An API can check other params to make sure they are valid, and many APIs do, bu its not yet possible to check poiners.

You have to dig in to the implementation to do that and that's a lot of work even if its opensource. Crashing in your code is great, because the debugger will tell you what is going on, but crashing in someone elses code is much harder to debug.

Worse, the code may not crash at all, it may over write another allocation by the same process. The implementation cant check the mallocs, it only crashes if the pointer steps outside of a memory block owned by the process.


> Why would an api be able to tell you anything the debugger can’t?

An error message is much easier to read then a stacktrace, especially if you dont have source-code.

> If you want to do what you describe you have to make such changes that the language really isn’t c anymore.

I dont want any changes to C. I just want to be able to query a process allocations. the OS has to keep the information in order to implement realloc and free, so the information is already there.


The operating system doesn’t handle malloc, realloc and free. That task is handled by the C library. All the operating system provides is a way for these functions to claim and release large areas of memory.

Applications use malloc to allocate some memory they then use to store a collection of data. Pointers often don’t point to the exact start of that memory area, which is what you need to do memory management. So from a pointer you can’t necessarily realloc or free.

And the problem is, if a buffer overruns for instance, you end up writing memory that is allocated and valid, just not for the thing the developer thought.

Alternative implementations of malloc exist that try to help with this, by encasing memory between guard pages.

Also, with for example glibc malloc you can use the MALLOC_TRACE environment variable to get a log of all allocations.


I already have an alternative implementation of malloc that does this. But i still need Microsoft s help to get the same information from a process that didn't use my implementation. I dont need to free or realloc pointers i just need to know if a pointer is valid.


In normal processes you can find this through the _heapwalk function:

https://docs.microsoft.com/en-us/cpp/c-runtime-library/refer...


Why would an api be able to tell you anything the debugger can’t? If you want to do what you describe you have to make such changes that the language really isn’t c anymore.




Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: