
Checked C - zmanian
http://research.microsoft.com/en-us/projects/checkedc/
======
Animats
This seems to be a successor to CCured from 2005, a project partially funded
by Microsoft. [1]

The general idea is straightforward: _" The unchecked C pointer type_ * _is
kept and three new checked pointer types are added: one for pointers that are
never used in pointer arithmetic and do not need bounds checking (ptr), one
for array pointer types that are involved in pointer arithmetic and need
bounds checking (array ptr), and one for pointer types that carry their bounds
with them dynamically (span). "_ So this is really a new language derived from
C, to which programs can be converted.

This is basically a good idea, but it's only useful if pushed hard by somebody
like Microsoft. People won't convert without heavy pressure.

[1]
[https://www.cs.virginia.edu/~weimer/p/p477-necula.pdf](https://www.cs.virginia.edu/~weimer/p/p477-necula.pdf)

~~~
dragandj
Being pushed by Microsoft could be exactly the reason _not_ to convert, for
many people. Especially when there is no shortage of alternatives that are
backed by a bit more benevolent entities (Rust, and many other less known
variants).

~~~
dragandj
In reply to myself: The times are changing indeed. It seams that HN-ers
started liking Microsoft. Whenever I post some concern, even if only mild,
there is first a few upvotes, and then a bunch of downvotes.

Microsoft became the darling of the community, OR, in addition to embracing
open source, they embraced voting ring tactics. These are two explanations I
have, maybe there is another. Maybe my criticism really was obnoxious...

~~~
ocdtrekkie
Maybe it's just because yes, we all remember the 90s. But today is not
necessarily the 90s. It was a lot more recent that Google was a relatively
benevolent company, and we've all seen how that's changed.

And the thing about open source projects, is just that. They're open source.
If the wider community takes hold of them, there's not much Microsoft can do.
Because then you just see a fork.

You don't have to be a fan of Microsoft to simply appreciate the contribution
to open software, as it is.

~~~
EugeneOZ
pff.. they are "open source" just because they want to attract developers,
because without developers writing apps for their platform, Microsoft will die
(as any other platform). They are still just money-makers and nothing else,
there is no any idea except money behind their movements to Open Source world,
otherwise they wouldn't take revenues from Android.

~~~
eropple
Everything you say may (probably is) true.

Now why does anything in your post _matter_?

~~~
EugeneOZ
Because it's important to distinct intention to contribute to open-source
world and intention to use this world to make money.

~~~
devnonymous
It's sad to see that despite open source (or free/libre..etc) software being
around for so long, people still don't get it that money and freedom are not
mutually exclusive. I would argue that contributing to open source for
business reasons is perhaps even more beneficial for the software and
community than just doing it for a hobby or some philosophy because companies
pay people to even work on stuff that is not exciting / interesting.

~~~
EugeneOZ
sad to see that people still thinks "it's sad to see" that somebody can have
different opinion.

~~~
devnonymous
Huh? I acknowledge that you have a different opinion and I'm saddened by that
opinion.

I respect your right to have a different opinion. I don't have to respect the
opinion itself. Hope you can see the difference.

------
Animats
I had a go at this problem, backwards-compatible safe C, back in 2012.[1] It
was discussed on the C standards mailing list, and the consensus was that it
would work technically, but was politically infeasible. What I proposed was
not too far from "Checked C", but the syntax was different.

I'd defined a "strict mode" for C. The first step was to add C++ references to
C. The second step was to make arrays first-class objects, instead of reducing
them to pointers on every function call. When passing arrays around, you'd
usually use references to arrays, which wouldn't lose the size information as
reduction to a pointer does.

Array size information for array function parameters was required, but could
be computed from other parameters. For example, UNIX/Linux "read" is usually
defined as

    
    
        int read(int fd, char buf[], size_t len);
    

The safe version would be

    
    
        int read(int fd, &char buf[len], size_t len);
    

In both cases, all that's passed at run time is a pointer, but the compiler
now knows that the size of the array is "len" and has something to check it
against. The check can be made at both call and entry, and in many cases, can
be optimized out. In general, in all the places in C where you'd describe an
array with a pointer with empty brackets, as with "buf[]", you'd now have to
put in a size expression.

You could do pointer arithmetic, but only if the pointer had been initialized
from an array, and was attached to that array for the life of the pointer.

    
    
        char s[100];
        char* p = s;
        ...
        char ch = *p++;
    

Because p is associated only with s, the compiler knows what to check it
against.

There was more, but that's the general idea. A key point was that the run-time
representation didn't change; there were no "fat pointers". Thus, you could
intermix strict and non-strict compilation units, and gradually convert a
working program to strict mode.

This took less new syntax and fewer new keywords than "Checked C". I was
trying to keep C style, adding just enough that you could talk about arrays
properly in C.

[1]
[http://www.animats.com/papers/languages/safearraysforc43.pdf](http://www.animats.com/papers/languages/safearraysforc43.pdf)

~~~
vasilipupkin
I've already been downvoted for this question, but I don't get the business
case here. If you are worried about this, why not write code in modern C++
which has plenty of ways to make all sorts of issues like this safe ? Why
stick to C at all?

~~~
Animats
C++ doesn't solve the problem. C++ papers over the problem with templates. You
can still get raw pointers out of C++ objects, and for most APIs, you have to.
The underlying language objects are still unsafe.

~~~
vasilipupkin
ok, so write C++ classes that don't let you access raw pointers. I mean I
understand that C/C++ languages are less safe than, say, Java - but that is
the whole point. The lesser safety gives you better performance. If you don't
care about bleeding edge performance, you should probably choose Java or
Python or some other language that is much more flexible and easier to use

~~~
Animats
_The lesser safety gives you better performance._

No. That excuse has been bandied about for years, but it's wrong, as we're now
seeing as Rust gets better. C's problems with arrays come from lack of
expressive power in the language. C doesn't even let you talk about array size
in parameters.

Many of C's painful design decisions come from trying to cram a compiler into
a PDP-11 with 128KB (not MB) of memory for a process. Global analysis on a
machine that small while retaining reasonable compile times was hopeless. This
is no longer a limitation.

------
Keyframe
You know, some people seem to scoff at replacement Cs. I'm kind of one of
them. I know my reason(s) though. It's not about other languages as much as C
itself. I've been with it for awful long time now and I used to think I kind
of know my way around it. I've changed that line of thinking, because I know
it's not true. I make a lot of mistakes, especially considering memory. At
least (almost all of) my code isn't public-facing.

Since I thought I knew C very well, I came to realisation (due to many memory
bugs/leaks) that maybe that's not the case after all. What is it then? I can
get performance out of it, lots of it (that's one thing I know to do, somewhat
at least). That's when I questioned myself and started thinking that it's not
that I know C (I don't, after 20 years or so), but that I am very comfortable
with it and I don't want to change that comfort. C++ and 90's traumas and post
90's (STL) traumas had a lot to do with it. Now, I'm fishing out for newer
languages that would replace C for me, Rust namely, but I'm still falling back
to C all the time. Now, I'm thinking that that's just the way it is. I'm a C
programmer and that's how it will stay. At least for a long while. I do and
have used a lot (A LOT) of languages in my past, but my core is always C (also
first language I've learned). I don't program anymore for a living (I do
"creative stuff" in film and tv now), so that is now more prominent than any
other time in the past. I do stuff for me only and I can pick and choose
whatever I want to - yet, it's always C.

Sorry for the "rant".

~~~
PeCaN
This mirrors my experience. C is a drug man. It's bad but you can't stop. You
know how to get shit done in C and it's _easy_ to think in. C is, IMO,
something of a local optima for programming. Problem is, it's actually a
relatively awful language, and I don't subscribe to the worse-is-better
philosophy (though C may well be the best example of it).

Thankfully Rust gets my ML instincts going. I'm trying to use it more. It'll
be nice not to be anxiously running my test suite under ASAN after every
build.

~~~
shrugger
Yeah but after sixty years of C, why should we trust an origanization like
Mozilla? They've been thoroughly unwilling to come up with any sort of ANSI
standard or anything, despite Rust being a reasonably mature language.

Google sought ECMA standardization when they took on Dart. Go language has an
official specification, so someone could hypothetically re-implement it if
they wanted to, and remain compatible Go projects.

Mozilla only wants Mozilla to be involved with Rust. They want everybody to
use it in their systems, and have no input on where the language should go.

~~~
PeCaN
Huh? Anyone can submit an RFC: [https://github.com/rust-
lang/rfcs](https://github.com/rust-lang/rfcs)

While Mozilla may not be the best organization, the Rust team seems much more
willing to listen to their users than the Firefox team.

------
rdtsc
Interesting. I thought Microsoft has stopped liking C a while back. Remember
complaining about C99 support in VS and getting a response about "Just use C++
compiler as a C compiler, you don't need C anymore". It took them until VS
2015 finally to support it.

~~~
linkregister
Also the DDK (Windows Driver Development Kit) didn't support C++ in any
meaningful way. If I remember correctly, all my drivers had to be written in
C89.

~~~
pjmlp
C++ is supported in kernel space since Windows 8.

------
kbart
The idea is nice (although old and tried more than once), but I'm pessimistic
as it doesn't seem to be backwards compatible and requires a specific, new
compiler. For legacy projects it's hard/impossible to change a
toolchain/compiler and for the new projects one can as well use Rust or other
modern language.

------
partycoder
Reminds me of Cyclone:
[https://cyclone.thelanguage.org/](https://cyclone.thelanguage.org/)

Many Cyclone ideas made it into Rust.

I strongly prefer Rust.

~~~
vintermann
You should look at their paper. They have an impressive survey of prior
approaches including Cyclone, and they claim that their approach (if I
understand them correctly) could allow you to do some things outside of unsafe
blocks if implemented in C# or Rust.

~~~
pcwalton
I don't see any use-after-free prevention here from a skim of the paper, so
this doesn't seem to address many of the most important benefits you get from
C# or Rust.

Use-after-free is not a theoretical problem. All of the Pwn2Own
vulnerabilities this year were UAF, for example.

------
markokrajnc
This clarifies the need for a more "rust-like" C. I hope something like this
will flow into standard C...

~~~
yitchelle
I can't upvote this enough. I started to play with Rust a few months ago and I
especially like their compile-time checks of the code. I don't see why C
compiler vendors can't do the same thing. Imagine if folks like GreenHills or
WindRiver adopt such a practice in their compiler. It would revolutionise
their industry. IAR has MISRA-C checks in their compilers but that is not
enough.

~~~
pcwalton
Because the C language is fundamentally hostile to Rust's safety features.

~~~
choosername
what about fail quick fail often

------
baq
the obvious question: how does this compare to rust? it looks like rust and
this aim to solve a very similar set of issues in more or less similar way
('static and dynamic checking'). i'd be very interested in a table that
compares capabilities of both. of course there's a gigantic advantage of this
being C, so in theory valid checked C would be valid C with all benefits of
that.

~~~
pcwalton
Well, this isn't C: it's a different language that extends C. There's a big
difference between that and just being C. In particular, valid checked C is
not valid C, because checking requires using the language extensions.

Regarding the comparison to Rust, Rust prevents use-after-free, while this
doesn't seem to from a skim of the paper. Use after free is one of the most,
if not the most, common remote code execution security issues in C and C++
code nowadays.

~~~
jjnoakes
> Use after free is one of the most, if not the most, common remote code
> execution security issues in C and C++ code nowadays.

I'd love to see a citation on this. My gut feeling tells me buffer overruns
and integer overflows are seriously in the running.

~~~
jerf
It is certainly "one of the most", even if it is not "the most".

~~~
jjnoakes
I'd love to see a citation on this, as I asked for previously. Repeating the
comment I replied to isn't a citation.

~~~
jerf
That is somewhere around asking for a citation if the sky is sometimes cloudy.

[https://web.nvd.nist.gov/view/vuln/search-
results?query=use-...](https://web.nvd.nist.gov/view/vuln/search-
results?query=use-after-free&search_type=all&cves=on)

Before getting too excited and claiming that it's only 1.3% of all CVEs or
something, remember that it's 1.3% of _all vulnerabilities_. (Especially with
the explosion of dynamic web languages, a lot of CVEs aren't really
C/C++-related.) There's a power law to these things, so by the power law
metric, it's not that far behind "buffer overflow" (6,500 entries), and ahead
of the well-known "format string" (577), which is also certainly "one of" the
most common C issues.

~~~
jjnoakes
I'm looking specifically for remote code execution vulnerabilities, which is
what the original comment was discussing, and which is a subset of what you
posted.

And no, this is not like asking for citations for the sky sometimes being
cloudy because the original comment didn't say "use-after-free sometimes leads
to remote code exploit".

This is like asking for citations for a claim like "whenever the skies are
cloudy it is due to acid rain more than any other reason". And a claim like
that should be accompanied with some citations.

Let's have an honest discussion here, or don't bother, please.

------
ansgri
_It is a design goal to allow Checked C to be a subset of C++ too._

Interesting. In times when many people advocate a safe C++ subset, Checked C
grows the other way, adding C++-compatible notation to represent the most
vital things like smart pointers.

------
eggy
Good timing for me, since I have been falling back to C for some personal
projects. I keep looking at Rust, but I just don't have the time. It would be
nice to leverage the experience I already have, and see what Checked C offers.

------
legulere
This greatly lacks an overview as the specification[1] is very in-depth.

I kind of wonder if they're working on automatic conversion tools between C
and checked C. At least for ptr<> this should be trivial. If a function does
no pointer arithmetic with a * pointer and only uses it in function calls that
take ptr<>, can be converted to a function taking a ptr<>.

[1]
[https://github.com/Microsoft/checkedc/releases/download/v0.5...](https://github.com/Microsoft/checkedc/releases/download/v0.5-final/checkedc-v0.5.pdf)

------
colejohnson66
microsoft.com discussion:
[https://news.ycombinator.com/item?id=11900009](https://news.ycombinator.com/item?id=11900009)

~~~
ansgri
It would be nice to merge these two submissions' comments.

------
silent90
Language extension for maintenance of legacy application sounds a little
bullsh1t to me.

0) If the application needs checks on anything (at the cost of performance)
then higher level language (like C++) should be chosen at design time. No use
for new application. 1) Existing applications will NOT port directly. Real-
life applications are tightly coupled with supported compiler(s), so the
compiler would need the update. Errors/exceptions (like overflows) would need
handling and changes in logic. It could only deny read/write from illegal
area, but without the feedback. The speed is also a major thing. Boundary
check could possible prevent some bugs, but the performance will drop
dramatically (example: commonly used libs like OpenSSL).

One use case I see is to add an extension (like GCC's for instance) for an
existing compiler which does this. User could build a slower debug application
and spot the silent errors during testing. An implementation thing, not the
language extension.

------
fithisux
I wish they supported namespaces.

~~~
chj
+1

------
fredmorcos
So they couldn't be bothered to contribute that directly back to LLVM/Clang or
am I missing something?

~~~
notdonspaulding
Baby steps.

The phrase " _couldn 't be bothered_" is assuming a lot of bad faith on
Microsoft's part. While I would agree that in times past MS didn't deserve the
benefit of the doubt, these days they're a different company.

With no special insight into their reasoning, I think the more likely answer
to the question, "Why didn't they contribute this upstream?" is probably "Give
it time."

------
stuaxo
Amazed at the amount of open source coming out of MS these days, very nice !

------
ArkyBeagle
So databases and browsers are "system software" now?

------
vasilipupkin
why? I mean if you think you really need checked C, then why not just use C++?

------
c3833174
Does it include telemetry calls?

------
known
[https://en.wikipedia.org/wiki/Smart_pointer](https://en.wikipedia.org/wiki/Smart_pointer)

------
EugeneOZ
Microsoft style - be too closed to adopt existing solutions, always invent own
ways/standards.

------
vortico
I've written lots of C and never had problems with buffer overruns, bounds
checking, double frees, and other memory issues. Typically at a glance in
one's code, I can tell when behavior may be undefined, and once someone has a
little experience, they can avoid undefined behavior altogether. Why does so
much work go into fixing these problems? In other words, what are some
examples of use cases of a stricter language like this, that would be too
complicated for human eyes to quickly verify?

~~~
dchest
Here you don't check the return value of calloc():

[https://github.com/AndrewBelt/bored/blob/master/src/map.c#L3...](https://github.com/AndrewBelt/bored/blob/master/src/map.c#L33)

and then access it:

[https://github.com/AndrewBelt/bored/blob/master/src/map.c#L4...](https://github.com/AndrewBelt/bored/blob/master/src/map.c#L44)

Here if realloc() fails, you'll have a memory leak and, again, accessing NULL
later:

[https://github.com/AndrewBelt/bored/blob/master/src/priq.c#L...](https://github.com/AndrewBelt/bored/blob/master/src/priq.c#L28)

There is also integer overflow: alloc is int, so if it becomes greater than
2^31-1, it may wrap around [I think signed int behaviour in C is undefined in
this case], and you'll allocate fewer bytes than needed, leading to buffer
overflow.

~~~
dmytroi
Well, to be honest, very few programs are able to orchestrate >2GB allocations
correctly when complied to 32 bit binary, like Visual Studio linker is not
aware of >2GB sizes by default (one need to /LARGEADDRESSAWARE flag for it),
which was a trouble for modern browsers because linker was unable to fit
everything in limited virtual address space.

And another to be honest, very few programs on desktops/mobile are actually
checking return values of malloc/calloc, because of amount of data that
program operates is usually much smaller than amount of RAM available. It's
sure a case for embedded, but you simply usually don't use malloc for
embedded.

