

Beyond the PDP-11: Processor support for a memory-safe C abstract machine [pdf] - jsnell
http://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201503-asplos2015-cheri-cmachine.pdf

======
haberman
This is a really interesting paper which I haven't been able to finish in
detail yet. Just a couple of quick reactions:

> We would like to be able to make a const pointer a guarantee that nothing
> that receives the pointer may write to the resulting memory. This allows
> const pointers to be passed across security-domain boundaries.

I've always wished that were the case too. It would also allow the compiler
more optimization opportunities, because it could assume that a call to f(&x)
will not change x unless a non-const pointer to x has escaped.

> Container describes behavior in a macro common in the Linux, BSD, and
> Windows kernels that, given a pointer to a structure member, returns a
> pointer to the enclosing structure [20]. This may or may not be permitted
> behavior according to the standard, due to the ambiguous definition of
> ‘object’.

I'm not sure what interpretation of the C standard would disallow this. For
the initial member of a structure it seems particularly clear-cut that it is
allowed:

C99, 6.7.2.1.13:

"A pointer to a structure object, suitably converted, points to its initial
member (or if that member is a bit-field, then to the unit in which it
resides), and vice versa. There may be unnamed padding within a structure
object, but not at its beginning."

This seems to pretty unambiguously state that you can convert a pointer to an
initial member to a pointer to the enclosing struct. I can't imagine what
interpretation of the standard would disallow this, or even disallow using
offsetof() to convert a non-first member to a pointer to the struct.

~~~
arielby
The C standard guarantees Pointer to Struct ~ Pointer to
(transitively)-initial Member (through casts, either directly or via `void*`
or via `intptr_t`). It doesn't guarantee Pointer to Struct + offset ~ Pointer
to non-initial Member for any value of offset.

~~~
haberman
So you can easily use &struct_var->member to go from pointer-to-struct to
pointer-to-member.

It sounds like you are claiming that there is no way to go back.

------
kabdib
I worked on a 68K emulator that supported heap tagging, so you could:

\- detect multiple frees of the same memory, even when pointers had been
returned by subsequent allocation calls (e.g., alloc => A, free(A), alloc =>
A', free(A) would trap, where A and A' have the same program-observable
values)

\- trap reads and writes to free'd memory, even if the memory had been
subsequently re-allocated

\- do strong bounds checking (can't read or write heap block B with an offset
of a pointer to A, even if you computed a valid address in B)

\- trap reads of uninitialized memory (you _did_ mean to zero that buffer you
allocated, right?), clear down to "that read of 32 bits included 8 bits that
weren't initialized, did you mean that?"

It wasn't perfect (you could do pointer arithmetic and smuggling operations
that foiled the tagging), but we found some nice bugs with it.

You can approach this with MMU games, but tagging all your values is more
powerful.

~~~
sitkack
Could you point me to some more info on this?

~~~
kabdib
Internal project for a company I worked for about 15 years ago. Proprietary
and dead.

I can say that a modern CPU ran the 68K emulator at an instruction ratio of
well over 100 to 1, with similar ratio of memory footprint, and that you can
do quite interesting things with all those extra resources.

------
Animats
They built a custom CPU on an FPGA that is memory-safe for widely used C
undefined behavior. They still need "fat pointers"; this doesn't let them run
existing executables, or anything close to them. It's neat that they were able
to do this. It's painful that they had to.

The "container"/"offsetof" problem is strange. If you only have a pointer to a
member of a struct, you don't really know the type of the containing struct.
You don't even know if there _is_ a containing struct. "offsetof" implies an
unchecked cast, violating type checking. Why is that used anywhere, let alone
in the Linux kernel? (In decades of C/C++ programming, I never wanted to do
that.)

This is why I really hope Rust can replace C. (As I write occasionally, "Rust
guys, please don't screw it up".) Rust is complicated, but it's simpler than
the semantics of C undefined behavior. That may be the answer to use on C
programmers who think Rust is too complicated. All those little issues in C,
such as "can you dereference the element one past the end of the array" and
"after dereferencing, what happens if you then test for a null pointer"
require a good understanding of the machine model and how an optimizing
compiler works.

~~~
arielby
> why Linux uses offsetof

Essentially, downcasting with "multiple inheritance" (non-linear subtyping)
and thin pointers - objects can be members of several intrusive linked lists,
held by different parts of the kernel, and container_of is used to get an
object from the LL pointer after you know the type.

------
ape4
Pretty cool. If C was designed to run nicely on the PDP-11 its slightly
strange that that could be a better machine for it to run on. Maybe in the
future somebody will make a language for the CHERI machine.

