Hacker News new | past | comments | ask | show | jobs | submit login
Beyond the PDP-11: Processor support for a memory-safe C abstract machine [pdf] (cam.ac.uk)
51 points by jsnell on May 10, 2015 | hide | past | favorite | 12 comments

This is a really interesting paper which I haven't been able to finish in detail yet. Just a couple of quick reactions:

> We would like to be able to make a const pointer a guarantee that nothing that receives the pointer may write to the resulting memory. This allows const pointers to be passed across security-domain boundaries.

I've always wished that were the case too. It would also allow the compiler more optimization opportunities, because it could assume that a call to f(&x) will not change x unless a non-const pointer to x has escaped.

> Container describes behavior in a macro common in the Linux, BSD, and Windows kernels that, given a pointer to a structure member, returns a pointer to the enclosing structure [20]. This may or may not be permitted behavior according to the standard, due to the ambiguous definition of ‘object’.

I'm not sure what interpretation of the C standard would disallow this. For the initial member of a structure it seems particularly clear-cut that it is allowed:


"A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning."

This seems to pretty unambiguously state that you can convert a pointer to an initial member to a pointer to the enclosing struct. I can't imagine what interpretation of the standard would disallow this, or even disallow using offsetof() to convert a non-first member to a pointer to the struct.

To get from a pointer to member to a pointer to an object, you have to subtract the offset of the member within the struct from the pointer you have.

The point of discussion is that there is no portable way to write the 'offsetof' macro, even though every compiler on the planet (AFAIK, YMMV) works fine with

  #define offsetof(st, m) ((size_t)(&((st *)0)->m))
See http://en.m.wikipedia.org/wiki/Offsetof#Implementation, http://stackoverflow.com/questions/6700114/portability-of-us...

Long story short: use what your compiler provides in stddef.h, and you'll be fine. If it doesn't provide offsetof, migrate to a modern ISO C compiler, or copy the definition used above, and test.

The C standard guarantees Pointer to Struct ~ Pointer to (transitively)-initial Member (through casts, either directly or via `void*` or via `intptr_t`). It doesn't guarantee Pointer to Struct + offset ~ Pointer to non-initial Member for any value of offset.

So you can easily use &struct_var->member to go from pointer-to-struct to pointer-to-member.

It sounds like you are claiming that there is no way to go back.

I worked on a 68K emulator that supported heap tagging, so you could:

- detect multiple frees of the same memory, even when pointers had been returned by subsequent allocation calls (e.g., alloc => A, free(A), alloc => A', free(A) would trap, where A and A' have the same program-observable values)

- trap reads and writes to free'd memory, even if the memory had been subsequently re-allocated

- do strong bounds checking (can't read or write heap block B with an offset of a pointer to A, even if you computed a valid address in B)

- trap reads of uninitialized memory (you did mean to zero that buffer you allocated, right?), clear down to "that read of 32 bits included 8 bits that weren't initialized, did you mean that?"

It wasn't perfect (you could do pointer arithmetic and smuggling operations that foiled the tagging), but we found some nice bugs with it.

You can approach this with MMU games, but tagging all your values is more powerful.

As I understand it, the lowRISC project is proposing to do something like this: http://www.lowrisc.org/

Could you point me to some more info on this?

Internal project for a company I worked for about 15 years ago. Proprietary and dead.

I can say that a modern CPU ran the 68K emulator at an instruction ratio of well over 100 to 1, with similar ratio of memory footprint, and that you can do quite interesting things with all those extra resources.

This sounds essentially the same as Valgrind (or ASan if you want to go statically compiled), so you could always check that out.

They built a custom CPU on an FPGA that is memory-safe for widely used C undefined behavior. They still need "fat pointers"; this doesn't let them run existing executables, or anything close to them. It's neat that they were able to do this. It's painful that they had to.

The "container"/"offsetof" problem is strange. If you only have a pointer to a member of a struct, you don't really know the type of the containing struct. You don't even know if there is a containing struct. "offsetof" implies an unchecked cast, violating type checking. Why is that used anywhere, let alone in the Linux kernel? (In decades of C/C++ programming, I never wanted to do that.)

This is why I really hope Rust can replace C. (As I write occasionally, "Rust guys, please don't screw it up".) Rust is complicated, but it's simpler than the semantics of C undefined behavior. That may be the answer to use on C programmers who think Rust is too complicated. All those little issues in C, such as "can you dereference the element one past the end of the array" and "after dereferencing, what happens if you then test for a null pointer" require a good understanding of the machine model and how an optimizing compiler works.

> why Linux uses offsetof

Essentially, downcasting with "multiple inheritance" (non-linear subtyping) and thin pointers - objects can be members of several intrusive linked lists, held by different parts of the kernel, and container_of is used to get an object from the LL pointer after you know the type.

Pretty cool. If C was designed to run nicely on the PDP-11 its slightly strange that that could be a better machine for it to run on. Maybe in the future somebody will make a language for the CHERI machine.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact