
Moving to a provenance-aware memory object model for C: proposal for C2x - matt_d
https://hal.inria.fr/hal-02089889/
======
zik
Can anyone summarise?

~~~
pm215
The "n2362+appendix" pdf is only 7 pages if you ignore the appendix which is
"changes to the actual C standard to implement the proposal", so it's not that
long to read, and it starts with a description of the basic idea. The short
summary is that it's trying to formalize an existing idea (which has been
discussed by the committee and used by compiler implementations but not really
nailed down in the standards text) about when the compiler can assume two
pointers don't alias. Intuitively, if you create a pointer by dubious
arithmetic means (eg by adding 1 to the address of a local variable) the
compiler shouldn't have to assume that that might alias with other local
variables, or it would generate terrible code out of paranoia. But you do want
common operations where a legitimate pointer is arithmetically manipulated to
work. The paper proposes a formalization that accommodates reasonable things
but rules some implausible/weird stuff as UB to accommodate both those
desires.

~~~
simias
It's interesting work but I really wonder if it's desirable. What real-world
problems is this supposed to solve exactly? Making the compiler extra-clever
about detecting aliasing seems like a good recipe for having broken code
appear to work correctly until it doesn't or getting strange behavior that's a
pain to debug. Current aliasing rules are already a significant pain point in
the C standard IMO, I personally don't want them to become even more
complicated.

The only concrete example given in the paper (having two contiguous objects
and referencing the 2nd one through a pointer on the first one) is frankly
contrived and I'm not sure how that can be considered reasonable or good code.
The only time I could see myself writing code like that is when dealing with
memory-mapped hardware registers at fixed addresses which would let me do
funky pointer arithmetic, but in this scenario the pointer is usually tagged
as volatile and sometimes accompanied by a memory barrier to make sure the
compiler does the right thing.

Note that if the programmers _really_ wanted the code given in the paper to
work correctly (i.e. changing y through an offset pointer from x) they could
do it by adding a memory barrier such as

    
    
        asm volatile("": : :"memory");
    

after the *p = 11;. Is it elegant? Arguably not, but it seems like such a
niche use to me that it seems like a fair compromise. It's also not standard C
but if you're making such pointer gymnastics chances are you're already making
assumptions about your environment that go beyond the standard.

I much prefer Rust's approach of "mutable references never alias" and if the
programmer really needs to share a reference they have to manually do the work
by wrapping. Of course integrating that into C in a backward-compatible
fashion doesn't seem very practical.

~~~
fooker
>I much prefer Rust's approach of "mutable references never alias"

This rules out a vast majority of fundamental algorithms. Jumping through
hoops for getting simple logic working doesn't sound very desirable.

~~~
ekidd
>>I much prefer Rust's approach of "mutable references never alias"

>This rules out a vast majority of fundamental algorithms.

To be more precise, in Rust:

\- '&mut T' (a mutable reference to T) can never alias.

\- ' _(asterix)_ mut T' (a mutable pointer to T) can alias.

The advantage is that you can write most production code with a mix of '&T'
and '&mut T', which allows the compiler to assume no mutable aliases anywhere.
But if you're building a doubly-linked list, then you can explicitly choose '
_(asterix)_ mut T'. (For other options, see [https://rust-
unofficial.github.io/too-many-lists/.](https://rust-unofficial.github.io/too-
many-lists/.))

In practice, it depends a lot on the code you're writing. I've written tens of
thousands of lines of production Rust without needing shared mutability. But
if I were writing a traditional game or GUI toolkit, then I might miss shared
mutability a lot more.

~~~
hedora
The & semantics sound like fortran’s semantics.

How does the rust compiler know if two references passed in from calling code
can alias or not? (I.e., what if I get the type annotations wrong?).

In general, determining if two pointers can alias is equivalent to the halting
problem.

The default C way is “assume aliasing unless you can prove otherwise”. This
seems safer and like less work to me.

~~~
Ar-Curunir
The compiler ensures that you can only take one &mut reference at a time. You
can have as many immutable `&` references as you want, because you can't
change anything through those references (unless you do some unsafe trickery)

~~~
cesarb
> you can only take one &mut reference at a time. You can have as many
> immutable `&` references as you want

There's another small detail: these two cases are mutually exclusive. You can
either have one &mut reference, or as many & references as you want. This
means that code receiving a & reference can be sure the object won't change,
and that code receiving a &mut reference knows the object can be modified
safely since nothing else could be looking at it. (Assuming the object doesn't
contain a Cell, the rules are a bit different when a Cell is involved.)

------
growtofill
Wow, what a horribly designed website.

Here’s the direct link to the pdf:
[https://hal.inria.fr/hal-02089889/document](https://hal.inria.fr/hal-02089889/document)

~~~
nestorD
Its a HAL website, the french equivalent of arxiv. Most french research
institute have one to publish openly.

[https://en.wikipedia.org/wiki/Hyper_Articles_en_Ligne](https://en.wikipedia.org/wiki/Hyper_Articles_en_Ligne)

~~~
msla
It's still a horribly-designed website.

------
layoutIfNeeded
Sorry for my ignorance, but my impression was that in the native spaces the
consesus is that Rust is the only reasonable choice for any new code. So I
don’t understand why people are still working on a C2x standard - shouldn’t
the native community work on a roadmap to migrate the existing legacy C/C++
codebases to Rust instead? Am I missing something here?

~~~
WoodenChair
> Sorry for my ignorance, but my impression was that in the native spaces the
> consesus is that Rust is the only reasonable choice for any new code.

Please tell that to the many millions of C/C++ programmers writing new code
every day, who are only vaguely familiar with Rust if at all. If by consensus
you mean "consensus amongst the vanguard of Hacker News" then yes. If you mean
"consensus" as in the "consensus of the software development industry" then
absolutely not.

~~~
adamnemecek
If you program but don’t post about it on HN, are you really programming?

~~~
aduitsis
Not really, no.

Also, Perl does not exist and is actually an urban legend from the '90s.

