
Pointers Are Complicated, Or: What's in a Byte? (2018) - pcr910303
https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html
======
raphlinus
Also potentially relevant: Ralf, or I should say, Dr. Jung, recently completed
a PhD, which I'm sure will have a lot more fascinating material for those
interested in this paper. I'm hoping to find time myself to read it, but I
seem to spend too much time on sites like Hacker News...

[https://www.ralfj.de/blog/2020/09/03/phd.html](https://www.ralfj.de/blog/2020/09/03/phd.html)

------
0x09
The article doesn't touch on this, but in C pointer types are also the only
kind that can be invalidated without any apparent change to their value:

    
    
      void *x = malloc(...);
      free(x);
      if(x); // undefined behavior
    

Note that this isn't about dereferencing x after free, which is understandably
not valid. Rather the standard specifies that _any_ use of the pointer's value
itself is undefined after being used as an argument to free, even though
syntactically free could not have altered that.

This special behavior is also specifically applied to FILE* pointers after
fclose() has been called on them.

If there is some historical reason / architecture that could explain this part
of the specification I would be interested to hear the rationale, this has
been present in mostly the same wording since C89.

~~~
jcranmer
My gut instinct is as follows:

    
    
      void *x = malloc(...);
      void *y = malloc(...);
      assert(x != y); // standard guarantees this [1]
    

Yet it's fairly reasonable that:

    
    
      void *x = malloc(...);
      free(x);
      void *y = malloc(...); // malloc reused x's allocation here.
    

So, in effect, guaranteeing that the results of two mallocs can never alias
each other, while allowing the implementation to reuse freed memory, requires
semantically adjusting the value of a pointer to a unique, unaddressable
value.

[1] I think, but I'm not sure which versions of C/C++ added this guarantee

~~~
firethief
This would be kind of hilarious if true.

It seems like they could have just said: malloc won't give you a pointer that
overlaps with the storage of any _live_ malloc'd object. Such a malloc is
implementable without too much trouble. But instead, they gave a stronger
guarantee--that all malloc'd pointers would be "unique". It would be
unboundedly burdensome on the implementation to meet this property, so what do
they do? Update the standard to offer the achievable guarantee? No! They add a
new rule, ensuring that it's impossible to _observe_ that the stronger
guarantee is not met without doing something "illegal". Instead of getting
their act together, they have elected to punish whistleblowers.

~~~
firethief
On second thought, the choice they've made isn't as sadistic as it sounds. I
was thinking of the standard as a contract between the language implementor
and the programmer, but actually it is a contract between the language
implementor, the programmer, and arbitrarily many other programmers. The
stance they have chosen mandates a social convention, that the names of the
dead will never be spoken. If everyone builds their APIs with this covenant in
mind, it makes it possible to use pointers as unique ids (for whatever that's
worth). C has never had much in the way of widely-followed social conventions,
so practically speaking, the only way to ensure everyone knows they can depend
on other programmers behaving this way is for the compiler to flagellate
anyone who steps out of line.

------
stephc_int13
I read the article and I am not convinced by the author arguments.

He is talking about edge cases. I've been using C and C++ for over 20 years
now, for both low level and high level stuff, and I never needed to know about
those edge cases.

And I think you don't want to know about them, you don't need to know about
them to write good code.

Nitpicking about edge cases tradeoffs made by compiler designers and language
standards decades ago is like attacking under the belt, in my opinion.

~~~
SubjectToChange
Can you clarify what you disagree with in the article? The author wasn't
saying programmers need to understand all the complexities of pointers to
write good code, nor was he saying anything about the quality of those
languages.

Also, it is incredibility defensive to accuse the author of "attacking under
the belt" when his observation wasn't aimed at any specific language.

~~~
stephc_int13
From my opinion, this is a way to try and push the adoption of Rust by shading
an unpleasant light on the current contender : C/C++

An other way to say is that this is propaganda.

~~~
steveklabnik
That doesn’t make sense, given:

> Both parts of this statement are false, at least in languages with unsafe
> features like Rust or C:

The two languages share these features, which is the whole reason this is
being written in the first place.

~~~
stephc_int13
Unsafe as a language feature is something that was invented by the Rust design
team. And I don't think there is much value to this concept.

This is like calling code "ugly" and saying that's a feature.

Pointers are not unsafe.

------
saagarjha
In addition to all the things mentioned above, there's also another kind of
byte: structure padding. Nobody really knows how to deal with it, and it has
extremely strange rules–it's kind of uninitialized in some cases, but you can
give it a value sometimes…and sometimes you can't, or not in any way that
sticks. It's truly strange stuff, and its exact nature is still being worked
out.

~~~
nwmcsween
Pack your structs to not have padding? The probable reason it sometimes works
is purely how the compiler decides to do the copy. Anything that relies on
implicit padding is horribly broken software.

~~~
saagarjha
I think you might be misunderstanding my comment, I’m not complaining about
struct padding causing issues in my code, I’m saying the standard is wishy-
washy about it. This can be problematic when you are doing things like
memcpying a structure, or trying to zero it out before sending it somewhere
else. (And, FWIW, there is no standard way to disable struct padding, and
doing so will likely generate slower code anyways.)

~~~
nwmcsween
How is the standard not explicit? copying with padding doesn't need to copy
the padding at all or can if it's faster, memcpy is a intrinsic the compiler
can copy whatever it deems wrt padding.

~~~
saagarjha
Because there are certain cases in C11 where padding can be legally observed
because it is defined to be set to zero–brace initialization, for example. It
is unclear what happens in this case when you do anything with the memory
other than read it (for example, storing to a structure member, based on the
current reading of the standard, may make padding indeterminate again–plus, is
"memcpy" a "read"?). See here for some of the questions raised:
[http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1793.pdf](http://www.open-
std.org/jtc1/sc22/wg14/www/docs/n1793.pdf)

------
teddyh
It always bothered me that C, and similar languages which are termed “low
level”, aren’t actually compiling to anything like the _actual_ low level
hardware anymore. The match used to be quite close in the 1980’s, but nowadays
the “machine code” which is “run” by a CPU is actally a kind of virtual byte
code language with an interpreter implemented in microcode inside the CPU. But
this byte code has flaws: For instance, this virtual machine byte code
language has little or no sense of memory caches (of any kind), out-of-order
execution, branch prediction, etc. Both the compiler and the CPU knows (or
could know) about these things, but the compiler has no way to communicate
this to the CPU other than using this ’80s style byte code which does _not_
have these concepts. It’s like talking about advanced mathematics using
nothing but words in “The Cat in the Hat” – a rather narrow communication
pipe.

I’d always imagined that as parallelism rose, a new model of virtual machine
would rise with it, with its own “assembly” (i.e. low level language closely
tied to the machine), which would in turn be the target of a compiler of a new
parallel-by-default high level language. Alas, this has not happened.

~~~
x87678r
> out-of-order execution

This broke me. I loved C++, I love pointers and shared memory, synchronization
etc but optimisations finally creeped me out enough to switch to Java.

~~~
jimmaswell
I don't get the motivation here. It's all invisible to you, how is it being
weird enough to make you not want to use it?

------
dang
Discussed at the time:
[https://news.ycombinator.com/item?id=17604402](https://news.ycombinator.com/item?id=17604402)

