
Pointers are Easy, Optimization is Complicated - pcr910303
https://blog.metaobject.com/2020/09/pointers-are-easy-optimization-is.html?m=1
======
jcranmer
While it's true that the difficulty of building a semantic model of pointers
is mostly related to permitting optimizations, I don't think the author
actually understands how many optimizations are actually disabled by a
simplistic pointers-are-integers model.

Let me give an example here. Consider these two functions:

    
    
      void a() {
        int x = 5;
        return x;
      }
    
      void b() {
        return 5;
      }
    

In a simplistic, pointers-are-integer model, these two functions are _not
equivalent_ , and it is illegal for any compiler to transform the first into
the second. This is because declaring the variable x means it must have some
associated storage, which can be accessed by some unknown integer name. Any
pointer which points to that integer value can then observe its value, and
deleting the store to that temporary value is a potentially observable change
to a legal program.

To permit this basic optimization, you need to have to some sort of rule that
allows you to reason that a value whose address is never taken can never be
legally accessed by a pointer. To actually effect this rule in semantics, you
now have to start tracking extra metadata about what pointers can and cannot
access... which is exactly the kind of model that's being criticized.

~~~
dnautics
maybe there's a question here: If you really need the kind of performance
afforded by that level of fine-grained optimization, shouldn't you instead
consider emitting better code, at least in hot loops? It seems less dangerous
to have the code do semantically exactly what you tell it to in the 90% case,
because otherwise you can unwittingly introduce difficult-to-reason-about
code. For a GP programming language, this could be possibly deployed in
mission-critical code where lives are at stake.

~~~
jcranmer
To get an idea of the kind of performance degradation involved, make every
variable and every pointer in your program volatile.

Honestly, the actual pointer model of C is probably easier to understand than
a everything-can-randomly-change-anything pointers-are-integers model.
Undefined behavior gets a bit of a bad rap, and that's arguably largely
because of unnecessarily gratuitous usage of it (signed integer overflow and
strict aliasing being the biggies). But undefined behavior is actually a
contract where the programmer promises not to do certain things. For pointers,
the contract boils down to the following:

* Do not violate strict aliasing (I'd call this an unnecessary rule, and it can be disabled in most compilers).

* Do not read memory you haven't initialized.

* Do not use memory after it's deallocated.

* Do not look outside the bounds of the object you are given.

* Do not look at memory you haven't been told about.

There is very little reason to ever need to violate these rules. In fact,
violating these rules tends to result in hard-to-reason semantics: suddenly,
people can read things they shouldn't be able to, or decide to do things like
execute random code.

~~~
petergeoghegan
I agree with your points about undefined behavior. I'm pleasantly surprised
that you didn't defend C's terrible strict aliasing rules.

Your interpretation seems uncharitable. I imagine that you've interpreted the
author's words literally, as somebody that has expert knowledge of compiler
internals. Could the author really have meant to propose an "everything-can-
randomly-change-anything pointers-are-integers" model?

~~~
sanxiyn
What is the author's model then? I agree everything-can-randomly-change-
anything interpretation is uncharitable, but then I am confused what the model
is. (As you pointed out, that's the literal interpretation.)

I think there is no model, and the author is just vaguely complaining they are
not satisfied with status quo. But then the tone is unbelievably arrogant and
deserves all the derision.

By the way, I am of the opinion C's pointer model can be greatly improved to
be programmer friendly, beyond getting rid of strict aliasing rule. For
example, I think comparing two unrelated pointers (for sorting, lock ordering,
etc.) should be well defined and not UB. But claiming "pointers are easy" is
not the way to go.

~~~
petergeoghegan
Why would you assume that the author must have a formal model? It's a short
blog post.

Perhaps the author really is deserving of derision to some degree - who can
say? It should not be based on a straw man that _nobody_ could possibly
believe, though.

------
Diggsey
> I prefer the simple and obvious pointer model. Vastly.

That's great, but then you're not writing C anymore. When you write C (or
indeed, almost any language in use today) you are writing code against an
abstract machine, not against real hardware.

Pointers are integers on (most) real hardware. They are indisputably _not_
integers in the C abstract machine.

The reason for this abstract machine existing is not to enable optimizations,
it's to allow the code to be portable. If there was no abstract machine, you
would have to reason about the correctness of your code for each target
architecture independently. With an abstract machine, you can reason that your
code is correct on the abstract machine, and the compiler can do the hard work
of translating that to each target architecture in a way that preserves well-
defined behaviour.

Now, given that the abstract machine exists, you could ask why it is so
complicated, and the answer this time _is_ for optimizations, and also to
allow the translation to the target architecture to introduce as little
overhead as possible.

~~~
scottlamb
> Pointers are integers on (most) real hardware.

I'm curious. Can you elaborate on "most"? The closest thing I know of to an
exception is ARM's memory tagging extension stuff, [1] and I'm not sure if
that qualifies because it looks like the actual checking of the tag is done by
compiler-emitted assembly rather than automatically by the hardware. [2] I
suppose it at least violates the idea that casting the pointer to an integer
type doesn't change the value and still allows you to sort addresses via a
simple <.

[1] [https://security.googleblog.com/2019/08/adopting-arm-
memory-...](https://security.googleblog.com/2019/08/adopting-arm-memory-
tagging-extension.html)

[2]
[https://clang.llvm.org/docs/HardwareAssistedAddressSanitizer...](https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html)

~~~
Diggsey
There's CHERI:
[https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/](https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/)

------
Taniwha
Really what is being argued here is that C pointer optimisation is difficult -
and because C is designed to do stuff like coding the insides of the 'new'
primitive (ie malloc) you can't make the sorts of assumptions about aliasing
that you can in other languages.

I'd argue that if you don't understand that you probably shouldn't be coding
in C (I'll let others argue about C++)

~~~
saagarjha
Coding a malloc in C is actually fairly questionable from a standards
perspective. You really have to consider the entire address space to be one
big object for this to really work.

------
sanxiyn
If pointers are integers, all pointers escape. This is not a matter of "nice
to have" optimizations.

(The usual model used by compilers is that pointers are not integers, but a
pointer _becomes_ an integer when it is casted to an integer. This prevents
majority of pointers that are not casted to integers from escaping, which is
important.)

------
drivebycomment
If you don't like the current C standard and you want to change the standard,
feel free to propose a change. The standard is there for a reason. If everyone
adds their own extra constraints to the language, the standard falls apart and
becomes useless and meaningless.

I will not be holding my breath waiting for the committee to accept this
though. The chance of this being accepted seems really low, as it has serious
consequences for the meaning of the large amount of existing C code.

~~~
sanxiyn
Eh no. While I agree it won't be accepted, this is more restrictive semantics,
so it doesn't cause any compatibility problem.

------
a_t48
> I find this fascinating: a "nice to have" optimzation is so obviously more
> important than a simple and obvious pointer model that it doesn't even need
> to be explained as a possible tradeoff, never mind justified as to why the
> tradeoff is resolved in favor of the nice-to-have optimization.

Hold up - there's also a tradeoff in assumptions you can make about well
behaving code. It's _also_ an optimization in reading and internalizing the
code.

------
Sulik
To me, the philosophy of C (and even C++ before things became nuts) is that
you should be able to reasonably guess the assembly code resulting from the C
code, the idea being that you can write assembly code with much less typing.
These days, things seem to be moving in a more dogmatic direction with the
underlying assumption that the vast majority of programmers are bad
programmers.

~~~
Ar-Curunir
That’s not true anyway today, because the assembly generated depends on the
architecture, and there’s sufficient diversity there.

The only thing that you can predict about compiled C code is the instructions
in the C abstract machine, because that’s what C targets.

~~~
Sulik
You can predict a pseudo-assembly output for a mental model of the
architecture(s) you're targeting. It doesn't have to be an exact match,
register allocation and all.

------
tjalfi
If you want PCC, you know where you can find it.

