Hacker News new | past | comments | ask | show | jobs | submit login

The float/int aliasing via pointer is the strawman version of this which is used to sell the undefinedness and the optimizations as a good thing.

In fact, int/int aliasing is also undefined behavior:

   {
     struct foo { int x; } sfoo = { 42; };
     struct bar { int x; } *psbar = (struct bar *) &sfoo;

     return psbar->x; // UB, even though int accessed.
   }
Why this is undefined ias that psbar->x is actually (* psbar).x. There is an lvalue ere which is (* psbar) and that is of type struct bar; it doesn't match the type of the object which is struct foo.

Maybe it's time for an alternative to ISO C: a safe C with more behaviors pinned down.

It should be that when an lvalue is formed by a chained expression like a->b[2].c.d the type of the access should be considered to be the final type which designates the data object, without regard for the intermediate types in the chain. The object should be deemed to be accessed via that type. So if the .d member in a->b[2].c.d has type int, and the object being designated has that type, everything is good.




> Maybe it's time for an alternative to ISO C: a safe C with more behaviors pinned down.

There was a proposal for such alternative [1], which was abandoned later [2] due to the difficulty to get a single set of limitations that everybody would expect.

[1] http://blog.regehr.org/archives/1180

[2] http://blog.regehr.org/archives/1287


Why would you expect the above to work? It is a pretty clear violation of the type rules on any sane language.

For what is worth, I would love to be able to wrap logically different types of float and ints in distinct structures so that I can benefit from TBAA more, but I believe that GCC, for many reasons, internally uses a structural type system to track TBAA.


> It is a pretty clear violation of the type rules on any sane language.

Is it? What is "sane"?

  This is the TXR Lisp interactive listener of TXR 141.
  Use the :quit command or type Ctrl-D on empty line to exit.
  1> (defstruct foo nil x)
  #<struct-type foo>
  2> (defstruct bar nil x)
  #<struct-type bar>
  3> (defun access-x (obj) obj.x)
  access-x
  4> (access-x (new foo x 42))
  42
  5> (access-x (new bar x 42))
  42
This is so insane that it will work even if x is at different offsets in foo and bar.

We can whip up a similar thing statically with, say, C++ templates.

Note that C does explicitly allow this:

   {
     struct foo { int x; char other; };
     struct bar { int x; double other; };
     union combined { struct foo f; struct bar b; } c;

     c.f.x = 42;
     return c.b.x;
   }
If structs which have some common sequence of leading members (same names and types) are combined via a union, then those common members can be accessed through any member of the that union.


> This is so insane that it will work even if x is at different offsets in foo and bar.

You're accessing foo.x or bar.x. You're just doing it in a generic way. That's fine and should work, in any language.

> If structs which have some common sequence of leading members (same names and types) are combined via a union, then those common members can be accessed through any member of the that union.

Unions have defined semantics in all cases, no?

    union combined { float f; int i; } c;
    c.f = 1.0f
    return c.i;
will return some (possibly implementation-defined) value rather than being undefined behaviour, because that's the whole point of a union - access to different fields of a union shouldn't be an aliasing violation.


Unions have well-defined semantics in that special case, spelled out in ISO C.

I have a copy of C99 handy, where it is paragraph 5 in 6.5.2.3 (Structure and union members):

"One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members."

The part about "anywhere that a declaration of the complete type of the union is visible" isn't superfluous verbiage: it means that the aliasing doesn't have to go through the union; but it has to be in a region of code where the union is declared. So for instance, it is no longer well-defined if you do this:

   // external fun has no idea about the union
   external_fun(&c.f);

   // exernal_fun is this:

   void external_fun(struct foo *pf)
   {
     struct bar *pb = (struct bar *) pf;
     // access through pb
   }
Here, c.f is a member of a union where and has a "common initial part" with c.b of the same union object. But this is not known in external_fun which has no declaration of the union and so the translation of external_fun doesn't have to honor this aliasing if there is no completely declared union type in scope which aliases struct foo and struct bar.

(See, C goes out of its way to make it clear that aliasing between different struct types is bad: even when an exception is made, there is weasel wording to carefully contain it!)


The 'visible union' rule is generally understood as meaning that the union type must be part of the access expression. Any other interpretation that only requires the union to be in scope would pretty much render TBAA moot for any non trivial program.

And yes, this rule is part of the reason the current TBAA rules are such a mess. There is an outstanding C issue to clarify the wording here but the C commitee is severely under capacity.

There are proposed C++ wordings that clarify this rule in addition to generally clarify lifetimes and TBAA (while still being far from settling the issue, mind you). These wordings also introduce std::launder(), a function that bless some of the type punning tricks by explicitly informing the compiler about what's going on.


> Any other interpretation that only requires the union to be in scope would pretty much render TBAA moot for any non trivial program.

An interpretation of the 'visible union' rule as being in scope is the only interpretation which doesn't render that text completely unnecessary.

Of course if the access is taking place through the union, the union must be completely declared! When would you be able to access a union member as u.x, or pu->x, when the type of u or *pu is incomplete?


is this about strict or static typing, which lisp doesn't use?


It is and isn't. There is only one access-x function, and it receives obj, which is de facto a pointer into the heap (a boxed object). So it's a case of aliasing: "access slot x in this thing, regardless of its specific type". It is analogous to what the C code is doing when it does type punning. It is tangentially relevant because C implementations of dynamic languages sometimes bend the rules to skirt with aliasing.


Because C's type rules are often thought of as syntactic sugar for pointer arithmetic, not sacrosanct definitions of what is or is not an X or a Y.


> Why would you expect the above to work?

Because

1. What it is trying to do is perfectly clear.

2. What it is trying to do is not only perfectly clear, but has a completely machine independent interpretation.

3. "Work" is better than "not work", all round, for everyone involved.


1. It is so perfectly clear that no two C programmers can agree to a common set of rules.

2. A machine independent interpretation would prevent any memory safety santiizer to do its job, which it seems to me much more useful than allowing such shenanigans.

3. The best resolution for everyone involved is for the program to abort as soon as it detect a rule violation. "Working" for some ill defined definition of working is not useful to anyone.


No two C programmers can possibly disagree about what that code should do if it were to be defined.

There are many ways of adding requirements to make it defined (and those ways will differ in what else they make defined).

> The best resolution for everyone involved is for the program to abort as soon as it detect a rule violation.

The pointer cast in the code makes it clear that the aliasing between the identically structured struct is intentional. (If we remove the cast, we have a diagnosable constraint violation: no need to whip out the memory sanitizers).

At least we seem to agree as far as that optimizing based on the assumption that aliasing does not take place is one of the worst things that could happen. The rule violation is not diagnosed, and the obvious meaning doesn't occur either, but rather some weird result. For instance, garbage is accessed because the underlying struct is not yet initialized (on grounds that it's not necessary to initialize something before an improperly typed access to it).


"No two C programmers can possibly disagree about what that code should do if it were to be defined."

The whole point is that there is no agreement on how they should be defined.

"At least we seem to agree as far as that optimizing based on the assumption that aliasing does not take place is one of the worst things that could happen"

I do want the compiler to optimize assuming no aliasing but first I want: a) the rules to be clarified beyond the current mess; b) have a strict compilation mode (possibly on by default) that will enforce the rules so that mistakes can be caught. Such mode would necessarily be very slow as the C language is very unfriendly to such an analysis (in the worst case you need a shadow type map for every memory location), but given the nature of aliasing violations, even a debug only mode should catch the large majority of mistakes.


> Why would you expect the above to work? It is a pretty clear violation of the type rules on any sane language.

It doesn't violate structural typing rules, so I disagree.


You are actually right that it will work on a structurally typed language. Have an upvote.


By the way, in the original C language by Dennis R. struct member names were in a kind of global namespace of all such names. So for instance if we had a

  struct stat { ... int st_ino; ... }

  struct dirent { ... int d_ino; ... } d;
It was possible to access:

  d.st_ino;
The compiler would just use the st_ino offset from struct stat, and generate an access into the struct bar b.

This is the main reason members have funny prefixes in Unix structures. Once upon a time they had to, to prevent clashes.


Indeed, it depends how one should interpret struct declarations. If you consider all structs structurally typed in C, then that should pass. If you only consider anonymous structs to be structurally typed, and named structs are the equivalent of newtype in Haskell (which is how C currently behaves I think), then the sample as written should fail, though using an anonymous struct for the alias would pass.

I think the latter semantics makes for a better language because you can exploit the type checker to greater effect, but the former is certainly not unequivocally wrong.


Structs with the same structure being equivalent for the correctness of access, while still denoting incompatible types, is useful.

Let's face it, if we have this:

   void some_library_api(foo *object, int arg);
we want a diagnostic if we do

   bar b;
   some_library_api(&b, 42);
otherwise we are almost back to C with no prototypes.

We probably want that diagnostic even if it happens that both struct foo and struct bar are in scope and look identical.

Maybe they are accidentally identical. The program then compiles. A new version of the API comes out and things break. Needless tensions arise between the users and the API provider.


I think that at least for standard C and certainly for C++, the type system is meant to be nominal. Unnamed structures are better understood as having a unique but unnameable type. Anonymous structures are just weird and probably better understood as being just a syntactic aggregation without any type level connotation (C++ doesn't have anonymous structures; it does have anonymous unions though).


I agree it's mostly nominal, but anonymous structs using structural typing semantics are incredibly useful. Consider for instance this general function pointer macro [1], which you can use with any number of arguments like so:

    fn(int, char, double) f = &functionMatchingSignature;
and which you can invoke with just:

    double d = f.call(arg0, arg1);
and it type checks correctly and ensure the correct argument types are applied. Way better than the typical, double(*f)(int, char). You can go even further and implement full closures in C11, curried and non-curried, and it's mostly all type safe using anonymous structs (see the rest if the repo for the work in progress).

[1] https://github.com/naasking/cclosures/blob/typed-clo/closure...


I appreciate any form of clever preprocessor abuse, but I fail to see how the above, by itself, is an improvement to the fn pointer declaration syntax and why do you need the unnamed (not anonymous btw) struct. Couldn't you make the macro espand to a function pointer instead of one wrapped in a struct? Is it because you want to manipulate all such function pointers under some sort of static polymorphism and you require a uniformly named field?


Every other value declaration in C is <type> <name>, but function pointer values are different because the name to which its bound is right in the middle of the type. I understand the historical reasons, but it's unnecessarily obtuse, and this fixes that. Declarations of function pointer values are now just like any other value.

The macro can't expand to a raw function pointer type without including the name to which it's bound in the macro arguments, once again introducing this obtuseness. Even worse actually, because the bound name is now indistinguishable from the types, where the ordinary syntax, while noisy, clearly distinguishes names from types.

This style also matches the closure declaration style, ie. clo(T0, T1, ...), thus providing a seemless transition from function pointer values to actual function values which can carry environments.


Oh, right, now I get it. Another option is for the macro to define a separate function type typedef (properly mangled with the line number) for the function type and then use it to define a pointer to it.


> Another option is for the macro to define a separate function type typedef (properly mangled with the line number) for the function type and then use it to define a pointer to it.

Yes, but this requires forward declarations of the function type, which seems unnecessary. I can now just write:

    void sort(int* vals, int vcnt, fn(int, int, int) compare);
    int* filter(int* vals, int vcnt, fn(int, bool) clause, int* rcnt);
without any forward declarations, and it's still immediately obvious what's going on. This approach makes C semantically a little more like a functional language without sacrificing a single thing people love about C (I hope).


One would expect it to work because the memory layout is the same for the two structures.


First, this is not undefined, so you are going to need to find another example :)

6.7.2.1/13

"Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning."

(This is also known as the first member rule)

6.5/7 then says: " An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

a type compatible with the effective type of the object, a qualified version of a type compatible with the effective type of the object, a type that is the signed or unsigned type corresponding to the effective type of the object, a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or a character type."

These are both types compatible with the effective type of the object.

IE your first pointer is really also an int*, as is your second pointer. Your dereference attempt, which is an access to an int, through something with an effective type of pointer to an int, is a valid thing to do.

(Note: I implemented a large part of of GCC's current pointer aliasing analysis and rules, though the underlying TBAA set construction predates me by quite a while :P)

Now, if you added a field and tried this with the second member, of a different type than the first, you'd maybe get a different answer.

However, pay close attention to:

an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

GCC takes this very seriously. So if you construct types that contain both compatible types, and use that, gcc will consider it okay, no matter how weird what you are doing is.

(Note also i do not claim either set of rules produces sane results, and in fact, can easily construct examples where the standard says nonsensical things, or rules seem to change depending on whether you have one source file or two or ...)

In general, however, your complaint is basically: "C is not a pure structural type system, it's a name based one for the most part". This is true, and it is meant to be that way.

There are other programming languages with structural type systems (OCaml is a good example).


> First, this is not undefined, so you are going to need to find another example :)

This example will do just fine, in fact. It is arguably not undefined, as you say. It is also silently miscompiled by both Clang and GCC.

https://godbolt.org/g/7Gb9S3

And there you have one of the problems with type-based aliasing optimizations: the people who write compilers have not read the standard and just make it up as they are going along. They have been acting like the six-year-old who pretends to be reading rules from the back of the box in a game of Monopoly.


You should file a bug report. This should in fact, work, and did, in fact, work, when i stopped working on it.

If you split the type definitions across different translation units, it will stop working (because it can't know they are compatible), but this is one of the weird edge cases.

"And there you have one of the problems with type-based aliasing optimizations: the people who write compilers have not read the standard and just make it up as they are going along. They have been acting like the six-year-old who pretends to be reading rules from the back of the box in a game of Monopoly. " This is, well, bullshit. The situations get incredibly complex very quickly, and it's completely unclear in a lot of cases what the standard meant to happen.

You should not assume bad faith without a good reason. The vast majority of people implementing this stuff either are committee members, or work closely with them, so saying "they haven't read the standard" just makes you look petty, because, in a lot of cases, they helped write the standard.

At the same time, while implementing it, i think we filed something like 15 DR's against the standard, some of which are still unresolved because the committee didn't know what to do and punted on it due to lack of consensus. So if you want to say who makes it up as they go along, i think you may be pointing fingers in the wrong direction:

Take, for example, DR 236 which was filed in 2000, and was not resolved until 2006, and they basically punted (note how the answer does not answer the questions): http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_236.htm

(note also that most DR's from the same time period were resolved in 1 year or less)

GCC had at least 3 committee members who were consulted on most of this stuff, and agreed with the current set of interpretations.

In short, if you think it's so easy to do, feel free to fix it. I think you will find yourself quickly in a world of trying to figure out what anyone meant to happen.

Contrary to what you seem to think, there are rarely objectively right answers, just interpretations that you can get consensus or no consensus for.

(Not sure what one would expect from a programming language standard built by something akin to the UN)


>You should file a bug report. This should in fact, work, and did, in fact, work, when i stopped working on it.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14319 (12 years ago, status: suspended)


Err, this is a completely different case.

This is the union case you can't make work right all the time, and is actually two DR's, as the bug says. This is a case where the standard is completely broken (and will likely never be fixed)

The viewpoint of committee members i spoke with is that explicitly visible union accesses should be required to get the right answer here, because anything else is insanity.

I could place your two memory accesses in a union, in a different translation unit, and pass it to this function, and you would never have any reason to know they alias, because it looks like i handed you two struct pointers. IE imagine the function and main were in two different files, one of which had the union, and other other did not.

So either the compiler assumes that literally all memory, everywhere, aliases, forever (despite any other rules that the standard says exist, which directly contradict this), or we require programmers to make explicit union accesses (or your answers change depending on how much code the compiler can see, and whether it does whole program optimization or not or ....)

Like I said, this is a case where the standard is truly broken, and the best you can do is try to build consensus about what to do.

Note also the bug was suspended to figure out what the language was supposed to mean here. Nobody said it was not a bug, they said "no idea what supposed to happen here". The committee punted (contrary to the last comment, they didn't really resolve the question), so it's never been worked on.


> In short, if you think it's so easy to do, feel free to fix it.

Naturally. The GCC developers have not documented their choices; what GCC actually does is fuzzy enough that a GCC developer who actually worked on the implementation of type-based aliasing optimizations gets it wrong. In these conditions, I am going to fix the fact that they didn't document their intentions (let alone doing what the standard says, since we have established that the standard committee, composed of compiler authors, is not helpful—I don't know what you think this is extenuating circumstances; to me, it makes things worse) by reverse-engineering GCC's assumptions, and extrapolate to what GCC might do in the near future, and help programmers determine whether their C programs might betray them now or soon.

In fact, this is exactly what we have been doing. Drop me an e-mail if you wish to help beta-test it: cuoq at-sign trust-in-soft.com


I noticed that ICC 13 on Godbolt compiled it "correctly", assuming correctly means as written. I wondered if this was just because it's an older version that wasn't yet optimized, so I tried with the current ICC 17 Beta. But with all the optimization options I tried, it stuck to its guns and compiled it as written:

   0:	c7 07 03 00 00 00    	movl   $0x3,(%rdi)
   6:	c7 05 00 00 00 00 04 	movl   $0x4,0x0(%rip)
   d:	00 00 00
  10:	8b 07                	mov    (%rdi),%eax
  12:	c3                   	retq
From what I can tell testing with http://webcompiler.cloudapp.net, MSVC seems to also produce the "correct" result at all optimization settings, while GCC and Clang do so only with -O1 or lower for all versions that I tried.


"arguably not X" is the same as "arguably X".


I find your conclusion very surprising, foo and bar are certainly not compatible. psbar->x is equivalent to (*psbar).x and dereferencing psbar violates the effective type rule, thus UB. The fact that the type of the 'x' fields are the same is immaterial as you are already in UB territory.

I believe that GCC uses a structural type system for its TBAA analysis (and generally much looser rules), but from the standard point of view it seems that this is a pretty clear violation of aliasing rules.


> "an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or"

I don't think you're properly understanding this wording (which was there since 1989 ANSI C, IIRC).

Firstly, we can interpret it very poorly, so that this appears defined:

  { double x = 42.0;  /* declared type is double */
    struct foo { int y; double z; } *p = (struct foo *) &x;

    return p->y; }
Hey, x is being accessed via an aggregate type, and that type includes one of the aforementioned types (the declared type of x) as one of its members, namely member z!

Here is the real intent of the wording. Why it is necessary is simply because members are implicitly accessed when an aggregate is accessed as a whole. That is all! If we assign one "struct foo" to another, for instance, and that "struct foo" has a member "int x", that member is accessed and it is being accessed through an lvalue of type "struct foo". Without the above wording, that use would appear not to be conforming, because a "struct foo" type is being used to access an "int".

> In general, however, your complaint is basically: "C is not a pure structural type system, it's a name based one for the most part".

No, that isn't my complaint at all because, note that I still want a diagnostic if we remove the cast:

   // diagnosable constraint violation:
   struct foo *p = &struct_bar_instance;
I just want the access to be defined to members that happen to be of the same type and at the same offsets in these structures, without going all the way to assert that they are compatible types for the purposes of assignment/passing and pointer conversions.

The access should be defined even if the offset is the same in different ways, on those platforms where it happens to work out:

   struct foo { char c; int x; };
   struct bar { short s; int x; };
If c is one byte, s is two, and x is aligned to the next multiple of 4, so its offset is 4 in both structures, accessing x should work through either struct foo or struct bar on this implementation. I.e. it is undefined behavior if the offset isn't the same, otherwise requirements apply. Just like 1 << 30 is defined, and its value is the same, on all platforms where int is at least 32 bits wide. But it is not defined if int is, say, 16 bits (out of range shift).


"No, that isn't my complaint at all because, note that I still want a diagnostic if we remove the cast."

Okay, then you want something super strange and in-between :)

"I just want the access to be defined to members that happen to be of the same type and at the same offsets in these structures, without going all the way to assert that they are compatible types for the purposes of assignment/passing and pointer conversions. "

Okay ...

"The access should be defined even if the offset is the same in different ways, on those platforms where it happens to work out: "

This qualifies in my book as super-strange. I'm pretty sure you will find no one who views the standard this way, as it contradicts the explicit wording pretty directly :P.

So if you want that, you'd probably need a new language.

Note also that it makes actual pointer analysis (IE andersens, etc) impossible without complete and total target info.

You could never have a target independent thing do pointer analysis in this world.


Is it really fair to call this int/int aliasing?


Since one of the ints is italicised then I assume that the HN software has eaten an asterisk. The second one is probably pointer-to-int.


The italics were for emphasis. There is no pointer-to-int there.


The italics were for emphasis.


I can't get that to compile, is that semi colon wayward?

    structs.c:5:39: error: unexpected ';' before '}'
If I remove that, I can't even get tis-interpreter to complain, strongly suggesting this is fine.


tis-interpreter does not detect strict aliasing violations yet.

An analysis for exactly that is being worked on, and it will warn on this example, but this is a minefield of ambiguous specification by the standard, compiler practices that seem to go beyond what the standard says in some cases, and existing programs that “work” and that no-one wants to read thousands of warnings about.


Thanks for this clarification. Also, thanks for the product, it's been invaluable.

You're correct though, there are plenty of common applications that are either meant to be compiled with strict aliasing disabled, or which just produce tonnes of warnings under normal circumstances as it is. It also seems to be compiler dependant. I found a popular complained about type aliasing, but I moved to a newer GCC and the error went away.


s/42;/42,/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: