It's nonsense. The function is external, called from a separately compiled file. The pointer comes in as a char *. The code checks its alignment before assuming it can be cast to a block. There is no way in it could be "miscompiled".
The ivec argument could in fact have come from an object that is of type aes_block_t. The only thing which might reveal that it didn't is wrong alignment. In other regards, there is no way to tell.
Lastly, any cross-compilation-unit optimization which could break code of this type is forbidden, because ISO C says that semantic analysis ends in translation phase 7.
I'm looking at C99, not the latest, but I think it's the same.
In translation phase 7 (second last), "The resulting tokens are syntactically and semantically analyzed and translated as a translation unit." Note the "semantically analyzed": semantic analysis is where the compiler tries to break your code due to strict aliasing.
In translation phase 8 "All external object and function references are resolved. Library components are
linked to satisfy external references to functions and objects not defined in the
current translation. All such translator output is collected into a program image
which contains information needed for execution in its execution environment."
No mention of any more semantic analysis! So unless somehow the mere resolution of external symbols can somehow break OpenSSL's AES, I don't see how anything can go wrong.
One thing I woudl do in that code, though is to make sure that it doesn't use the original ivec pointer. In the case where "chunking" goes on, it should just cast it to the block type, and put the result of that cast in a local variable. All the ememcpy's, load/store macros would be gone, and the increments by AES_BLOCK_SIZE would just be + 1.
Citing the translation phases in the standard as evidence that undefined behavior is ok, as long as it's divided between two translation units, strikes me as wishful thinking.
Undefined behavior is the absence of requirements: there not being any requirements for some situation.
Suppose a document tells you that for some special situation X, there is an absence of requirements. However, suppose that some other general rules elsewhere in that document in fact imply a requirement for that situation.
That just means that the claim that situation X has no requirements is incorrect.
For instance, ISO C says that two struct types appearing in separate translation unit are only compatible of they have the same typed members in the same order ... with the same names.
This says that if you do aliasing with otherwise identical structures that don't have the same names, the behavior is undefined: i.e. that there is no requirement that it work.
But, we can infer that it must work by the logical fact that during the semantic processing of one translation unit, the translator has no clue what the names of struct members are in another translation unit. They disappear at translation time and turn into offsets.
I mean, we can fwrite a struct to a file, right? We can send that file over a network. According to ISO C, every program (or at least every C program) will have to use a structure with the correct names to fread that area of the file! Ridiculous!
Suppose we take ISO C and add a statement to it like, "the consequences are undefined if one of the operands of the + operator is the integer 42". The rest of the document would still be exactly what it is, and if we strike out that sentence with a black marker, nothing has changed. The rest of the document continues to inform us that adding 42 to something is well-defined (in the absence of overflow, or overflow-like issues with pointer displacement and so on).
Basically, it's a contradiction: the document gives a description which adds up to some requirements, but then some sentence tries to take them away.
In such a situation, we can just proceed as if the requirements apply. That is to say, when a requirement conflicts with the claim that there is no requirement, just let the requirement prevail.
(In a situation where conflicting requirements are asserted, it's a different story, of course.)
You can't infer shit, because of the way current compiler writers are interpreting the standard today. At one point in the 90's it was obvious for the entire planet including compiler authors that some undefined behaviors did not apply for a given target architecture, so obviously obvious that nobody would even require to have it specified in the compiler doc (still nice to have, but you would not be too much angry if it does not appear)
Now they just add their "optimizations" at the highest levels so of course even for targets where it should makes no sense, and without asking for your permission, and even by default, and even some they consider "aggressive". So either you have provisions to avoid all that shit, like having a guy disabling all the new ones each time you upgrade you compiler, and you better have some defenses in depth, and I agree with you that using TU as boundaries is also a good idea, if your compiler+build-sys have an option to NOT do WPO.
But it is just not in any way guaranteed by the ISO standard, and still just an implementation detail from its pov. And honestly that's a problem. Because compiler writers will takes more drugs and come up with new imaginative way to break your code more, in the name of their "strictly-conformance and nothing more" crazy ideal.
Translation phases are all fun and games, but WPO can still break your code thanks to as-if rules - and because nothing prevent alias analysis to be performed regardless of the TU boundaries. And compilers are doing it.
> nothing prevent alias analysis to be performed regardless of the TU boundaries
Standard conformance does. You know, that principle in the name of which the alias-breaking optimizations are done in the first place.
The "as if" principle (there is only one) means that optimized code produces the same results as the abstract semantics (under a certain set of requirements of what it means for the abstract and actual semantics to agree).
The separation between translation phase 7 and 8 is part of that abstract language semantics.
> And compilers are doing it.
GCC currently only does optimizations across translation unit boundaries when it is told via special options, and only for the .o files which are designated as participating in it. This is no different from using __attribute___ or __asm__.
Well, in the model, there is no such thing as effective type of objects loosing their power at TU boundaries, and otherwise no relationship defined between effective types and linkage. An implementation that tags all accesses to dynamically check/enforce the effective type rules (by killing a faulty program or doing otherwise nasty stuffs) would not loose its conformance because of that. The default behavior of an implementation used under a particular and completely unspecified build-system is irrelevant to the fact that aliasing analysis is allowed across TU boundaries in the model, and the fact that it does not happen in some cases is just an implementation detail - that if not guaranteed by other means can always be changed without even a warning by compiler vendors.
Now that is the situation, in regard with the standard. Should we be happy with it or not is another story.
C is actually quite simple, even the aliasing rules (I think the aliasing rules all fit on about a quarter page). Programming in C though is anything but.
The tension between weakly typed pointers and the desire to generate efficient code is where there is a problem. More or less anywhere you violate the aliasing rules you are doing something that Fortran doesn't allow at all, and by disallowing it semantically the hope was that C programmers could have their cake and eat it too. The reality is that all systems code should probably be compiled with alias analysis disabled.
A spec of a quarter page can already be incredibly convoluted and with very hard to anticipate consequences -- even more so when crazy people are interpreting it without caring about the consequence of their acts in the real world. And C is not just about aliasing rules; the current situation is that ANY undefined behavior is a landmine waiting to kill you, regardless of whether is seems to makes sens for your target architecture. And that is mostly because compiler writer have an insane interpretation of the standard: the definition of "undefined behavior" is "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements"
A key word here is "nonportable" and it is clearly not considered often enough by some compiler writers, who generally prefer to see all undefined behaviors as a licence to "optimize" your code without carrying too much about warning you about potential bad side effects because it's "hard" according to them.
This does not make even the beginning of any sense. If it is not what many programmers are expecting (a majority ? -- most coworkers I know including my direct boss are not even aware of all that mess), is costly during the translation phase, has unclear/unquantified runtime performance benefits, is dangerous in the real world and is hard to detect when bugs are activated by those "optimizations", then WHY they are doing them in the first place? From an engineering point of view this is just plain insane. Correct executions and in depth safety are extremely valuable, and only becoming more so year after year, and when they pretend that it's not their fault that programs are breaking they are being even more ridiculous; a compiler does not exist in a vacuum, and neither just to reproduce itself and satisfy the curiosity of geeks for mathematical logic.
Obviously some amount of alias analysis can be useful, and this is clearly one of the topic really intended from scratch in the standard to address some performance issues, but maybe it would be enough to explicitly identify what you want to not alias. Seeing the bug when they discuss about allowing uint8_t to not behave in general as an unsigned char in regard with aliasing is just plain disgusting and makes me lose yet again a portion of the tiny remaining trust I had in them.
It will soon get to the point that C/C++ will not be realistic languages to consider if you want any kind of reliability. Maybe I'm even deluded in thinking this is not already the case.
I get a bit fed up when software developers complain about C compilers and blame the developers, as though compiler developers are somehow a different breed or something. It's all software, and compilers are actually one of the easier things to write.
Somehow, the developers of gcc, clang, and various commercial compilers are all crazy, while people who work on any other project in C are sane? Why haven't the sane software developers forked an open-source compiler and implemented sane semantics?
Blaming the state of compilers on compiler developers without understanding their motivations for making engineering choices is intellectually lazy, particularly for other software developers.
What you write is all very meta, so maybe enlighten us on their motivations and why they should continue in that way? And on how using a language is the same as implementing it and how it seems that the application space has magically folded into a single point last night?
I know already too much for my taste about the reasons they gave to come with such horribly risky designs. Example: "it's difficult to properly warn where we are doing dangerous optims". I'm not buying it. Don't do them in the first place or use more heuristics. Publish your data about your risk / benefit analysis. I think they made strategic mistakes. "Everybody" in the academia and security / safety related business is saying that. And no, that won't be solved with dynamic checkers. They should concentrate on optimizations minimizing the risk of new behaviors concretely appearing, and heuristics maximizing the detection of the intent of the programmer. That sounds informal as hell, because this is, but in the real world solving a problem does not always mean finding the perfect solution to an equation, and good enough approx are often the best thing to seek. Compiler writers know that. They are just not seeking the good thing. The "smartness" they think they are creating though their imaginative use of undefined behavior is as smart as my ass, because on real projects those kind of sufficiently smart compilers are indistinguishable from an adversary.
I have recently implemented some kind of compiler, though not C to machine code. In the real world it's not easy. It's messy. It's made of blood and tears. You have to care about all kind of little details for your users. You have to make the junction between two domains that sometimes have quite different semantics -- there is never a perfect solution to that. And all of this, with the minimal risk to be misused. If you don't care for your users, their is no point. I've the greatest respect for the authors of CLang, which I used for the front end. It's not perfect, but it is quite good and it does the work. But sometimes, when you feel that somebody effort really is misplaced, you better tell him (or the community, my voice is not particularly original), and explain your reasoning. Otherwise, you effectively won't be entitled to complain latter, if you wait too long.
So now what do we have: two domains, C and various target architectures, that used to be with quite some low impedance mismatch, by design, are considered radically different. The expectations of most users have nothing to do anymore with the expectation of how the compiler writers thinks the users should write their code (or have written it in the past, for less maintained software). That won't ends well. Actually, we are already in the mess, while we don't especially needed more than we had already.
So go on and please explain your POV. But anyway, no, I won't fork and write a "friendly C" compiler overnight. Neither will maintainers of cryptography softwares, probably. I'm just yet another datapoint that will loose some time in adding all the -fdisable-insanity of the day anywhere he passes. Because somebody else somewhere else in the world thought that would be a good idea to infer proofs using rules written for the least common denominator of all computers to detect if you read the n1570 over and over again enough times to obtain the privilege to get a sane translation of your code that actually will anyway only ever run on x64_86, thank you very much.
The same debacle somehow happened about memory models (see how they have been received for kernel work). If you only ever care about the abstract, concrete real prospective users won't greet you as you might have expected. Rightly so. Especially when your formalism has been proven unimplementable and unsound.
I don't think it's going to be a fruitful exercise to demand that implementers of ISO C/C++ change their implementation to guarantee certain properties of programs that ISO C/C++ consider to be invalid, and thereby weaken their competitive position vs. other implementations.
The root cause is the ISO C standard itself, so if you want any change in that direction, the best approach would be to join the ISO C working group and make proposals to replace various undesirable undefined behaviors with, at the minimum, implementation-defined behaviors (starting with signed integer overflow, perhaps). Probably there aren't any vendors of one's complement machines and the like left to veto your proposals.
The C11/C++11 memory models for the most part don't solve a problem that kernel developers have, because all kernels already contain tested solutions to the same problem, so one would not expect kernels to quickly exchange all their tested and highly performant concurrency primitives. The only benefit for them is that the memory model effectively prohibits implementations from doing some optimizations that would be valid only under the assumption that there is only one thread of execution, and this benefit is implicit, i.e. you don't need to use any of the new language features or standard library headers to benefit.
First of all, I think that many people underestimate the performance advantages of some of these "optimizations" I've seen real-world loops that take a 40%+ reduction in speed by disabling alias analysis. All of those fancy loop optimizations that everybody agrees are good fall apart if you can't prove that various objects in them don't alias.
Secondly the way people shop for compilers is completely broken. It's often nearly entirely done by running the candidate compilers against a suite of benchmarks and picking the one with the best numbers. Usually only a single -O option is passed. Disabling any of these "optimizations" by default means you lose.
Thirdly I think you are overestimating the number of people that care about this problem. John Regehr ran a bunch of crypto code through clang's undefined behavior detection tools and reported bugs. Many of those bugs were closed with a WONTFIX because the code generated today is correct!
Lastly, the ISO spec is what it is, and has what it has. To the extent that compilers are in competition with each other, a single vendor working on a separate dialect of C is spending effort that helps it not at all in the benchmark game, and which over 90% of their users will never use. GCC being the 800lb. gorilla it is, occasionally users of other compilers request language extensions from GCC, but most often their checkboxes are "Complies with the ISO standard" and "generates good code" I would love for the next version to have a more friendly C dialect as an annex, and then everyone who cares could just pass --std=c17friendly and it would work across all compilers.
[edit]
As far as my original post being very meta, that's mainly because none of what I'm saying above has anything to do with my primary complaint: people who aren't writing compilers are complaining about decisions made by those who are, and attributing it to something fundamentally wrong with those people. Compiler developers aren't insane or stupid. Since most C compilers are self-hosting, compiler developers use their product more than most other developers. Saying they have tunnel-vision is perhaps a bit more close to the truth, but it is worth examining the environment they are in, as it's not like there is a "will be a compiler developer" gene that selects for compiler writing skill along with "doesn't give a shit about users"
And most of this isn't directly pointed at you, but rather the fact that there's been an unusual amount of compiler-writer hate on HN recently. I thought I could let it go by, but I hit my limit.
[edit2]
To sum up my first edit; all of my "meta" post I feel like I could have said even if I didn't know a lot of compiler developers. I don't really know any web developers well, but I don't assume that the 3MB of JS that is loaded for tracking and advertisement on most large sites is because web developers are insane or stupid. I assume rather that at least some of them are trying to deliver the best product within the constraints of their customers (a page that can be monetized, and completed in under a certain number of billable hours).
Ahaha. Rust is language that assumes dynamic memory allocation never fails. The way current compilers treat undefined behavior is terrible, but Rust is worse.
The standard library does assume this, but it's not an inherent language issue. Most people who are in resource constrained environments won't be using the standard library anyway.
Rust also made the same mistake that Go did: it tried to appease the "exceptions are bad" crowd and ended up adding exceptions as an afterthought. I'm angry: Rust is merely good, but with a few early tweaks, it could have been great.
Rust does not have exceptions, not even in the limited capacity that Go sort-of has. It's legal for a Rust implementation to translate panics into aborts, which means that they cannot be relied upon as an error-handling mechanism. Furthermore, the means to halt unwinding in Rust exists only to prevent memory unsafety from occurring when Rust is embedded in another language via the C interface, because unwinding across an FFI boundary is undefined behavior. In fact the exact mechanism used in the aforementioned role has deliberately arbitrary restrictions placed upon it in order to prevent anyone from using it as a general-purpose error-handling mechanism. Finally, the prevailing Rust culture overwhelmingly discourages using anything other than Result for error handling, to such an extreme that I've never in all my days even seen anyone attempt to "catch" a panic as an error-recovery mechanism.
https://doc.rust-lang.org/beta/std/panic/fn.recover.html looks a lot like catch to me. That the documentation suggests not using it as "catch" is merely a political statement. While it may be legal to turn panics into abort, I doubt implementations will do that, because by doing so, you'll break programs that assume that panic recovery works.
The reason Rust's standard library isn't safe against malloc failure is that propagating Error everywhere would be cumbersome; I distinctly recall rust mailing list discussions on this point.
The only reason handling allocation failure has to be difficult is that the Rust people put themselves into giving into the anti-exception people. Once you allow exceptions, allocation failure becomes as easy to address as any other kind of resource exhaustion.
The Rust designers made a serious error eschewing the only fully general and fully ergonomic error handling strategy we've found.
> the Rust people put themselves into giving into the
> anti-exception people
It's a shame that you characterize design decisions as "giving into" the crowd instead of trying to understand why the decision might make sense in the context of Rust. Designing the error handling story was a discussion that spanned years, with dozens of discussions, many implementations, and likely hundreds of participants.
> That the documentation suggests not using it as "catch"
> is merely a political statement
The -fno-exceptions flag also makes me sad, because it encourages people to write libraries that are worse than they could be purely for the sake of interacting with code that chooses not to use an important feature of the language. Exception support should not be optional.
What's wrong with Rust's error-handling mechanism (namely, the `Result` type)? Generally when writing Rust, panics only come up when you actively invite them with something like an `.unwrap()` call or the `try!` macro, which is Decidedly Bad Style for anything serious.
One section claims "Physical Subtyping is Broken", where "physical subtyping" is defined as "the struct-based implementation of inheritance in C." I assume this means the typical pattern of:
typedef struct {
int base_member_1;
int base_member_2;
} Base;
typedef struct {
Base base;
int derived_member 1;
} Derived;
The article claims physical subtyping is broken because casting between pointer types results in undefined behavior. The article gives this example:
I agree this example is broken, but casting between pointer types in this way is totally unnecessary for C-based inheritance. You can do upcasts and downcasts that are totally legal:
If you compile this on an architecture like x86 that truly allows unaligned reads, you'll see that modern compilers do the "chunking optimization" for you:
It says next that "int8_t and uint8_t Are Not Necessarily Character Types." That is indeed a good point and probably not well-known. So I agree this is something people should keep in mind. But most of this article is warning against practices that are generally unnecessary and known to be bad C in 2016.
It's true that a lot of legacy code-bases still break these rules. But many are cleaning up their act, fixing practices that were never correct but used to work. For example, here is an example of Python fixing its API to comply with strict aliasing, and this is from almost 10 years ago: https://www.python.org/dev/peps/pep-3123/
I'm with you. I wrote the code that breaks this stuff in gcc (and the implementation of struct-sensitive pointer analysis).
It explicitly and deliberately follows the first member rule, as it should :)
In C++, this is covered by 6.5/7, and allowed because it's a type compatible with the effective type of the object (in a standard layout class, a pointer to the a structure object points to the initial member)
I understand the upcast (which is certainly legal but it forces the casting code to know the depth of the inheritance hierarchy - as in &derived->base1.base2), but what's the argument making the downcast back to Derived legal C? (I honestly wonder; personally I either compile with -fno-strict-aliasing or trust my tests to validate the build...)
> I understand the upcast (which is certainly legal but it forces the casting code to know the depth of the inheritance hierarchy - as in &derived->base1.base2)
If this is inconvenient, just casting directly to Base pointer is also legal.
> but what's the argument making the downcast back to Derived legal C?
The justification comes from this part of the C standard (C99 6.7.2.1 p13):
Within a structure object, the non-bit-field
members and the units in which bit-fields reside
have addresses that increase in the order in which
they are declared. A pointer to a structure
object, suitably converted, points to its initial
member (or if that member is a bit-field, then to
the unit in which it resides), and vice versa.
There may be unnamed padding within a structure
object, but not at its beginning.
It follows that:
Derived *d = GetDerived();
// This is legal: a pointer to Derived, suitably converted,
// points to its initial member "base":
Base *base = (Base*)d;
// This is also legal: a pointer to Derived.base, the initial
// member of Derived, suitably converted, points to Derived.
Derived *d2 = (Derived*)base;
About the memcpy stuff, when you consider the whole picture, this is ridiculous though. There should be no reason for any sane implementation to ever do nasty stuff about
Now I don't remember the article where I saw that, but technically given the current orientation of compiler writers there are some even more ridiculous situations. Like (a<<n) | (a>>(32-n)) having the obviously desired effect on all current architectures when you look at what would be an obvious direct translation (and quite efficient one already), and yet given the current orientation of compiler writers I would not like to see that code AT ALL unless it is proved that n is always strictly between 1 and 31. And now if they want to restore any kind of efficiency after all that madness, they would have to implement yet another case of convoluted peephole optim. Stupid. Give me my original intent of the langage back, because virtually everybody is using it like that consciously or not, and that will just not change.
How about the fact that the addresses might not be aligned?
How about the fact that there is no reason to write that if what you actually mean is memcpy(dst, src, 8)? Chunking yourself is a premature optimization that the compiler is in a better position to actually perform.
typedef struct {
int x;
} Base;
typedef struct {
Base base;
int y;
} Derived;
int f(Base* b, Derived* d) {
b->x = 0;
d->base.x = 1;
return b->x;
}
Notice that if we are accessing the base members of "d", we are still accessing them through a struct of type "Base" (d->base.x). If we compile this with strict aliasing, you can see the output is allowing that the two might alias (while this isn't a proof, it's a strong indication that this is aliasing-correct).
6.5.7. An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:
- an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively, a member of a subaggregate or contained union)
If we talk in terms of concepts that exist in the C standard, we would say that you can't cast an object to pointer-to-X unless your pointer actually points to an X.
The reason your example is illegal is that you are casting to pointer-to-"struct derived", but the thing being pointed to is not actually a "struct derived."
The "physical subtyping" pattern works because the C standard says that a pointer to a struct, suitably converted, also points to its first member. So a pointer-to-Derived, converted to a pointer-to-Base, points at Derived's first member. But a pointer-to-Base doesn't point at a Derived unless that object actually is a Derived. So the downcast is only legal if the object actually is a Derived.
Looks like your other comment hit the max reply depth so this will need to finish up, but in any case I don't agree with your reading of the vice versa.
It may be that the aliasing rules are also required to fully justify my conclusion (ie. a Base can't have its stored value accessed via a pointer-to-derived due to the aliasing rules). But I have a very high degree of confidence in the conclusion itself. I think that you will find that your compiler implements the behavior I have described.
There isn't actually a depth limit (or if there is we haven't hit it yet :). HackerNews just hides the "reply" link for 5 minutes or so to cool down flamewars.
You can work around this by clicking on the link for the post itself (ie. "3 minutes ago") which allows you to reply immediately.
I'm not making a one-way argument. If the underlying object actually is a Derived, you can freely cast between pointer-to-Base and pointer-to-Derived. That is what "and vice versa" means.
But if the object isn't actually a Derived, you can't cast to pointer-to-Derived:
Derived derived;
Derived *pDerived = &derived;
// This is legal because it's equivalent to:
// Base *pb = &derived.base;
//
// ie. there actually is a Base object there that the
// pointer is pointing to.
Base *pBase = (Base*)pDerived;
// This is legal because pBase points to the initial member
// of a Derived. So, suitably converted, it points at the
// Derived.
//
// The key point is that there actually is a Derived object
// there that we are pointing at.
pDerived = (Derived*)pBase;
Base base;
// This is illegal, because this base object is not actually
// a part of a larger Derived object, it's just a Base.
// So we have a pDerived that doesn't actually point at a
// Derived object -- this is illegal.
pDerived = (Derived*)&base;
// Imagine if the above were actually legal -- this would
// reference unallocated memory!
pDerived->some_derived_member = 5;
To my mind, 'illegal' means that the compiler will complain. In this case, I don't even see weird, scary UB; this is just a case of the standard being completely unable to say anything about what will happen.
After spending too much of my life chasing these bugs, here the compiler will do exactly what you told it to, which probably means making your day miserable.
> Actually, casts to/from char * are always defined in C (chars are always assumed to alias).
Not true. The standard says you can access any object's value via the char type, but not the reverse. You can't cast a character array to any type and dereference it.
> The author was talking about "chunking" non-char units.
Sure, but you can call my copy_8_bytes() function like so legally:
Yes, I believe it would be. That's a good point, now that you mention it -- I have code that does just that, and I hadn't realized it's probably undefined.
The "effective type" (this is a term defined in the standard) of the char array elements would be "char", whereas the memory returned from malloc() is considered to be an object that initially has no effective type. I don't know of any way to take the char array and "erase" its effective type so that it can be used generically, like the value returned from malloc().
This is one reason among many why people who write serious low-level code (e.g. game developers) think all the new aliasing rules are completely bonkers.
We implement our own allocators all the time. If you can't even do such a basic thing legally, then the rules are obvious nonsense.
>
Just make a fast allocator that uses heap instead of the stack. You only need to malloc once and it can be used for any type since like you pointed out, it's effective type can be changed.
Sometimes, especially in embedded systems, it is useful to have a bunch of statically allocated heaps. You can see them in a memory map, and the linker will tell you if they don't fit in memory.
There is also the case where you have some raw data from a file or network, that you want to re-interpret as a struct. That is always dangerous with endianness and struct padding, but it is a very common practice. You could always memcpy from a char array to a struct, but that can waste memory.
"More importantly, they allocate a data-dependent amount of stack space that can trigger difficult-to-find memory overwriting bugs: "It ran fine on my machine, but dies mysteriously in production"."
Oh, that's nice. :/