
Clang emits memcpy for std::swap, which can introduce undefined behavior - luu
https://llvm.org/bugs/show_bug.cgi?id=27498
======
kazinator
The compiler is the implementation; it cannot _cause_ "undefined behavior".

What it can be is "nonconforming".

A correct program which hits that situation continues to have well-defined
behavior (which we know from the text of the program and the standard). Just
the implementation isn't handling the requirements correctly; it is not
conforming.

An implementation is not bound by the standard in its own use of the standard
library functions. The implementation can use a function like memcpy in ways
that would be formally undefined if they occured in user code; its
implementation of memcpy just has to harmonize with that use, so that the
intended behavior is ensured. This is because implementations can add their
own requirements to areas that the standard leaves undefined. For instance, an
implementation can add the requirement to its memcpy that the copy may
overlap, if the destination has a lower address. Then, the implementation can
generate code which uses memcpy that way, or make internal uses of memcpy from
other library functions which use it that way. It's just using its own
(possibly not publicly documented) extension.

~~~
Dylan16807
The compiler has different layers. Layer X is nonconforming, which causes
undefined behavior in layer M.

Layer M's memcpy doesn't have to have the same semantics as the C standard's
memcpy, but the bug report implies that it does.

~~~
kazinator
Yes, if "undefined behavior" doesn't just mean "a situation where the ISO
language standard doesn't impose a requirement", but has a broader meaning
like "behavior not defined by anyone at all".

~~~
Dylan16807
The compiler is using the ISO language standard to define the behavior of an
internal piece. It is then using that piece in the wrong way. It's not generic
"not defined" behavior, it is specific C "undefined behavior".

Think of that part of the transformed code as still C for relevant purposes.

~~~
chandlerc1024
I want to assure you that the compiler does not exclusively use the ISO
language standard to define the behavior of internal components of the
implementation.

~~~
Dylan16807
I'm talking about a specific component named memcpy. Obviously this does not
apply to other components.

And even that is only if the bug report is correct.

~~~
chandlerc1024
The compiler is not producing C code that calls memcpy. It is leveraging a
specific implementation of a memcpy symbol in a particular system library. The
contract between Clang and that system library can include guarantees not
provided by any ISO standard.

~~~
Dylan16807
Then the bug report is wrong, and disregard the second and third posts I made.
They depend on the bug report not being wrong.

My first post is generic and not about this particular issue; it's still fine.

~~~
whitequark_
The bug report is not wrong. Rather, whether the current behavior is correct
depends on the platform.

For example, Darwin's libc offers this contract for memcpy and clang is
perfectly within its rights to generate such code (note that the IR it
generates is still violating LLVM's contract on the memcpy intrinsic); glibc
offers no such contract for memcpy, and so clang's code is nonconformant.

------
mehrdada
The compiler causing undefined behavior on its own reminds me of one of the
fun bugs we found[1] with 32-bit x86 clang where one compiler pass fools
another into thinking there's an undefined behavior by combining three struct
field initializations into a single 4-byte load, but the struct itself was
3-bytes long. Another pass looked at this and thought there's an undefined
behavior writing past the struct boundary, silently eliminating the
initialization altogether, resulting in bad codegen.

[1]: [https://mehrdad.afshari.me/pubs/compiler-validation-via-
equi...](https://mehrdad.afshari.me/pubs/compiler-validation-via-equivalence-
modulo-inputs.pdf)

~~~
slededit
Compilers are way too overzealous eliminating code because of undefined
behavior. Legally it is allowed and I'm sure it helps in synthetic benchmarks.
But writers of real programs won't be impressed of the perf gains when you
stop running half their program.

~~~
kazinator
Rather, they are getting zealous in eliminating code because of _the
assumption that there is no undefined behavior_. The scope of what undefined
behaviors do this is increasing and that breaks programs that make some
assumptions on traditional compiler behavior, or that have hidden bugs.

I believe that C should have safer semantics, plus standard-defined modes of
operation which restore the unsafe semantics selectively.

The optimizer then must obey whichever semantics in effect (safer or less
safe).

Then, in addition to telling your compiler you want a certain optimization
level, you could tell it which standard-defined modes of operation you would
like.

Currently, this is already _de facto_ the case outside of the standard. For
instance, users of GCC who want to do certain pointer aliasing without
unpredictable behavior tell the compiler to be in a mode in which that is
allowed (effectively an altered C dialect) using -fno-strict-aliasing.

This kind of thing should be a standard, portable feature. Look, everywhere in
this program, I want nice left-to-right evaluation within all expressions and
among initializers and function arguments. Except in super-fast.c; when
compiling super-fast.c to super-fast.o, please do it in a mode where
unspecified-evaluation-order semantics applies and optimize accordingly. (I
might even want this on a more finer-grained scale than translation units.)

We need safety, and we need to sometimes throw away safe semantics in exchange
for better speed (whereby we still ensure that the code we are writing is
correct with regard to the weakened semantics; i.e. we promise to the compiler
that even if it relaxes the evaluation order, or whatever, we know what we are
doing and everything is cool).

It would be good to do this without compiler-specific guesswork.

Optimization control _alone_ over a language with a single set of semantics
(which is largely unsafe) is a bad way to achieve this.

~~~
slededit
I agree you should be able to opt-in to a safety level. For example most
programmers do expect signed integer overflows to follow 2s compliment
behavior - regardless of what the standard says.

There really seems to be a large disconnect between the tool writers and the
users of said tools. I've had countless discussions with compiler writers
where we've essentially talked passed each other. The compiler writer arguing
from a legalistic (and dare I say moral perspective) that the programmer must
not write undefined code, and myself from the perspective that developers _do_
write undefined code and we should therefore find a way to handle it in a safe
way. Perhaps even by standardizing more than we do currently.

In reflection its not unlike the drug legalization debate.

~~~
jeffdavis
Keep in mind that the compiler doesn't always know whether code is undefined
or not. Is x+y at risk of overflow? Maybe, depending on some logic in a
separate compilation unit.

So what your asking is to define behavior for all code, such that the compiler
can prove locally that's it's well-defined. That's not a crazy idea, and a lot
of languages do that.

But C counts itself as special, for better or worse.

I think we really just need more alternatives. There's C++, Rust, Ada, and not
much else. I am very hopeful about rust.

~~~
kazinator
The problem is that C compilers don't bother proving anything, when it comes
to taking advantage of what is defined.

For instance, if x is signed, a C compiler can assume that x + 1 produces a
value which is greater than x, and optimize accordingly. It's up to the
programmer (and failing that, user) to ensure that x is less than INT_MAX
prior to the execution of that x + 1. In fact there is a proof involved, but
it is trivial. The compiler assumes that x + 1 > x because, by logical
inference, the only way this can be false is if x == INT_MAX. But in that
case, the behavior of x + 1 is undefined, and so, who cares about that; it is
the programmer's responsibility that this is not the case. Q.E.D.

Rather than require compilers to implement complicated proofs, which
essentially consists of a whole lot of work whose end result is only to
_curtail_ optimizations, it's far simpler to have ways in the language to
tweak the semantics.

If there is a safe mode in which x + 1 is well-defined, even if x is INT_MAX,
then the compiler can no longer assume that x + 1 > x. That assumption is
simply off the table. (Now the difficult work of proof kicks in if you still
want the same optimization: the compiler can still optimize on the assumption
that x + 1 > x, but that assumption can be false, yet x + 1 can be well-
defined! If an assumption can be false in correct code, then you have hard
work to do: you must prove the assumption true before relying on it.)

This safe mode is not linked to optimization level. Unless turned off by the
programmer, it is in effect regardless of optimization level.

Right now we have a situation in which compiler optimization or "code
generation" options are used for controlling safety. Programmers know that _de
facto_ these options affect the actual semantics of C code, and so they get
used that way. We know that if we tell GCC "-fno-strict-aliasing", we are not
simply defeating some optimizations, but we are in effect requesting a C
dialect in which cases of certain aliasing are well defined.

This is a poor situation. Why? Because it's nonportable (the next compiler you
use after GCC might not have this option). But, more importantly, semantics
and optimization must be decoupled from each other. We pin down the semantics
we want from the language (we pin down what is well-defined), and then
optimization _preserves_ that semantics: whatever is well-defined behaves the
same way regardless of how hard we optimize.

Semantics and optimization have to be independently controlled.

That is how you can eat your cake and have it too. If you think some code
benefits from an optimization which assumes that x + 1 > x, you just compile
that under suitable semantics. Then if you don't ensure x < INT_MAX, it's
really your fault: you explicitly chose a mode in which things break if you
don't ensure that, and then you didn't ensure that.

~~~
jeffdavis
That's a good point, and a lot of languages use that philosophy.

C has a number of things that have kept it going. Let's get some real
alternatives that address those reasons.

Or, you can look at Ada and rust, which are closer to what you want.

------
haberman
Given how aggressively compilers (including Clang) are exploiting undefined
behavior these days, it is surprising to see Clang developers (I presume) on
the bug being so blasé about introducing undefined behavior themselves.

~~~
chandlerc1024
Hi, LLVM and Clang developer here.

I don't see anyone being blasé about this. The issue comes down to something
really simple:

\- The compiler can emit substantially simpler code if it has a guarantee from
the platform's implementation of memcpy.

\- The compiler authors thought they had such a guarantee and so chose to
leverage it.

Now, maybe they don't have that guarantee on all platforms. If that's the
case, its flat out and simply a bug in the compiler. And you'll find folks on
that bug arguing that it doesn't actually hold, which is good and important.

But it's actually quite reasonable to have a guarantee from a platform's libc
that this will work and leverage that. The compiler isn't generating C code or
code bound by any standard. This is about two parts of the implementation of C
agreeing about what internal invariants will hold.

The problem this bug highlights (and it is a real problem, and not one anyone
takes lightly) is that we have insufficient agreement about these invariants,
both for certain platforms (glibc) and even within LLVM. Fixing this remains
absolutely critical, but may only involve documenting the agreement.

~~~
haberman
Hi Chandler,

I should have clarified that the comments I was referring to were on this bug:
[https://llvm.org/bugs/show_bug.cgi?id=11763](https://llvm.org/bugs/show_bug.cgi?id=11763)

I was surprised to see: "I don't think it's really worth warning about,
considering I don't know of any real-world implementation where it doesn't
work."

That sounds a lot like someone knowingly writing UB in a C program because no
compiler they know of breaks the intended behavior. I doubt any memcpy() has
ever guaranteed in any kind of official way that overlapping regions are ok,
even if it has historically worked in practice.

I mean I'm sure it'll get fixed now, I was just surprised to see this kind of
reasoning coming from someone who I presumes works on the compiler.

btw. I'm pretty sure I heard that glibc broke this behavior sometime recently.
There's a new memcpy() symbol in glibc these days with a different symbol
version that no longer provides correct behavior in the case of overlapping
memory regions. This is all second-hand from some other people I was working
with, but if I got their meaning correctly I think this issue will actually
have relatively widespread impact.

~~~
chandlerc1024
That comment was actually a considered comment: when we look at what warnings
we should produce, we consider cases where everyone has written code a
particular way and the fact that it isn't defined is actually a language bug.

We (the Clang community members, and I suspect the GCC community members as
well) have also worked to change language standards to provide guarantees
relied on consistently and where the lack of guarantee has no reasonable
basis. Also that comment is very old, and predates us learning about all of
why this is probably something actually worth warning on. That discussion
ended up in a different forum, and we do actually warn on stuff like self
assignment.

Understand that it is often precisely the people working on a compiler that
_do_ deal with the cases of when we discover widespread UB in real code and
whether or not the code is incorrect and must be changed or the standard is
incorrect and must be fixed.

But all of that is about a warning, and not about the actual question of how
the compiler and the system library interact.

Note that both Clang and GCC have at least some variants of this bug, and so I
don't actually expect glibc to break this in the foreseeable future. But we're
still actively working to defend against this on platforms for which we don't
have a guarantee.

~~~
haberman
I can't see an argument here for why the standard should be changed. memcpy()
does one thing and memmove() does another. Changing the standard so they are
the same just because memcpy() gets misused in real-world programs doesn't
make sense to me. If people misuse memcpy(), then have a compiler flag akin to
-fstrict-aliasing where memcpy() gets rewritten to memmove() unless people
compile with -fstrict-memcpy.

That seems a separate issue though from what calls the compiler should emit. I
am not following why LLVM would emit memcpy() when the semantics it actually
requires from libc are memmove(). Am I missing something?

~~~
Joky
It is not about making memcpy behave the same as memmove: in this case the
difference would still be between exact identity (src == dest) and partial
overlap.

~~~
haberman
I see, thanks for clarifying. I agree there could be reason to consider a
change in that case.

------
nikic
The bug report contains a quote from the C++ standard: "An rvalue or lvalue t
is swappable if and only if t is swappable with any rvalue or lvalue,
respectively, of type T"

What does this statement mean and why is it not tautological?

~~~
yelnatz
If you extract it for only one case, it's easier to read.

    
    
      >An rvalue t is swappable if and only if t is swappable with any rvalue of type T

------
ben0x539
You don't even need swap. t = t also emits a memcpy.

------
hendzen
Specifically in the case of a self-swap, which is a somewhat odd thing to do
in the first place.

~~~
dkopi
Imagine reversing a string:

    
    
      while(pStart <= pEnd)
           swap(*pStart++, *pEnd--);
    

If swap doesn't work for a self swap, this code would break in the case of an
odd length string. It's an easy fix in this case, but being able to swap an
element with itself is convenient.

~~~
Matheus28
A more common case, would be swapping a vector element with the back of the
vector and popping it off, which is kinda common.

~~~
plorkyeran
It was common pre-C++11, but these days you should just be doing a move and
pop rather than a swap and pop.

~~~
adrianN
Luckily all existing code has been modernized to C++11 /s.

------
steplee
Aside: am i crazy for pronouncing clang see-lang in my head?

~~~
boulos
No, you're not crazy. Some people definitely pronounce it C-lang, while others
produce a banging sound (klang). I prefer the banging sound ;)

~~~
masklinn
The banging sound is also the one used by the developer team:
[http://lists.llvm.org/pipermail/llvm-
dev/2008-July/015629.ht...](http://lists.llvm.org/pipermail/llvm-
dev/2008-July/015629.html)

------
boulos
IIRC, there are passes that also detect memcpy-"equivalent" loops and replace
them with the builtin to memcpy (mostly to get the vectorization benefit). And
of course there's SimplifyLibCalls (or whatever it's named) that undoes this
or any other "you called a standard library function we know how to inline in
this case".

~~~
majewsky
The second one sounds like big fun when you LD_PRELOAD a different memcpy
(say, for debugging purposes) and spend hours and hours trying to figure out
why it is not being called.

~~~
boulos
Yes, it's a nightmare, particularly on OS X with its funny loader semantics
that make replacing malloc and free correctly very painful. For example, you
can easily LD_PRELOAD your way out of malloc() but did you remember that
asprintf also calls an allocation routine and won't use your new malloc?
Enjoy.

~~~
comex
Hmm, why won't it? Maybe in the past when asprintf and malloc were located in
the same dylib, since OS X dylibs generate direct calls to other public
functions in the same image (rather than going through the PLT as GNU does by
default), but for many years /usr/lib/libSystem.B.dylib has been split into
several reexported libraries in /usr/lib/system, so there shouldn't be an
issue there. Probably better to use this technique though (should still work
in the latest version but needs mprotect):

[http://eatmyrandom.blogspot.com/2010/03/mallocfree-
intercept...](http://eatmyrandom.blogspot.com/2010/03/mallocfree-interception-
on-mac-os-x.html)

------
raverbashing
I wonder if it's possible to identify that both objects are the same (or might
just be easier to use memmove instead)

------
SFJulie
Reading most thread of llvm bugs so far about there "undefined behaviours" I
see a pattern.

Llvm goal is performance over correction. stop.

If old code breaks, it is you played with fire, correct your code to eliminate
these undefined behaviours. Stop.

We made wonderful coloured warning to help you. Full Stop.

Well on one hand I get the point, people played with fire. Code should be
"correct" by standards that are emerging, but I can predict one thing : before
the new "correctness pattern" that enables lots of awesome robust high
performing code spreads, there will be chaos.

My first self interrogation is in how much time llvm team can build a
completly correct stable compiler without any surprises? It it takes 2
decades, it will be a lot of chaos involved...

My second interrogation is can the code that "is incorrect" per LLVM standard
be obtained without rewriting the source code (I guess not), and how much will
it costs before all the code is being "corrected". Do we even have the man
power to do it?

You see, some C code may be used everyday but are so complex hardly any
developers even well paid like to look at them. (openssl, gnutls, libpng,
zlib, dhcpd, bind...). And with platform specific ASM, we may discover that
code has lost portability, and it may take a lot of time to discover that it
is already happening.

Don't we risk another strike of bad news one day like discovering old
software/libraries are behaving differently and need to be rewritten and that
no one want to do the same effort as libressl especially since some software
have a LOT of dependencies on C code (I do not think to loud of PHP or python,
or Perl).

Of course there are unit test, but there are so much of them, some tests (like
for python on freeBSD with curses) have been disabled, because "no one has the
time to fix them". It is ofc a bad example in this precise case, but in the
context, it feels like a growing ghettoisation between the CPU/OS/langage with
lots of eyeballs/investments, and the old/poor market. Especially non company
backed open source software. A smell of obsolescence by attrition of resource
to report bugs/rewrite code.

As a normal former experimental physicist I can feel a transition coming, it
smells of chaos and instability, and the question is always, how long, how
smooth before the new equilibrium if ever we reach it, and what will break in
the process? And well since transitions are transitions, we may reach a new
equilibrium, but nothing guarantees it will be reversible, hence some stuff
may disappear in the process forever.

I don't know why, but I guess the most vulnerable ecosystem to the changes to
come is the non commercial open source.

~~~
lmm
Much of the transition has already happened - most applications have already
moved on from C to safer languages. What's coming is either the replacement of
C libraries (Rust, very careful C rewrites), or perhaps bypassing them
entirely (OCaml unikernel work).

I would fear more for proprietary systems. Open source is relatively
innovative and adaptable.

~~~
SFJulie
Still it sounds like the death of portability.

We are kind of crowning the right to rule of the x86 computer industry. llvm
is very tied to apple, maybe it can extend to ARM. But it means that some HW
platform that may be to old may be difficult to maintain in the future (HPPA,
Dec alpha, MIPS, PDP, zseries?).

And some computers have life cycle that are more than 10 years. (Expensive
industry robots, telco switches, some medical devices, aeronautic/space CPU,
radars, some very old mainframe used for accounting, embedded automation
devices)...

Is this push to obsolescence really cool?

Okay critical system already handle the problem, but what about the stuff in
the grey zone? Stuff that were not critical but get adopted nonetheless
because they just worked and good be updated?

~~~
lmm
Huh? Higher-level languages are much more portable than C - you only have to
port the language runtime, and often someone's done that already. LLVM has a
lot of different backends available.

~~~
SFJulie
hum.... well. If you imagine that they are 100% idempotent on every platform
and 100% fully supported. It would work.

But as I stated in my example python/Perl/C# and probably others are in the
language support dropping either some platform or supporting only x% of the
feature on some platform...

I dare you to access you services using C#/mono on openBSD.

~~~
lmm
I'm running FreeBSD as it happens and C# works absolutely fine - I have even
just downloaded random .net exes and run them and had them work.

If you had a race where you picked a random C program and a random
python/perl/C# program off github/sourceforge/whatever and tried to run them
on OpenBSD, I'm pretty sure 9/10 times the python/perl/C# would win.

