
Understanding GCC Builtins to Develop Better Tools [pdf] - mpweiher
https://www.manuelrigger.at/papers/GCCBuiltins-ESECFSE19-preprint.pdf
======
teddyh
Not a builtin, but a GCC feature which has improved my C code hugely is
__attribute__((cleanup)). It is used to specify a destructor for a local
variable; when the variable goes out of scope the function is called with a
pointer to the variable as an argument.

Example:

    
    
      {
        __attribute__((cleanup(foo)))
          int a;
          …
         /* foo(&a) will implicitly be called here */
      }

~~~
therein
Didn't know this existed. The closest thing I can think of similar to this is
`defer` of Golang.

------
jart
It'd be great if the GCC devs adopted a process for introducing new builtins,
that required code review, unit testing, and justification of use-cases.

Some of the GCC builtin wrappers are outstanding, e.g. __builtin_memcpy on
two-power sizes, since it can do things that wouldn't be possible otherwise.

Some GCC builtins, e.g. __builtin_va_start, can take away flexibility since it
locks into the core codebase nontrivial code generation that's suboptimal in
certain cases.

Most of the builtins are neutral, e.g. atomics api, which provides 42
different ways to say asm("mov"), 13 different ways to say asm("xchg"), and
twenty different ways to asm("lock cmpxchg").

Other GCC builtins, e.g. __builtin_ia32_rdrand32_step, don't even have the
multi-architecture benefit, only seem to exist because some folks disagree
with the asm() keyword, and is naturally problematic from a security
standpoint when the GCC devs get it wrong.

What's great about builtins like asm() is it's one simple general-purpose
abstraction that puts the power of so many compiler internals into the hands
of the user at once, rather than being another narrowly-focused case-by-case
feature you need to beseech the core dev team to get.

~~~
earenndil
The main problem with assembly, afaik, is that gcc doesn't know how to
optimize even very simple assembly.

~~~
canarypilot
GCC treats inline assembly as an opaque blob of text It knows nothing about,
plus some register constraints it knows something about. It will “optimise”
assembly if you’ve passed the right registers as input/output, and haven’t
marked it volatile and memory clobbering. Simple things like eliminating it if
it is dead code, but this probably isn’t the optimisation you were looking
for!

Depending on who you ask, this is a feature, not a defect. Inline assembly is
your last line of defence against a compiler which thinks it knows better than
you.

Clang/llvm go a step further and do parse and optimize your inline assembly;
that makes me uncomfortable, but I’ve always subscribed to the “don’t touch
user asm” mindset for compiler development.

Your main benefit of builtins over assembly is that no matter how hard I tried
I couldn’t get the suggested replacement given by the parent for
__atomic_compare_exchange to assemble on my arm64 server!

~~~
jart
I've never coded directly for ARM, but if it's anything like x86, then this
example might help you (on GCC6+):

char DidIt; asm("lock cmpxchg" : "=@ccz"(DidIt), "+m"(IfThing),
"+a"(IsEqualToMe) : "r"(ReplaceItWithMe) : "cc");

~~~
canarypilot
Thank you for assuming the best from me, and providing a genuine attempt at
help in response to my much more bad-faith remark.

Two things complicate the AArch64 (Arm64) story for atomic builtins.

One is Arm’s weak memory model. This permits different, more efficient,
implementations of compare exchange depending on which of the C++ ordering
guarantees you need and ask for. A compiler will generate different code for
the sequential consistent model than it will for acquire/release.

The other is that there are two ways to provide atomics on Arm, and which you
have access to depends on the architecture revision implemented by your
particular Arm hardware. If you have Armv8.1-A atomic operations available to
you, you’ll use a similar interface to x86; a single instruction which looks
like a cmpxchg. If you don’t have the atomic extension, you’ll instead be
using a load-lock/store-conditional loop of ldrex/strex for all of your atomic
operations.

Getting this exactly right is tricky, so where possible I’d encourage using
the atomic builtins (or, of course, your language’s portable API, over
assembly).

The biggest benefit is that if in future architecture technology moves forward
again, the GCC and language built ins will let you move with it, while
assembly leaves you relearning occasionally.

~~~
jart
I think it's awesome that GCC provides builtins that help ARM devs generate
the best atomic code possible, for ARM. As someone who only cares about her
coding running natively on x86, I would obviously never use C/C++11 atomics,
since it's kind of regressive for a high-level language to specify 42
different ways to say "MOV". How would I even know my code works? The way I
see it, the "portability" here only flows in one direction: helping
ARM/MIPS/etc. folks move to x86. The only thing I don't get, is why it's part
of the language standard.

------
TazeTSchnitzel
It surprises me more of these haven't become standardised features.
__builtin_expect is very common (but as a compiler hint I somewhat understand
its absence), __builtin_clz is arguably a missing feature.

~~~
jart
I suspect (but haven't verified) that modern compilers are actually
implementing __builtin_expect() as a no-op. Much better alternatives exist
today, e.g. GCC 8+ __cold__ and profile guided optimizations, which don't have
a negative impact on code readability.

