
Clang Format Tanks Performance - ingve
https://travisdowns.github.io/blog/2019/11/19/toupper.html
======
BeeOnRope
Author here, happy for any feedback.

I'll own up to misleading-and-possibly-clickbait title, I just gave up trying
to think of anything better without revealing the conclusion.

~~~
JonathonW
Just to make sure I'm understanding this correctly: the performance issue you
found actually amounts to include-order dependent behavior in _GCC_ and glibc;
clang doesn't really have anything to do with anything here except that you
used its code formatter to sort your include statements.

If that's the case, the clickbait-y headline's bordering on useless, since the
performance differences here have _nothing_ to do with clang itself-- you'd
see the same results if you had ordered your includes the same way by hand.

~~~
BeeOnRope
I'm talking about clang-format, a formatting tool for C and C++ code, which
has nothing to do with _compiling_ your code with clang. You can use it with
any compiler.

That said, the title is still a lie in that clang-format is not really at a
fault here at all: clang-format _triggered_ the issue on my codebase, by
sorting header files, and this in turn triggered a libc performance issue.

So as I initially say it, a commit which purely clang-format 'd my codebase
caused a massive regression - but the full story is doesn't implicate clang-
format at all, really.

~~~
glandium
The title would be a /little/ clearer with a dash between Clang and format.

~~~
BeeOnRope
Good point, I put a hyphen a few places but not everywhere nor in the title.
It seems hyphen is definitely the standard though.

Fixed now [1], although the HN title will remain hyphen-free free I guess.

\---

[1]
[https://github.com/travisdowns/travisdowns.github.io/commit/...](https://github.com/travisdowns/travisdowns.github.io/commit/95c6788e011eaa9638e12223d6d7dbd0465e5d8f)

------
pdimitar
I may eat downvotes but here's an honest opinion nonetheless: this article
shows almost exactly why I wouldn't touch C++ with a ten-foot pole still,
after running away screaming from it 11-12 years ago.

Not saying it's an unique problem with C++'s tooling. I'm just saying that it
has been my experience that one is much more likely to stumble upon such
horrors while working with C++ as opposed to at least 7 other languages I
worked with.

I'll take the slower Rust and Go and Elixir, thank you very much.

Don't take this as trash-talking, though. I simply want to get things done,
not fight my environment.

~~~
wffurr
Rust is slower than C++?

~~~
The_rationalist
Yes, at best 20% slower but can be far more slower if you hit worst case
scenario.

~~~
Ygg2
Where did you pull those numbers?

Looking at [https://benchmarksgame-
team.pages.debian.net/benchmarksgame/...](https://benchmarksgame-
team.pages.debian.net/benchmarksgame/fastest/gpp-rust.html)

While the worst case is 40% slower (where even C perfored abysmally), most of
the time the Rust is within a 10% margin of error, and in quite a few cases
it's faster.

~~~
NoZZz
Strange benchmark, comparing llvm(rust) to gcc(c++ code)

------
abainbridge
Isn't the correct way to write this loop actually:

    
    
      void toupper_cmov(char* buf, size_t size) {
          for (size_t i = 0; i < size; i++) {
              char c = buf[i];
              buf[i] = c - 32 * (c >= 'a' && c <= 'z');
          }
       }
    

Because that persuades gcc and clang to transform the conditional branch into
a conditional move which avoids the branch mispredict penalty and removes the
need for a lookup table.

See [https://godbolt.org/z/xHYKft](https://godbolt.org/z/xHYKft)

Disclaimer: I haven't actually run this function. That would be too much like
work.

~~~
nocturnial
Why not use -march=native? When I look at it, I hope the compiler knows what
it's doing wrt cache and whatnot.

[https://godbolt.org/z/zRQRkr](https://godbolt.org/z/zRQRkr)

~~~
abainbridge
> Why not use -march=native?

The conditional jump remains even with -march=native.

[https://godbolt.org/z/2p7uAB](https://godbolt.org/z/2p7uAB)

~~~
nocturnial
Yep just noticed it.. wanted to delete the comment but was to slow apparently

------
thestoicattack
You also get the slow version if you include <cctype> instead of <ctype.h>.

[https://godbolt.org/z/cbeGc4](https://godbolt.org/z/cbeGc4)

~~~
BeeOnRope
Yes, I saw that too and mention it:

> As a corollary, if you include <cctype> rather than <ctype.h> (the C++
> version of the C header which puts functions in the std:: namespace) you
> also get the slow behavior because <cctype> ultimately includes
> <bits/os_defines.h>.

~~~
thestoicattack
Oh, so you do!

------
hombre_fatal
Please give us more than one pixel of color in the legend of your graph.

------
MiSeRyDeee
This is both technically in-depth while so hilarious. Love this post!

------
ufo
Can the std::transform version be written without a lambda?

    
    
        std::transform(buf, buf + size, buf, toupper);

~~~
BeeOnRope
Yes, it can!

In fact I had that at some point and clobbered it with a git reset (the perils
of developing on two machines and once and using git a sync mechanism).

Fixed [1] and credited.

\---

[1]
[https://github.com/travisdowns/travisdowns.github.io/commit/...](https://github.com/travisdowns/travisdowns.github.io/commit/cff0322d9b45d9ec18a8a069f7429589574b8b83)

~~~
account42
This could actually change performance though as the lambda version gives you
a unique templace instantiation of std::transform while the non-lambda version
requires std::transform to be inlined into the caller (or cloned to an
argument-specific version) before the compiler can make any optimizations
based on its knowledge of toupper.

~~~
BeeOnRope
Good point. I would _usually_ expect std::transform to be inlined, since it is
quite simple - but it may not be and it's a very good point in general
regarding the optimization probability for function pointers versus lambdas.

Ultimately, I ended up reverting the change, so using a lambda based on the
advice the taking the address of a standard-library function is verboten,
since they may introduce overloads at any time [1].

\---

[1]
[https://twitter.com/foonathan/status/1197051249822195712](https://twitter.com/foonathan/status/1197051249822195712)

------
lazyjones
Messy (esp. standard library) header files like in this case is why C/C++
isn't really fun today, for me anyway.

------
_kst_
> I was including stdlib.h just to get the definition for size_t.

size_t is also defined in stddef.h, and including that likely won't cause the
same issues.

------
ufo
A fascinating read. Is there a sane way to fix this in a future version of
glibc?

~~~
BeeOnRope
I mean if you wanted the inline body to be visible to C++ you just have to
slightly reorganize the ctype.h header so those definitions are available to
C++ code (i.e., visible when __cplusplus is defined), because right now the
bodies are getting caught the same #ifdef net that macro definitions are (but
the body is not a macro).

That said, any exposure of implementation details, including inline bodies
like this, is risky and constraining for future changes, since you bake in
part of the implementation and also commit to supporting that implementation
as far as you backwards compatibility strategy requires.

So it is entirely possible that the non-inlining is intentional, but then the
part about include ordering mattering doesn't make sense, and if you are
willing to commit for C programs, you are probably willing to commit in C++
too. You'd have to really spelunk the history of this files or ask the people
responsible to be sure.

It seems more likely that that conditional compilation block is intended to
avoid implementing standard library functions as macros, which is probably
barred by the C++ standard (e.g., so you can use the name `std::tolower`) -
and the inline bodies just got caught in the crossfire.

------
john-radio
Yeesh, I can't read assembly. Is there somewhere I learn what all movsxd, lea,
cmp, ja, and friends do in plain english?

~~~
slavik81
Follow the links in the article to the godbolt compiler explorer. The
disassembly there has a short description of each instruction when you hover
over it, and links to the full reference.

------
LessDmesg
Order of headers should not matter in any sane language. This just proves that
C++ is not a sane language. Yet Bjarne Stroustrup keeps getting engineering
awards! If I were king of programming languages, I would forbid any new lines
of C++ to be brought into this world, and install mandatory programmes for
rewriting legacy C++ code in sane languages.

~~~
intea
But the root cause here lies with GCC or rather glibc and not the language?
C++ is but a tool, just like every other language.

~~~
devit
It likes with C++ not having a proper module system enforcing properties like
module import order being irrelevant, and the glibc developers apparently
failing to produce modules having such properties even if the language doesn't
enforce it.

------
sesuximo
cctype is such garbage. not real functions, takes/returns int, not contexpr,
and now this trash. bleh

------
goalieca
Link time inlining would solve this.

~~~
BeeOnRope
In a sense, yes - specifically if libc could be link-time inlined.

I don't think many people are proposing that for libc though because it would
shut down the possibility to patch libc for anything that has been link-timed
inline, and more generally expose the internal implementation of libc in a way
that will make evolving it very difficult.

That is, link-time inlining will see a lot more use within an application
before it will ever apply to system libraries.

~~~
quotemstr
Applying LTO to libc would be no worse than statistically linking it. I don't
like static linking, but on Linux, we're stuck with supporting it.

~~~
gpm
Isn't glibc almost always dynamically linked on linux? I thought half the
point of musl was that it supported static linking.

~~~
quotemstr
Almost everyone _does_ dynamically link Glibc. You should dynamically link
glibc too. But if you really want, you can statically link glibc.

    
    
        $ dpkg -S '*/libc.a'
        libc6-dev:amd64: /usr/lib/x86_64-linux-gnu/libc.a
    

But please _don 't_. Static linking doesn't give you a compatibility
advantage: dynamically linked glibc is specifically designed to be forward and
backward compatible. Static linking only makes life more difficult later,
since it complicates use of NSS and PAM and precludes nice, clean shims
between user and kernel mode. Seriously. It was a mistake for Linux to make
the kernel-user transition the stable system ABI instead of putting that
stability boundary at the libc layer like most other operating systems.

As for static linking being a competitive advantage of musl: I don't know
where you got that idea. Musl is less featureful than glibc, not more. In
particular, I'm not going to use it until it supports dlclose like every other
full-featured computing platform.

~~~
gpm
> As for static linking being a competitive advantage of musl: I don't know
> where you got that idea.

That's how I've been told to build fully statically linked rust binaries, I
can't say I've dug into why glibc isn't used for those.

~~~
glandium
The usual reason is that if you want a binary that can be used "everywhere",
you can't just build on your machine without more thought if you use the glibc
because chances are you're going to be using symbols that aren't available in
older versions of the glibc, making your binary incompatible with many (older)
distros. musl is an easy way out of that. Another is to build against an old
glibc.

~~~
quotemstr
The right way to solve this problem would be to teach the linker to link
against the set of glibc symbols that existed as of a particular version
number or date. Then you'd just pass a flag (along with a preprocessor macro,
so the headers would know to define the right version of the linker symbols)
specifying this version target when you built your program. For want of this
relatively simple toolchain feature, we have people statically linking entire
C libraries. It's a shame.

You can do a moral version of the same thing using linker scripts, but nobody
bothers with linker scripts.

~~~
glandium
The macOS SDK does this better. It is indeed a shame that the glibc doesn't
care.

------
krzat
It's 2019 and people are still using languages where order of imports
matters...

