Hacker News new | past | comments | ask | show | jobs | submit login
What every scientific programmer should know about compiler optimizations? (acm.org)
94 points by blopeur 81 days ago | hide | past | favorite | 31 comments

The paper covers inefficiencies in compiler-generated binaries, either from missed optimization opportunities or from the compiler's own poor code generation.

The paper shows that some of these perceived inefficiencies can be resolved by manual changes on a few lines of code, such as eliminating redundancies. However, the paper then demonstrates that such manual changes can sometimes make the results worse because a particular compiler might not be able to perform inlining or vectorization.

The paper concludes the optimization study by stating, "no compiler is better or worse than another production compiler in all cases." (The study was on GCC, LLVM, and ICC.)

That conclusion is hardly insightful. We don't expect one compiler to be universally better in every single conceivable situation that another. Of course (in terms of performance) one can still build benchmarks that evaluate how good it is in typical numerical code, for instance.

Yes, however typical users do appear to expect or claim that, normally for the Intel compiler, on the basis of myth and not measurement and understanding. I'm not convinced there is typical numerical code across scientific applications, though; consider the (behaviour) on the Polyhedron Fortran benchmarks, for a start.

I used to work with some folks who believed the Intel compiler was amazing. I think the logic to the myth was, "Who would better know how to optimize for Intel chips?".

However, in my experience icc crashed more than any other compiler I've ever used (internal compiler errors). That kind of a bug in a compiler makes me really wonder what else is wrong under the hood. And of course, being closed source, I can't possibly know...

Indeed, and you can be told you have to use it even when it won't compile the code, or the result won't run correctly (and obviously you should use it and MKL on AMD hardware too). At least it's not as bad as the infamous "Pentium GCC" from long ago.

The conclusion is just a restatement of the No Free Lunch theorem in optimization.

[1] https://en.m.wikipedia.org/wiki/No_free_lunch_theorem

A lesson I learned as an undergraduate: Sometimes changing optimization settings can yield different answers.

It's been ~17 years since then, so I can't recall the options used, but seeing different results emerge when I used a higher optimization option caused an unanticipated puff of smoke from my little brain.

A possible moral of the story -- test every little change.

I used to be a compiler developer and this was no big deal (except when it was an actual code generator bug!). The compiler switches tell the compiler to do different things, so of course some of them change the output semantics. It’s up to the user to decide what she wants — it’s not like the differences are obscure or surprising.

Then I went on to other things and realized that developers are experts in their domains, and those “obvious” things were only clear to someone steeped in the code — not obvious at all. In fact some of them are no longer obvious to me any more after decades doing other things. Hopefully I’ve become at least slightly less arrogant and thoughtless by now.

I used to be a compiler maintainer because I was the maintainer for a large scientific suite -- i.e. I was on both sides of the fence -- and I haven't really changed my opinion. The arrogance I've seen has mainly been from scientific users who lecture you about the code you've worked on and the language standard, neither of which they know. A classic case is ignoring the Fortran storage association rules (that happen to have some significance in GCC history).

Are you talking about things like very small differences in the results of floating point calculations?

Some optimization switches are off by default as they can subtly change the semantics of your code.

Yes, this can trip up even quite knowledgeable people as with the fp post from Lemire the other day — he thought gcc was making a mistake, but it was doing the correct thing.

But there are plenty of others, for example telling the compiler not to worry about aliasing. I might use this on certain modules or individual files which were carefully written for this.

Or asking for more aggressive loop unrolling Which could increase code size but improve performance (or not!).

99.9% the defaults are fine.

Very small errors in floating point calculations can be made arbitrarily large.

There are domains where floating point problems being ill-conditioned is the norm rather than an exception. Try running `1e100+1e20-1e100` in python, js, etc....

I'd say the moral is to write standard-conforming, well-defined code, and use standard-conforming compiler options, e.g. not the Intel defaults. Also use the available warnings and diagnostics.

Well-formed code is not sufficient. For example, C11 says this about floating point:

> The accuracy of the floating-point operations (+, -, <star>, /) and of the library functions in <math.h> and <complex.h> that return floating-point results is implementation-defined, as is the accuracy of the conversion between floating-point internal representations and string representations performed by the library functions in <stdio.h>, <stdlib.h>, and <wchar.h>. The implementation may state that the accuracy is unknown.

There was an article recently talking about the bizarre rounding that happens when the x86 FPU registers get involved. You get one rounding for operations occurring in the register (which is 80-bits wide) and another if you get kicked out of the register and into RAM (which is only 64-bits wide, for doubles).

I've seen that kind of rounding error in code that went through emscripten.

The initial C++ code predated std::log10 so it was implemented using a change of base, i.e. log(value)/log(10). When this was translated into Javascript, it was as Math.log(value) / Math.log(10) which rounds to 64 bits each step of the way. In this particular calculation, this caused error to build up over time.

Replacing the C++ code with std::log10 translated into Javascript as Math.log10 and since the browsers natively implemented that, the values would stay in registers and only round once.

I said "well-defined" for that sort of reason, but there's more. Fortunately I haven't had to discuss -ffloat-store (which may have been originally for m68k) for a long time.

I've skimmed the paper, and I'm not sure what to make of it. Definitely you need tools, but even basic profiling is a good start and quite rare. Suggesting changing source code because of the behaviour of a particular compiler version's optimizer is a bad idea generally, especially without checking the optimization reports and investigating tuning the available parameters, like cost models. Optimizers can be quite unstable for good reasons.

The paper isn't concentrating on what are probably the two most important performance governors these days, vectorization and fitting the memory hierarchy, which can make a factor of several difference. ("Your mileage may vary", of course.) Then it isn't obvious whether comparing different generations of compilers was even sufficiently controlled; their default tuning may be for different micro-architectures which were considered the most relevant at the time. GCC performance regressions are monitored by maintainers, but I wonder what real optimization bugs uncovered in a maintained version have been reported as such from this sort of work.

In large scale scientific code, such compiler optimizations may be relatively unimportant anyway compared with time spent in libraries of various sorts: numerical (like BLAS, FFT), MPI communication, and filesystem i/o. Or not: you must measure and understand the measurements, which is what users need to be told over and over.

Video : https://www.youtube.com/watch?v=YbHzM6bNTXk

CCTLib - Framework for fine-grained monitoring and execution-wide call path collection. : https://github.com/CCTLib

Why does this title have a question mark at the end? (The ACM did it -- it was copied accurately here?)

How does this pan out for major sci libraries, like NAG, IMSL/MKL, GSL, BLAS, LAPACK etc.? Do distributions control the optimization levels used with a particular compiler, or just make sure it simply passes the tests?

I'm seeing these kind of titles (what every X programmer should know about Y) quite frequently which seems like the trend of adding "for humans" in front of library names.

>I'm seeing these kind of titles (what every X programmer should know about Y) quite frequently which seems like the trend

Some call it a "snowclone". (Previous comment: https://news.ycombinator.com/item?id=19072271)

It's a headline writing rhetorical device that's been with us even before computers. E.g. newspaper headlines.

Are you saying that we should replace "by {poster}" with "posted with love by {poster}"?

Only if there really is love behind it. Otherwise, such a message is fraudulent.

I guess I need to have a talk with the Atom devs about expectations and boundaries.

The question mark at the end was new.

They can sometimes be more interesting than 'X considered harmful' articles.

I found the title irritating and childish, as if the author knows what all scientific programmers need or require.

Doesn't the question mark at the end make up for the arrogance of the rest of the title?

Applications are open for YC Winter 2021

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact