Hacker News new | comments | ask | show | jobs | submit login
#ifdef considered harmful (1992) [pdf] (usenix.org)
39 points by Annatar 6 months ago | hide | past | web | favorite | 46 comments

I’ve been maintaining unifdef since about 2001. One of the rules I have set myself is to never use #if in unifdef. It’s an easy program to port, so this isn’t very hard :-) but Microsoft’s appalling C libraries make it more painful than it should be.

The main technique is to set the include path depending on the behaviour of the target platform, and have per-platform variant header files. For unifdef, this basically means standard vs windows.

The other thing I don’t have to deal with is build-time configuration options or feature switches. It’s easy to get into a combinatorial explosion with #if but kinda hard if you force yourself to reify each option as a pair of include files!

Isn't this just moving the goal posts? Now all of your #ifs and the complexity they bring are in your build system.

It already was in the build system. The macros that control conditional compilation when one is doing it with the preprocessor do not appear out of thin air. A build system has to set them appropriately in the first place.

All of the mechanism for auto-detecting (or manually specifying) the capabilities of the target have to be there. It is simply a case of whether they set preprocessor macros via command-line options or tell the build system to use alternative source files.

The latter has the advantage of only creating build dependencies against the modules that actually use the library functionality in question, rather than against the compiler command line and potentially forcing a re-build of everything (when an honest dependencies mechanism is being used).

For gRPC we needed the code to work on all major OS's and had to support a number of different build systems. In no case were the capabilities of the target system manually specified, as that wouldn't have been sustainable. And specifying for each build system what files to include or not for each platform would have been onerous and hard to work with (as the person adding some platform-specific code probably isn't fluent in build systems for other platforms).

So the solution we adopted is a header file that detects the target platform capabilities. And each source file being either platform-agnostic, or specific to one platform. In the latter case, the whole file contents would be #ifdef'd out if compiling for a different platform.

Perhaps there are fewer conditionals in the build than there would be in the code.

But your build system will probably need that platform specific piece anyway, so the question is more "should we have one or multiple switches on platform type".

It bears noting that at some point, you will likely need to #ifdef. There's no way around it for non-trivial code. Simple example: Getting the size of a file if said file is greater than the maximum size of long, meaning you can't fseek to the end and use ftell(); you'll need platform-specific APIs like stat(2) or GetFileSizeEx().

You can also do the platform checks in the build system instead and have multiple per-platform C files implement APIs for portability shims -- but given that the platform check is likely unportable (at least in make as specified by POSIX), you're just making different portability assumptions.

> You can also do the platform checks in the build system instead and have multiple per-platform C files implement APIs for portability shims -- but given that the platform check is likely unportable (at least in make as specified by POSIX), you're just making different portability assumptions.

And yet, it seems that's what the authors of the article did:

The best method of managing system-specific variants is to follow those same basic principles: define a portable interface to suitably-chosen primitives, and then implement different variants of the primitives for different systems. The well-defined interface is the important part: the bulk of the software, including most of the complexity, can be written as a single version using that interface, and can be read and understood in portable terms.

Let's not forget microcontrollers, the moment one wants to write something more universal (different MCU support) using ifdefs is basically unavoidable.

Not when using one translation unit per MCU and letting the Makefile pick the right one.

Back on my C days, I was able to write portable code between Windows NT, Windows 2000, Red-Hat Linux, Debian Linux, Aix, HP-UX, Solaris where the #ifdef were only used as include guards, regarding my own code.

Seconded. And it's only gotten better, as uc code is about the best case for LTO being both small and all in one final image. Now there's not even the specious argument about "artificial" compilation unit boundaries hurting performance terribly.

Also, you'd be surprised how heavy software can get without outgrowing single-unit LTO, as long as you don't try to pull in all of Qt/GTK or similar bloat in.

(I don't see those as a bad choice, but they are too heavywight for this style of LTO to work well.)

> Not when using one translation unit per MCU and letting the Makefile pick the right one.

Could you please link any examples of that I could examine myself?

Also, makefiles aren't really an option with platforms like Arduino, the IDE does not really support them.

People have written Makefiles for arduino-projects, for example this project:


To be honest the simpler way is to do it "by hand", if you enable the appropriate options in your arduino IDE you can see the actual compilation-commands which are executed. Copy and paste them to a Makefile..

I'm very familiar with make, etc, but still use the IDE for my Arduino projects (simple as they might be). I just use Emacs as an external editor.

As you can see from my OS list, those days were a while back.

Basically I adopted some ideas from Pascal/Modula world, plus some tips from somewhere/someone during my university days.

Basically you consider a translation unit as C's vision of a module, and use Abstract Data Type types for the data structures.

The header contains the common code, and only on performance critical code are used macros instead of functions (validated with profiler that is worthwhile doing so).

Then you have module.c for the common code, followed by module_os.c for each platform.

Sorry for not providing Makefile rules, it has been a while.

Err, you can use the Arduino ecosystem with gnu make.... look for it, there is documentation, which does have ease of use to wish for and requires installation that drives me towards believing in chroot.

Sure I can, but it's ugly and cumbersome and doesn't work when you try to provide a library that could be used with the IDE.

I may be wrong, but I was under the impression that this was portable on all POSIX systems:

    uname_S := $(shell sh -c 'uname -s 2>/dev/null || echo not')
    ifeq ($(uname_S),Linux)
This is how git does its conditional compilation purely powered by Makefile logic: https://github.com/git/git/blob/master/config.mak.uname

Ack… please don’t do this. It will break if you’re cross-compiling binaries for a different OS than the OS you’re building on. If you want to do per-OS configuration in a Makefile or some type of configure step, that’s fine – but you should do it either based on a target triple (--host in autoconf-speak) which can be overridden by the user, or by running the given compiler on test programs and checking the output (like autoconf does, but ideally with one or two tests rather than a bazillion of them).

That requires GNU Make; neither $(shell) nor ifeq are in POSIX Make.

":=" assignments are not in POSIX either.

Makefile logic is basically just an ifdef one level up.

But your per platform build logic pretty much has to exist anyway on any non-trivial project, as differences in compilers (even different versions of the same compiler) crop up. Better to just stick that distinction all in one place, IMO.

Which is exactly why I'll often try and keep build configuration in C++ #ifdef s.

Everyone wants/needs proper Visual Studio projects in gamedev. And XCode projects. Maybe makefiles on top of that for some of your other platforms. Maybe a project generator to generate it all. Maybe some .props files to express things that the project generator can't. And then CI build server configuration on top of that. And then, because you're trying to another CI solution, a second CI build configuration.

It's a mess of multiple unrelated tools - so it will never be in "one place" at the build system level. Shoving it in there anyways causes tons of duplicate build configuration logic. Alternatively, I could minimize the configuration in there - to just the bare minimum needed to drive high level configuration - and then centralize all the derived configuration in the one thing that's shared between them all: C++.

C++ is a common denominator among my coworkers. When they have to dig through Gradle or Ant configuration because we're doing Android builds, I become a bottleneck. Lame. Yet another project generator? Ends up on my plate again. Lame. My coworkers have legitimately better things to be doing than becoming build system experts.

They already understand "hey, I can pull configuration #define s into a separate file and only rebuild the things that #include it", meaning minor settings tweaks don't invalidate several build hours worth of compiled translation units, something I still don't know how to effectively do in some build systems.

I mean, that's the whole point of cmake, ninja, etc. They'll build all of those projects, complete with testing and other ancillary projects. You don't have to be an "build system expert".

And you can absolutely make your build times better than what you're citing, coming from someone with binaries measured in tens of megs with templates out the wazoo. Just break up your dependencies. Seriously, you can almost certainly get ten to hundred times faster with only a little bit of work (which more than pays for itself in increased productivity).

> I mean, that's the whole point of cmake, ninja, etc. They'll build all of those projects, complete with testing and other ancillary projects. You don't have to be an "build system expert".

Every company I've worked for has had a new an unique variation of which ones they use and how. Most recently, I've started submitting the occasional fix upstream for genie, a premake fork. They're a horrifically leaky abstraction though, and invariably I need to fix them, extend them, or partially sidestep them. By the time you've gotten up to speed on two or three of them well enough to effectively manage all that configuration, you've probably become a build system expert, if only to debug when they fall over. I guess you don't technically need to be one though?

> And you can absolutely make your build times better than what you're citing,

I mean, yes. It's an ongoing battle. I enumerated one of many improvements - breaking up your configuration dependencies.

> coming from someone with binaries measured in tens of megs with templates out the wazoo.

Typically I'm dealing with gigs of build outputs (tens of gigs including temps like .obj files) per individual cell of a rather large CI build matrix. My current job is no exception.

> Just break up your dependencies.

Trust me, I get it. I'm currently dealing with several hundred separately built projects. Coworkers are happy that VS2017 can load our .sln s faster, and we have our project generator able to select subsets of that in dozens of different ways.

But I work on a large C++ codebases with macros. If I tweak something truly and fundamentally global - the configuration of if and how our assertion macros should expand, for example - no amount of "breaking up dependencies" is going to fix anything. I will have invalidated everything. The build server will rebuild everything, because that's a dependency in everything. Those macros are used in a large enough subset of TUs that we toss it in a precompiled header for improved build performance. That's legitimately the right choice, and it still means rebuilding literally every single TU.

When that's configured as inherited global defines in a project generator, as they almost always are, and people add their own configuration defines there, as they almost always do to "keep things in one place" and all that, suddenly tweaking configuration options that are used in one place rebuild the entire tree. So I go to fix that when I find myself needing to tweak them.

Say you have a network desync check assertion or debugging macro exposed by a networking module (we'll imaginatively call it 'network'.) Where do you drive the #define s configuring when it's removed?

In the generator config for the network project's private defines? No, this is a public macro, stripping it from the consumers requires they see the config too.

In the generator config for the network project's public/inherited defines? Touching that now means rebuilding everything that uses the network module, which might be a lot more widely used than the macro.

In the generator config of each project that uses the macro, so they can specify which file(s) to make the configuration defines in? That's incredibly brittle and an invite to merge hell and a lot of duplicate configuration spread all over.

In config_network_desync_checks.h? You can #include it from just the header that defines the macro. Tweaking it will only rebuild the TUs that actually depend on that file. You can still put them in "one place" (e.g. a central directory) if you want and still retain those advantages. You don't have to rerun your project generator, wait for it to rebuild everything, click through Visual Studio prompting you to let it reload the project, watch it hang or occasionally crash as it reparses things... you can just edit, let your build system recompile a few TUs, relink as needed, and then just run.

If they aren’t build systems experts, they have no business programming. Seriously. That’s their job as programmers. What kind of a programmer doesn’t know build systems?

Plenty of webdev doesn't (or at least didn't, once upon a time) involve build systems at all - with that in mind, why would a webdev need to know? Even now, what use does a frontend developer have for C++ build rules arcana when they don't even touch the language? And how many thousands of lines of msbuild rules have you written? Are you sure you're a build systems expert? :P

Granted, I'm going to be extra thorough in a code review request from a webdev dabbling in C++. Similarly, I'd hope those same webdevs would be extra thorough in reviewing my dabblings in JavaScript. Programmers have different specializations - nothing wrong with that, properly supported. Some sacrifice breadth for depth, or vicea versa, and there's nothing wrong with those choices either.

So what do you do then?

any place you have to parameterize signatures or behaviors or types, calling it something meaningful and define it in a platform/compiler/os specific header.

sure, in the end its the the same thing, but i find it to be easier to manage over time. and when you need to port to a new platform/compiler/os, you have a list of the important dependencies at hand...maybe you have to define a new one

I don't consider ifdefs harmful.

But if you do, you must surely consider logic in Makefiles equally harmful.

Why so?

No; this is GNUmake specific syntax.

The simplest and cleanest thing to do is to write a Makefile per uname(1).compiler, like so:

...and so on. The source code and generic targets like clean: would go into a central Makefile.core which would then be included by the OS.compiler specific Makefile. The OS specific Makefile would only set $CC, $CXX, $FC, $CFLAGS, $CXXFLAGS, $FFLAGS and so on.

This way, no configure system would be needed and adding support for a new $OS.compiler suite would be trivial.

The code would have to be so clean as to allow compilation by any compiler of course.

> $(shell sh -c 'uname -s 2>/dev/null || echo not')

Is there a reason they don't use simpler:

  $(shell uname -s 2>/dev/null || echo not)


This doesn't answer my question.

It tells you whom to ask and what xe was thinking at the time.

Yes, nobody likes using #ifdef but if you need it, its great that its there

Hence, why the abstraction translation units should be named according to their abstraction level.

Meaning module_posix would be for all POSIX platforms, whereas module_aix would be Aix only.

Similarly, if you’re writing an native extension targeting multiple versions of ruby, python, etc., ifdefs are unavoidable.

Every construct can be harmful if one doesn’t understand its purpose.

The article does not seem to mention any uses of conditional compilation outside of portability. But consider the most common case: debug and release versions of the same program.

You want the extra checks and diagnostics in the debug version and fastest code in the release. Using assert() will only get you so far because you might need to compute values to assert on and if those require function calls then the calls will still be there when asserts are eliminated in the release. And if you want additional room in your data structures to store debug info then assert() is of no use.

When you say "the most common case", I assume you mean "the most common case now", and not in 1992 when this paper was written.

The abstract does start "We believe that a C programmer's impulse to use #ifdef in an attempt at portability is usually a mistake" so it's fair to assume that they were not trying to cover other uses of #ifdef.

That said, your points weren't completely ignored. Page 193 mentions #ifdef use in debug builds in the text starting "An optional feature such as debugging assistance ...".

The part about "additional room in your data structures" follows under the guideline "Such declarations, in turn, preferably should be confined to header (.h) files, to minimize the temptation to introduce #ifdef into main-line code".

“Debug” and “Release” only pertains to Windows. You are always supposed to build with -g on UNIX and UNIX like operating systems because besides a few miliseconds of startup time, including the source code into the binary doesn’t affect anything but file size. The runtime linker skips those sections and links only code and data into memory, but the source code is there if you load it into the debugger. Kernel engineers taught me this.

There are much more differences between debug and release than having debug info included in the executable so your reasoning makes little sense. Many people, for example, build games for PS4, which runs a FreeBSD Unix-like OS, unawares that their debug and release builds only pertain to Windows...

If someone wants to fix Wikipedia's article on Henry Spencer, a full citation just waiting for {{cite conference}} is at https://news.ycombinator.com/item?id=17610233 .

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact