Hacker News new | past | comments | ask | show | jobs | submit login
Why does the C preprocessor interpret the word “linux” as the constant “1”? (stackoverflow.com)
275 points by dcro on Oct 9, 2013 | hide | past | web | favorite | 57 comments

And that's why, in my opinion, non capitalized macros are always evil (yes, I'm looking at you linux kernel).

When I see a lowercase symbol used as a variable, I expect it's going to be some actual global variable. When I see a lower case symbol used like a function I expect it's an actual function. And then comes the day I want to take the address of the "function" and it fails. Or worse, I make a syntax error somewhere and because of the macro expansion it gives me the most perplexing error message in the world and it takes me 5 minutes and digging through 4 levels of include files to realize I forgot a colon or something.

Not to mention the mess if the macro is poorly implemented and does things like using a parameter several times which causes hard to catch undefined behaviours in the application. Nothing worse than having a seemingly harmless code like "foo(a++)" turn into the bug of doom because you didn't know foo was a macro. Please don't do that.

On the other end when I get a weird behaviour/error message around a capitalized symbol I automatically think that it must be a macro and go dig for the definition.

This is why a major focus of D was to figure out the valuable use cases of the C preprocessor, and instead design replacements for them that are semantically part of the language.

On the whole, this has been very successful - about half of D programmers have a C/C++ background and aren't shy about being vocal if there's something they really need.

Yup. This is why in Rust macros have to end with a ! (and syntax highlighters know about this). This was a hard call, because it prevents using macros to create forms that look exactly like the built-in ones, but it prevents issues such as the one described in the OP.

That's not the original issue. The problem is that the compiler silently predefines an undocumented macro. If the compiler silently predefined an undocumented variable, type, or function called "linux" or "unix", that too can be a problem. E.g., if there were a global function linux() on Linux systems only and you tried to define a global variable linux in your code, you'd get an error, too.

The clean solution is to do what clang does and generate informative error messages. In the case of macros, by showing the macro expansion history. For example, the following (using __STDC__ instead of linux for portability):

  foo.c:2:7: error: expected identifier or '('
    int __STDC__ = 1;
  <built-in>:159:18: note: expanded from here
  #define __STDC__ 1
Note that there can be other reasons to have macros that are syntactically different from functions and variables, but this isn't one of them.

But Patrick isn't responding to the original issue, he was responding to simias' assertion that macros should be syntactically disambiguated.

  > If the compiler silently predefined an undocumented 
  > variable, type, or function called "linux" or "unix", 
  > that too can be a problem.
While we're on the topic of Rust, the compiler actually used to do this! In the early days, the identifiers "error", "warn", "info", and "debug" were all magical, undocumented constants for controlling logging levels. As you might expect, this caused problems. :P So don't think that Patrick isn't keenly aware of the perils of including non-keyword identifiers by default.

As for clang, their solution is the best they can do given that they don't have the authority to make the call over whether to syntactically disambiguate macros. It's definitely a debate to be had in the case of a language designed from scratch, and the Rust devs decided to err on the side of explicitness (you can contrast e.g. Elixir, which has the same sort of structural macros but chose not to distinguish their invocation syntax).

But... the ability to use macros to create forms that look like "real" syntax is sort of the point. I don't think you can fix this with a language feature or style convention.

Metaprogramming is "dangerous" because the meta constructs don't (can't!) act like what they look liks. Metaprogramming is "great" because the meta constructs look like simple programming but have surprisingly deep behavior.

As I see it all you did was add a "!" to the confusion, honestly. People likely to get tripped up by e.g. the "address of macro" problem above are going to be no less confused by the extra syntax IMHO.

When you see an exclamation point, you know that compile-time fishiness is happening. If you don't see it, you know that there's nothing fishy going on. You might not think that's useful, but I hardly see how it can add to the confusion.

For clarity I just want to note here that Rust macros are structural (like Scheme) rather than textual (like C).

People don't expect an ordinary-looking identifier like "linux" to macro-expand to something else, like the number 1. That was the source of the confusion in the OP. Syntactically distinguishing identifiers that will be macro-expanded from identifiers that won't does solve this problem.

But again, that's situational. This particular macro is confusing because it (1) looks like an identifier and (2) doesn't act like one, instead being a 1970's style platform identifier that (3) no one actually uses in favor of constructs like __linux__ or __GNU_SOURCE__ or whatever.

But the more abstract point is that macros allow you to provide abstractions that do look just like an identifier (or function call, etc...) and do act like identifiers in the context where they're used. And that this is a good thing, because that kind of syntactic flexibility is useful in defining domain-specific languages that make the expression of hard problems clearer.

Adding a bunch of "!" to those DSLs isn't helping anything.

Basically, we have a macro that was abused here. Your solutions it to make it harder to abuse macros, mine is to accept the occasional abuse (with a "don't do that" in most cases, those here it's a backwards-compatibility problem) in exchange for more clarity on the hard stuff.

As I understand it, it is one of Rust's goals to be predictable in the code it runs/emits. This is fitting in a language designed for performance-critical systems. I see what you mean about the point of macros being that they can transparently add compile-time behavior. That's just a different, inconsistent philosophy from that of Rust.

Sure. Lots of successful languages exist without macro facilities. Lots exist with it. Frankly I don't see much correlation with "performance-critical systems" given that the overwhelmingly dominant languages in that field both have metaprogramming facilities (C++ has two!).

I was just quibbling with the idea that putting a magic "!" on a macro somehow makes it a better design choice. I don't think it does, either as a macro or as a safety mechanism.

Ah! That's great, I didn't know that. The more I hear about rust the more I like it. But then I'm reminded that it bakes garbage collection into the core language...

Oh well, perfection is no of this world.

GC is being removed from the core language and will migrate to a library before 1.0.

There's a project, zero.rs, that lets you have no runtime in rust. It's been shown that it's possible to make a simple kernel with it without too much pain. The disadvantage, of course, is that libraries will use garbage collection, and as soon as it's used, you get the runtime, so you can only use certain libraries designed for zero.rs, which causes fragmentation.

All this would happen if gc was in the std library anyway though.

Rust libraries rarely use GC. The standard library goes out of its way to avoid it wherever possible, while remaining safe.

Fair enough. I just have to cross my fingers and hope the GC won't catch on then! :)

We strive to avoid GC like the plague in the stdlib, and the language itself encourages users to do the same. I wouldn't go so far as to say that we try to make it hard to use GC in Rust, but we definitely believe that it's currently too easy to resort to it over better alternatives. This is why we're moving it into a library.

As a thought experiment, what would happen if the GC was removed altogether? What consequences would that have (good and bad)?

No thought experiment necessary, you can do this today in two ways:

1) We have an attribute that you can stick in a module that will throw an error at compile time if a managed pointer gets used anywhere in that module, including in its dependencies. (Note: this is still experimental, and I'm not sure if it actually works yet.)

2) It's possible to write Rust entirely without a runtime, which means your result binary just plain won't contain any of the machinery for GC, tasks, or TLS (task-local storage).

This second approach does have consequences. Bad: random bits of the standard lib won't be usable, since our preferred means of error handling uses TLS (and some functional/persistent datastructures will use GC to handle cyclical pointers, since that's our only internally acceptable use case for GC). We have plans to look at dividing the stdlib into "usage profiles" to allow users to selectively disable certain runtime features while still remaining aware of exactly which pieces of the stdlib are still usable.

But the good consequences of disabling the runtime are that if your code exposes a C ABI, then you can write a library that can be called from any other language with a C FFI (i.e. basically every language, ever). This is actually one of my favorite things about Rust.

What's wrong with garbage collection being built into the language?

Nothing if you're writing your whole application in Rust. If you're making libraries or a kernel, though, it starts to present problems.

Performance. Rust is able to provide its desired expressiveness without forcing you to make that tradeoff every time you use it.

Two notes that'll date me pretty well:

(1) I had to port a game engine to solaris a few years ago. In that engine, if you looked up, you'd see a star emitting light. Guess what 3-letter, all-lowercase identifier was conflicting.

(2) Xt (only useful, really, if you're doing Motif work) had a macro or two that looked exactly like a function call, and had a very natural use case for incrementing an index you feed as an argument. If you knew not to do it, no problem. Otherwise, that was a mean bug to track down.

> (1) I had to port a game engine to solaris a few years ago. In that engine, if you looked up, you'd see a star emitting light. Guess what 3-letter, all-lowercase identifier was conflicting.

sel? The suspense is killing me! :)

I think it's sol!

The star in question is 93 million miles away...


oh, duh! The phrase "porting to solaris" should have clued me in.. <:)

The other problem is what I call greppability.

If I can't grep the header files/source code for a function call I will want to hurt you badly. Especially if a function call is hidden behind 6 - 7 different macro indirections!

/me glares over at OpenSSL...

I spent at least an hour a while ago trying to track down a bug that turned out to be caused by exactly this. I couldn't work out what was happening until I looked at the preprocessor output.

I'm increasingly turning to -E to debug these hard-to-understand problems. I'd love a -E2 or what-have-you that would be even more verbose, and list all preprocess transformations applied to a line and the source lines where they were defined.

Unfortunately, the standard preprocessor output doesn't provide a convenient way to do this without a comment format. I guess you could use a do-nothing #pragma.

Eventually, I'm sure somebody will spin up some extension to clang to show you the complete evolution of a block of code and what other code contributed to that evolution.

Emacs has a "c-macro-expand" function which is quite hackish but extremely useful, IMO: it attempts to macro expand the region.

For instance running it on the following code in the middle of a C source file:

    #define BOGO_MAX(a, b) a > b ? a : b

    int test()
      return BOGO_MAX(3 + 4, 5);
Creates a new buffer containing:

    int test()
      return 3 + 4 > 5 ? 3 + 4: 5;

How well does that play with ifdef? The problem is rarely what does this macro expand to, but which macro is going to be expanded.

Well, I guess it depends on what your code looks like.

That being said if I'm not sure if some code is being compiled I just add an "#error foo" and rebuild. Or even simpler I just type in some garbage to trigger a compilation error.

Just a note: This was on the stackoverflow weekly newsletter yesterday. If you're interested in more questions like this it's worth to check it out, I like it a lot.

For anyone wondering where to subscribe: http://stackexchange.com/newsletters

PS: That's one of the few reddit customs I really like "link for the lazy" :)

While I can't imagine much discussion here that couldn't occur on SO, I did want to say thank you for posting this. I found it quite interesting.

+1, I love those interesting Stack Overflow (and Stack Exchange) posts that started to appear here every other week. I always learn something new/interesting from them. Thank you dcro :).

no doubt the SO question will soon be closed as "not a question" or "too interesting" or some other draconian response :(

They'll lock it to keep newbies away if nothing else, but I doubt it'll get shut down. This is a "good fit" for the Stack sites though - it's a direct question with one single direct answer.

This is awesome, I am going to divide by linux in my code everywhere in an attempt to obfuscate what I am doing.

You should divide by (linux - unix).

At least anyone trying to port your code to Windows will be a bit surprised... :-)

Edit: Or /.*BSD/. :-(

Non portability to Windows is a feature.

If you're in the habit of writing "struct sockaddr_in sin;", you'll probably be tempted to write "struct sockaddr_un sun;". Just don't expect that code to compile on Solaris.

Is that just hardcoded into the preprocessor itself? Or is part of ANSI C. Wondering if the MS C++ compiler will do this well.

It's not and can be tuned using compiler options as mentioned in TFA. Definitely not standard.

Hacker News C-discussion response template:

"That's why [insert other language] has [insert feature]. This avoids some of the edge cases in [insert popular non-C language] along the lines of [insert unused obscure language]. Early benchmarks suggest it's almost as fast as C!"

More or less equivalent to the question, "Why is clang/llvm running roughshod over gcc?" It's the sum of a lot of really questionable decisions, like this one, made over a very long time.

It's quite sad that you must specifically ask for conformance.

Doesn't name Keith Thompson ring a "bell"?


(No relation to Ken Thompson, if that's what you were thinking.)

Linux is No. 1

Unless you compile it in Windows.

One could argue the real problem here is C's weak typing and lack of a proper boolean type.

Wow. This would've been amazing news... back in in 1992.

Which makes it amazing news to people who weren't programming back in 1992.

Some people here likely weren't even born in 1992.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact