Hacker News new | past | comments | ask | show | jobs | submit login
C Macro Magic (sagartewari01.com)
210 points by sagartewari01 24 days ago | hide | past | web | favorite | 96 comments

This is exactly how the list implementation in the Linux kernel works [1]. I've extracted this header file and used it in user-space before, too. The cool trick is in calculating an offset to the start of the struct containing the link.

But I do not think this trick or list implementation is newly introduced in Wayland. It existed at least as far back as kernel 2.6.11 [2].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

[2] https://elixir.bootlin.com/linux/v2.6.11/source/include/linu...

I've been using queue.h from bsd for as long as I can remember. It's slightly different syntax, but it's available on linux, from userland, and is not GPL...

Note that the one that ships in linux is pretty simple, the one that is in BSD is a lot more complete.

This is an exceedingly common way of using containers in C, because how fast and predictable it is during the run-time. It also allows putting a single data item into multiple containers like that.

It's used pretty much in every OS kernel, Windows included.

I've seen this same trick done with macros in BSD kernel code from the 80s!

> Maybe that’s the reason people who use the C language love it so much. You can’t do that in any other language.

But you can't do it in C. __typeof__ is a GNU extension.

> Note that wl_list is not a very safe data structure, in the sense that a programmer need to know exactly how to use it.

You can do it in C++ with template, and it would be type safe.

>But you can't do it in C. __typeof__ is a GNU extension.

Yes you can. You just have to provide the type every time you use the macro yourself. You can even create a macro that creates helper “methods” for you.

With that said, I really dislike how common it is for C programmers to just use whatever their compiler's authors felt like adding and still calling it “portable C”. At this point, if you dislike “plain” C so much, you might as well just switch to a subset of C++ or D's “Better C” mode.

Yeah, i know you can even do it without providing the type:

    #define wl_container_of(ptr, sample, member) \
        (void*)((char *)(ptr) + ((char*)(sample) - (char*)(&(sample)->member)))

typeof is a GNU extension, but it (or an equivalent) is implemented in every compiler anyone actually uses.

You can do it in C, you just can't do it in the part of C the standards committee has decided to standardize.

I'm not sure what the value is of defining C as standard C when loads of software has shipped for decades using this pattern.

And there you have it. Purists claim C or C++ can't do X because it's UBH in the standard. Pragmatists know that C and (especially) C++ aren't languages on their own, while "gcc" and "MSVC++" are complete (and very similar) languages.

And careful people keep in mind that only what's defined in the standard can be expected to be true across compiler updates.

Good clarification, you really only know what you're going to get on (language, compiler, compiler version, platform).

You also need to know the exact implementation of your compiler's optimizer, otherwise you don't know whether it's safe to change your program. Any change might give the optimizer enough information to unleash the nasal demons in previously cromulent code.

And it's stuff like this which is making me increasingly uncomfortable with using C++ as my main language. Currently it looks like Rust is the most suitable replacement but I'm not sure if it's there yet.

Sure, for UB.

But extensions like typeof are different from UB. They have documented behavior, it is just documented by GNU instead of the C standards committee.

Not even that. Compilers had and have bugs.

It's not a bug if it's UB. If the compiler behaves differently than the standard says it should you can file a bug.

The compiler vendor can treat UB however they like. One option would be for them to specify that it does something in particular...

(The C standard explicitly calls this out as a valid way of handling UB.)

...with the notable exception of MSVC.

I wouldn't call MSVC a C compiler until more recently.

A comparable implementation in Zig. Key detail is `@fieldParentPtr` [1], which has the same functionality as `container_of`.

    const std = @import("std");

    const wl_list = struct {
        prev: *wl_list,
        next: *wl_list,

        fn init(list: *wl_list) void {
            list.prev = list;
            list.next = list;

        fn insert(list: *wl_list, elem: *wl_list) void {
            elem.prev = list;
            elem.next = list.next;
            list.next = elem;
            elem.next.prev = elem;

    const Data = struct {
        data: i32,
        link: wl_list,

        fn init(data: i32) Data {
            return Data{
                .data = data,
                .link = undefined,

    pub fn main() void {
        var foo_list: wl_list = undefined;

        var e1 = Data.init(1);
        var e2 = Data.init(2);
        var e3 = Data.init(3);


        var entry = @fieldParentPtr(Data, "link", foo_list.next);
        while (&entry.link != &foo_list) : (entry = @fieldParentPtr(Data, "link", entry.link.next)) {
            std.debug.warn("{}\n", entry.data);

[1] https://ziglang.org/documentation/master/#fieldParentPtr

2 things concerning wl_list:

1) In C++ you can use struct inheritance to get the same functionality without macros or templates. ie: struct ThingList : wl_list { /* data elements */ }

2) Like in the C++ case above, the next and prev elements should probably be at the top if you plan on iterating a lot since the next/prev pointers will be in the first cacheline that you load when you dereference previous list elem. In this case, it's also worth allocating list nodes with posix_memalign.

OTOH, there might be reasons why you want the list buried deep in the struct, like if the other data element access times are more important.

You can do something similar in C - make struct wl_list the first member of struct data, and just cast the wl_list node pointer to a data pointer (offsetof is zero).

I think the most common reason for this generic-location offset-of style is so that the struct data can have multiple list nodes and be on multiple lists (without any extra allocation). This is pretty common in the kernel.

Re: #1 - Now try and place Thing on two lists, a map and a hash table or a combination of thereof.

No way to do this cleanly in C++. At best, it's going to be an iterator galore, at worst some boost-style frankencode.

Intrusive lists. A trick probably as old as C itself. Multiple libraries provide an equivalent in C++. The claim that you can only do that unsafely or need C "magic" is completely wrong. The safe implementations are not even slower, as the main gain is node cache locality.

FWIW, this magic is how the Linux kernel implements generic linked lists, hash tables, and red-black trees:


So this would be hashing the pointer? In which case it doesn't support keys, like strings, etc. I'm now wondering if this is the more common use case for hash tables, where you just need a reference to a unique object rather than a specific key.

No, that's not true. The insert function (hash, rbtree, etc.) takes a hash node as an argument. So, in the kernel implementation the caller is expected to hash.

EDIT: check out some examples to see for yourself: https://elixir.bootlin.com/linux/latest/ident/hlist_add_head

Java has IdentityHashMap for this use. I hesitate to claim it is more common.

Could be a catch that while this kind of constructs work OK for local lists, you have to be aware when passing the struct between applications or processes compiled with different packing but using the same struct definition with non-aligned members.

struct foo { uint8_t data[3]; struct wl_list link; };

For an application link member may be at offset 3 while for another it would be at offset 4.

That may be the reason link should be put as the first member, and a simple cast would let you walk the list.

Usually it does not matter as you keep items of the same type or know the type and exact compiler.

Unless you want to write them somewhere or put it as part of API and ABI. Which you probably shouldn't.

It's true that types has something to do with it, but the main issue is padding. While I can keep the same type (i.e. uint8_t), it's the padding following uint8_t var[3] that may vary according on packing options when compiling.


Spelling: "it's" should be "its" except when short for "it is" I believe. i.e.

>It’s a simple doubly linked list. Here’s it’s definition

The first one's right, the second should be its.

Though almost most native speakers seem to do this, so I guess it won't be 'wrong' for much longer...

FYI some people were taught to use an apostrophe to imply ownership. It used to be the "official" way, but no longer.

The gorilla guards it's bananas.

>It used to be the "official" way

Oh right, thank you, I didn't know the history. Short version: https://www.merriam-webster.com/words-at-play/the-tangled-hi...

Did you mean, some people now alive were taught that "it's" is right (in "it's definition"), or just ownership generally (Harry's). Well, it's better than using apostrophes for plural's, which seems very common too.

And if we are talking about several gorillas, they guard they're bananas.

It makes sense now, thanks.

As others have pointed out, this is also used in the Linux kernel. But unfortunately it shares the kernel error of using the same type for list head and member. That's not good separation, and on particular it means list_add() is ambiguous (the kernel one is member, head, which is backwards).

Hence I recommend kernel refugees start with CCAN's list: http://ccodearchive.net/info/list.html

why is this amazing? such macros have been around since the 80s. then we got various implementations of intrusive list in C++ and they became typesafe

Because most people haven’t seen it before? For people who program in languages in which this stuff isn’t possible, this pattern isn’t obvious. No need to dismiss it just because it’s obvious to you.

Yes, I suppose such use cases were the reason 'offsetof' was introduced. It's just that I wasn't taught any uses of macros other than trivial examples like `MIN` and `MAX`. That's why I found it interesting.

I remember using it in C in '79...

C is surprisingly complex when you dig into it - I have a little example I pull out every once in a while where you can write code that runs one way when a global typedef is defined with a certain name, and other way when the variable isn't defined (compiling just fine both ways).

No C is very very simple. That’s why it’s not easy :)

No but really, there isn’t anything to it. Just think about how a CPU sees memory and registers and it becomes intuitive. But that doesn’t make it easy to use because it doesn’t give you any guard rails to show you the one true way to accomplish a task.

I agree partially - overall C's promise was that it was a light wrapper over top of registers and memory, absolutely.

The issues with C are around some of the weirder bits of the parsing process, namely the context-sensitive parser and the "interesting" corner cases of how the C preprocessor works.


Here's a fun example - try it with and without the typedef line commented out:

  typedef int T1;

  int main() {
      const T1 (*T1);
      printf("%d\n", _Generic((T1), const int *: 1, const int (*)(int *): 2));

I think C has broken that promise since. It now targets an abstract machine, and a lot of behavior that would make sense for a light wrapper compiles to something counter-intuitive in an optimizing compiler.

In a different time, that something was left undefined by the standards might have meant that the result might be some artefact of the target machine architecture, but with aggressively optimizing C compilers the undefined behavior is more likely to be an artefact of the optimizer.

A thin light wrapper wouldn't compile this:

    static int doub(i) { return i + i; }
    int main(void)
     int i;
     int sum;
     sum = 0;
     i = 5;
     while (i--)
      sum += doub(i);
     return sum;
into this:

    movl $20, %eax
I'm not programming memory or registers, I'm instructing an optimizer.

What compiler does that? None of the ones I tried on god bolt seem to (after fixing the missing type on the function parameter)? And what's the undefined behaviour here?

I genuinely forgot to specify the type of the function argument, but that's not considered an error in C which will default to int when the type is omitted. If the compiler you're using objects to that it's probably for the better, but it's still legal C.

There is no undefined behavior in the example. It's meant to further illustrate my point that C is not a light wrapper around memory and registers. In the example, it's just a deceptively imperative looking way of describing the return value to the optimizer.

The compiler in this case was gcc with -O3.

For an example of behavior that is straight forward in the machine but is undefined in C, try signed integer overflow. Intuitively on x64 it should behave just like ADD, possibly adjust the carry and overflow flags and wrap around. In C, the behavior is entirely undefined. Some operation that results in UB is low hanging fruit for the optimizer. If it can deduce that the operation will result in UB, in this case signed integer overflow, it might omit the operation altogether.

You're probably not turning on optimization. For example, gcc 8.3 with -O3 gives the result that the OP posted.

There's no undefined behavior in that example, as far as I can see. The point is that the compiler optimizes away all of the actual computation, since it's able to determine at compile time that the result will always be 0x20.

Also the rules around undefined behaviour, type aliasing and so on are definitely not simple.

Seems like a "most vexing parse", but for C!

Is it expected to compile without the typedef?

Amazingly, it does compile. I believe the declaration that looks as though it shouldn't (because T1 isn't defined) compiles as a function pointer declaration, where the first T1 is the name of the variable defined and the second is, in syntactic terms, the name of the function. With the typedef commented out, you can delete the second T1 on that line.

Note that the type of a variable defaults to int in C, so you can write 'const a' instead of 'const int a'. The return type of a function also defaults to int.

Clang accepts it, but GCC does not. I suspect there's a bug here.

It fails because gcc appears not to support _Generic by default, but that's not the important part of the code. If you comment out the printf, or figure out which compiler options to set, gcc will compile it.

C is far from simple, a simple language would be something like Scheme, Forth, or even SML. It so happens that the C standard (which is nearly 1k pages long nowadays) hides a lot of pitfalls, making the job of optimisers, implementators, and regular programmers miserable.

Generating performant code from Scheme, Forth or SML is far more difficult compared to the C family of languages.

Are you sure about that? Forth is famous for being extremely fast. As for SML, assuming a variant with manual memory management (which I do not consider to add complexity in C), I can't think of any reason on why it would be more difficult to optimise.

> Forth is famous for being extremely fast.

Have you seen that claim backed up by any evidence produced within the last 25 years?

> I can't think of any reason on why it would be more difficult to optimise.


The main reason a language like SML or Haskell is difficult to optimize is because it is incredibly difficult to canonicalize for the purpose of making peephole optimizations and abstract interpretation effective. https://sunfishcode.github.io/blog/2018/10/22/Canonicalizati...

C is extremely simple. In fact is one of the few languages that you could manage to understand completely in the sense of knowing exactly what it does.

C macros on the other hand, should die. I use C a lot but never ever use c macros, they are evil. Extremely hard to debug, confusing and very weak.

When we want macros, we use Lisp, that have proper macros.

I'm not sure of all the valid use cases for macros, but I think they are useful when you need to do something simple. For example, a debug flag to run some code like verbose logging but you don't want to include that in the release binary. C# has preprocessor directives but I wish that java had something similar. I guess this is different than complex micros though.

An alternative could be to guarantee that the first two elements of the data structure are next and prev pointers. This way the implementation can just assume the first 8 or 16 bytes (depending on sizeof(void*)) are the next/prev addresses. Then the same set of functions can work with everything that looks like a linked list. Another option is to have a "struct list" that contains the information needed, and the API allows, when you initialize the list, to say how much space you want starting at "node->data". This way the node is always over-allocated by the library, and when you want to manipulate the list items you cast node->data to your privata data structure.

If wl_list was required to be the first member of the struct, the wl_container_of() FLM becomes trivial. This implementation comes from the kernel though, where a single item might be in multiple lists.

This approach is less type-safe.


> You can’t do that in any other language.

Except languages with parametric polymorphism, such as ML and Haskell.

Not quite right, in other languages you (a) can't, and (b) don't have to (thankfully).

The compiler doing my offsetting for me using type information is much safer and more reliable.

Or C++ and Java.

This is C's version a Ruby-style Mixin. By adding a List link to your datatype, you can use List operations on your data.


Also, is this macro magic, or simply memory layout magic? The macros look like appreviations for readability, not magic required to make the structure work.

An alternative implementation is to put the link structure at the beginning of the data structure. Then converting between one and the other is just a cast between the two types. This is a fairly standard technique in C to get some version of inheritance of structs, and how I have implement LLs in the past.

I don't see what this extra complexity of sizeof() offset calculations gains you.

More importantly offsetof is an implementation-specific macro. Whatever it does cannot be generalized from its definition for one or many compilers to a language requirement. The use of the offsetof macro in a program may cause undefined behavior when compiled with GCC or Clang. GCC instead provides __builtin_offsetof. If multiple lists are desired, then an array of them should be declared at the head of the struct and the index be fixed to the type of list desired to contain a reference to the struct. Additionally, as rustyrussell wrote in another comment the list head and member structs should be different in order to optimize the list struct as CCAN's list is implemented, but it too uses offsetof, which leads to the same issue of undefined behavior.




This is not true. offsetof is defined in the C standard. See http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf, Section 7.17, point 3 on page 254.

I prefer not to use it due to the loss of type information and the potential for screwups, but you can write ANSI portable C using it.

The Offsetof macro is vaguely "defined" in the C standard, but the implementation of the macro is not explicit. So, I do not understand why the original comment is not true?

The standard I linked describes both the type signature, and also what the macro should do. There are various ways that that can be implemented by the compiler, but assuming that you are using a compiler that meets the ANSI C standard, then the macro will behave as the standard says it should.

Using __builtin_offsetof is undefined, because it is not standardised. The fact that __builtin_offsetof is used to implement offsetof by GCC does not affect the fact that offsetof itself is standardised (in the same way that a given internal heap operation is not standardised, but might be used by GCC to implement malloc which is).

It allows the object to be part of multiple lists.

That's an excellent point, thanks.

This implementation of doubly linked lists is not even exclusive to Unix-based operating systems, but also known as struct LIST_ENTRY under Windows/ReactOS and heavily used for kernel development there. They also provide a singly linked list version as struct SINGLE_LIST_ENTRY.

Check e.g. https://docs.microsoft.com/en-us/windows/desktop/api/ntdef/n... and https://git.reactos.org/?p=reactos.git&a=search&h=HEAD&st=gr...

Macro-based, non-intrusive data structures: https://gitlab.com/oliver117/ucds/tree/master/src

These sorts of macro tricks are cute, but they quickly become a pain when using an FFI.

Oh but this is only the beginning. With boost MPL, you can do full metaprogramming at preprocessing time (I'm not 100% sure but I think the C++ preprocessor is compatible with that of C? So that would make MPL applicable to C code as well.). You can generate code by iterating over lists of preprocessor 'variables' and do other crazy things that seem impossible the first few times you think about it. It's great, albeit a little bit confusing the next time you come back to it, because it makes you think more about what level you're metaprogramming at.

I think you mean the Boost Preprocessor Library. MPL is template based and quite a bit dated by today's standards.

C++ Preprocessor is pretty much the same as C (they might go formally out of sync from o e standard release to another but in practice compilers implement the same features as far as I know).

The C preprocessor wouldn't balk at "char template;" but that of C++ ?

wl_list seems like an incomplete re-implementation of the LIST_ macros in <sys/queue.h>.



It isn’t, at least directly. I am actually really happy to see this alternative, as queue.h uses anonymous structs, which are C11 and not well supported.

I've used uthash in the past for a similar hash implementation. It certainly has its wrinkles, i.e. you may need to include several if you want to include an object in multiple hashes... but it was fun to use and straightforward! https://troydhanson.github.io/uthash/

From Objective-C, you can make switch statements support strings with 6 lines of macro:

  #define TTD_CASE(str)                       if ([__s__ isEqualToString:(str)])
  #define TTD_SWITCH(s)                       for (NSString *__s__ = (s); ; )
  #define TTD_DEFAULT

Somebody, please just fix the language. Extend all the free compilers or, better yet, run the changes through the standards committee. People have been dying for this feature for over half a century.

While you're at it, just make the feature generic. Handle arbitrary arrays, structs, unions, and floating-point types.

The ... feature supported by gcc is also a huge usability improvement. Don't allow weird sorting order issues with strings. Don't allow it for structs and unions.

None of those types can be compared directly in C, so that's not making the feature "generic", it's making it magically super-powered. Associating comparison intelligence with arbitrary types is not very obvious how to do in C, but I guess the new super-switch could accept a function pointer in addition to the data, or something. Like:

    const char * s const = "bar";

    switch (s, strcmp)
    case "foo":
      printf("I got foo\n");
    case "bar":
      printf("nopes, I got bar\n");
      printf("neither foo nor bar :/\n");
Which would be fine/acceptable, but not exactly how C tends to roll.

No, that entirely loses the expected performance and clarity of a switch. Remember that "arbitrary types" in C are not so exotic. A simple byte-by-byte comparison will do, skipping over any struct padding. Compilers today will generate code that uses various strategies (tree, hash, etc.) to do the comparisons, and it would be awful to impede that via an arbitrary user-specified comparison function. Doing a linear scan of every case, O(n*m) for the number of cases and the lengths of the strings, defeats the whole point of using a switch. One can already do that with cascading "if".

For strings, the simplest syntax is to have the new behavior triggered by passing a pointer to any type which can be evaluated as an integer. That last requirement makes scanning for a terminator work in an obvious way. This does seem to lose the possibility of comparing arrays that are not NUL terminated, but that isn't the use case that people are dying for.

Since the vast majority of the desire to switch on strings is involving ASCII keywords, it would be reasonable to prohibit gcc-style ranges (like "case 1 ... 4:") for non-ASCII strings. This means that the strings can be treated like giant integers stored in big-endian form. (after the NUL, treat it as more NUL going on forever) Clearly it is well-understood how to switch on integer values, and thus we can switch on strings.

Right now the ways to switch on strings are terrible. If they are short, they can be memcpy into long long. With gcc one can use the computed goto extension, plucking values out of a table via bsearch or via a perfect hash function generated by something like gperf. It's all really miserable, leading low-performance programmers to just use cascading "if".

Why not just use an enum for the switch, and if strings need to reflect the enum, use an xmacro to expand the enum into an array of strings?


That doesn't seem to have anything to do with the problem.

The problem: You have a string. (pointer to char with NUL termination) Depending on what that string is, you want to run different code. You want to do this with high performance and a minimum of fuss. No, it isn't an option to say "but what if I used an enum instead?". You have a string. A string is what you have.

Currently in C, you must choose between high performance and a minimum of fuss. Pick one.

The high-performance solution is to use bsearch or a perfect hash, mapping the string to an index of some sort. With purely standard C you would then use a normal switch. Using gcc extensions, you could get slightly better performance with a computed goto. Writing and maintaining this code is a pain, so most people don't bother.

The minimum-fuss solution is a whole bunch of "else if ... else if ... else if ..." that tries each possibility in turn. The performance is terrible, but good enough for stuff like command line option parsing.

This is quite cool, that being said, you would need to add a break at least at the end to that, unlike a regular switch. Plus it would not work in the case that you want multiple strings to lead to the same code

I avoid anything magic in C.

Just because you can, doesn't mean you should. Unless you are hacking the linux kernel, use a type safe language folks.

C macro magic? Run.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact