Hacker News new | past | comments | ask | show | jobs | submit login
Modular C: Reusable and Maintainable Code Using the C Language (1994) [pdf] (metamodular.com)
92 points by goranmoomin on Dec 19, 2019 | hide | past | favorite | 30 comments



"System programming differs from application programming in that system programming emphasizes performance over reusable and maintainable code"

Maybe for sufficiently small systems. But on a large scale product (reusable and maintainable code == performance). Am I missing something. Seems like something one or our EEs would say to me to excuse their poor code quality


no, that's not true.. in C, extracting struct elements into variables, holding temporary values for longer, explicitly adding intermediate steps in a dense reference of references, would be examples of 'optional' contents that help readability and lessen potential ambiguity to a human coder. Each time data moves, clock cycles are used, at least.

For decades, compilers have looked for this sort of code and condensed it as an optimization, so the way the code is written is not always the way it executes at runtime, but the longer, verbose steps might add cycles, and therefore would be avoided when coding for 'systems performance'.


Optimizing compilers are good at optimizing obvious code, so this usually isn’t a trade off. Many of the clever hacks you can find in books from the 80s are irrelevant now because the compiler can figure out what your naive code is doing and emit something optimized.

Modern processors don’t even operate sequentially and will execute multiple lines in parallel when there are no data dependencies.

For most code, the performance killer is when you try to be too clever or have too many pointer indirections. The compiler has a harder time with this, and is also precluded from applying other optimizations like auto vectorization, because it cannot figure out if it would change the meaning of your code.

For true performance, you need to enter the land of intrinsics, manual vectorization, and cache-aware algorithms. Domains which very few engineers are qualified to work on.

So just keep it simple and trust the compiler. This also applies to algorithms, complicated ones with lots of branching will often perform far far worse in practice than the naive one, you can’t trust the big-O alone. So profile when in doubt.


> So just keep it simple and trust the compiler. This also applies to algorithms, complicated ones with lots of branching will often perform far far worse in practice than the naive one, you can’t trust the big-O alone. So profile when in doubt.

I've found that optimizing for cache is usually the biggest gain when trying to be clever with algorithms. Eliminating branches and being naive in the algorithm itself is usually a better idea than trying to be clever with special cases.

In my experience.


> Am I missing something Copyright 1994. C89 only, this predates C99... The year ought to be in the title.


This is old-fashioned (uses C89 instead of C99) and some of its "idioms" are questionable, like this one:

    list *p;
    list l = (list) malloc(sizeof(struct list));
    l -> val = val;
    for(p = &the_list; *p && (*p) -> val < val; p = &((*p) -> next))
      ;
    l -> next = *p; *p = l;
Here p is a doubly indirect pointer and I would personally think this is an unwarranted cleverness.

Furthermore this is the year 2019. Why would you still manually traverse a linked list? Oh right because C doesn't have parametric polymorphism (genericity) so you can't abstract out the traversal in a standard library.

Consider the C++ equivalent (although I chose to use a doubly linked list for better reusability):

    list.insert(std::lower_bound(list.begin(), list.end(), val), val);
See how the search for the location to insert is done by the library, and the insertion is also done by the library. This, to me, is truly reusable and maintainable code.


It's sort of funny that linus torvalds actually talked about why that sort of iteration is better: https://youtu.be/o8NPllzkFhE?t=861

I think he puts it pretty well (although he them also goes on to say dumb little snippets like that really don't matter in the big picture). Essentially, by making a doubly indirect pointer you get to treat the first element exactly the same way you are going to treat every other element.


However, Linus would object to hiding the true type of list (it's actually a pointer) --- as do I, since I was a little confused by that fragment at first, wondering what exactly was doubly indirected about it. It looks very different from the usual implementation of a "virtual head", which uses a single-indirect pointer only.


Exactly. And using such indirection often allows you to write a single search function and then use it to simply find and return the element, or find the spot and do something else: insert a new one, or replace or delete what you've found.


FWIW personally i prefer the C code. It is explicit at what it does, whereas the C++ one is harder to follow exactly what it does - and also very ugly (i never liked C++'s iterators).

The double indirect pointer is hairy but at least everything you need to know to "parse" the code is right there.


It depends on what you want. If you want to be able to see everything, then the C++ code is better. If you want to get the trees out of the way so that you can see the forest, the C++ code is better.

And, for most of us, if you want to just write it and have it work, the C++ code is better. Sure, I might have to find an example of the iterator code in order to get the syntax right, but it's a lot easier to mess up the double indirect pointer code. Worse, when I mess up the iterator code, it likely doesn't compile. If I mess up the double indirect pointer code, it it likely to silently corrupt memory somewhere.


> The double indirect pointer is hairy but at least everything you need to know to "parse" the code is right there.

but why do you want to spend time parsing code that

- does textbook-level things ?

- has a thousand different places where a typo can end up

- needs to be rewritten for every kind of list

those are the red flags. Do you also go read your graphics driver's code when you push pixels to the screen ?

> and also very ugly (i never liked C++'s iterators).

ignoring the jump from what you like to a global assertion (the C version makes my eyes bleed), in C++ 20 you can also just do

    list.insert(std::lower_bound(list, val), val);


> Why would you still manually traverse a linked list? Oh right because C doesn't have parametric polymorphism (genericity) so you can't abstract out the traversal in a standard library.

It's pretty easy to make something like this work:

    list_insert(&itemList, struct Item, &item, &compare_item);
Which would be just a thin macro, for example defined like this

    #define list_insert(list, type, item, compare) \
        __list_insert((list), \
            sizeof (type), __alignof(type), \
            (item), (compare))
You could further "improve" to even allow

    DEFINE_LIST_FUNCS(itemList, struct ItemList, compare_items);

    itemList_insert(&itemList, &item);
It's not the end of the world. The reason C programmers usually don't make or use such "libraries" is that these are pretty easy to craft on the spot, and there are just a billion way of making it work. Allocation strategy and intrusively vs extrusively linked are two example considerations. I for one prefer to think about the best way to organize my code and data, so I'm not interested in such a library.


A a pointer to a pointer is a perfectly valid C idiom. I'd say typedef'ing a pointer would raise more eyebrows :)

E.g. I have a singly-linked list of

    struct x { int y; struct x* next };
I want to locate an element here and then maybe remove it (1), maybe insert a new one before (2) or after (3), maybe replace (4), or maybe just return what I've found (5). How do I write this base locator function? If I do it naively and return just a pointer to struct x I can only do (3) and (5); but if I return a pointer to a pointer, I can do all five.

A doubly-linked list is simpler in this regard because the element has pointers to both neighbors and thus a mere pointer to struct x would suffice, but it's also one pointer more expensive. By the way, will your C++ code work with singly-linked lists?

Also, look: your C++ example calls four methods. The C code calls only one (malloc), all the rest is core language. It's 4:1 ratio for extra complexity.


> By the way, will your C++ code work with singly-linked lists?

Yes, but you'll use merge instead:

    list.merge(std::forward_list<T>{val});
I chose to use std::lower_bound for binary search. If you are content with linear search like in the C example, you can even write a single function that works for both doubly linked lists and singly linked lists.

> Also, look: your C++ example calls four methods. The C code calls only one (malloc), all the rest is core language. It's 4:1 ratio for extra complexity.

This is what I call unnecessary manual inlining and reinventing the wheel for no good reason.


Binary search on a list??


Forgot to check for null return from malloc().


Excuse me, but this sort of complaining about toy code is pretty pointless, at least when it comes to bashing the language. I don't use malloc directly in real code...

You'd never object "forgot to catch std::bad_alloc from new'ing", would you?


> Furthermore this is the year 2019. Why would you still manually traverse a linked list? Oh right because C doesn't have parametric polymorphism (genericity) so you can't abstract out the traversal in a standard library.

You actually can. But it takes some clever allocation tricks to precede the object allocated by the list struct. There usually is 'a way' to do it in C but it is hardly ever an elegant one.


Holy cow. Defining list to be an alias for a struct list pointer is itself bonkers. Section 4.9: "Client code may consider that the pointer IS the object." Well now.


You're comparing an implementation to an interface. Do you really think that's valid?


My point is, C doesn't give you enough tools to build an elegant, easy-to-use interface, so people reimplement things over and over again. So in typical application code you see people using the "interface" in C++ but manually inlining the "implementation" in C. And what's why it is a fair comparison.


Generic collections in C implemented with macros are so heinous, I think this comparison may actually be fairer to C. I say that as an abuser of C macros, and someone who genuinely likes C.


Not a huge fan of those kind of macros myself, because they still leave too much of the implementation exposed. The other alternative is pointer-arithmetic implementations, which do hide the implementation better but often come with limitations (e.g. list members have to be first and you can't be on two lists at once). But either way, that's C for you. The language has its limitations, and there's only so much you can do. Comparing that to an API in a language with better support for such things seems a bit unfair and snobbish. It's like comparing a matrix-math API in a language where it's native to the guaranteed-horrific internals of a C++ library to do the same thing.


For double-linked list, a better choice is to use intrusive data structures. Cleaner and more flexible than both macros and void pointers. However, only double-linked list and binary search trees, to a lesser extend, can be implemented with intrusive data structures.


I agree that intrusive data structures are often a better choice and do not have the issues GP mentions (by using something like a container_of() macro, so the data structure member does not need to be the first member, and the object can be in multiple data structures at once). But of course many other "pointer-based" data structures such as singly-linked lists, separate chaining hash tables, etc can also be implemented with them.


The table of contents is the last three pages. (Is that normal within some community?)


Looks like this is written in TexInfo. Is there a HTML version or the texi source available?


[flagged]


> the proper modern programming language

I understand wanting to advocate for your favorite programming language but this kind of elitist attitude doesn't do the cause any favors, it's more likely to turn people away than anything.

Rust is not "the proper programming language". It's just one of many excellent choices.


You're absolutely right. These people have seriously marred my view of the language.




Applications are open for YC Summer 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: