Hacker News new | past | comments | ask | show | jobs | submit login
Some obscure C features (mort.coffee)
340 points by mort96 on Jan 25, 2018 | hide | past | favorite | 156 comments

The author tried using __COUNTER__ and a static array of function pointers, and narrowly missed a workable idea—a static linked list! Forward declarations pun definitions, and can be used roughly as follows:

  struct Element {
    int value;
    struct Element *next;

  #define ELEMENT(value) _ELEMENT(value, __COUNTER__)
  #define _ELEMENT(value, n) \
    struct Element CAT(e, SUCC(n)); \
    struct Element CAT(e, n) = { (value), &CAT(e, SUCC(n)) }
You can then walk from e0 through the linked list, but because of how the initialization works, you're done when the current element has `next == 0`.

  int main(void)
    struct Element *e = &e0;
    while (e->next) {
      printf("%d\n", e->value);
      e = e->next;
Here's a repl.it: https://repl.it/repls/DetailedYoungAsianconstablebutterfly

The Linux Kernel also used another trick - placing data in a special named "section."

The linker's job is to combine all the separate pieces in a "section" (special linker term) and write out one section.

Using a special named section lets you create an array using macros, though using it for function pointers is probably safe and using it for other data types might be pretty fragile. http://www.compsoc.man.ac.uk/~moz/kernelnewbies/documents/in...

I was impressed when I discovered our codebase was using this trick to implement new CLI commands. Each C file would implement the functions and helptext etc for implementing a command, but I couldn't find where each command was being registered in the central parser. Because they had to be registered in some central list, right?...

Turned out the macros I overlooked at the end of each file placed the struct defining the command in a special linker section, and the parser init would go and load all the commands from that section.

Interesting to know that this trick already existed in the Linux kernel.

I'm (perhaps perversely) disappointed we aren't using more of the linker's power in many C codebases.

> I couldn't find where each command was being registered in the central parser.

This is the reason why I do not like such tricks. This obviously hurts maintainability and probably portability. Is the tradeoff really worth it? Plain stupid registration is just a single line. KISS.

"Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" –Brian Kernighan

Yeah, the linker offers a lot of magic waiting to be discovered.

A primary problem with linker magic is that it is usually very non-portable. Hacks you do with macros may be ugly but at least they work everywhere, unless you used a non-standard behavior.

Linker tricks tend to be platform specific, and even more so that other tricks (eg POSIX platforms share a lot of common behavior, but still tend to have quite different linkers).

I had a minor epiphany a few months ago where I realized the linker can be seen as sort of doing dependency injection for a bunch of object files. In theory, you could have a program that depends on some set of symbols, two object files that each include the symbols, and a linker script that at load time checks some configuration parameter before determining which subsystem to link with. You could even get really fancy and dynamically generate another object file with those symbols that acts as a proxy. And from there you could basically implement Spring for C, which obviously is what everyone wants to do ;)

In theory? LD_PRELOAD is specifically designed for it. For examples, see https://rafalcieslak.wordpress.com/2013/04/02/dynamic-linker...

This is roughly how the Windows CE driver architecture works, with "device manager" as the runtime proxy in the middle: https://msdn.microsoft.com/en-us/library/jj659831.aspx

That is how I used to debug C and C++ memory leaks on UNIX platforms, during the late 90's.

I overrided malloc() and free() with my own memory tracking functions.

On Windows side it wasn't needed, as VS already provided some APIs for it.

That's also how C++ static constructor/destructor calls work --- a section contains entirely function pointers, which the init/uninit code in the RTL calls before and after main().


Another interesting thing is that it relies on the linker ordering names in alphabetical order.

Yeah, the Macintosh Programmer's Workshop (MPW) did pretty much the same thing (back in 1987 or so). I forget the ordering guarantee, it might have been lexicographic by filename. Global initialization in C++ has been a horrible mess, portability-wise, for literally decades.

It is a mess in any language that allows it.

The initialization order is implementation dependent in module/package initialization blocks.

The only guarantee is that the modules being imported run first than the one being executed.

I believe the JVM specification describes precisely when class static initializers run, so that should not be implementation dependent.

Of course it still won't be deterministic if you load classes in parallel in multiple threads.

It's true the JVM specifies this, but it doesn't mean it's always easy or nice to figure out what's going on.

For example

  class A { static final int x = B.y; static final int y = 1; }
  class B { static final int x = A.y; static final int y = 2; }
will not produce any static initializers and will do what you probably wanted because everything can be turned into a class constant and resolved at compile time. However this

  class A { static final Integer x = B.y; static final Integer y = 1; }
  class B { static final Integer x = A.y; static final Integer y = 2; }
will not work the way you might want because there is no order in which the static initializers can be run that will do everything in the correct order.

No it's not. OCaml gets it right.

By forcing developers to remember which order they should be linked?

I wish so much that linker arrays were part of the c and c++ standards! They are much more flexible than pre-main static constructors, and could be used to implement them at the user level in a less magical way!

You would need a linker script for that. Not very accessible for a "normal" project.

Note that this does not work if you make use of __COUNTER__ anywhere else in the file (directly or indirectly).

That's true, and in the event that you're actually using __COUNTER__ for something you could build something similar out of #include with a header that increments an internal number.

  // COUNTER.h
  #if !defined(COUNTER)
  #define COUNTER 0
  #elif COUNTER == 0
  #undef COUNTER
  #define COUNTER 1
  #elif COUNTER == 1
  #undef COUNTER
  #define COUNTER 2
  #end if
And then change each invocation of ELEMENT to have a leading #include "COUNTER".

  #include "COUNTER.h"

  #include "COUNTER.h"
Note that if you take this approach then (as far as I know) you're actually writing standard C!

That's really neat, I hadn't considered that. I will definitely try it out, and might end up using it for the library.

I think the other solutions proposed are better, but:

> Sadly, you can't set the value of an index in an array like that in the top level in C, only inside functions.

This is not exactly true. An assignment is a value, you can do e.g.

int a = b[x] = 0;

(which implies that you can do your array-index assignments at the global level by creating nonsensical/unused variables and initializing them).

There's a simple solution you're missing here:


This is what I came to say. Here is a working example that builds an array of function pointers at program initialization time.


This also works across multiple source files, unlike __COUNTER__.

Yeah, it's a weird one! I discovered the idea while building my own toy unit test library, but I never got around to finishing it.

The biggest downside is that (as far as I know) it's impossible to get this approach to coordinate across translation units, so you're forced to define a main in each test file (via the library's header file?). But I doubt that matters too much in practice.

My not-so-humble-opinion is that if you find yourself doing metaprogramming with the C preprocessor, it's time to upgrade to a more powerful language.

Sometimes you just don't have other options than C.

Like kernel drivers, embedded (microcontroller) development.

Or perhaps you just need to develop a small loadable library without any runtime (no malloc, etc.). Or perhaps writing Python, Lua etc. C-module for better performance.

You just need to treat C with proper respect, not be too arrogant about your skills.

I hope over time we can move more and more to Rust or some other language which gives as many errors at compile time as possible and helps with not shooting at our feet.

Shouldn't have too many C-macros, of course. Some macro messes I've seen can be nearly impossible to debug...

> You just need to treat C with proper respect, not be too arrogant about your skills.

We've been hearing "C isn't the problem, bad programmers are the problem" for 40 years. This has continued to perpetuate the problems with the language.

I mean, I certainly agree that there are no alternatives to C now for certain domains, but there are real problems with it that aren't solved by just "treating it with proper respect".

Having programmed C for soon 30 years, I absolutely agree with you that C is the problem.

But often you just need to get the job done, and have limited options available.

Perhaps in 50-100 years, people looking back at software from this era won't be able to understand how we managed to make it work at all.

I haven't met or know anyone mastering C/C++ (especially C++), and doubt I'll ever hear about one.

I know plenty of C masters, and a few C++ masters. If you want to meet C++ masters, go to cppcon or get a job writing systems software at a respectable C++ shop.

I’d say I had a working knowledge of C++ after about a year ramping up on a codebase based on a mature C++ framework.

That was after years of writing reliable, performant C code. My C++ is now significantly better than the C I can produce, on both fronts, and at the same time.

I’m also much more productive as a side effect of that.

I doubt very seriously that any of those C masters can recite by heart the 200 plus use cases of UB recognized by ANSI C.

> I haven't met or know anyone mastering C/C++ (especially C++), and doubt I'll ever hear about one.

C can definitively be mastered. C at it's core is pragmatic minimalism. The basic ideas are very simple. If you want to be a language lawyer and know the standard completely (including all historical accidents), it can get complex, but still manageable. But anyways, you shouldn't be a language lawyer. That's not C. Make maintainable programs instead, stay in the middle of the road.

C gets out of the programmer's way. It's not about mastering C, but about mastering programming. And while I agree there's much bad C software out there: I haven't seen one maintainable non-bloated Java project. Here is one master of C that I'm sure you know: Linus Torvalds. Of course there are many, many more. Just remember it's not about C, but programming machines in general.

> I haven't seen one maintainable non-bloated Java project

I also haven't seen one large-scale network-facing C project written using reasonable development practices (i.e. not absurdly expensive formal verification) that hasn't had some sort of terrible remote code execution vulnerability. By contrast, Java programs tend to be far less vulnerable to this kind of problem. (The closest thing is probably bugs with attacker controlled java.io.Serializable, which while severe are much less frequent than RCE in C programs.)

> Here is one master of C that I'm sure you know: Linus Torvalds.

I agree that Linus is a master of C. Linux is also an excellent example of the above problems with C.

> I also haven't seen one large-scale network-facing C project written using reasonable development practices (i.e. not absurdly expensive formal verification) that hasn't had some sort of terrible remote code execution vulnerability.

This has gotten a lot better with static code analysis (clang-analyzer, Coverity, ...) and whitebox-fuzzing (libfuzzer, afl, ...). Both simply point you to the bugs and don't require a formal spec.

Together with Valgrind, these tools have revolutionized C programming for me.

These tools are important for legacy code or code that cannot be written in any language other than C. They do not bring C code anywhere near to code written in other languages in terms of security and correctness.

Plus those tools are not available in all C compilers, specially in the embedded world.

I believe BIND9 has never had a remote code execution vulnerability - plenty of denial of service vulnerabilities, though, because it kills itself when it detects memory corruption.

I wouldn't say "mastering C" should imply "never have any security issues". And many vulnerabilities are conceptual and don't have anything to do with the use of C.

Use your tools wisely. Architect programs for clear data flow. Validate inputs as soon as possible. Don't do (de)serialization by hand.

By all means use a managed "memory safe" language if you have many data accesses that are not behind a validation gateway. But for large amounts of data or complex software, that's still not an option. C is still the only language in which I can write software of advanced complexity - because it does not get in my way.

The entire industry has been successfully writing programs that process "large amounts of data" and "complex software" in languages other than C for decades.

Counter-examples: Kernels, Game Engines, Software with lots of small "objects". I once had to write a SAT solver in Java (dozens of millions of clauses) and it was a disaster which I could only rescue by converting to highly unidiomatic Java code. It would have been much more straightforward in C.

> Kernels

Mesa/Cedar at Xerox PARC, Topaz at DEC/Olivetti, Oberon at ETHZ, JavaOS at Sun, Singularity/Midori at Microsoft Research

> Game Engines

Unity is slowly rewriting parts of their C++ pipeline with C#, after their IL2CPP compiler got mature.

Then there are game engines like Xenko being done.

Also being with one foot into games since the 80's, I remember the transitions "Pascal/C are too slow, we stick Assembly", "C++ is too slow, we stick with C", now we are in "JavaScript/C#/Java are too slow, we stick with C++" phase.

> C can definitively be mastered.

If that's true, how come there are almost no widely used network facing C programs that didn't have plenty of security vulnerabilities?

(Anything written by Daniel J. Bernstein doesn't count, because there's no way he's a mere mortal. :))

OpenBSD has done a great job, especially if you look at what they achieved, how fast they got there, and what resources were at their disposal.

Qmails direct competitor Postfix? Postfix has more, but Qmail has one with a higher score: https://www.cvedetails.com/vulnerability-list/vendor_id-8450... vs https://www.cvedetails.com/vulnerability-list/vendor_id-86/p...

Postfix is also much more widely used than qmail.

Even djb had flaws in his C code despite being a semi-god. It took a $10k bounty and several years to be found but it was there !

Which language has? Keep in mind market share, so don't say Go.

for, while, do...while, recursion, goto: is not a sign of "pragmatic minimalism" unless one choses to define it tautologically as "what C has".

"recursion" is not a concept in C. Just function calls. "for" and "while" are very simple concepts and basically the same. They are very simple and well known idioms and not C specific. Most languages have both - not a big deal. I use "for" mostly for iterating over arrays, and I have otherwise mostly while loops and while(1) loops with manual breaking.

Goto, you just can't do without it.

Also not saying that in some cases even C is too much abstraction. But in general, its memory abstractions and its function call abstraction have proved to be very effective helping humans develop in a modular fashion.

> Goto, you just can't do without it.

Strange, the last time I used it I was still doing Basic in the early 90's, or when I do some Assembly programming.

So yeah, we can do without it, as long as the languages are powerful enough.

I do find forward gotos useful in C. For error handling and breaking out of nested scopes. But only when goto is simpler and easier to understand than the alternative.

Not saying there's no use for backward gotos, just don't remember ever needing them. Maybe they could be useful for some C code generation cases or perhaps for retrying some operation?

What really annoys me are C++ exceptions used for flow control. Surprise gotos. I loved exceptions long ago at first, but over time the feeling has shifted more and more towards hate. Exceptions make code bases so much harder to understand. Especially when combined with inheritance.

> "recursion" is not a concept in C. Just function calls.

There are indeed C compilers that don't support recursion (or function re-entrancy). What does the standard say about stacks?

I'm not a standards guy, so I might be wrong, but this is the only reference to recursive calls that I could find from the C11 draft [1], Section item 11:

    Recursive function calls shall be permitted, both directly and indirectly through any chain of other functions.
Stacks, I don't know. I only care about the concept (and semantics) of function calls. The call stack is an implementation detail. I wouldn't be surprised if the concept of a call stack was not part of the standard.

[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

Indeed, it's not. In the case of what we'd usually call stack-allocated variables, the C standard just uses the phrase "automatic storage duration", and defines their lifetime without explicitly referencing a stack:

> For such an object that does not have a variable length array type, its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time. The initial value of the object is indeterminate.

Pragmatic == close enough to all of the popular and contemporary assembler flow control constructs.

> C at it's core is pragmatic minimalism.

Pragmatic minimalism as it was for the hardware at the beginning of the 1970s, to be exact.

The C machine model is still the only model I can easily program and not make a mess. And without being a hardware architecture expert: Has so much changed on the assembly level? And what other languages are truer to today's machines?

None of them are, not even Assembly, unless the CPU executes the opcodes directly in hardware without micro-code help.

The C machine model says nothing about vector instructions, instruction re-ordering, NUMA, GPGPU, IO ports, Harvard architecture, multi-core.

As of C11, the C machine model does (finally) include threads and memory ordering, so at least you can say it says something about multi-core…

Modern C compilers do enormous job of optimizing C code, despite its shortcomings like the lack of distinction between pointers and arrays. You don't need to worry too much about that, maybe you'll need to sprinkle vectorization intrinsics or assembler code if the compiler isn't smart enough, or to add a couple of memory barriers.

Would mostly sign that, but not without inspecting security or performance critical code in disassembler.

Modern compilers are still rather bad at vectorization. What's worse, those optimizations are often unstable. A small change might make code several times slower. Performance regression tests and when that fails, vector intrinsics are a must.

I'm curious about what other languages could be used in place. Maybe ADA ? it's both generic oriented and low level.

A friend of mine who is working in HPC research (high-performance computing) generates CUDA kernels using higher-kinded templates. He can instantiate you a

  template<template<class>, template<class, template<uint>>, class>
before breakfast.

We've been hearing "how long will the problems of C persist" mere persist before people realize there might be reasons it's the most successful programming language yet invented?

There are definitely reasons behind C's success. Most important was that it was the language of Unix, which spread like wildfire in the '70s and '80s because it was free.

There are equally important reasons why C's usage has been in steady decline from the '90s to today.

> Most important was that it was the language of Unix

More important was it was the only language that really worked on DOS, which ruled the computer biz for 10 years. I would guess 80% of programming was done for DOS, and C was a very good fit for the 640K systems.

On my corner of the planet we were using Turbo Pascal, Turbo Basic, Quick Basic, Clipper, FoxPro, DBase.

C and C++ had no special place on the spectrum of programming languages usage, unless we were porting code between MS-DOS and UNIX.

Until it was time to move into Windows, and anyone not using C or C++ started to be left behind, manually writing FFI wrappers (e.g. Turbo Pascal/Delphi) and we eventually migrated to one of them.

None of that is true at all. Turbo Pascal was hugely popular on DOS in the 1980s into the early 1990s. It wasn't until Windows 3.1 that Turbo Pascal really started losing market share to C. You are also completely forgetting about Apple. IBM PC compatibles did not even pass 50% market share until 1986.

Well, sure, but Microsoft wouldn't have chosen to promote C if not for the success of Unix. :)

Quite true, and on those days Microsoft was equally promoting Quick Basic and Quick Pascal, while writing MS-DOS in pure Assembly.

No, C got famous thanks to UNIX, just like we have to use JavaScript thanks to the browsers.

Path dependency is why. It's not complicated, it's just expensive to overcome.

I think only recently has a language been invented which has a good chance of replacing C and that's Swift. It's like C++ done right, it's user friendly, pragmatic, backed by a huge company and doesn't use GC.

Reference counting is a GC algorithm.

> Sometimes you just don't have other options than C.

This is true. With all the new and fancy languages around, it's strange that there are so few attempts at making a better C.

I would say that the characteristics of C is: dead simple, manual memory management, "fancy assembly".

Rust is close, but fails on being dead simple.

The best attempt I know of, is Zig: http://ziglang.org/

But it's still very early in development. I am using it for microcontroller code right now though, and it's much nicer than C already, despite having to deal with bugs.

"Sometimes you just don't have other options than C."

You always have other options if better languages can compile to C or integrate it via FFI. There's been plenty to do that, too. For simple and safe, Modula-2 was early one compiled to C that couldve had syntax modified to be more like it. If aiming for power, PreScheme was a systems dialect of Scheme that compiled to C at one point. The OcaPIC (Ocaml) project and use of ATS language for 8-bit show stuff like that could probably be made to wofk in areas C is used in, too.

So, I'd say it's a myth you need to develop in C just because a target has only C libraries or compilers that safer languages can use. Instead, we get a new claim that people just didnt do it for reasons that varied per project or person but wasn't technical capability.

For the microcontrollers example, the 'C' that the device's compiler uses could be a very loose implementation of a subset of the language. How do you ensure that the code you're compiling to will work for it?

Depends on the microcontroler.

The majority of them also enjoy Pascal and Basic compilers.


So sometimes, there is indeed an option.

Also many languages have compilers that are able to generate C code.

As to ensuring, well that is what testing is for.

That's a good point. I doubt most languages generating C put a lot of thought into what subset is supported in proprietary compilers. If anyone tries one, then there could be profiles done for unusual compilers where the generated code has certain properties. Far as the past, I think people couldve tested the output just to see what happened. Then, either ditch high-level language per fuction, module, or project depending on how far incompatibility effects went.

How does that sound?

>Sometimes you just don't have other options than C. Like kernel drivers, embedded (microcontroller) development

Use Common Lisp to generate C code. There are three libs that are specific for this, see C-mera for example.

That is how the .NET GC was originally implemented.

Wait. If I use non-tail recursion on a micro controller, then I’ve probably implemented a stack out of memory crash, which, for embedded devices, is actually worse than a buffer overflow on unsanitized input.

Is this a joke, or do they do some sort of stack-bounding static analysis?

What are you talking about? Running out of stack has nothing to do with what you use for code generation/preprocessing. I also don't see why running out of stack would ever be worse than a buffer overflow. The former is at least going to be fail-fast.

I keep hearing this. How exactly common is it that you can't use C++ for technical, as opposed to human, reasons? gcc is by far the most talked about compiler in this space (at least from what I hear) and obviously it's a C++ compiler as well.

My guess is that in most (not all) cases C++ is usable instead of C, but the culture of such development is to prefer C. It's a real pity, because for someone with a moderate amount of judgement (i.e. not going off the deep end on unnecessarily complicated C++), it only takes a moderate amount of C++ knowledge to be able to write more maintainable and correct code that performs equally well.

What happens in reality though is that as soon as developers get access to the stanard library all hell breaks loose. People switch to the Java mode of thinking and start almost mindlessly using vectors, maps, shared pointers, etc. and C-level performance goes out of the window. With their exposure to templates, there goes their sanity and, as a consequence, code maintainability.

Exactly. I like C++ because it lets me abstract over the absolute fastest C-like code.

For example, to produce generic SIMD-accelerated code over both Intel-specific SIMD operations and the Sleef vector math library I needed for FHT-based JL-transforms and kernel projections, I used template specialization to abstract to operations on physical memory. [0]

The generic methods themselves were gross, but in application, it was extremely simple in downstream code to specify an operation and a container and have the compiler pick the fastest method for the job. I’ve also checked my generated assembly and the smaller functions are always inlined, even though they’re called from static constexpr function pointers.

This is also a great use case for macros, as they saved me huge amounts of typing.

I’m sure many will say C++ is the wrong direction to go for metaprogramming, but it’s worked great for my purposes and lets me get straight to the hardware.

[0]: https://github.com/dnbaker/frp/blob/master/include/frp/vec.h

As a heads-up, in case anyone checks, I’ve moved this to its own repository so I can use it in other projects easily: https://github.com/dnbaker/vec.

Agreed. A sad amount of the history of Unix has consisted of gradually migrating ill-advised solutions made with text-pasting preprocessors to more sensible designs. See the Bourne shell (written in Bournegol), autoconf (written in m4), troff, sysvinit, etc.

I don't think I agree with your point in the general case, but even if you were 100% correct, and nobody should do metaprogramming with the C preprocessor in regular production code, I still don't think that's very relevant. The specific use case I had for all of these features were a unit testing library, which is very different from a regular program. If you need a testing library for C, that's probably because you have C code. You probably shouldn't try to write a test suite in Python for your C library.

Why not? The majority of engineering professions say use the right tool for the job. Whatever is most cost-effective. Programming is one of few things where people try to do the opposite. Fortunately, many in the profession do right tool for the job philosophy at least for whole domains like OS code, web apps, and so on. Still an insistence on sticking to same language too much.

Python has incredible productivity and versatility with ease of learning. It's readable. It's easier to prototype things like verification tools in it than C. There's also tools like Hypothesis to test that. Then, there's tools to speed up the code like Cython.

If anything, Python would be a safer, faster way to write tooling for modifying or testing C code than C itself. One could also convert Python generated tests to a C library of tests with automatic validation of data types, coverage, etc. So, I think it would make sense if someone did anything from prototyping C in Python to using it to test C code.

And do note Ive enjoyed reading your experimentation with macros in C. I learned about defer through it. Just commenting on that one point.

How is python safer than C? I deal with a code base that’s 10x more C++ than python, and when I’m on call to babysit the test clusters, >99% of the type errors are in python. With the occasional oddball exception to break the rule, the C++ type/memory errors are easier to track down to boot. (Python has its doozies too, BTW)

I think this is because writing python code is like writing C++ without templates, but all the integer types declared “auto” and all the pointers void.

The main difference is that the C++ compiler still* does more type checking than python even after you abuse the warning flags enough to get it to stop gently telling you to change professions!)

Reading it reminds me of when I accidentally got my emacs syntax highlighting stuck at “white on white” for a bunch of stuff, but I digress.

I guess I find it to be a completely unproductive, unreadable, and unmaintainable language. It’s OK for throw away prototypes, but I’ve never seen a team throw away a python prototype and rewrite it in another language. I know of python projects that failed and were [repeatedly] rewritten by the same team in python, and I know of python teams that failed and got swapped out for another language, but that doesn’t count. Part of the problem with the python prototyping story is that python wants to own the event loop and asynchrony / thread model, and changing those are often the main reason to swap languages — the next step is always a rewrite in my experience.

Wow. Long rant. One too many ‘assert “” != None’ errors this week. :-)

Python now supports type annotations, and static checking (using mypy).

Regarding the event loop, I don't think that's true; you can embed libpython and call functions from your own loop: https://docs.python.org/3/extending/embedding.html

"How is python safer than C?"

Its basic operations don't often crash a system or lead to hacks. It also promotes readable, concise code with a lot of FOSS utilities for things like testing (esp see Hypothesis). This both increases development pace and reduces defect according to about every study that's ever pitted a HLL against C. The best one I've seen in terms of apples to apples was this Ada and C comparison:


Note that Python isn't the main point or even what I brought up, though. mort's point read like you shouldn't develop C-related tooling in high-level languages like Python. I'm countering saying you should do as much work out of C as you can with even Python being beneficial. I give much better examples in my other response here:


> Why not?

TLDR: because you need to test the real thing, and using Python to test C code causes too many important differences between tests and production.

To test C code you need to compile it, and there’s no C compiler inside Python. Also if you have C++, sometimes you can abuse templates for similar effects, and Python doesn’t have C++ AST either.

C toolchains largely depend on environment: included & linked files, environment variables, compiler & linker switches, they all make a huge difference. You’ll spend much time replicating that in Python, and it’ll break when you’ll upgrade any part of toolchain (compiler/IDE/build automation/etc).

To a lesser extent, same applies to native binaries as well. If you’ll manage to use Python to produce tests, C compiler to build them, then Python to run them — the runtime environment will be different from the real one.

"To test C code you need to compile it"

I think nickpsecurity has introduced two ideas, and you are only countering the more complex one. The ideas were,

* Use python as a preprocessor

* Use python to call into C.

To the first idea: the language you choose to use as your preprocessor. Instead of using C macros, you could have an alternate DSL that you transform into C. Then you compile this. Given what an awkward thing the C preprocessor is, I am surprised that it continues to hold mindshare against options like this. Awk is a better powerful transform tool, and easily compiled for any platform it is not already on.

Line numbers is a complication if you do your own preprocessor. This post on PHP suggests how to deal with this, https://stackoverflow.com/questions/396644/replacements-for-...

To the other point. The complications you raise are real, but you can manage them away by setting your C project up to create a shared-object build. For example, it is trivial to use Racket to write tests against shared object files (once you know Racket). This still doesn't address the runtime issue you raise.

Separate issue. There is a set of debugging C-preprocessor macros with similar intent to the ones posted in OP published in "Learn C the Hard Way".

> Use python as a preprocessor

It’ll become harder to find developers, and longer for new ones to start being productive. Also it’ll become harder to do cross platform development because you’ll have to port, and then support, that custom pre-processing tools/DSL for every platform.

> Use python to call into C

Last time I’ve checked Python can’t import C headers. To call into C, you somehow need to negotiate function prototypes & calling conventions between the two languages. Regardless on how you do it manually or automatically, it’s very expensive to support. People do it when the have to (e.g. to interop Python with C libraries), but for tests, the approach is way too expensive for most project.

> it is trivial to use Racket to write tests against shared object files (once you know Racket)

I don’t know Racket but I doubt it’s trivial. SO libraries export C functions, they use C structures for arguments and return values, and these can be very complex. Pointers graph, function pointers, pointers to pointers, variable length structures, other advanced features are pain to use from any other language (besides languages specifically designed to be backward compatible, like C++, objective C, and maybe D). And if you need to test C++ code it’s even harder, unlike C it doesn’t have standardized ABI, i.e. the ABI changes between compilers and their versions.

You introduce a strong point with the issue moving structs over the barrier. When I do it, I create constructor functions in C, and only send pointers over the bridge between the non-C and C. I'm happy to do this, but recognise that it is boilerplatey, and should be seen as a different tradeoff choice rather than a trivial alternative.

Agree regarding C++ ABI also. I don't have much experience with C++. My usual approach is to use C for system calls and bare-minimum surrounds, and then use the wrapping approaches described above to get to a high-level language. This is why the struct issue didn't come to mind for me.

"* Use python as a preprocessor

* Use python to call into C."

You nailed it. Also note that Python isn't my recommendation so much as what mort brought up that I'm countering with examples showing even it can help. If it was my choice, I'd pick a better HLL for these goals. Let's keep looking at this a bit, though, where I'll introduce those where they're useful.

PHP was a great example I didn't think of on preprocessor. It versus C's is either the 1st or 2nd most used one out there. On the high end, some "term-rewriting languages" can easily handle jobs like refactoring: TXL, Rascal, OMeta, Ohm. Alan Kay et al in STEPS project did a whole OS with graphics stack and all in a mere tens of thousands of lines of code using such a language. One can, as people do with Prolog and STEPS did, pick a general language that can be extended with meta facilities for DSL's or rewriting where opportunistic. Then, where that doesn't work out, you at least have general-purpose language to fall back on.


(Use Control-F to go to anything that says "STEPS" to find the reports. Work from the bottom up since it's chronological series.)

On your other point, your Racket example is one way it could work. I was also advocating in this thread just using Python itself to build tooling for its own benefits and ecosystem. A lot is already built. Use C for low-level software that strictly needs it if nothing else is available with HLL's like Python for stuff that doesn't need it. The language I've been telling C programmers to check out for scripting-like use is Nim. It looks like Python, has real macros, can call C, and compiles to C. Here's a comparison I just dug up:


I also doubt it will be harder to find C developers if tooling is written in HLL's. For one, they just have to use the tooling rather than write it. I doubt most C# developers extend Visual Studio, most Java developers extend NetBeans, and so on. If they do have to learn, using a language like Python or Nim should make low barrier to entry since even folks with no experience pick Python up quickly. A C-like subset of Nim or something similar will be just a lot like C with easier syntax and less stuff to worry about. If anything, productivity will go up over C like shown in about every language study ever done.

I have encouraged people to do C-like languages with safe defaults, cleaner semantics, a REPL, better macros, easy integration of C, and outputting C. That way, one can program in a cleaner language avoiding most headaches and accidents that come with C being designed for 1970's hardware. Aside from Wirth's languages with C syntax, a good picture of what this might look like is Julia language which was femtolisp on the inside. Especially its seemless interoperation with C and Python.


  I have been working on Snow, a unit testing library for C.
OP requires a unit testing library for C, so they've written one. What is the alternative solution you propose? The framework itself may appear complex, but the entire end goal here is to provide simple/abstract tools to write tests in a readable/self-documenting manner. The framework itself is the mean, not the end goal.

My take: rethink the problem and the solution. While there are perfect applications of the preprocessor (such as pasting or stringifying tokens) - yes at some point (rather sooner than later) it gets unmaintainable. But I don't share the sentiment that C wasn't powerful. It has everything one needs (and, due to memory control and cpp, arguably more than most languages).

These days I mostly use C, and occasionally resort to python as an accompanying scripting language to generate data definitions (either in plain text formats or as C structures).

Which more powerful language do you suggest?

Macro metaprogramming is still needed in C++. Other powerful languages might lack some nice properties of C. Like easy interfacing with other languages, and a wide availability.

Considering the parent comment's author he might suggest D :-)


Well, if you use metaprogramming in e.g. 5% of the C code base, it would be OK, in my opinion, as C has lots of benefits vs other high level languages.

For instance by inventing one. :) Was this how you got started?

Only the first two of these are available in C. Computed gotos and local labels are gcc extensions; and the dynamic linker is POSIX but not C.

(Incidentally, the dynamic linker hack is precisely how both FreeBSD and Linux perform all their kernel initialization: Individual modules define symbols with a particular naming pattern, and then the kernel linker enumerates those symbols.)

The second sentence in the article:

> I wanted to see how close I could come to making a DSL (domain specific language) with its own syntax and features, using only the C preprocessor and more obscure C features and GNU extensions.

I was objecting to the misleading title (which is currently "Some obscure C features"), not to the content of the article.

I think you're missing the point of the article. The author isn't limited to standard C, and neither is anyone with a unix and GCC.

Are you just nitpicking the author's description of extensions as "C?"

It was a clarifying comment, didn't denigrate anyone, and added a useful and interesting tidbit of information. I see nothing wrong with this.

Perhaps the comment author should receive some benefit of a doubt as to his intentions?

Sorry, you're just incorrect about both the content of Colin's comment and authorial intent. Colin clarifies he is expressing an objection, not a clarification, and that it is in fact the nitpick I guessed it was: https://news.ycombinator.com/item?id=16236015

Yes, it doesn't denigrate anyone, but that's a really low bar.

I do give authors the benefit of the doubt when something isn't clear. That's why I couched my message in "I think" and asked a question instead of just asserting facts.

> Sorry, you're just incorrect about both the content of Colin's comment

You seem to have misinterpreted me. I wasn't saying Colin was attempting to clarify (I didn't know his intent), I was saying that is his comment was factually clarifying. I didn't have any evidence as to his intent at the time, but I didn't see a reason to assume anything other than he was being helpful.

> Yes, it doesn't denigrate anyone, but that's a really low bar.

That's not even half of the criteria I listed, so it's not really the bar I set at all, is it?

The comment contributed by clarifying the submission name for those that had not yet read the article. It additionally contributed by adding some factual information about some of the referenced features and how they are used in the kernel.

I don't think anything he said takes away from the article, so I don't think he's necessarily missing the point.

> I do give authors the benefit of the doubt when something isn't clear. That's why I couched my message in "I think" and asked a question instead of just asserting facts.

I think you also misunderstood what I was trying to accomplish, how much of my statement was meant to be a condemnation and correction, and how much was leveled at you instead of the general readership. The only existing reply to you at the time I posted was referencing people on HN that like to nitpick. It was meant as a "this isn't necessarily negative, so let's just take it for what it provides, and it provides some usefulness" and not as "shame on you for assuming the worst".

Edit: Blah, there was some weird wording in that last paragraph

I find that some HN people read articles with the express intention of picking nits. Like the dwarf with learning difficulties, it’s not big and it’s not clever.

The dynamic linker is also Win32. You can wrap things to either call dlopen or LoadLibrary. In Cygwin, you have dlopen and friends which work with DLLs.

I also encountered the last problem, it can be handled pretty easily with __attribute__((constructor)):

  void (*_test_functions[NTEST_FUNCTIONS])(void);
  int _test_function_idx;

  #define TEST_FUNCTION(fname) \
    void fname(void); \
    __attribute__((constructor)) static void _init_ ## fname (void) { \
      _test_functions[_test_functions_idx++] = fname; \
    } \
    void fname()

  TEST_FUNCTION(test_foo_bar) {
    assert(foo_bar == 0);

A quite fun, related feature of the AIX linker, is that it will automatically make any function name prefixed with "__sinit_" statically invoked the same as this attribute does. So you name the function a certain way to take care of AIX/xlc, add pragmas to take care of other UNIX vendor compilers add the attribute for any GCC-based (or compatible) compiler, and they are actually pretty cross-platform! Not... that... I would use these all the time or anything like that :)

The fun was even better in the old days, where each UNIX had a different concept of what a shared object should be.

AIX used to implement shared objects just like on Windows, using an export definitions file and import libraries, for example.

Still does to this day. And it has a machine-wide cache of libraries, so simply updating the library and restarting the processes isn't enough to get the new version -- you have to be aware it could be in the cache and might need to be manually flushed by someone with elevated access. shudder

Thanks for the clarification.

Last time I used it was about 2003.

I love taking the address of labels. Back in highschool I wrote a couple of articles on using them to create JIT compilers:



This is similar to the technique used to implement gforth. The interpreter is all contained in one function, and the instructions are accessed using local labels and goto.

I believe CPython does something similar.

And local functions.

And __cleanup.

And statement-expressions.

And Blocks is nice too.

All of this should be standardized.

That shorter URL is 404ing.

Oops, looks like I forgot to migrate it from my old blogspot

Instead of the tricks with __COUNTER__ it is better to define the macro such that it also defines initializer function which registers whatever you want to register dynamically on program startup. The way of doing that is somewhat non-portable but essentially every interesting compiler-platform combination has straightforward way to do that as it is needed for both C++ (to call constructors of global variables) and ObjectiveC (as part of the code that @implementation expands into) implementations.

Edit: also, syntax-wise it seems to me that it is better do define the macro such that it can be used as

    define_foo(blabla...){ ... code ...}
In this case this would involve code like

    #define define_foo(name, ...) \
    static foo_ ## name(); \
    ... initializer function ...
    static foo_ ## name()
which also solves the first problem mentioned in the article.

I agree with your point about syntax; it would have been nice if the syntax was `define(blah) { ... }` and `it("whatever") { ... }`. However, I don't think that's possible, because I need to include code after the end of the block.

That might not be obvious from the simplified macros I used in my example, but it's pretty clear by looking at the actual definitions. The describe and subdesc-macros (https://github.com/mortie/snow/blob/a9ad850df456f78bcf96e1aa...) need to increment counters and print their status after the block, and the `it` macro (https://github.com/mortie/snow/blob/a9ad850df456f78bcf96e1aa...) needs to run deferred expressions.

I think it would be possible to implement the `describe` macro such that it can be used as `describe(blah) { ... }`, because that defines a function and we can both give the function argument and expect return values from it, but I can't think of any way to do it with the other macros which just create regular `do { ... } while(0)` blocks.

If I'm wrong, please show me how; the `foo(blah) { ... }` syntax would make the __LINE__ macro work, it would play better with auto indenters and syntax highlighters, and it would give prettier error messages if you have a syntax error in a test case. I just can't see any way it would be possible.

    #include <stdio.h>
    #define before_after(before, after)					\
    	for(int before_after_##__LINE__ = 0;				\
    	    before_after_##__LINE__ < 3;				\
    	    before_after_##__LINE__++)					\
    		if(before_after_##__LINE__ == 0) {			\
    			printf("%s\n", before);				\
    		} else if(before_after_##__LINE__ == 2) {		\
    			printf("%s\n", after);				\
    		} else
    main(void) {
    	before_after("hello", "world") {
    		printf("to all the\n");

You should combine "__COUNTER___" with "before_after_##__LINE__" to make a unique identifier more robust.

Your snippet won't work with this inline example:

  before_after("hello", "world") { before_after("hello", "world") { }}

I suppose you could do a linked list hack of sorts to. Just create a linked list of goto labels. When done iterate through to clean up everything.

Personal wish for an extension to c is to be able to create a list function arguments and slop that around until needed.

   int BarBaz(int bar, int baz)
      return bar+baz;

   auto foo = args_of(BarBaz(10, 20));

   printf( "BarBaz = %i\n", BarBaz(foo)};

@mort96, for the latter example, the BSDs provide this in the sys/linker_set.h header, with macros like SET_BEGIN, SET_FOREACH, etc. Unfortunately I don't see the same functionality in Linux headers, but there's no reason the same technique cannot be used there.

Enormously powerful, and powerfully unreadable for mere mortals. you can take the assembler out of the loop but you can't take the failure to comprehend out of the programmer.

I don't do C for a living any more (barely did) but the number of times I asked smarter people "why did you use this idiom with this terrificly confusing side-effect" and they said "thats not a side effect" comes to mind...

More examples (C89/90 preprocessor compatible):

Function building using templates:

Passing macros to macros (meta-templates):


My favorite (somewhat) obscure C feature:

    puts("Line one\nLine two\nLine three");

    puts("Line one\n"
         "Line two\n"
         "Line three\n");
mean the exact same thing. This can be a godsend for making complex debug/log output more presentable in the source.

Or, "Things you can do with macros in C, but probably shouldn't."

Why not? I'm serious. Please don't say "don't use C". We all agree on that. I'm asking why one should not use clever macro tricks -- I know many codebases that do, and I'd not shy away from it when I have to use C anyways.

> Please don't say "don't use C". We all agree on that.

I wouldn't be so sure John Nagle agrees with such a blanket statement.

> I don't know how that works, but the important part is that it does.

Coding in C is already super risky, as proven by the huge number of projects that have exploitable memory errors in there. Adding complexity using features one doesn't really understand is at best going to complicate debugging and make thing work magically and in the worse case, lead to the risk that someone is actually going to understand how your code works and exploit it.

Not worth it.

I might not have expressed myself clearly enough. It's not risky, it's not a feature I don't understand how it works, and I'm using it exactly as it's described in GNU's documentation on labels as values and computed gotos: https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html

The "I don't know how that works, but the important part is that it does." sentence was meant to highlight the apparent absurdity of dereferencing a void*, not to say that I don't understand exactly how to use the feature.

The author is talking about code used in a unit testing library that he is developing. As long as you don't release your test code as part of your software package, I don't see why some smart shortcuts that can help with the programming experience can not be used.

It's just a "computed goto." A GCC extension, yes, but not a particularly hard one to understand.

Sending blocks as arguments to macros: Just wrap them in another set of parentheses. For parentheses-balanced input (such as valid C blocks) this should always work (I think).

    $ echo '#define x(a,b,c) a b
    x(1, (2, 3), 5)' | cpp

That isn't really a viable solution, because C doesn't allow parentheses around blocks.

Calling `describe(foo, ({ int a, b; }))` would make the preprocessor happy, but it would also produce invalid syntax:

    void test_foo() ({ int a, b; })

Tsk tsk, if you're going to know about label values and other GCC C extensions, you might as well know about statement expressions (another GCC C extension). `({ <statements> })` is an expression.

Naturally, Clang also supports statement expressions...

EDIT: I highly recommend spending some time reading about all the various GCC C extensions, and what compilers support them (typically Clang, but also Sun Studio, if that's still around..., and maybe others). There are quite a few very useful ones.

Interesting as highlights of what C preprocessor and compiler extensions can do. But I'd hate seeing it in actual production code.

I used to be enamored with what I could do with the preprocessor back in the 80's. I've since reverted all of that (in C code I still use) back to mundane uses of the preprocessor.

Eventually one tires of no support in the symbolic debugger, program analysis tools, compiler error messages that are based on the post-expansion code, nobody else understands the code, hygiene problems, etc.

Exactly my sentiment, especially the collaboration part. If you take it too far, you effectively invent a new language. Forcing it upon others is more about hubris than skills.

> you effectively invent a new language

Perfect. Can't tell if that's deliberate.

I haven't seen a major project written in C that _didn't_ use some magical macro-foo.

PostgreSQL source code for example, has a foreach() macro that loops thru a linked list.

There are straightforward macro things like comparisons, list/queue head declarations and perhaps iterators (tacky IMO) that are understood by most practising C programmers. Deviate from that too much, you effectively penalize collaborators/future maintainers by forcing to learn your toy language.

At the same time the abstractions provided by C preprocessor are too weak for significant productivity gain via elaborate DSL route.

It's better to rather stick to universally understood vocabulary than force your sometimes buggy, often inflexible language upon others.

I agree that there's a risk of getting off into the weeds.

That being said, C Preprocessor behavior is pretty simple once you know it, and modern compilers are pretty good these days about helping you see what happens when you have an error in your macro-expanded code. You're right though, good to stick to the universally understood vocabulary.

Do you use PuTTY? Because PuTTY does crazy C macro things. Actually, maybe not that crazy. It just has C functions that a thousands of lines long, using macros to turn them into fancy Duff's device co-routines.

No I don't use PuTTY. Also, I'm aware what you can do with C macros, doesn't mean you should do it. See my sister comment.

in college my buddy made a defer function in C that is a little simpler (but only works with clang)


It's not just that it only works with Clang, but that it only works with Blocks (Apple's extension to C which basically adds closures of indefinite extent, which is pretty cool).

Really interesting read. However, I fear that if other C library maintainers will start to use this kind of techniques, creating bindings for other languages will become increasingly difficult.

Applications are open for YC Winter 2024

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact