Hacker News new | comments | show | ask | jobs | submit login
Abusing the C Compiler (wingolog.org)
40 points by wingo 2568 days ago | hide | past | web | 19 comments | favorite

Hardly abusing or original, generating small chunks of code and invoking C compiler on each of these is how GNU autoconf works (and the reason it works slow, too!)

An interesting parallel. I guess the novelty is that there is no intermediate expanded file -- no config.h, etc.

What would you call tmp.c, a.out, tmp.txt and so on if not intermediate files?

I don't see a single sentence in the article that implies this.

Not sure what you mean here; the autoconf analogy came from the grandparent.

If you are under the impression that an intermediate expanded file is produced, then you misunderstand how Lisp macros work.

If I understood correctly, here the Lisp code generates the C code which generates the small chunks of Lisp code which are then compiled. But I don't understand what you mean under "intermediate expanded file." In autoconf the config.h is the final resulting chunk that is used in the final compilation (somewhat in parallel to Lisp chunks generated by the C)." What's your take?

I'd like to point you to the 2008 paper by Felix Klock [1], or perhaps the more easily digestible presentation [2], where you will find (slide 39-47) the description of the inner workings of procedural macros like define-c-info, where it's explicitly stated that an intermediate C program is generated, compiled & executed to get the desired result(s).

This is not groundbreaking per se, but it is certainly a smart compile-time tactic of interfacing with a C ABI.

[1] http://www.ccs.neu.edu/home/pnkfelix/Published/klock-ffi-sch... [2] http://www.ccs.neu.edu/home/pnkfelix/Published/klock-ffi-sch...

I agree, that is a great paper. It's the one I linked to in my article :)

I was specifically referring to the usage of intermediate files...I think I assumed too much, since obviously you must have a very different view of what consists an intermediate file than I do. Would you care to elaborate?

I don't want to get too bogged down here, but sure:

When you use autoconf, you typically run configure, then you're left with a config.h, which then parameterizes later builds.

On the other hand, when you evaluate the the definition of dirent->name, no intermediate file is left behind.

Of course in both cases you make temporary C files, but they are ephemeral. The difference is that in the first case you are generating files for inclusion in a later phase, and in the second you are effectively extending your scheme compiler with a c compiler.

The surprising about this code from a Scheme programmer's POV is that usually macros are about rewriting Scheme source using Scheme. In this case the macro generates C source, forks to compile and run it, and munges the result into the resulting text.

But sure, I can see that from a certain point of view, autoconf and scheme macros can do similar things :)

I see your point, but the fact remains that the underlying technique is just the same. The difference on whether intermediate files are used further down the build process seems somewhat irrelevant.

http://perldoc.perl.org/pstruct.html takes this a bit further (but less portably), and compiles the code to assembly and parses out the debug records to find the information about the structures.

Very cute; very problematical. C struct offsets depend entirely on the model, packing, target etc. How can this be controlled?

There are compiler intrinsics that can do that but it is not pretty.

You include stddef.h and use offsetof().

Unless you're compiling for another platform, or a different code model than the default etc.

In the first case, you are likely going to have a libc and headers from the target platform sitting around so that you can link.

May take switches on the compiler command line.

Further, packing can depend upon the pragma in force when the header file is included. How to set up that environment?

You know, the whole dynamic FFI world is very much seatbelts-off, which is initially quite disconcerting. Usually I rely on my distro to ensure compatibility, but once you start saying "this syscall gives me a pointer to memory which should be interpreted as two ints, one char, and a float, packed conventionally" you begin to realize what exactly are the interfaces between various bits on your system. For better and for worse of course.

What Guile has now is the "list-all-members-in-the-struct" approach that Klock discourages. It's the difference between API and ABI compatibility. I'd like to figure out how to do the former.


Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact