Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Large single compilation-unit C programs (2006) (csail.mit.edu)
50 points by Sadkov on Oct 9, 2020 | hide | past | favorite | 21 comments


May be relevant to the discussion: SQLite is also compiled from a single, 220k LOC C file called "the amalgamation".

https://www.sqlite.org/amalgamation.html


I also found this "amalgamate" script on GitHub, intended to allow creating such amalgamations from C/C++ projects:

https://github.com/rindeal/Amalgamate

Which seems interesting, however when I tried the FreeType example, there seemed to be some preprocessing issue, such that some function definitions are conditionally excluded even though they are called later. I didn't have the time to find out if this was an issue in the original code or if the amalgamation script introduced it.

In any case, such single-C programs are very useful for quickly testing tools, so having more of them would be great.


I'm not a C programmer, but I have heard of amalgamation, and I wonder why a standard workflow to create a single compilation unit from multiple source files isn't more straightforward.


Because C is stuck in the dark ages.

Textual preprocessors are evil.


That took a few minutes to make it work on newer macos.


I've been working on a project that auto generates c programs - sometimes up to 1.5m lines of code - in a single file (actually two files but the second is only 35 lines)

Not open source but happy to share benchmarks if that would be useful.


Too bad it's not open source, but will some of the generated programs be?

Also, would you mind comparing it to Csmith (https://embed.cs.utah.edu/csmith/)?


There is quite a lot of IP in the generated programs so probably not possible to share sadly.

I wasn't aware of Csmith so thanks for highlighting. My C code doesn't really test many features of the compiler so I suspect mainly of interest in seeing just how the compiler handles a really large single file.


There's also https://github.com/intel/yarpgen which I haven't used. I believe there are a couple of others...


Some compile times for those interested:

Hardware 2016 12" MacBook (1.1GHz Core m3) Ubuntu 20.04 running in Docker Clang 9 -O0 optimisation (more optimisation increases the compile times a lot!)

0.53m LOC 41MB 34s

0.99m LOC 76MB 91s

1.44m LOC 110MB 167s

I suspect the code is relatively straightforward to compile - few function calls etc.


Please share the benchmarks...


Will do - give me a few hours (and that's not the compilation times!).


I like to code this way. You just include "foo.c" instead of "foo.h", which does not exist at all. The compilation is really simple, and there's half of the files!


1283 = continue 1432 = license 1766 = gnu

So for every loop continue statement there is a GPL license text :D


I know it's half serious but it's simply not true, in the same way as grepping for "Stallman" in the leaked Windows source code (nobody actually mentioned RMS there, these were false positives). In this case, some headers contain multiple occurrences of GNU in a single header. Then there are several #ifdefs like "__GNU_LIBRARY__" or "__GNUC__" or e-mail addresses of people in the gnu.org domain.

In practice, it doesn't matter at all as the preprocessor replaces all license headers with a single space even before the compiler has the chance to look at it.


The preprocessor removes the comments? I thought that was the compiler?



The preprocessor doesn't remove comments and at least in clang, comments are parsed into the AST.


Well, according to C99, it should. Section "5.1.1.2 Translation phases" says (in phase 3): "Each comment is replaced by one space character."

Edit: just checked and clang behaves just like gcc with -E. Maybe you didn't mean comments but preprocessor directives?


Clang, like GCC, has -C and -CC flags to preserve comments during preprocessing. However, these are really flags for the underlying preprocessor. Your parent might be thinking of some application of the Clang frontend that does not run preprocessing. For example, clang-format will probably not want to preprocess the code nor strip out comments.


You are correct, my bad.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: