Hacker News new | past | comments | ask | show | jobs | submit login
Embedding Binary Objects in C (tedunangst.com)
257 points by ingve on April 16, 2020 | hide | past | favorite | 136 comments

If we're in the realm of "non-standard linker tricks"...

Compilers will concatenate sections of the same name. You can use this trick to produce a concatenation of arrays across several files:

    $ cat t1.c
    __attribute__((section("some_array"))) int a[] = {1, 2, 3};
    $ cat t2.c
    __attribute__((section("some_array"))) int b[] = {4, 5, 6};
    $ cat t.c
    #include <stdio.h>
    extern const int __start_some_array;
    extern const int __stop_some_array;
    int main() {
      const int* ptr = &__start_some_array;
      const int n = &__stop_some_array - ptr;
      for (int i = 0; i < n; i++) {
        printf("some_array[%d] = %d\n", i, ptr[i]);
      return 0;
    $ gcc -std=c99 -o t t.c t1.c t2.c && ./t
    some_array[0] = 1
    some_array[1] = 2
    some_array[2] = 3
    some_array[3] = 4
    some_array[4] = 5
    some_array[5] = 6
This is the mechanism that linkers use "under the hood" to get a list of C++ object initializers that need to run pre-main().

It's unfortunate that there is no standard way of getting at this functionality in portable C, or to get it in C++ without actually running code pre-main(). Sometimes you really want a linker-initialized list things (of some sort) that have been linked in, without actually running code pre-main() (which has all kind of issues).

I would love to see a thing like this standardized in both C and C++. C++ compilers need it under the hood anyway.

What a weird coincidence seeing a comment about mergeable sections today...

A patch I helped review that's related to this in LLVM just landed today: https://reviews.llvm.org/D72194. (LLVM previously had support, but would perform bad merges in specific and rare edge cases; that patch fixes those cases).

Also, note that you need a custom linker script to define those symbols (__start_some_array, __stop_some_array). The Linux kernel does this, as noted in: https://nickdesaulniers.github.io/blog/2020/04/06/off-by-two...

> A patch I helped review that's related to this in LLVM just landed today

Cool. :)

> Also, note that you need a custom linker script to define those symbols

The code I posted above compiles and runs without any custom linker script.

> The code I posted above compiles and runs without any custom linker script.

What?! Ok, I've seen the Linux kernel's custom linker script define symbols for custom ELF section boundaries, but TIL that you can get these automagically. I'm curious to see if LLD implements this logic, too.

How exactly is the ordering of items across the section decided? Is it relatively random or is it based on the order that the object files are passed to the linker?

It's unspecified, and I've seen it change over time as the linker changes it's data structures.

This is true, to my knowledge, I would NOT relie on the order to be always consistent. But I think you can enforce the order in the linker file, i.e. load the arrays one after the other, and then the order should keep constant. But you still have several places that you need to keep synchronized and, really, most people don't look in the linker file at all.

You're right, you can manually specify the ordering of object files in the linker file. Though, like you say, this is rarely done because of how brittle it is.

I was referring more to the way shown above, where I've seen a linker change the list of object files from a sorted list of filenames, to a hash table of filenames, which obviously changed the ordering with which it iterated across all object files.

You can sort the array during program initialization, if you care about the order.

It's probably already in link order. If you have to know the names of the symbols in code to be able to sort them, it sort of defeats the purpose of the scheme in the first place.

You might want to sort them by some other criteria.

For example, the FreeBSD kernel's use of this construct embeds a well-defined priority[1], and the array is sorted by this priority value at boot time.[2]

[1]: https://github.com/freebsd/freebsd/blob/master/sys/sys/kerne...

[2]: https://github.com/freebsd/freebsd/blob/master/sys/kern/init...

Assume it's random. See also the static initialization order ‘fiasco’: https://isocpp.org/wiki/faq/ctors#static-init-order

The BSDs define some macros in sys/linker_set.h to make this slightly easier. The mechanism is used somewhat widely in both kernel and userspace in FreeBSD.

Can you give an example of a use case where this is needed?

One place it's used is to avoid central "registration" for multiple "things". For example, consider a program with many fairly separate "subcommands". Using this mechanism allows adding a new subcommand simply by linking in a new source file (without changing a shared source file to list the new command).

I use this idea in embedded work, where the code is split into "modules". The same firmware source is shared amongst various differing pieces of hardware, each of which utilise a different subset of modules.

I like to use this technique for CLI command and help tables. It eliminates having a single giant table in a file, instead you have a C macro which makes the system aware of your command right next to your command definition.

   int my_command(..) { ... }
   COMMAND(my_command, "command_name", "example command")
   HELP("use --foo argument for this...")
   HELP("use --bar argument for that...")
Also used this all the time back when I programmed assembly language.

It's definitely a problem that this is not standardized. For example, TI has their own C compiler for their ARM CPUs, not everybody uses gcc..

This is used for constructor & destructor functions in C code. The function pointers get put in an array in a certain section which is collated then the startup and exit code calls them. Ordering is not guaranteed.

Note that the ordering that the constructors and destructors will be run once the binary has been constructed is guaranteed by your C runtime.

There's also gnu's objcopy.

This post covers a lot of different approaches, and has lots of tips: https://www.devever.net/~hl/incbin

I got about 90% of the way to having a working ruby single-file executable builder which used objcopy to embed a sqlite database of the source files into an MRI build. Then YARV happened and the ruby build chain changed just enough that I needed to throw it out and start again.

Every now and again I ponder having another go with mruby...

FWIW, this is roughly how Tclkits work in the Tcl world. Although by default they use a Metakit database instead of SQLite.

Currently they append the database to the end of the executable, which has some problems and I'm working to make including it in the image more standardized as part of XVFS [0].

[0] https://chiselapp.com/user/rkeene/repository/xvfs/

Thanks for the link! That’s got a lot of goodies.

One of us from Memfault also wrote a post about tips with Binutils, and this approach is covered in it.


TIL objdump --visualize-jumps…

The image background on that webpage is extremely distracting

Make your browser width narrow, and it drops to the bottom. Not that you should have to :)


I needed this last week! Building an embedded firmware image for a dashboard display, with lots of PNG files for icons etc. 61 of them.

The original developer wrote a tool to expand the PNGs to BMPs (arrays of 32-bit pixel values) and generate a C array definition as a text file. Which is lots bigger than the original PNG (13K => 100K sort of thing). Then included that C source in the build. Used up 700K of my firmware image space which was only 1MB to begin with.

So I wrote a tool to represent the raw PNG file as a C declaration, then added parts of libpng to the boot code to decompress the images during initialization. Even with libpng, I saved over 400K. Now the images use RAM instead of ROM, but that's ok I had buttloads of RAM.

Anyway, this is a much slicker way of including binary files in an image. I may go back and change my build.

There are even smaller png libs than libpng, try a stripped down stb_image for example. Wouldn't use that for user-supplied images, but since you control them all it should be fine.

I do this with embedded graphics in boards for a lot of reasons. Works great - and STB is pretty easy to modify if you need a funny one (eg : monochrome pbm for small displays). Animated gif, png, bunch of others all work pretty smoothly. Just include the bits you need so it can be even smaller.

Might be worth recompressing the PNGs too if you haven't already. In particular, using 8-bit PNG can size a lot on filesize if you don't need loads of colours.

Yes! And having tried a few different tools for this, I recommend pngcrush (easy to use, excellent results).

Edit to add: I settled on pngcrush 3 or 4 years ago; if there's something newer and better, I'm interested in hearing about it!

Nope. Dumb Visual Studio doesn't support embedded binary in cross-compile. That's the tool the client wanted, so I'm stuck with it.

Funny, they (VisualGDB) even have a tutorial on doing it. But the option 'Embedded Resource' doesn't exist in my (modern) Visual Studio.

Ha! And the feedback I gave VisualGDB about their outdated tutorial, they responded "Just pay HERE and we'll be glad to solve your problem!" They can go ahead and have outdated documentation, screw them.

Could you not try using the objcopy method mentioned here outside Visual Studio (or somewhere before the build process) and then just using the .o at link time, just referencing it from the outside at linkage? I expect at least a way to:

- hook custom tool calls

- link external objects into the project

Also, hello fellow automotive embedded SW eng! I know your suffering! :))

VisualGDB uses a tool-created makefile for cross-compiling and building. Not a lot of room for me to innovate. I suppose the .o trick could work. But it still requires careful hand-scripted builds. I already have that - I run a script to read a binary file and emit .c declaration, which I pipe to a C file and include in the project.

I suppose it lets me drop support of the script, which is something.

Afaik `xxd -i` is much simplier. No need to write another tool.

imagemagick and netpbm can do that. Gimp too.

I assume so. But how many tools do I want to include in my client's build dependencies?

I wrote this tool a long time ago, which allows you to embed files and then access them as a "filesystem" with stdio. May be of interest:


If you just want the contents of the file as a symbol, the approach described in the article is 100% the way to go.

Or, as any CTFer worth their salt will tell you,

  __asm__(“.incbin file”);

A real world example of use: https://github.com/google/honggfuzz/blob/075756bea8d1f08eb19...

I think it's a bit better than

  ld -b binary ... 
cause it doesn't depend on the static linker generating symbols, which feels a bit like a non-standard/non-portable feature. But, who knows, maybe the vast majority of ld variants actually implement it.

CTF = ?

A capture the flag challenge https://ctftime.org/ctf-wtf/

One of the Capture the Flag challenges at CCC last year was to figure out a way to leak the contents of a file on a remote compiler server. The server would accept some C code and just give you a boolean true/false value whether the code compiled or not---without ever executing it.

Others came up with a solution abusing the C preprocessor by defining macros that would make the known structure of the file valid C and therefore they could just #include it. But my solution works with arbitrary files, without knowing the structure beforehand.

As others have pointed out here, you can use inline assembly and the .incbin directive to include a file. But how could that influence whether the compilation succeeds or fails? I figured out how to guess a byte of the file and create metadata sections only accepted by the linker if the guess was correct.


We didn't actually use .incbin on that challenge, interestingly; however, we did use it (along with some nested static constructor function trickery) for Online Calc from TokyoWesterns CTF. For that we had some straightforward tricks to get around the forbidden characters, then we abused the flag format to get the #include to work. After that we could leak the flag byte-by-byte using Linux's BUILD_BUG_ON_ZERO, which is basically an upgraded static_assert. I think this is the code we ran:

  _Pragma("clang diagnostic push")
  _Pragma("clang diagnostic ignored \\"-Wtrigraphs\\"")
  ??= define STRINGIZE(...) ??=__VA_ARGS__
  ??= define hxp EXPAND_AND_STRINGIZE(
  ??= define BUILD_BUG_ON_ZERO(e) (sizeof(struct <% int:-!!(e); %>))
  ??= define BUILD_BUG_ON(condition) ((void)BUILD_BUG_ON_ZERO(condition))
  const char flag =
  ??= include "flag"
  BUILD_BUG_ON(flag == {});
  _Pragma("clang diagnostic pop")
(The {}, of course, being Python format parameters from our script.)

My favorite is simply using a tool to create a C file with your binary data:

  static uint8_t mydata[] = {0xDE, 0xAD, 0xBE, 0xEF, ... };
The advantage is that it works anywhere, with any compiler.

The disadvantage is that it can increase compile time. I would limit each .c file to 10MB; it seems there's a quadratic increase in build time with file size, at least with gcc.

Also, instead of "0x%02x" I use decimal notation and no spaces in order to decrease the .c file size:

  static uint8_t mydata[] = {

I've never noticed the "quadratic" build time increase you mention, so I did a test[0]. Files in size from 1mb to 50mb, 3 trials each. These are the results, and they look absolutely linear[1].

[0]: https://pastebin.com/Z9329xkc

[1]: https://i.imgur.com/I3XSBDg.png

Nice, I love it when people actually try it out. In the end, I much prefer the method outlined in the article, less processing overall, no need to transform it into an array nor compile the array into an object afterwards. You just skip 2 steps and bring it to an object directly!

I stand corrected! I shouldn't have guessed "quadratic" without measuring.

Large files will make gcc use more memory, and if you're building on a low-RAW machine, the OS will start to swap. That's probably the effect I saw. Nothing to do with "quadratic" compilation time.

The article specifically leads with this as the default technique and then wanted to describe a fun alternative.

For those wondering, the easiest way to do this is with:

    xxd -i filename.bin

Yes, so long as you're willing to take a build-dep on vim. An od+awk+sed+sh combo will get you there from a POSIX base—and you have to add some wrapper text around the output of xxd anyway…

xxd exists in standalone form. Compile it.


I mean, so long as you're willing to take a build-dep on that, in that case.

Just a c compiler.

No. So long as you're willing to take a build-dep on that code.

Either you're expecting the code from that repository to exist somewhere in the compilation-host system already, or you are vendoring the code by copying it into your own repository and taking on any potential maintenance, unportability, any necessary configuration integration to make sure, for instance, that you're compiling for the host rather than the target in a cross-compile setup (did you make sure?)—

Either way, if you use that code as part of your build process, you are taking a build-dep on that code. The fact that its only build-dep is a C compiler may play into making this a sensible choice as compared with something that has even more transitive dependencies, but it does not make it go away.

Too bad that xxd doesn't include the const before the array decl. With const it saves a lot of memory.

I don't see how const would save memory, unless you have a separate ROM region that keeps that out of the main memory.

VMM. const memory can just be evicted and be swapped back in with zero cost. rw memory must be backed somehow, which us much more expensive.

I submitted a patch

Not true unless you actively modify it (or the page it was in). "RW" mapped memory that is not dirty is also discarded on memory pressure.

If you click through from the article to the mailing list post that inspired it, you'll see that that's what they originally used, but it created a memory problem.

I also did quite a bit of benchmarking, and never saw any quadratic behaviour [0]. However, there is a floor for RSS and and elapsed time, but you would need to test assets less than ~1MB to see it. For assets over a limited range, the graph might look quadratic, but if you cover a large enough range you will see there are two regimes of behaviour.

[0]: http://stonecode.ca/posts/binc_benchmarks/

Typically such data can be 'const', too.

Also note that for bytes, hex literals are longer than decimal ones a lot of the time and never shorter due to the "0x" prefix.

I wonder what causes it to be nonlinear?

if you do that you could also do base64 ;)

But you can’t? Base64 won’t be recognized by the C compiler, you’ll end up having to decode it at runtime.

yep. That's a small piece of c-code you need to invest. See https://stackoverflow.com/questions/342409/how-do-i-base64-e...

So, by encoding in base64, you’ve made a more complicated solution which is also slower and harder to use? This seems like it is worse in every conceivable way. The only reason I can see for embedding base64 is if you need the source code to be compact, which makes sense for e.g. JavaScript but makes no sense for C.

I’ll take the simple, fast, and easy solution, thank you.

Where did they define the symbols `_binary_quine_c_start` and `_binary_quine_c_end`? I would expect that the symbol names would need to be passed to the command that produced the object file from the binary file:

    ld -r -b binary quine.c -o myself.o -m elf_amd64

> I would expect that the symbol names would need to be passed to the command that produced the object file from the binary file:

It's the other way around. The command produces these symbols basing on the file name:

$ readelf --symbols myself.o

   Symbol table '.symtab' contains 5 entries:
      Num:    Value          Size Type    Bind   Vis      Ndx Name
        0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
        1: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
        2: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    1 _binary_quine_c_start
        3: 0000000000000113     0 NOTYPE  GLOBAL DEFAULT  ABS _binary_quine_c_size
        4: 0000000000000113     0 NOTYPE  GLOBAL DEFAULT    1 _binary_quine_c_end

Those are standardized named generated by ld, derived from the file name (in this case "quine", which is treated by the author both as C source and a binary data file). I bet there's a CLI switch to change the defaults.

They are assuming gnu ld, and that's the pattern it uses. It will change some characters, like "." to "_" to keep valid symbol names.

GNU objcopy has a way to change the generated names. As far as I can tell, ld does not.

Totally not relevant for this post, but this is how we do this in Nim:

    const a = readFile("mydata")
The `const` makes the expression evaluate at compile time, conveniently slurping the file into `a`, where it will be available at run time.

That is pretty cool, but also not the behaviour I'd expect coming from other languages where this would be "get the contents of this file from the current directory of my runtime environment, and assign them to immutable variable `a`."

Understandable; there's `let` for run time and `const` for compile time evaluation.

Unfortunately, this technique is a bit problematic with modern C compilers. Because the `start` and `end` symbols are unrelated objects, as far as the compiler is concerned, the subtraction `&end - &start` to get the length of the data invokes undefined behavior. Just for that reason, I feel the include file with a hex dump is the better method.

> Because the `start` and `end` symbols are unrelated objects

That's irrelevant. What matters is whether they point to (or just past the end of) the same object, which they do.

To get the size you would use the `_binary_FILENAME_size` symbol that the linker puts into the object file.

Last time I checked in the Green Hills compiler I'm using you also have the option of using a symbol that contains the size.

Can you elaborate on this? Does "unrelated" have any special meaning in this context? Why is this operation undefined?

In C, accessing data outside the bounds of an object is undefined (like, you can’t legally “go past the end” of an array and end up in some other object). The “start” and “end” pointers, as far as the compiler is concerned, are totally different objects, so it may optimize a loop from one pointer to the other out since it’s impossible to increment an address so it’ll go from pounding at one thing to another.

So is the idea that just because the two objects happen to appear sequentially in memory under a certain implementation, the compiler (or linker in this case?) has no obligation to ensure that assumption holds?

Correct. (Although, the pointers being defined that way seems to make it quite unlikely that the compiler could optimize this incorrectly…)

Where is data outside any object accessed? Note that forming a pointer just past the end of an object is well-defined; only dereferencing it isn't.

That's what [u]intptr_t is for.

In the section on search paths…

It follows the same implementation experience guidelines as #include by leaving the search paths implementation defined, with the understand that implementations are not monsters and will generally provide…

Really? We are going to "understand that implementations are not monsters" after what they've done with "undefined"? I think maybe these standards should be written from the perspective that implementations are sociopathic demons summoned against their will.

(But I really want this feature. I regularly use xxd and some Makefile rules to embed assets in my executables. For instance, in a web service I might have all my default configuration and templates in the executable with command line options to send them to standard output and override them from external files. Then on the chance someone needs to make a change they can just make their own file and use it.)

May I ask what's wrong with using xxd? I don't have an opinion about C, but C++ is already pretty complicated, and adding features to the language when there are already well-known solutions doesn't seem wise.

(For the record, I'm aware that there are size limitation when using xxd, but there are also other solutions).

Windows doesn't ship it, so now your build system got even more complicated on Windows.

That's true, but I put that in the bucket of "C++ needs a package manager so that we can use dependencies more complicated than a single header file".

I wouldn't do it this way, within the toolchain. Better to just append the data to the finished executable, with a header that contains an identifying marker. The program can scan its own image and look for that marker to get at the data. That will work on any OS with any executable and linker format and can be done post-production (users or downstream distributors can receive the binary and add customized data to it without requiring a dev environment with a toolchain, just tiny utility you can bundle with the program).

Scanning for the marker can be avoided, if we do the following:

   /* inside the program, at file scope */

   struct {
     char marker[16] = "dW5pcXVlbWFyawo="
     uint32 offset;
   } binary_stuff;
Then your tiny utility programs opens the executable and looks for the sequence "dW5pcXVlbWFyawo=" inside it. Having found it, it puts the offset of the data into the 32 bit offset field which follows, and writes the data at that offset.

When the program runs, if it sees a nonzero value in binary_stuff.offset, it can open its image and directly proceed to that offset to read the stuff or map it: no searching.

This isn't portable. Executable files aren't binary blobs, they're a structured format. If you append data to an executable there is no guarantee it's actually going to end up mapped in memory for you. You'd have to put it into an actual ELF segment, and at that point you're back to using the linker.

It's a lot more sensible to ask the linker to do this as in OP than to hack together something like you've described.

> If you append data to an executable there is no guarantee it's actually going to end up mapped in memory for you.

I didn't state it clearly enough, but I didn't say anything about it being mapped. It almost certainly isn't mapped. Loaders do not blindly map the whole thing to memory; then you would end up with debug info unconditionally mapped.

Even if it were mapped, the program wouldn't easily find it with the latter approach I described: the offset given in the structure is measured from the start of the file, not from some address in memory.

> ask the linker

It is not portable either. For instance, it doesn't work with Microsoft's linker which is called link.exe.

The advantage of the approach described in the article is that there's less work to do at startup time. For applications that run on end-users' machines, we should do whatever we can to minimze work done at startup, getting it as close as possible to just mapping a big blob into memory and jumping to the code that does the real work of the program.

> For applications that run on end-users' machines, we should do whatever we can to minimze work done at startup.

That is not an absolute given. If this is some tool that is launched a large number of times from frequently running shell scripts, or from a busy web server's CGI, I'd tend to agree.

The principle you are championing is not widely followed anyway. All sorts of applications that people use on their machines on a daily basis have atrocious startup times. Though "everyone's startup time sucks" is no excuse, it does remove the motivation to worry about single digit millisecond differences.

With my technique we don't actually have to do anything at all until the blob is required. If there are ways of executing the program that don't require the data, that initialization need not even take place. With lazy initialization we can move the cost out of the startup time.

Even if you work at minimizing startup time, there is a lot of activity in a modern application, like attaching shared libraries and opening various files. Contribution from this approach is a like rounding error in the least significant digit of startup time compared to the default stuff that happens before your program even gains control.

You should check out how self-extracting archives work. All of those that I am aware of have a first part that is a regular executable and the archive to ve extracted is appended to that. File formats like zip have an advantage here as they keep their index at the end of the file after the compressed data. So there is no need to scan for the start of the archive.

IIRC this is how nwjs works too.

No need to scan the image if the last few bytes has a "footer" describing the length that precedes it.

I still think I like adding it as a section better, though.

The footer could work. If you round up the file to some reasonable power-of-two block size and put the footer right at the end of the last block, that would be highly portable; it would work on systems/devices that only write binary files in multiples of a block size (even if you write a short block at the end).

Sounds like a great vector for malware.

Any unsigned native executable is a vector for malware. Images with stuff appended to them can be signed.

For instance, think of signed Linux kernel images on secure embedded systems. The signature includes the embedded initramfs and device tree blob.

According to https://news.ycombinator.com/item?id=22865842 #embed proposal is in the review process http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2499.pdf . This similar (same?) proposal is also presented and in the review at WG21 to be included in C++ standard.

The author puts a lot of work towards making this proposal a reality (as far as I as a casual twitter/slack observer can see) and I'm looking forward to it.

Unfortunately, the proposal has been stalled by the C++ committee and the author is uninterested in continuing it. See their post here https://thephd.github.io/full-circle-embed

Is Circle something that is actually coming to C++?

You may want to add -z noexecstack or linking the object will give your program an executable stack.

e.g: ld -r -b binary -z noexecstack input.bin -o output.o

This is simply the include_bytes! macro in Rust. There's also the include_str! macro which does a compile-time check for UTF-8 validity.

I'm pretty sure this is trivial in D too although you have to tell the compiler the file exists (you can't do arbitrary FS reads at compile time)

In Rust: `let foo = include_bytes!("path/to/file")`. And people wonder why I think C is difficult.

C does not have the feature (yet), it is not about being difficult or not.

The fact that C doesn't have the feature makes it difficult if I want to perform that task!

Thing is, embedding binary files is not that useful, and there are easy enough ways to do it for those few that really needed it.

The C standard has been very cautious about adding features that would make the standard and the compilers more complex since it is the lingua franca of computing.

Nowadays there is really just a handful of archs in use and their compilers are all backends for GCC/LLVM, so they are relaxing a bit the gating of features.

Seems like a trivial feature to add though.

Like all the other features C has been missing for the last 30 years.

Found Chromium is using a shell script that calls od and sed. https://chromium.googlesource.com/chromiumos/platform/ec/+/m...

There's also Drew Devault's Koio: https://git.sr.ht/~sircmpwn/koio

It works pretty well, though I don't believe it works on Windows.

I am in India and I cannot access the site. The DNS resolves to `` but the server doesn't seem to respond to pings either.

I can access the site through a VPN.

Anyone else facing the same issue? Anyone know what's up with that?

Same, from Canada. My guess is that it will come back up later.

The linked page is currently not available. Archive link: https://web.archive.org/web/20200416183057/https://flak.tedu...

There are also ways to do this on Windows and Mac OS X.

This library provides CMake support to do this: https://github.com/motis-project/res

Do you know how it compares with CMakeRC? See https://news.ycombinator.com/item?id=22888879

Ted has one of the best, and longest running tech blogs I've seen, and it covers such a wide array of topics. If you're reading Ted, thanks.

If you do this with a dynamically loaded library that you write to during execution, can C become a dynamic language?

Any turing language can be as dynamic as you want if you're motivated enough. After all, many interpreted languages are implemented in C...

Many dynamic recompiler/JIT implementations effectively do something like what you describe: they map some executable portion of memory and output native code that is then executed.

You've invented JIT.

Back in the day Borland shipped a tool called binobj which did exactly this.

Using GNU assembly .inbbin is also quite flexible. And you can use .dc.* and .asciz and .byte to add arbitrary data inline.

Generating assembly files is my favorite way of including outside data into my C programs.

Heh, I used this trick when 2 years ago went to Hackaday Belgrade and did a port of a NES emulator on the badge. The ROM that was being played would be compiled using the same basic method. Neat!

Damn I want to read this, but looks like it's down? 4/16/20 ~2 PM PST

Used to do this for embedding stuff into homebrew GBA roms! :)

Is it possible to automate this with cmake?

You may want to checkout the CMRC library (CMake Resource Compiler) [0].

[0]: https://github.com/vector-of-bool/cmrc

Neat. Looks a little more complex than it needs to be though, and I'm surprised it targets C++ (and even relies on exceptions) with no support for C.



I used to have this automated as part of a makefile, so I'd say yes.

You can use add_custom_command() to run a script that generates an assembly file using the incbin directive.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact