
Incorporating and accessing binary data into a C program - signa11
http://smackerelofopinion.blogspot.com/2015/12/incorporating-and-accesses-binary-data.html
======
rwmj
The linker trick has a problem. It doesn't set the non-executable stack bit,
so as a result your whole binary will have an executable stack, and therefore
be insecure against various stack smashing exploits.

Try running 'readelf -S blob.o' and you won't see any '.note.GNU-stack' in the
output.

If you go via a C source file compiled with gcc (or clang), then the compiler
sets the bit properly.

Edit: Also I tried to add a comment to the original posting to warn them, but
blogger just eats my comment.

~~~
cnvogel
If you are right then this can be mitigated by two additional options passed
to objcopy, namely --add-section to create an empty section and --set-section-
flags to adjust the flags of this empty section. I'm just recreating the
section/flags that I see in a normally compiled file.

    
    
        $ make
        cc -Wall -Wextra -ggdb -Os   -c -o testme.o testme.c
        objcopy -I binary -O elf64-x86-64 -B i386:x86-64 \
        	--rename-section .data=.rodata,alloc,load,readonly,data,contents \
        	--add-section ".note.GNU-stack"=/dev/null \
        	--set-section-flags ".note.GNU-stack"=contents,readonly \
        	/etc/passwd passwd.o || (rm -f passwd.o ; exit 1)
        cc   testme.o passwd.o   -o testme
    

([https://github.com/vogelchr/objcopy_to_carray](https://github.com/vogelchr/objcopy_to_carray))

Without --add.../\--set section:

    
    
        $ readelf -lW testme
        ...
        GNU_STACK (...) RWE 0x10
    

With --add.../\--set-section:

    
    
        $ readelf -lW testme
        ...
        GNU_STACK (...) RW  0x10

~~~
rwmj
Looks good. The reason I know a bit about this is I had the same "bin2o" hacky
script that used objcopy. It broke every time someone found a new architecture
or platform (ie. having to choose the correct -O and -B flags is non-trivial
if you want to support every architecture).

The solution (which is not really much better than yours) is a script that
creates some _assembler_ and assembles it:

[https://github.com/libguestfs/supermin/blob/master/src/bin2s...](https://github.com/libguestfs/supermin/blob/master/src/bin2s.pl)

I think it's bad that such a simple thing is so hard to do.

 _Edit:_ In fact looking at the script now I can see we really should be using
.rodata instead of .data. That bug has survived for at least 4 years. Patch
posted:
[https://www.redhat.com/archives/libguestfs/2015-December/msg...](https://www.redhat.com/archives/libguestfs/2015-December/msg00095.html)

------
cnvogel
Very likely larger amounts of binary data linked in your program should better
be "read only data", and hence be put in the ".rodata" section. From the
objcopy manpage:

    
    
        objcopy -I binary -O <output_format> -B <architecture> \
            --rename-section .data=.rodata,alloc,load,readonly,data,contents \
            <input_binary_file> <output_object_file>
    

I had made a short demo of this technique quite a few years ago, but
regretably hadn't included the .rodata thing:
[https://github.com/vogelchr/objcopy_to_carray](https://github.com/vogelchr/objcopy_to_carray),
so if you want to check this out with minimal effort, have fun with it ;-).

    
    
        [optiplex /home/chris/objcopy_to_carray (master)]
        $ make
        cc -Wall -Wextra -ggdb -Os   -c -o testme.o testme.c
        objcopy -I binary -O elf64-x86-64 -B i386:x86-64 \
        	--rename-section .data=.rodata,alloc,load,readonly,data,contents \
        	/etc/passwd passwd.o || (rm -f passwd.o ; exit 1)
        cc   testme.o passwd.o   -o testme
        [optiplex /home/chris/objcopy_to_carray (master)]
        $ ./testme 
        Dumping /etc/passwd, in memory @0x4006a4, size is 1701.
        root:x:0:0:root:/root:/bin/bash
        bin:x:1:1:bin:/bin:/bin/false
        daemon:x:2:2:daemon:/sbin:/bin/false
        mail:x:8:12:mail:/var/spool/mail:/bin/false

~~~
twoodfin
I'll add that by segregating read-only and writeable data, you allow the
virtual memory system to efficiently share a single copy of the read-only data
among all processes loading it.

~~~
cnvogel
You are perfectly right. And there's also a second use-case: In some embedded
applications things run directly from ROM/flash and things put in the .rodata-
section will stay there and not get copied to RAM (which you obviously have to
do for modifyable data).

------
DSMan195276
This isn't extremely important, but you should probably declare those symbols
as straight `char`'s, not `void * `. The symbols themselves aren't pointers,
the symbol itself refers to the first byte in your binary block - The same way
`b` from `char b` refers to a byte on the stack. That's why you have to take
the address of the symbol to get the address of the block. It makes more sense
to declare it as a `char` because then nobody will attempt to use the original
block symbols to access the blob itself.

As it is, you have `_binary_blob_start`, `start`, `_binary_blob_end` and
`end`. All 4 are `void * `'s, but `start` and `end` are the only ones which
actually point to the block! `_binary_blob_start` and `_binary_blob_end` are
actually pointers made-up of the first 4/8 bytes of your binary-data, and thus
aren't actually pointers.

------
comex
On OS X: pass -sectcreate when linking (see man ld) or use segedit.

If you're building native executables rather than cross-compiling for some
embedded target, xxd has the advantage of not needing different commands on
different platforms, although xxd itself may not be commonly found on Windows
- oddly enough, it's part of vim...

Albeit if it's not embedded, you should consider why you're embedding data
into executables in the first place.

------
jclulow
In illumos distributions, we have elfwrap for this:

[http://illumos.org/man/1/elfwrap](http://illumos.org/man/1/elfwrap)

------
wyldfire
> ... alternative way is to use the linker ld as follows:

Good to know. I've always used objcopy to do the same. But this looks saner
because it uses the native arch by default.

------
ksherlock
Back in the days of BeOS and gcc 2.95, I ran out of memory (32M!) trying to
compile xxd-type data. x86 BeOS used ELF and PPC BeOS used PE (and the
MetroWerks toolchain) so using the linker trick probably wouldn't have been an
option (had I known about it at the time!).

------
sigjuice
The problem is that such tricks are not portable. This will not work with ld
on OS X or will need something different. Others have suggested xxd.

    
    
      $(fw_bin_o): $(fw_bin)
            @echo "CONVERT $@"
            $(SILENT)xxd -i $^ | $(CC) -c -x c - -o $@

------
jheriko
if its so big that its unwieldy as a byte array in a source file, does it
really need embedding in the executable?

what about the case when you want n blobs of data instead of one?

good to know, but edge case useful...

~~~
kentonv
> does it really need embedding in the executable?

Yes, because self-contained executables are massively easier to deploy than
ones that have data dependencies, because it's just one file that you can put
anywhere. Plus you avoid having to write any file I/O code (and deal with
potential errors).

Honestly I never understood why this technique is so obscure, rather than
being standard practice for C/C++ devs.

> what about the case when you want n blobs of data instead of one?

Each one turns into a .o file, then you link them all together. There's
nothing limiting you to just one.

~~~
spoiler
Well, the only drawback that comes to mind is that you can't mutate the state
of that blob (well, you _can_ , but you _really really_ shouldn't). Also if
it's obscenely large, it might be better to keep it on disk and load only as
much as you need/can.

~~~
icebraining
_Also if it 's obscenely large, it might be better to keep it on disk and load
only as much as you need/can._

That's what this technique does, since the kernel doesn't fully load the
executable file to memory, it'll mmap it but only load data from disk as it's
requested.

~~~
jheriko
thats only true in some cases. in general it is not true.

~~~
kentonv
When is it not true?

------
guytv
IF anyone is curious, doing this in java is only possible if the resulting
java file is no bigger than 65kb and does not include more than 65k literals /
constants

~~~
jclulow
If you're shipping Java software, you probably already deliver more than one
class file, possibly within a JAR. I think it makes more sense to include the
binary data as a regular file and load it as a resource through the class
loader.

------
jgh
Interesting trick, I've always done it the xxd way. I'll have to keep this in
mind next time I plan on putting a blob in C/C++ code!

------
kbart
Cool, that looks like a very useful C trick, it have never struck me to use a
linker to include binary data. Gotta try it someday.

------
ape4
We need a cross platform resource compiler.

