
How C array sizes become part of the binary interface of a library - Sindisil
https://developers.redhat.com/blog/2019/05/06/how-c-array-sizes-become-part-of-the-binary-interface-of-a-library/
======
simias
It's interesting, I've been coding in C for 15 years and I wasn't aware of
this caveat. I'm surprised I never encountered this issue in the wild but then
again I can't really imagine a situation where I'd share a raw extern array
across ABI boundaries without any abstractions on top of it.

Still, it's noteworthy that the dynamic linker will attempt to fully copy the
array in such a situation, that seems rather heavy handed even if I understand
the justification. Maybe now that LTO is more prevalent the linker could
figure out that the symbol is not part of the executable and insert the
indirection through the GOT instead?

~~~
foldr
>Maybe now that LTO is more prevalent the linker could figure out that the
symbol is not part of the executable and insert the indirection through the
GOT instead?

I'm pretty sure this is already possible. I think the copy relocations are a
kind of optimization. If a shared lib contains mutable variables then at least
some of the library has to be copied every time a process modifies that
variable. Copy relocations help to avoid this.

------
avar
There's obviously many ways to avoid this situation, but it seems remiss not
to mention in the "How to avoid this situation" section that you can usually
malloc() such an array.

Obviously that's not a drop-in replacement in all cases, but once the article
stats talking about API changes the obvious thing is to have a struct with
(among other things) a pointer to such an array, have an my_init() function
that returns a pointer to the struct which'll include a pointer to a malloc'd
array, and a corresponding custom my_free() function to free the struct and
its array(s).

This is the common way I've seen e.g. Pascal strings implemented in C. E.g. a
struct with size_t alloc_pool, size_t strlen & char *str;

~~~
saagarjha
You don't even need malloc: just expose the address of the first element of
the array (i.e. "erase" the size) instead of presenting the whole backing
array, size and all. The issue here, of course, is that you can no longer use
sizeof.

~~~
klodolph
How would you do that?

The problem is that:

    
    
        extern int my_array[];
    

appears to expose only the address of the first element and not the size, but
in fact the size is exposed through the linker.

~~~
foldr
The idea is that the library would expose a pointer to the first element of
the array and the size of the array.

~~~
klodolph
That's proposed by the article, under the section "how to avoid this
situation".

~~~
foldr
Yes, I was answering your question "How would you do that?".

------
adontz
Looks like PE is superior in this case, since initialization of dynamic
modules is handled in dynamic modules themselves? Or do I miss anything?

[https://docs.microsoft.com/en-
us/windows/desktop/Dlls/dynami...](https://docs.microsoft.com/en-
us/windows/desktop/Dlls/dynamic-link-library-entry-point-function)

~~~
userbinator
Dynamic linking on Unixes (POSIX-ish, BSDs, Linux, you know what I mean...)
has always been a pretty horrible mess (speaking as someone who has mostly
worked in the Windows side of things, starting with DOS and Win16), but
DllMain() is not really the way in which it's better on Windows.

On Windows, symbols (really just addresses) that you want accessible from
other modules need to be explicitly declared "dllexport" in one, and imported
(using the "dllimport" attribute) in others. The loader is part of the OS and
resolves the imports at load time (although you can do it yourself if you
really need/want to, as evidenced by things like packers and plugin modules.)
On the nixes, it seems dynamic linking was more of an afterthought (the
dynamic loader being its own executable _whose path is hardcoded in the
binary_ is one artifact of this) and they tried to reuse what they already had
for static linking but at runtime. I suppose it's more "elegant" in some ways,
but on the other hand having all the complexity of static linking reproduced
at runtime just seems overly convoluted to me.

~~~
saagarjha
> DllMain() is not really the way in which it's better on Windows.

To expand on this, ELF shared libraries have this too: any function pointers
in the .init_array section will get executed when the library is loaded. You
can get one of these in your library by marking a C function with
__attribute__((constructor)) and compiling with compiler that understands this
attribute.

~~~
cryptonector
The .init and .fini sections are run at object load and unload times,
respectively, yes, but DllMain() also ets called for each thread at thread
startup and teardown time. On Unix you can catch thread teardown with pthread
keys, but you can't catch thread startup.

Just an FYI.

------
FrozenVoid
Such uninitialized arrays in Structs are C99 standard.
[https://en.wikipedia.org/wiki/Flexible_array_member](https://en.wikipedia.org/wiki/Flexible_array_member)

~~~
kazinator
You know, a shared library system with special declaration specifiers to
indicate shared symbols (think: Microsoft's__declspec(dllimport)) would be
entirely acceptable. I could live with it. Sure, it's less convenient if
you're compiling the same code both ways or converting.

That would address situations like this. Arrays that live in one object only
wouldn't be exported and could be accessed using the faster instruction
sequence within that object. Arrays exported for sharing would be referred
through the global offset table using the slower machine sequence, without
having to be copied.

------
95014_refugee
Just one more reason why commons are evil and have no place in a well-formed
program.

~~~
saagarjha
If you're talking about global variables, then they're a necessary evil in
most programs.

~~~
nitrogen
For one thing you can't pass parameters into signal handlers, so the only way
to set a flag for your main app is a global variable.

------
winter_blue
This is _part_ of the reason why the Linux kernel doesn’t have a stable API.

Quote from
[https://github.com/torvalds/linux/blob/master/Documentation/...](https://github.com/torvalds/linux/blob/master/Documentation/process/stable-
api-nonsense.rst):

> Depending on the version of the C compiler you use, different kernel data
> structures will contain different alignment of structures, and possibly
> include different functions in different ways (putting functions inline or
> not.) The individual function organization isn't that important, but the
> different data structure padding is very important.

~~~
saagarjha
I don't think this is quite related: this article is about array sizes being
part of the ELF ABI, while what you've linked is a discussion of the Linux
kernel not having a stable _API_.

