
Sorry state of dynamic libraries on Linux - d0mine
http://www.macieira.org/blog/2012/01/sorry-state-of-dynamic-libraries-on-linux
======
dkarl
_All this for the possibility of interposition? Yes, it seems so. The impact
is there for this little-known and little-used feature. Instead of optimising
for the common-case scenario where the symbols are not overridden, the ABI
optimises for the corner case._

Using LD_PRELOAD isn't so rare and strange that you can propose throwing it
out without quantifying its performance cost. I can't recall offhand why I
needed it (I would guess Valgrind or Massif) but I've used it several times as
a developer. What exactly is the payoff for giving it up? It can't be bigger
than the current performance difference between statically and dynamically
linked executables, can it?

~~~
gioele
> All this for the possibility of interposition? Yes, it seems so. The impact
> is there for this little-known and little-used feature. Instead of
> optimising for the common-case scenario where the symbols are not
> overridden, the ABI optimises for the corner case.

It also looks like PIE+PIC is required if you want a secure system with ASLR:
<[http://blog.flameeyes.eu/2009/11/02/the-pie-is-not-
exactly-a...](http://blog.flameeyes.eu/2009/11/02/the-pie-is-not-exactly-a-
lie>).

Flameeyes (Gentoo dev) has sent patches to all the main library developer that
make sure only the necessary number of symbols is exposed and as much data as
possible is marked as read-only. I think this effort is more valuable than
proposing a very unlikely ABI change.

~~~
klodolph
Of course, a bunch of this work is sabotaged by stupid default `LDFLAGS` you
get if you use `pkg-config` and some package (like `gmodule`) throws random
stuff in there like `-Wl,--export-dynamic` which is just totally unnecessary
for at least 99% of executables which only ever used `gmodule` indirectly in
the first place...

------
brigade
I started writing a comment before the site went down (hah, I knew it would be
WordPress) and now I'll probably forget to post it there so I'll just post it
here.

Is the address of externalFunction that is stored in the GOT really resolved
to the address of the stub and not the final real address of the function?
Because it has to be for data (or code simply wouldn't work), and I don't see
why function symbols would be any different.

Also C and C++ say that a function has the same address in all translation
units, so comparing the address ought to match regardless of PIC or shared
libraries or symbol overriding, and if they don't it's probably a bug in the
compiler/linker (or you're using a nonstandard option that breaks this
guarantee, e.g. symbol hiding.) And actually this should require that the
address in the GOT is the final address of the function.

By the way, any CPU that has static destination branch prediction will predict
as well for a double indirect call as a single indirect call (assuming of
course the addresses don't change.) The cost is taking up two entries in the
branch prediction tables, another L1I cacheline for the stub, and a hiccup in
instruction decoding which may or may not have a real effect depending on the
code before and after.

> If there’s a reason for getting the address indirectly like this, I have yet
> to find it.

It should be because of PIC, and the fact that PC-relative addressing on
x86_64 has only ±2GB displacement, so if your final binary is over 2GB the
linker could fail to put the symbol within range of the offset and fail.
Whereas for calls the linker can just insert a stub if this happens and
noone's the wiser. Disabling PIC results in "movq $externalFunction,
externalVariable(%rip)" for me.

But -mcmodel=small is the default, which should contradict this explanation...

EDIT: so I just tried a test and it appears the GOT on Linux really does
contain the address of the stub. _what the fuck_

On OS X it contains the real address.

~~~
marshray
_C and C++ say that a function has the same address in all translation units_

Until perhaps very recently, the ISO C and C++ standards didn't actually
support dynamic linking. (They didn't officially support multithread
concurrency either but flexibility is one of those languages' strengths).

 _if your final binary is over 2GB the linker could fail to put the symbol
within range of the offset and fail_

I have had to code around this limitation too but it didn't turn out to be
that hard in practice.

How about we optimize for the case where the final binary is 2GB or smaller?
:-)

~~~
brigade
> Until perhaps very recently, the ISO C and C++ standards didn't actually
> support dynamic linking. (They didn't officially support multithread
> concurrency either but flexibility is one of those languages' strengths).

How so? They certainly didn't mention it but they shouldn't have to -
describing the final linked behaviour is enough and means that whether it was
statically or dynamically linked doesn't matter if it produces the same run-
time behaviour. Which resulted in a huge mess in the linker for C++.

> How about we optimize for the case where the final binary is 2GB or smaller?
> :-)

I agree, but compilers should be standards-compliant by default and any such
optimizations should be under non-default flags (e.g. -fvisibility-inlines-
hidden). But -mcmodel=small is already the default...

~~~
marshray
I'd seen references to possible standards issues with dynamic linking, but
hadn't really thought about it until you pointed it out: taking the address of
a function or data object with linkage is no longer naturally returns the same
address in different translation units.

Believe it or not, in C++ a pointer to an object or function is valid as a
non-type template parameter. Heck, I bet you can even partially specialize on
it.

------
AceJohnny2
Website down, looks like reddit pumelled his server before HN did.

Google cache:
[http://webcache.googleusercontent.com/search?q=cache:http://...](http://webcache.googleusercontent.com/search?q=cache:http://www.macieira.org/blog/2012/01/sorry-
state-of-dynamic-libraries-on-linux/&strip=1)

G+ discussion with the author:
[https://plus.google.com/108138837678270193032/posts/No8T7VLo...](https://plus.google.com/108138837678270193032/posts/No8T7VLoF33)

------
psykotic
While we're complaining about the sorry state of Linux, something it's
desperately lacking (for those of us who ship binary-only libraries and
executables) is an equivalent of Windows's PDBs or OS X's dSYMs for post-
mortem debugging without bloating the shipping binaries.

~~~
ice799
hi. linux does have this.

you just strip the debug symbols out (and put them somewhere safe). then write
a .gnu_debuglink section to the stripped ELF binary with a CRC that matches
the stripped symbols.

once something bad happens: you just take the core dump, the symbols you have
tucked away, and you are able to debug just fine.

~~~
psykotic
Thanks to you and others for jumping in and pointing this out! There's nothing
better than being corrected when it means learning something new and solving a
long-standing problem.

------
wavetossed
For an interesting use of RPATH headers to make portable Linux binaries (and
shared libraries) have a look at this script that I used to build Python 2.7.2
and a whole pile of 3rd party libraries
<https://github.com/wavetossed/pybuild>

I think that things like RPATH and LD_PRELOAD are exactly what shared
libraries should be doing. The reason for shared libraries in the modern age,
is increased flexibility.

------
bch
I _think_ it was here (on HN) that I read a comment that suggested shared libs
in Unix originated at Sun Microsystems, originating due to some politics in
their work w/ the X Window System (iirc). I've searched for that story since,
but not found it -- I'm guessing there may be somebody reading this story who
may know what I'm talking about. Retell the story?

