
Sorry state of dynamic libraries on Linux - nkurz
http://www.macieira.org/blog/2012/01/sorry-state-of-dynamic-libraries-on-linux/
======
quotemstr
For a long time, I thought that the ELF shared library model was the obviously
correct thing, but over the past few years, I've come to realize that Windows
got shared libraries right after all.

On Windows, all inter-module dependencies are (module, symbol) pairs (written
"module!symbol"), not just bare symbol names as on ELF systems. That is,
modA!fun1 calling modB!fun2 can exist in the same process as a modC!fun1
calling modD!fun2. The Windows model doesn't permit global interposition, but
that's a good thing: the lack of interposition support permits the
optimizations Macieira mentions, but "hooking" is still possible through a
variety of mechanisms, which usually involve either overwriting import and
export tables or the replacement of function preambles with jumps to
trampolines.

Regardless of the performance considerations, I think the Windows DLL approach
is the more robust and conceptually lighter one. That the Windows approach is
also faster is simply a beneficial side effect.

~~~
FooBarWidget
While ELF has its problems, I wouldn't call Windows's approach 'right', by a
long shot.

* Writing libraries is a big hassle in Windows because you have to explicitly define all exportable symbols.

* DLLs have all kinds of strange boundary rules. C++ exceptions cannot pass DLL boundaries. Heap memory allocated by one DLL cannot always be freed by another. Etc etc.

~~~
shin_lao
The first problem is actually not a problem but one of the reasons DLLs are
better: private by default reduce linking problems.

The second problem has got nothing to do with DLLs and everything to do with
the fact that there is no C++ ABI.

As for memory allocation, it's not really a good idea to allocate memory in a
library and free it in another, is it?

~~~
npsimons
_As for memory allocation, it's not really a good idea to allocate memory in a
library and free it in another, is it?_

Ah, I guess I'll have to stop using strdup() then.

~~~
shin_lao
strdup(), part of the libc, allocates memory with a function from the libc
(malloc), therefore you are allocating and freeing memory with functions
within the same library.

However, yes, for security reasons strdup should be avoided.

------
lmm
The case for sacrificing flexibility for performance microptimizations gets
weaker by the day, and it's certainly not worth the transitions cost - another
incompatible ABI change would kill linux dead.

I suspect that the compiler generates the indirections itself to handle
linkers that can't; making the linker easily replaceable involves a bit of
overhead now but gains us in the long run (e.g. it allows new linkers like
gold to be developed more quickly).

If you want to microoptimize how the linker loads function addresses on x64
then be my guest. Just don't expect the devs to treat it as high priority

As the article says, visibility modifiers exist and can be used by those who
care about them. I think that will have to be enough, in the interests of
usability.

~~~
api
Not only that, but LD_PRELOAD and the ability to override symbols is a very
valuable debugging tool.

It also has value in other areas. For example, it is possible to make any
binary -- even an entire virtual machine -- send all network traffic through
SOCKS by wrapping it in a socksify library. You can't do that on Windows or
(to my knowledge) OSX.

Part of the strength of Linux is how unbelievably hackable it is. I don't see
the point in sacrificing this flexibility for a tiny microoptimization. And
it's very tiny... that code has no branches and is just a couple of MOVs, so
it's just going to get pipelined. It would only be worth it if it involved a
bunch of extra conditional branching. Removing a few MOVs per function call is
going to do nothing. It might literally do nothing, given that it might be
buried beneath the waves of branch prediction failure overhead, cache
misalignment overhead, etc.

Linux _performance_ is not a problem at the low level. Linux performs very,
very well. Its disk and memory allocation performance is noticeably superior
to any other OS on the same hardware, in my experience.

The problem with Linux is usability and cruft.

~~~
saurik
You can pull that off on either Windows or OS X, it just takes a little more
work. On Mac OS X there is even a supported simplerish feature for it called
"interpose".

------
Tuna-Fish
> That is, it’s a doubly-indirect call. I seriously doubt any processor can do
> proper branch prediction, speculative execution, and prefetching of code
> under those circumstances.

His doubts are entirely unfounded. Branch prediction of indirect calls is not
done at all based on the value they are provided, but on the address of the
call instruction, and the previous targets called from that instruction. A
one-target trampoline will add one cycle of latency to a path, and will be
predicted perfectly every time after the first. The largest cost is that no
useful instructions can be decoded on the cycle that the trampoline is fetched
(which does not hurt you at all if you have excess decode throughput to make
it up on the cycle before or after), and that the second indirect call takes
another entry in the BTB.

------
dllthomas
It is not optimization of a corner case over optimization of the common case;
it is _enabling_ of the corner case over optimization of the common case.
Making the corner case slightly less efficient, that the common case can be
slightly more efficient is obvious. Making the corner case impossible is not
the same thing, and requires a much stronger argument.

------
gioele
Already discussed on HN: <https://news.ycombinator.com/item?id=3472142> .

Every time this gets discussed a new security vulnerability comes up and is
caused by the lack of -fPIE or -fPIC:
<https://news.ycombinator.com/item?id=3473068> .

------
teeeler
> No, I don’t have any measurements

Measure or GTFO.

~~~
batista
How about: watch your language, or get off HN?

No, he doesn't have to measure. He wrote a post in his own blog. You are
neither forced to read it, nor to take action on what it says.

If you are interested in the actual profiling to see the performance penalty,
you can do it yourself. It is enough for him to start the discussion on the
matter, which is far more than you did.

------
npsimons
One thing I find interesting, is that he makes a case for optimizing something
without providing any profiling data. As others have pointed out, he's talking
about doing away with something that offers flexibility, and doing away with
it might lead to security problems, while not offering any good reason why it
is necessary. The least he could do is profile a few representative programs
(say, Firefox and Apache) to show that this is a real hot spot that needs
attention. That's why we have things like sysprof and valgrind/cachegrind.

------
klodolph
I remember people complaining loudly about the transition to Mach-O from PEF,
but I don't remember seeing solid data that the situation was a problem. Is
there data here that supports this?

Some of this just makes sense (resolving function addresses through the GOT,
which has an impact on correctness), but for the performance issues it would
be nice to know how much is at stake here.

------
regehr
The right way to do it:

1\. Present data showing there is a problem. 2\. Propose fixes.

But that's harder than doing it the wrong way:

1\. Screed. 2\. Propose fixes.

------
rlpb
The only problem described in this article appears to be of performance, but I
can see no objective figures on what kind of performance improvement we'd see
if the proposed loss of flexibility were made.

The title of the article is "Sorry state of dynamic libraries on Linux" which
implies that other OSes do it better. But the article does not cover what
other OSes are even doing.

I'd love to see:

1) Objective and compelling figures that show that changing the current system
would result a significant improvement.

2) A discussion about what other OSes are doing and justification of why Linux
is in such a "sorry state" (or an article title that isn't linkbait).

~~~
quotemstr
> The only problem described in this article appears to be of performance

Correctness is a serious problem. ELF symbol interposition is dangerous
because it can be done unintentionally. For the most part, the flexibility
afforded by ELF goes unused except for LD_PRELOAD, and LD_PRELOAD can be
accommodated using a safer and less general mechanism.

Systems with module-specific binding don't have to worry about unintentional
symbol interposition. On these systems, adding a module to a process never
causes another unrelated module to sometimes stop working depending on exact
load order. On ELF systems, this catastrophe happens.

~~~
dllthomas
It would seem that somehow warning about interpositions would be the right
thing. Makes it easier to debug weirdness, warns users if someone
surreptitiously set LD_PRELOAD, but doesn't get in the way when someone wants
to use the functionality.

