
LD_PRELOAD: The Hero We Need and Deserve - ingve
https://blog.jessfraz.com/post/ld_preload/
======
woodruffw
Awesome post! `LD_PRELOAD` is a powerful tool for program instrumentation.

It's worth noting, though, that using `LD_PRELOAD` to intercept syscalls
doesn't actually intercept the syscalls themselves -- it intercepts the
(g)libc _wrappers_ for those calls. As such, an `LD_PRELOAD`ed function for
`open(3)` may actually end up wrapping `openat(2)`. This can produce annoying-
to-debug situations where one function in the target program calls a wrapped
libc function and another doesn't, leaving us to dig through `strace` for who
used `exit(2)` vs. `exit_group(2)` or `fork(2)` vs. `clone(2)` vs. `vfork(2)`.

Similarly, there are myriad cases where `LD_PRELOAD` won't work: statically
linked binaries aren't affected, and any program that uses `syscall(3)` or the
`asm` compiler intrinsic to make direct syscalls will happily do so without
any indication at the loader level. If these are cases that matter to you (and
they might not be!), check out this recent blog post I did on intercepting
_all_ system calls from within a kernel module[1].

[1]: [https://blog.trailofbits.com/2019/01/17/how-to-write-a-
rootk...](https://blog.trailofbits.com/2019/01/17/how-to-write-a-rootkit-
without-really-trying/)

~~~
roblabla
There's another way to intercept syscalls without going as far as a kernel
module, using debugging API ptrace. There's a pretty neat article about how to
implement custom syscalls using ptrace:
[https://nullprogram.com/blog/2018/06/23/](https://nullprogram.com/blog/2018/06/23/)

~~~
sl1ck731
I've been looking for a while for a way to capture all file opens and network
ops to profile unknown production workloads similar to proc. explorer on
Windows, which I believe is implemented using ETW. Unfortunately strace seems
to be out of the question purely because of the performance impact. Is the
performance impact due to strace or ptrace itself?

~~~
woodruffw
It's ptrace itself: every traced syscall requires at least one (but usually
3-4) ptrace(2) calls, plus scattered wait(2)/waitpid(2) calls depending on the
operation.

If you want to capture events like file opens and network traffic, I'd take a
look at eBPF or the Linux Audit Framework.

~~~
alexgartrell
I recommend bpftrace as an entry point to working with bpf

[https://github.com/iovisor/bpftrace](https://github.com/iovisor/bpftrace)

~~~
sl1ck731
This is really cool. Unfortunately the 4.x kernel requirement wouldn't work
for the majority of my work since RHEL is still on 3 :|

~~~
moonbug
If you have RHEL 7.6 or later, you have bpf

------
philpem
LD_PRELOAD is a fantastic tool.

At a previous job, we wanted binary reproducibility - that is to say, building
the same source code again should result in the same binary. The problem was,
a lot of programs embed the build or configuration date, and filesystems (e.g.
squashfs) have timestamps too.

Rather than patch a million different packages and create problems, we put
together an LD_PRELOAD which overrode the result of time(). Eventually we
faked the build user and host too.

End result: near perfect reproducibility with no source changes.

I've also used it for reasons similar to the GM Onstar example in the article
-- adding an "interposer" library to log what's going on.

I've pulled similar stunts with pydbg on a Windows XP virtual machine --
sniffing the traffic between applications and driver DLLs (even going as far
as sticking a logger on the ASPI DLLs). That and the manufacturer's debug info
got me enough information to write a new Linux driver for a long-unsupported
SCSI device which only ever had Win9x/XP drivers.

~~~
jacobush
Thank you! I may yet figure out the protocol of my APS film scanner.

~~~
philpem
Well if I could figure out the protocol of the Polaroid Digital Palette
(specifically the HR-6000 but the ProPalette and CI-5000S use the same SCSI
protocol)...

Look for any debug data you can turn on in the driver and correlate that
against whatever you see going to the scanner. Try to save timestamps if you
can, then merge the two logs.

I was a little surprised that while Polaroid had stripped the DLL symbols,
they'd left a "PrintInternalState()" debug function which completely gave away
the majority of the DP_STATE structure fields.

After that, I reverse-engineered and reimplemented the DLL (it's a small DLL),
swapped the ASPI side for Linux and wrote a tool that loaded a PNG file and
spat the pixels at the reimplemented library.

And then someone sent me a copy of the Palette Developer's Kit...

(Incidentally I'd really love to get hold of a copy of the "GENTEST"
calibration tool, which was apparently included on the Service disk and the
ID-4000 ID Card System disks)

~~~
jacobush
Wow, do you use these for anything?

I shoot 135 film and some medium format, I have tried Super 8 and would love
to start shooting 16mm film - but having a film recorder and actually use it
something?!

:-D What can you do, what would you do?

If I was filthy rich I'd project 35mm movies in my living room. :)

------
bvinc
I'll share my story. I used to work at a popular Linux website hosting control
panel company. Back in the early 2000's "frontpage extensions" were a thing
that people used to upload their websites.

Unfortunately, frontpage extensions required files to exist in people Linux
home directories, and people would often mess them up or delete them. People
would need their frontpage extension files "reset" to fix the problem.
Fortunately, Microsoft provided a Linux binary to reset a users frontpage
extension files.

Unfortunately, it required root access to run. Also unfortunately, I
discovered that a user could set up symlinks in their home directory to trick
the binary into overwriting files like /etc/passwd.

We ended up actually releasing a code change that would overwrite getuid with
LD_PRELOAD so that the Microsoft binary would think it was running as root,
just to prevent it from being a security hazard.

~~~
mixmastamyk
So, it didn’t need root, but insisted on it? A MS binary no less.

~~~
Twirrim
It was very much in keeping of the Microsoft of the era. Not out of
maliciousness. Just a general lack of interest or knowledge of any non-Windows
platform, but a recognition that if Frontpage was going to be as dominant as
they wanted, they at least needed to vaguely support it.

Think the worst case of "Well it works on my machine"

------
segfaultbuserr
There's a well-known libfaketime library, that can forge the current system
time.

[https://github.com/wolfcw/libfaketime](https://github.com/wolfcw/libfaketime)

Here's my friend's LD_PRELOAD hack, it pushes the idea further: hooking
gettimeofday() to make a program to think that the time goes faster or slower.
Useful for testing.

[https://github.com/m13253/clockslow](https://github.com/m13253/clockslow)

~~~
roghummal
>Useful for testing.

And as a speed hack for Quake 2!

~~~
kreetx
Yeah, that hack was a great insight!!

------
Slartie
I've implemented some sort of "poor man's Docker" using LD_PRELOAD, back then
in 2011 when Docker wasn't a thing. It works by overriding getaddrinfo (IIRC)
and capturing name lookups of "localhost", which are then answered by an IP
address that's taken from an env variable. The intended use is the
parallelization of automated testing of a distributed system: by creating lots
of loopback devices with individual IPs and assigning those to test processes
(via the LD_PRELOAD hack), I could suddenly test as many instances of the
software system next to each other as I wanted, on the same machine (the test
machine is some beefy dual-socket server with lots of CPU cores and RAM). Each
instance (which consists of clients and several processes that provide server
services, thus they're by default configured to bind themselves to specific
ports on localhost, as it is common for dev and test purposes) would then be
able to route its traffic over its own loopback device, and I was spared of
having to somehow untangle the server ports of all the different services just
in order to be able to parallelize them on a single machine and of the
configuration hell that would have come with this. It helped that processes by
default inherit the env variables from their parents that spawned them - that
made it a lot easier to propagate the preload path and the env variable
containing the loopback IP to use. I just had to provide it to the top-most
process, basically.

Today, one would use Docker for this exact purpose, putting each test run into
its own container (or even multiple containers). But since the LD_PRELOAD hack
worked so well, the project in which I implemented the above is still using it
(although they're eyeing a switch to Docker, in part because it also makes it
easier to separate non-IP-related resources such as files on the filesystem,
but mostly because knowledge about Docker is more widespread than about such
ancient tech as LD_PRELOAD and how to hack into name resolution of the OS).

------
matthewaveryusa
Here's my ldpreload hack: rerouting /dev/rand to dev/urand -- because I
disagree with gpg's fears on entropy. Now it's as fast as generating a private
key with ssh-keygen or openssl:

[https://github.com/matthewaveryusa/dev_random_fix](https://github.com/matthewaveryusa/dev_random_fix)

~~~
nemonemo
I am curious. How do you know if this is secure or not? Is there any
publication or article available for this slightly time-saving but potentially
dangerous choice?

~~~
segfaultbuserr
1\. The official man page.

The /dev/random interface is considered a legacy interface, and /dev/urandom
is preferred and sufficient in all use cases, with the exception of
applications which require randomness during early boot time; for these
applications, getrandom(2) must be used instead, because it will block until
the entropy pool is initialized.

2\. [https://www.2uo.de/myths-about-urandom/](https://www.2uo.de/myths-about-
urandom/)

~~~
tomjakubowski
Not that I disagree with you, but which are the official man pages for
/dev/urandom? It's my recollection that the advice therein varies from OS to
OS.

~~~
segfaultbuserr
This page is part of release 4.16 of the Linux man-pages project. A
description of the project, information about reporting bugs, and the latest
version of this page, can be found at [https://www.kernel.org/doc/man-
pages/](https://www.kernel.org/doc/man-pages/).

And only Linux has /dev/urandom.

~~~
floatboth
BSDs (incl. macOS) have /dev/urandom, but it's the same thing as /dev/random.
Both don't ever block after they've been filled initially at boot time.

------
Twirrim
One useful tool for the toolbox, to be used _very_ carefully, after thinking
about the consequences:

libeatmydata:
[https://github.com/stewartsmith/libeatmydata](https://github.com/stewartsmith/libeatmydata)

It disables fsync, o_sync etc, making them no-ops, essentially making the
programs writes unsafe. Very dangerous. But very useful when you're trying to
bulk load data in to a MySQL database, say as preparation of a new slave
(followed by manual sync commands, and very careful checksumming of tables
before trusting what happened)

~~~
nicoburns
Useful for running tests against a throwaway MySQL database too!

------
cryptonector
I've been using LD_PRELOAD for fun and profit for a long, long time. Its
simplicity is due to the simplicity of the C ABI. Its power is due to dynamic
linking.

C is one programming language. C w/ ELF semantics and powerful link-editors
and run-time linker-loaders is a rather different and much more powerful
language.

I won't be sad to see Rust replace C, except for this: LD_PRELOAD is a
fantastic code-injection tool for C that is so dependent on the C ABI being
simple that I'm afraid we'll lose it completely.

~~~
ec109685
Doesn’t rust use the C ABI under the covers?

~~~
tomjakubowski
You can easily write and call functions that abide the C ABI in Rust, but the
set of types permitted in those signatures is much smaller (only
#[repr(C)]-compatible types) than in ordinary Rust functions. The Rust ABI is
more complicated and won't be stabilized anytime soon.

------
pferde
Holy crap, was that article annoying to read! The author should really cut
back on meme image macros.

~~~
larrywright
This is Jess’ personality - check out her Twitter account. I don’t mind it,
but I’ve followed her for a while so I’m used to it. Honestly I find it to be
a refreshing break from typical the typically stiff writing I see. She’s smart
and doesn’t need to hide behind stodgy writing in order to make herself seem
smarter.

~~~
pferde
I don't really mind it on twitter, as twitter is anything but serious, and you
can't really have any coherent text there.

But such elements in a regular article simply harm its coherence and
readability for anyone that does not spend much of their time in (rather noisy
and immature, IMHO) communities which feature "meme image macros" heavily.

(Bonus negative points if some of the images are animated. That makes me think
that the author actively hates the readers.)

------
d99kris
I'll also join in and share my projects using LD_PRELOAD. These also work on
macOS through its equivalent DYLD_INSERT_LIBRARIES.

[https://github.com/d99kris/stackusage](https://github.com/d99kris/stackusage)
measures thread stack usage by intercepting calls to pthread_create and
filling the thread stack with a dummy data pattern. It also registers a
callback routine to be called upon thread termination.

[https://github.com/d99kris/heapusage](https://github.com/d99kris/heapusage)
intercepts calls to malloc/free/etc detecting heap memory leaks and providing
simple stats on heap usage.

[https://github.com/d99kris/cpuusage](https://github.com/d99kris/cpuusage) can
intercept calls to POSIX functions (incl. syscall wrappers) and provide
profiling details on the time spent in each call.

------
sherincall
I recently gave a small talk about this and listed the applications I had for
it in the past few years:

\- Test low memory environment

\- Add memory tracking

\- Ignore double frees (fix broken programs)

\- Cache allocations / lookaside lists

\- Trace all file operations

\- Seamlessly open compressed files with fopen()

\- Speed up time() as program sees it

\- Offset time() to bypass evaluation periods

\- Alternative PRNG

\- Intercept/reroute network sockets

\- Trace various API calls (useful when debugging graphics APIs)

\- Force parameters to some API calls

\- Set a custom resolution not supported by program

\- Switch between HW and SW cursor rendering

\- Framelimiting and FPS reporting

\- Replace a library with a different one through a compat layer

\- Frame buffer postprocessing (e.g. reshade.me)

\- Overlays (e.g. steam)

E: format

------
int_19h
For a Windows equivalent:

[https://github.com/Microsoft/Detours/wiki](https://github.com/Microsoft/Detours/wiki)

It's a bit more unwieldy to use, because it doesn't just replace all matching
symbols (it's not how symbol lookup works for DLLs in Win32) - the injected
DLL has to be written specifically with Detours in mind, and has to explicitly
override what it needs to override. But in the end, you can do all the same
stuff with it.

~~~
andrewf
_the injected DLL has to be written specifically with Detours in mind_

Does it? I've _almost_ (ie: haven't :P) used Detours but
[https://github.com/Microsoft/Detours/wiki/OverviewIntercepti...](https://github.com/Microsoft/Detours/wiki/OverviewInterception)
reads like it can rewrite standard function prologues.

~~~
int_19h
I didn't phrase that unambiguously -"injected DLL" in this case means "the DLL
with new code that is injected", not "the DLL that the code is being injected
into". With LD_PRELOAD, all you need to override a symbol is an .so that
exports one with the same name. With Detours, you need to write additional
code that actually registers the override as replacing such-and-such function
from such-and-such DLL. But yes, the code you're overriding doesn't need to
know about any of that.

~~~
andrewf
Ah, I get it. Thanks.

------
kingosticks
Librespot uses LD_PRELOAD to find and patch the encryption/decryption
functions used in Spotify's client so the protocol can be examined in
wireshark (and ultimately reverse engineered). I am not the original author,
he wrote a MacOS version using DYLD_INSERT_LIBRARIES to achieve something
similar.

[https://github.com/librespot-org/spotify-
analyze/blob/master...](https://github.com/librespot-org/spotify-
analyze/blob/master/dump/dump.c)

------
aboutruby
In Ruby world a lot of people use LD_PRELOAD to change the default malloc to
jemalloc (or tcmalloc):
[https://github.com/jemalloc/jemalloc](https://github.com/jemalloc/jemalloc)

------
saagarjha
Another interesting preloaded library is stderred, which turns output to
standard error red:
[https://github.com/sickill/stderred](https://github.com/sickill/stderred)

------
disqard
I once used LD_PRELOAD to utilize an OpenGL "shim" driver (for an automated
test suite). The driver itself was generated automatically from the gl.h
header file.

------
tom_mellior
I like the use of LD_PRELOAD in this paper: Long et al., Automatic Runtime
Error Repair and Containment via Recovery Shepherding, PLDI 2014,
[http://people.csail.mit.edu/rinard/paper/pldi14.pdf](http://people.csail.mit.edu/rinard/paper/pldi14.pdf)

The authors have a small library that sets up some signal handlers for things
like divide by zero and segmentation faults. They LD_PRELOAD this library when
starting a buggy binary (they test things like Chromium and the GIMP), and
when the program tries to divide by zero or read from a null pointer, their
signal handlers step in and pretend that the operation resulted in a value of
0. The program can then carry on without crashing and usually does someting
meaningful. Tadaa, automatic runtime error repair!

------
gnufx
If everyone is giving examples, of LD_PRELOAD — it has serious production use
at scale in HPC, particularly for profiling and tracing. Runtimes such as MPI
provide a layer designed for instrumentation to be interposed, typically with
LD_PRELOAD (e.g. the standardized PMPI layer for MPI). Another example is the
entirely userspace parallel filesystem that OrangeFS (né PVFS2) provides via
the "userint" layer interposing on Unix i/o routines. That sort of facility is
a major reason for using dynamic linking, despite the overheads of dynamically
loading libraries for parallel applications at scale. I'm not sure if a
solution could be hooked in with LD_PRELOAD, but Spindle actually uses
LD_AUDIT:
[https://computation.llnl.gov/projects/spindle](https://computation.llnl.gov/projects/spindle)

------
floatboth
My favorite:
[https://github.com/musec/libpreopen](https://github.com/musec/libpreopen) is
a library for adapting existing applications that open() and whatnot from all
over everywhere to the super strict capability based Capsicum sandbox on
FreeBSD. I'm working on
[https://github.com/myfreeweb/capsicumizer](https://github.com/myfreeweb/capsicumizer)
which is a little wrapper for launching apps with preloaded access to a list
of directories from an AppArmor-like "profile".

------
raincom
LD_PRELOAD is extremely helpful in troubleshooting libraries. Around 2007,
qsort on RHEL was slower than SUSE. I raised a case with Redhat along with a
test case; but Redhat was not helpful, as it was not reproducible.

So, I copied glibc.so from a SUSE machine to that RHEL machine and ran the
test case with LD_PRELOAD, compared with the RHEL glibc. I showed these
results to Redhat. Eventually, a patch was applied to glibc on their side.

------
badrabbit
I personally just hate LD_PRELOAD because it's very difficult to turn it off
and keep it off. I am glad others find uses for it and that's great,but I hate
the privilege escalation attack surface it opens up,I get that it has uses,but
there needs to be a simple way to disable it for hardened systems.

------
djhworld
I feel like I've learned something new from this, I'd never heard of this
before.

Would this work with Go or Rust binaries?

~~~
cyphar
LD_PRELOAD only works for binaries that are dynamically linked (LD_PRELOAD is
actually handled by the link loader not the kernel[1]), and you can only use
it to overwrite dynamic symbols IIRC.

It definitely doesn't work with Go, and Rust _might_ work but I'm not sure
they use the glibc syscall wrappers.

[1]: [http://man7.org/linux/man-
pages/man8/ld.so.8.html](http://man7.org/linux/man-pages/man8/ld.so.8.html)

~~~
steveklabnik
Rust uses glibc by default; you can use MUSL but you have to opt in.

~~~
cyphar
I'm aware of that, I guess my point was that Rust probably doesn't use a lot
of glibc (like most C programs would) so the utility of LD_PRELOAD is quite
minimal.

I don't know enough about .rlib to know whether you could overwrite Rust
library functions, but that's a different topic.

~~~
steveklabnik
Rust uses glibc to call into the kernel like anything else. The standard
library is built on top of it.

~~~
cyphar
Right, but does that mean it's only used as a way of getting syscall numbers
(without embedding it like Go does) or is it the case that you could actually
LD_PRELOAD random things like nftw(3) and it would actually affect Rust
programs? I'll be honest, I haven't tried it, but it was my impression that
Rust only used glibc for syscall wrappers?

~~~
steveklabnik
We don’t provide nftw like functionality in std, and so you can’t replace it
as it would have never even been called. But for example, malloc and free are
used, not sbrk directly: [https://github.com/rust-
lang/rust/blob/master/src/libstd/sys...](https://github.com/rust-
lang/rust/blob/master/src/libstd/sys/unix/alloc.rs)

------
AndyKelley
Here's mine:
[https://github.com/andrewrk/malcheck/](https://github.com/andrewrk/malcheck/)

It uses LD_PRELOAD and a custom malloc so that you can find out all the
horrible ways that application developers did not plan to run out of memory.

~~~
loeg
We use a sort of similar trick (not via LD_PRELOAD, though) to inject faults
in M_NOWAIT malloc() calls in the FreeBSD kernel. FreeBSD kernel code tends to
be a bit better than most userspace code I've seen as far as considering OOM
conditions, though it is not perfect.

------
hendry
Isn't LD_PRELOAD's hipness outweighed by the Pandora's box of security issues
it gives rise to?

~~~
FartyMcFarter
What security issues are those? Is there anything you can do with LD_PRELOAD
that you cannot do in other ways such as modifying binaries before executing
them?

~~~
craftyguy
As a regular ol' GNU/Linux user, you cannot modify binaries in /usr/bin (or
/bin), but you can definitely influence their behavior by "LD_PRELOAD=blah
/usr/bin/thing".

~~~
yjftsjthsd-h
If you can do that, you can (generally) do `cp /bin/foo ./ && modify foo &&
./foo`

~~~
peterwwillis
It depends on assumptions in the way a system is hardened. For example, a home
directory mounted noexec. In theory, LD_PRELOAD will not mmap a file in a
noexec area. But if you can find an installed library with functions that
mirror some other application you have, and you can LD_PRELOAD that library
before executing the target application, you might be able to force the
library to call unexpected routines. (That's a stretch, granted)

Another would be possible RCE. Say you can get a server-side app to set
environment variables, like via header injection. Then say you can upload a
file. Can you make that server-side app set LD_PRELOAD to the file, and then
wait for it to execute an arbitrary program?

------
G4Vi
I needed to calculate the potential output file size tar would produce, so
what better way than using tar itself to calculate it. It just required
hooking read, write, and close.

[https://github.com/G4Vi/tarsize](https://github.com/G4Vi/tarsize)

------
jamesu
I’ve found LD_PRELOAD immensely useful patching new code into binaries, saves
the pain of trying to squeeze code into the existing binary.

------
piyush_soni
Wow, that looks like a security nightmare.

~~~
yjftsjthsd-h
Meh? It only works on your own programs.

~~~
jannes
What do you mean? There are many examples of people using LD_PRELOAD to patch
the behaviour of other's binaries.

~~~
slrz
Sure, but not across a security boundary.

Being able to override some library function such that running my text editor
does $BADTHING isn't very interesting from a security perspective: if I have
the capability to do that, I could also just run a program that does $BADTHING
directly. Why bother with additional contortions to involve the text editor?

------
asynch8
I remember back in the day when LD_PRELOAD first became the go-to userland
rootkitting method

------
tinix
audio generation from malloc and read: [https://github.com/gordol/ld_preload-
sounds](https://github.com/gordol/ld_preload-sounds)

also for OSX: DYLD_INSERT_LIBRARIES

~~~
ndesaulniers
One issue on OSX is multi-level namespaces; I had to recompile with a flag to
disable them in order to hook malloc/free, for example.

~~~
Hackbraten
In case you can’t recompile, you should be able to do the same thing with
`DYLD_FORCE_FLAT_NAMESPACE=1`.

~~~
ndesaulniers
Where have you been all my life?

------
nailer
One other useful thing : making a trash can for Linux, by wrapping unlink

------
a_sink
If I were to provide ld preload based security cover (take any binary and
secure it with ld-preload), will that be acceptable to corporates? Or does
that increase the attack surface?

~~~
saagarjha
How do you plan to secure a binary with LD_PRELOAD?

------
vectorEQ
one feature to break all software. superb

------
peterwwillis
Wait until they find out how Go apps work...

~~~
akhilcacharya
...what's this a reference to?

~~~
yjftsjthsd-h
Static linking, which makes this trick not work.

~~~
saagarjha
And the fact that Go will embed raw syscalls in its binaries, which is also
somewhat annoying.

~~~
trasz
Which is unsupported on anything other than Linux and various BSDs.

~~~
floatboth
On FreeBSD, it's supported but kinda sucks. e.g., porting to a new CPU
architecture is hell (I contributed to the FreeBSD/aarch64 go port, someone
else picked it up now…)

The libc is the stable ABI on pretty much any OS that's not called Linux, just
use it.

~~~
yjftsjthsd-h
Okay, but now you're using a C library as the basis for your non-C programming
language. I understand why it is that way, but that kind of sucks.

~~~
floatboth
Ehh… does it kind of suck? The ABI of libc's syscall wrappers is basically
"here's some ELF symbols to call with some arguments using the operating
system's preferred calling convention". The only really "C" thing about it,
other than the name, is struct layouts of various arguments.

