Hacker News new | past | comments | ask | show | jobs | submit login
Using LD_PRELOAD to cheat, inject features and investigate programs (rafalcieslak.wordpress.com)
202 points by icyfox on Sept 8, 2023 | hide | past | favorite | 108 comments

This reminds me of the first internship I did. It was at an AI based anamoly detection startup. Their pipeline for the biggest client used to run in just under 3 hours, most of which was apparently spent extracting meaningful information from sensor data from a proprietary binary format using a shared library provided by the client. Ran it under perf and found that most of the time was spent calculating a bunch of sines and cosines. Used the LD_PRELOAD trick to log the sine and cosine calls and turns out that we were just calculating the sines and cosines of the same set of angles over and over again literally thousands of times. Wrote a wrapper to cache the values of sines and cosines in a table and return the cached value if it is already calculated and brought the running time of the pipeline to under 15 minutes. Which meant that the data science team could run their experiments and iterate on their model much faster and cheaper. Fun times.

It always amazes me when people miss this stuff first time around. “Dynamic programming” was always one of the common optimisations throughout my CS degree.

That sounds more like memoization than dynamic programming.

AIUI, DP is around memorising the results of static functions for a particular input.

“Dynamic Programming is mainly an optimization over plain recursion. Wherever we see a recursive solution that has repeated calls for same inputs, we can optimize it using Dynamic Programming. The idea is to simply store the results of subproblems, so that we do not have to re-compute them when needed later”

Wow, that sounds like a fun place to work at!

Almost any workplace is like that if you can, want and allowed to create value.

> Almost any workplace is like that if (...)

Most people aren't that fortunate. There are good reasons why the word "job" is not synonymous with "fun".

I have assumed a software development workplace here.

It was, and great colleagues too! Although this wasn't "assigned" work. Just me being curious/bored.

On unixy systems, they put a restriction into Nethack to limit the game's "wizard mode" (the fuck-around mode where you can't die and just create all kinds of stuff out of thin air, etc) to the unix user with the username "wizard": https://nethackwiki.com/wiki/Wizard_mode#Unix

The wiki describes a bunch of ways around this, but they all seemed kinda finicky and annoying, so instead I LD_PRELOADed a shim that made getpwent(3) or whatever it used always return "wizard" as the user name.

Fun fact: proxychains uses LD_PRELOAD [0] to hook the necessary syscalls [1] for setting up a "proxy environment" for the wrapped program, e.g. `connect`, `gethostbyname`, `gethostbyaddr`, etc. (Note this also implies that it could be leaky in some cases when applied to a program that uses alternative syscalls to make an external connection, or a program that is not dynamically linked. I would not recommend depending on proxychains for any sort of opsec, even though it's often recommended as such a tool.)

[0] https://github.com/haad/proxychains/blob/master/src/proxycha...

[1] https://github.com/haad/proxychains/blob/master/src/libproxy...

There are a lot of small utilities which are built upon LD_PRELOAD, for example.

- fakeroot: Gives the running program the impression that it is running as root, often used for example for building debian packages. This will let the build script create a directory tree which it believes is owned by root and then when that directory tree is packed by tar, tar will also see root as the owner and the .tar-archive will have root as the user/group for the files in the final debian package.

- faketime: Gives the running program the impression that it is running at some specific time. Usefull for testing code during specific events like leap-years, etc.

- eatmydata: Will ignore all fsync() and related system calls which ensures files are written to permanent storage. I have used this once for running a database during a testsuite, and databases are much faster when they do not have to wait for the data to reach permanent storage.

My favorite tool that (ab)uses LD_PRELOAD to do something useful is `stdbuf`: https://linux.die.net/man/1/stdbuf

The implementation is completely bonkers. The tool sets some environmental variables, then spawns a child process with LD_PRELOAD set to load a library (libstdbuf.so) which has a some initialization code that runs when the library is loaded, and that based on the environmental variables, calls setvbuf() from inside the child process to override the buffering behavior.

tsocks: route all network traffic through a socks gateway with elegant per-process controls. (i guess a proxychains ancestor?)

faketime could've been used to circumvent time bombs in software. This could for example allow you eternal trial access. Nowadays stuff just requires networking with internet connection though.

You could also use LD_PRELOAD to get a different MAC address. This would've work with FlexLM.

Though its probably easier with a bit of hexediting or disassembling to modify the binary personally, I get fuzzy feelings of love from the type of software cracking which does not require modified binaries.

libnaw uses it to wrap and authenticate connect() and accept() calls.

frida uses it to wrap and inject... anything.

These programs have all been around for quite a long time. I think libnaw has been around since early 2000s at least.

> proxychains uses LD_PRELOAD [0] to hook the necessary syscalls [1]

Technically, it uses LD_PRELOAD to hook the necessary libc functions. As, at least on x86-64, a syscall is just a CPU instruction like any other, you can't hook into it through LD_PRELOAD or any other tricks that don't involve the kernel (apart from rewriting the program before you execute it). That's also why it doesn't work on e.g. Go programs, as they don't use libc.

> As, at least on x86-64, a syscall is just a CPU instruction like any other, you can't hook into it through LD_PRELOAD or any other tricks that don't involve the kernel (apart from rewriting the program before you execute it).

On most Unix systems, you can use ptrace to intercept system calls from another process. This is how tools like rr or strace work.

Ptrace is one of a number of ways to hook syscalls without LD_PRELOAD.

The gVisor docs list 3 ways: KVM, systrap, ptrace https://gvisor.dev/docs/architecture_guide/platforms/ :

> systrap: The systrap platform relies seccomp’s SECCOMP_RET_TRAP feature in order to intercept system calls. This makes the kernel send SIGSYS to the triggering thread, which hands over control to gVisor to handle the system call. For more details, please see the systrap README file.

> systrap replaced ptrace as the default gVisor platform in mid-2023. If you depend on ptrace, and systrap doesn’t fulfill your needs, please voice your feedback.

> ptrace: The ptrace platform uses PTRACE_SYSEMU to execute user code without allowing it to execute host system calls. This platform can run anywhere that ptrace works (even VMs without nested virtualization), which is ubiquitous.

> Unfortunately, the ptrace platform has high context switch overhead, so system call-heavy applications may pay a performance penalty. For this reason, systrap is almost always the better choice.

The Falco docs list 3 syscall event drivers: Kernel module, Classic eBPF probe, and Modern eBPF probe: https://falco.org/docs/event-sources/kernel/

Dynamic linker > Systems using ELF: https://en.wikipedia.org/wiki/Dynamic_linker#Systems_using_E...

The tup[1] build system also uses ldpreload injection as an aspect of its (additionally(?) FUSE-based) dependency-specification enforcement.

1: https://github.com/gittup/tup/blob/master/src/ldpreload/ldpr...

I've implemented some sort of "poor man's Docker" using LD_PRELOAD, back then in 2011 when Docker wasn't a thing. It works by overriding getaddrinfo (IIRC) and capturing name lookups of "localhost", which are then answered by an IP address that's taken from an env variable. The intended use is the parallelization of automated testing of a distributed system: by creating lots of loopback devices with individual IPs and assigning those to test processes (via the LD_PRELOAD hack), I could suddenly test as many instances of the software system next to each other as I wanted, on the same machine (the test machine was some beefy dual-socket server with lots of CPU cores and RAM). Each instance (which consists of clients and several processes that provide server services, thus they're by default configured to bind themselves to specific ports on localhost, as it is common for dev and test purposes) would then be able to route its traffic over its own loopback device, and I was spared of having to somehow untangle the server ports of all the different services just in order to be able to parallelize them on a single machine and of the configuration hell that would have come with this. It helped that processes by default inherit the env variables from their parents that spawned them - that made it a lot easier to propagate the preload path and the env variable containing the loopback IP to use. I just had to provide it to the top-most process, basically.

Today, one would use Docker for this exact purpose, putting each test run into its own container (or even multiple containers).

Worth mentioning that statically linked binaries prevent this attack vector. I won't weigh in on dynamic vs static in general but if you're shipping something you don't want fiddled with then maybe dynamic linking isn't what you're looking for.

My favorite nasty hack along these lines was to inject a new implementation of gethostname via LD_PRELOAD as the simplest path to prevent a CI server from surfacing a hostname in a place it shouldn't be.

This is not an attack vector and not something programs should try to protect against. If attackers have control to the point that they can run a program with LD_PRELOAD, they've already won.

There are cases where developers need to protect software from a local user with root access, like DRM related software, games that want to defend against pirates, etc.

Those use cases are inherently evil, and the rest of us should go out of our way to make them impossible, or at least as difficult as possible. A local user with root access should always have full control over everything, regardless of the wishes of any hardware manufacturers or software developers.

That's more of a want than a need.

This would not help with that goal.

Won on that node, agreed. I suppose I meant the 'investigate programs' part of the title and if that aids in garnering info for attacking something it interacts with.

But of course it's all splitting hairs. A sufficiently dedicated / motivated / funded person can investigate even the most hardened static position independent binary. Dynamic linking with LD_PRELOAD is like propping the front door open in comparison.

If you're shipping something you don't want fiddled with, don't ship it, because it's an impossible task.

I guess the entire world of online multiplayer games becomes hacker haven.

Well, it is. Most popular online games have a large amount of cheaters.

That's a big selling point of cloud gaming.

Even that can't prevent computer vision with a robot mouse.

Sure. We agree. Cat and mouse. Can you help me understand what value you're adding to the discussion though? Low hanging fruit arguments based on semantics might be best suited elsewhere

He/she is mentioning that an obtained client can always be hacked, no matter what, and the reader of the original comment may not realize that.

> He/she is mentioning that an obtained client can always be hacked, no matter what, and the reader of the original comment may not realize that.

That pointless remark misses the whole point. Even though ideally an attack vector would be eliminated, it's already good enough if it becomes unexploitable by the vast majority of potential attackers. That's why there is a whole field called "app hardening" as in a sliding scale instead of "perfect app protection".

I think the vast majority of HN users would realize that. Is it important for the few that don’t? Perhaps

The original commenter clearly didn't understand it, since they asserted that "statically linked libraries prevent this attack vector".

Which is unambiguously not the case, they merely slow it down by a small margin.

It's a true statement; LD_PRELOAD cannot be used with statically linked binaries. You can "fiddle" in other ways, but not by using the LD_PRELOAD attack vector (although personally I wouldn't call it an "attack vector", although in some cases it could be where you can upload a malicious file and control the environment of another program somehow, or something along those lines).

>Which is unambiguously not the case, they merely slow it down by a small margin.

When you are using shared libaries, it is fairly trivial to hook into the library calls and replace them with whatever you want. When you are using static libraries, the linker and optimizer could for example inline the machine code directly in the application code. What tools do you have to do similar tricks with statically compiled binaries?

messing with executable binaries is undoubtedly harder but not impossible.

LD_PRELOAD is the specific attack vector that is prevented by static linking. They made no claim that static linking prevents all forms of tampering.

I think this is a great entrypoint into the static/dynamic argument and I'd love to argue with some people about it. I believe dynamic used to make sense but no longer does in the vast majority of cases. Static binaries have their costs, but are so much easier to reason about.

This is a really silly entrypoint into the static/dynamic argument. Static linking does not protect anything here, only makes it harder for developers.

They each have tradeoffs even only considering security.

Consider a situation in which there is a new vulnerability in openssl. you can treat this as a hypothetical question or just.. remember any of your past experiences of any of the many openssl vulns.

How many binaries on your server use the vulnerable version? If all binaries are dynamically linked you can answer this fairly trivially with a shell script to enumerate binaries, pass them to ldd, and a little grepping. If all of your binaries are statically linked what do you do? Ideally pull the build info from your build server that shows you every version of everything that went into the binary.. which is data that just doesn't exist for most people

Maybe you scan the binaries to do some kind of signature analysis... but I would not be confident in the results not having false positives and false negatives.

Now let's patch it. How quickly can you recompile every static binary on your server? Can you even easily cut new builds of these existing versions but with a small patch increment or will your dev teams just rush a new release of any changes they're working on?

or with dynamically libraries, you update the library on your server and be done with it

... or so you thought. you didn't check what processed were running with the old library still open in memory and restart them so you're still vulnerable :)

> I think this is a great entrypoint into the static/dynamic argument and I'd love to argue with some people about it.

I don't think it is. You start from an irrational and unsubstantiated belief that ignores any of the basic usecases of shared libraries.

> I believe dynamic used to make sense but no longer does in the vast majority of cases.

It's your personal belief, and one that's unsubstantiated and os based on ignorance.

> Static binaries have their costs, but are so much easier to reason about.

That assertion is completely irrelevant, as it fails to address any of the usecases for shared libraries. Being able to run code, and other dubious claims of simplicity, don't even qualify as questioning the purpose of shared libraries.

Static binaries are easier to reason about. You’ve provided no evidence to the contrary.

If dynamic libraries are compatible to what they did (when you developed the program) then why waste disk and RAM?

Because disk and RAM are cheap and that's a huge "If"

There’s also other security considerations. As an operator or builder, do you want to patch a library (say OpenSSL) to keep your system up to date or patch every binary. If changing a dependency requires rebuilding all consumers recursively, the there’s not a huge benefit.

I think in the specific case of security issues, more bugs have been fixed by upgrading dynamic dependencies than introduced. That's just my gut feeling though, and I'd like to see data.

> I think in the specific case of security issues, more bugs have been fixed by upgrading dynamic dependencies than introduced.

That's just your personal assertion, which is entirely baseless and unsubstantiated. It's ok to have beliefs, but instead of pushing them as truths you should at least start by doing some cursory research to see if they are even plausible. And yours isn't.

It's not entirely unsubstantiated, as my experience is that the former is very common. The latter is much harder to observe though, so it's just an impression.

I'm very interested in your assertion that my impression is implausible though. What evidence do you have?

He qualified it saying it was a hunch… I don’t see where he pushed it as a truth

It seems to me that deploying a static binary is for situations where one doesn't have control over the underlying system, or where shipping dependencies hasn't been solved, i.e, you just want to ship one binary.

Shipping one binary is much easier and I don’t even want to solve the problem of shipping deps as separate artifacts

Only cheap if you're running on huge servers. End user machines and edge compute are more constrained, so one needs to be more polite with resource use there.

to prevent a CI server from surfacing a hostname in a place it shouldn't be

I had to deal with the same problem a few years ago, and used the exact same solution.

A great framework for doing something along those lines is Frida (https://github.com/frida/frida). Works on a bunch of stuff, including Android and iOS. Some global-ish certificate pinning bypasses work through Frida, by patching http libraries to not raise exceptions, accept system certificates, etc and just quietly hum along instead. Certificate unpinning in turn enables network MITM with mitmproxy, which makes it a lot quicker and easier to inspect, block, or modify network traffic.

Funnily enough, I've seen much stronger obfuscation from reverse engineering from my cheap Tuya IoT devices app than from my bank app.

> Funnily enough, I've seen much stronger obfuscation from reverse engineering from my cheap Tuya IoT devices app than from my bank app.

IMHO, if the client-side of an IoT service is obfuscated, I'd take that as a sign that they're trying to hide some really insecure API endpoints.

LD_PRELOAD relies on a naive target. A target can be crafted to bypass LD_PRELOAD - it can be as simple as statically linking the target. Also, dynamically linked targets are not guaranteed to be naive. The target can still directly issue syscalls that LD_PRELOAD was intended to interdict (by displacing a higher level function in a library).

ptrace or SECCOMP_RET_TRAP can be used to do syscall interception. But that would be somewhat complex in comparison to the ease of use of LD_PRELOAD.

gVisor[1] is a project which intercepts every syscall with the method above and services the syscall by itself (no passthrough) for the purpose of sandboxing.

You can also use eBPF to audit and meddle with syscalls.

[1] https://gvisor.dev/blog/2023/04/28/systrap-release/

Similarly ldd – which is mentioned in the article – works by setting LD_TRACE_LOADED_OBJECTS, which then gets picked up by Linux's dynamic linker, so any program that doesn't use it can just do whatever instead.

Interesting blog post by Fangrui Song "ELF interposition and -Bsymbolic" [1] - talks about the real world cost of this feature that's imposed by ELF. On one hand it enables incredible extensibility, OTOH codegen is obligated to be pessimistic about resolving symbols.


On Windows, you can start a process suspended, then inject a DLL into it before the entry point even loads. That DLL can basically do anything to override the behavior of the program.

I used LD_PRELOAD for patching RCE vulnerability in PunkBuster[0]. They did patch the exploit, but that didn't involve many of the older games they dropped support for. The AC itself isn't effective or even operational for the most part in those, but it still serves as a reliable method of identifying players.

Even their server libraries are obfuscated, and hooking open() turned out to be just easier than trying to patch the binaries themselves.

[0] https://medium.com/@prizmant/hacking-punkbuster-e22e6cf2f36e

We've built a tool using LD_PRELOAD that speeds up SAT and QBF solvers (but the same idea could be used to speed up other programs too). The idea is to fork, then LD_PRELOAD the other program and overwrite its read functions (and equivalents to also capture inlined read functions). The child process loads some solver which will try to read its input from STDIN. The overwritten read is triggered and instead of just reading, our library shim connects to the parent process. The parent process feeds the child with data, the shim converting the data into text, feeding it to the solver as if it would be reading from a file. Now once we are finished with stating the problem and want to query it (i.e. send assumptions, in QBF or SAT solver speech), we issue a fork command, which lets the child process fork again into a second process, while the solver program thinks it is still in the read() call. We then feed the assumptions only to this grandchild, close its STDIN, and return the result to the calling parent process. When there's another assumption, we can issue the fork again and send assumptions to the new instance, never having to process the full problem again This is nice when the formula is large and the assumptions are small and numerous, which (e.g. for parallelization) was very useful in our research. The copy on write nature of fork() of course also helps, effectively reducing the required RAM to keep solvers in memory. The best of all this, it works remarkably well on Linux and is even (mostly) POSIX conforming! Check out our paper: https://ceur-ws.org/Vol-3201/paper1.pdf Or just the code: https://github.com/maximaximal/quapi

This feature is used since ages by sudo to implement the noexec option, to prevent dynamically linked executables from executing further programs. At some point I took advantage of this (sudo allows specifying a custom .so file for the noexec function) to implement other restrictions in programs run by sudo (of course, there are always ways to go around them).

Nowadays sudo uses seccomp filtering for its noexec option, which is enforced by the kernel and does not allow the workarounds that userspace-based solutions (including LD_PRELOAD) have.

Didn't know this, thanks for the info!

On a related note, I recently came across this ingenious high-performance technique for system call hooking: https://www.usenix.org/conference/atc23/presentation/yasukat...

Another use of LD_PRELOAD is to alter the shared object loading order. This is particularly useful for working around TLS shortage from shared objects [1].

[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1722181

I used LD_PRELOAD to test system call failure code paths in a program (https://boston.conman.org/2022/12/21.1) that would otherwise be difficult to test.

2 decades ago, when X windows support for Indian language display in Unicode was spotty at best, I'd created an LD_PRELOAD hack, libxindic, to do character reordering for mostly correct display (It really didn't do the right thing with Tamil as Tamil ligates the -u and -uu matras to the base characters. It was useful back then, and even worked with Mozilla (but was a bit unstable).


In ClickHouse, we forbid LD_PRELOAD and every similar variable: https://github.com/ClickHouse/ClickHouse/blob/master/program...

We also forbid dlopen - so even if some third-party library commits such an offense, it will be blocked: https://github.com/ClickHouse/ClickHouse/blob/master/program...

that's not going to block anyone who understands anything about how the dynamic loader works

   /lib64/ld-linux-x86-64.so.2 --preload myevil.so ./your_program
there we go, bypassed

Why can't an attacker just link a new getenv() that pretends that LD_PRELOAD and other variables aren't set?

It is mostly to protect from unusual configurations rather than protecting from an attacker, e.g., https://github.com/ClickHouse/ClickHouse/issues/43933 and https://github.com/ClickHouse/ClickHouse/issues/10505

It almost seems like it gives credence to self-inflicted problems by implementing changes like that.

The creativity of people to break things by misusing them is unbound. Fix your bugs, fix non-bugs that seem like "foreseeable misuse". Don't fix "my users are in outer space" non-bugs.

This is so wrong...

Why the fuck do Azure and Google Collab mess with LD_PRELOAD. Why do Clickhouse crashes when it does? Does it rely on unspecified behavior or are the preload libraries problematic? Is the preloaded library buggy?

It looks like an arms race where stome piece of software forces use of a particular version of library instead of using what's in the system (but not static linking), then someone else use LD_PRELOAD to force back the use of the system library, and then other software ban LD_PRELOAD to counter the counter. I understand that some ugly things are sometimes needed to make software work, but think of the collateral damage.

You could also modify the source code to remove the check, I'm guessing this is just for people that are accidentally setting LD_PRELOAD.


This is awesome. macOS actually enables the same env var protections by default if your process is opted into the hardened runtime. You can do that by passing —-options=runtime to your codesign invocation.

What's the easiest way to use those variables anyway on a binary that's been compiled that way? Does it need SIP to be off?

You could try completely unsigning the binary: https://reverseengineering.stackexchange.com/a/13623

This won't work for everything, and it probably does need SIP to be off (also make a backup!) but it might be a way to get something to work.

Yes, and then patch dyld to enable library insertion again.

I get the first one but not the second one. Since you are just redirecting dlopen, aren't you just rewriting it for people that actually work on your codebase which then compiles?

It doesn't actually block it from 3rd parties.

If you link your executable to a library that references a symbol defined in your executable, that symbol will be added to the executable's dynamic symbol table.

It is the case when a library already uses dlopen in unusual code paths, but we want to make sure that code paths won't work.

The examples are some authentication plugins.

all they have to do to beat your dlopen blocker is link against libdl statically

one extra argument to the linker when they compile their plugin

I think p is just trying to prevent people from being shot in the foot as opposed to preventing malicious users ie hackers.

You're not supposed to go to such lengths to stop people from shooting themselves in the foot, because when you do, you also stop people from doing clever things.

I don't want my authentication library to do clever things.

Then you don't have to make yours do so. But if other sysadmins disagree, they should be able to make theirs do so.

im talking from the point of view as the user of the software.

The same applies. If you don't want it to do clever stuff on your computer, then just configure it not to do so. Don't try to make the software less configurable so that I can't make it do clever stuff on my computer.

Inspecting library calls has been automated in `ltrace`. It works like ptrace, but tracks all dynamic symbol calls instead of all syscalls. Making your own LD_PRELOAD takes more time. Though it is more flexible.

LD_PRELOAD can also be used to turn an executable binary into a library. You have to intercept __libc_start_main(), provide your own custom main(), then call whatever functions from the binary your heart desires. You may need to use raw function addresses taken from some IDA or other Ghidra, as the binary is not required to export symbols.

The first security program I remember when I started playing with Linux around 99 was called libsafe which used ld preload to intercept calls on the fly to prevent buffer overflows.

Windows has "Application Verifier" which can turn out-of-bounds writes that are past the end of memory into access violation exceptions.

Amazing explanation, I've seen this env var in loads of places and I never really understood what it did. Thanks for this!

I used LD_PRELOAD to make things not crash on openpandora, and to make valid EGL context https://gist.github.com/Cloudef/5788729

The macOS equivalent is DYLD_INSERT_LIBRARIES. You can then use Objective C reflection from your library to patch someone else's app by replacing method implementations for example.

Note that darwin prevents dylib injection in certain scenarios.


IIRC it has been further tightened since the above was written. DYLD_INTERPOSE is now a thing, DYLD_LIBRARY_PATH and DYLD_INSERT_LIBRARIES may be silently ignored and dropped from the env.


I suspect the program you’re trying to crack would check that environment variable, refusing to run if it’s set.

This assumes there’s some copy protection on it.

Used to use it all the time for debugging back in the day when I could write Linux apps in C professionally and not just build dumb webpages.

one of my favorite hacks, which started as a joke, is using LD PRELOAD to generate audio from memory allocation and read calls.


this started out as like 10-20 lines of terrible code originally, and a few people sent merge requests to improve it

I'll die on the hill that LD_PRELOAD not being optional is a glibc bug that should at least be fixed by distros.

Use libmusl instead of libc, and LD_PRELOAD becomes no more; enough said.


Absolutely no relation to the topic whatsoever. Statically linked binaries also prevent this, that’s not the point. And musl is far from being a 1:1 replacement from glibc, by design - many of the options just aren’t supported, e.g only a subset of resolv.conf options (and not single-request-reopen which very typically was used to work around kernel race conditions on dns resolution in k8s envs)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
