This reminds me of the first internship I did. It was at an AI based anamoly detection startup. Their pipeline for the biggest client used to run in just under 3 hours, most of which was apparently spent extracting meaningful information from sensor data from a proprietary binary format using a shared library provided by the client. Ran it under perf and found that most of the time was spent calculating a bunch of sines and cosines. Used the LD_PRELOAD trick to log the sine and cosine calls and turns out that we were just calculating the sines and cosines of the same set of angles over and over again literally thousands of times. Wrote a wrapper to cache the values of sines and cosines in a table and return the cached value if it is already calculated and brought the running time of the pipeline to under 15 minutes. Which meant that the data science team could run their experiments and iterate on their model much faster and cheaper. Fun times.
It always amazes me when people miss this stuff first time around. “Dynamic programming” was always one of the common optimisations throughout my CS degree.
AIUI, DP is around memorising the results of static functions for a particular input.
“Dynamic Programming is mainly an optimization over plain recursion. Wherever we see a recursive solution that has repeated calls for same inputs, we can optimize it using Dynamic Programming. The idea is to simply store the results of subproblems, so that we do not have to re-compute them when needed later”
On unixy systems, they put a restriction into Nethack to limit the game's "wizard mode" (the fuck-around mode where you can't die and just create all kinds of stuff out of thin air, etc) to the unix user with the username "wizard": https://nethackwiki.com/wiki/Wizard_mode#Unix
The wiki describes a bunch of ways around this, but they all seemed kinda finicky and annoying, so instead I LD_PRELOADed a shim that made getpwent(3) or whatever it used always return "wizard" as the user name.
Fun fact: proxychains uses LD_PRELOAD [0] to hook the necessary syscalls [1] for setting up a "proxy environment" for the wrapped program, e.g. `connect`, `gethostbyname`, `gethostbyaddr`, etc. (Note this also implies that it could be leaky in some cases when applied to a program that uses alternative syscalls to make an external connection, or a program that is not dynamically linked. I would not recommend depending on proxychains for any sort of opsec, even though it's often recommended as such a tool.)
There are a lot of small utilities which are built upon LD_PRELOAD, for example.
- fakeroot: Gives the running program the impression that it is running as root, often used for example for building debian packages. This will let the build script create a directory tree which it believes is owned by root and then when
that directory tree is packed by tar, tar will also see root as the owner and the .tar-archive will have root as the user/group for the files in the final debian package.
- faketime: Gives the running program the impression that it is running at some specific time. Usefull for testing code during specific events like leap-years, etc.
- eatmydata: Will ignore all fsync() and related system calls which ensures files are written to permanent storage. I have used this once for running a database during a testsuite, and databases are much faster when they do not have to wait for the data to reach permanent storage.
The implementation is completely bonkers. The tool sets some environmental variables, then spawns a child process with LD_PRELOAD set to load a library (libstdbuf.so) which has a some initialization code that runs when the library is loaded, and that based on the environmental variables, calls setvbuf() from inside the child process to override the buffering behavior.
faketime could've been used to circumvent time bombs in software. This could for example allow you eternal trial access. Nowadays stuff just requires networking with internet connection though.
You could also use LD_PRELOAD to get a different MAC address. This would've work with FlexLM.
Though its probably easier with a bit of hexediting or disassembling to modify the binary personally, I get fuzzy feelings of love from the type of software cracking which does not require modified binaries.
> proxychains uses LD_PRELOAD [0] to hook the necessary syscalls [1]
Technically, it uses LD_PRELOAD to hook the necessary libc functions. As, at least on x86-64, a syscall is just a CPU instruction like any other, you can't hook into it through LD_PRELOAD or any other tricks that don't involve the kernel (apart from rewriting the program before you execute it). That's also why it doesn't work on e.g. Go programs, as they don't use libc.
> As, at least on x86-64, a syscall is just a CPU instruction like any other, you can't hook into it through LD_PRELOAD or any other tricks that don't involve the kernel (apart from rewriting the program before you execute it).
On most Unix systems, you can use ptrace to intercept system calls from another process. This is how tools like rr or strace work.
> systrap: The systrap platform relies seccomp’s SECCOMP_RET_TRAP feature in order to intercept system calls. This makes the kernel send SIGSYS to the triggering thread, which hands over control to gVisor to handle the system call. For more details, please see the systrap README file.
> systrap replaced ptrace as the default gVisor platform in mid-2023. If you depend on ptrace, and systrap doesn’t fulfill your needs, please voice your feedback.
> ptrace: The ptrace platform uses PTRACE_SYSEMU to execute user code without allowing it to execute host system calls. This platform can run anywhere that ptrace works (even VMs without nested virtualization), which is ubiquitous.
> Unfortunately, the ptrace platform has high context switch overhead, so system call-heavy applications may pay a performance penalty. For this reason, systrap is almost always the better choice.
I've implemented some sort of "poor man's Docker" using LD_PRELOAD, back then in 2011 when Docker wasn't a thing. It works by overriding getaddrinfo (IIRC) and capturing name lookups of "localhost", which are then answered by an IP address that's taken from an env variable. The intended use is the parallelization of automated testing of a distributed system: by creating lots of loopback devices with individual IPs and assigning those to test processes (via the LD_PRELOAD hack), I could suddenly test as many instances of the software system next to each other as I wanted, on the same machine (the test machine was some beefy dual-socket server with lots of CPU cores and RAM). Each instance (which consists of clients and several processes that provide server services, thus they're by default configured to bind themselves to specific ports on localhost, as it is common for dev and test purposes) would then be able to route its traffic over its own loopback device, and I was spared of having to somehow untangle the server ports of all the different services just in order to be able to parallelize them on a single machine and of the configuration hell that would have come with this. It helped that processes by default inherit the env variables from their parents that spawned them - that made it a lot easier to propagate the preload path and the env variable containing the loopback IP to use. I just had to provide it to the top-most process, basically.
Today, one would use Docker for this exact purpose, putting each test run into its own container (or even multiple containers).
Worth mentioning that statically linked binaries prevent this attack vector. I won't weigh in on dynamic vs static in general but if you're shipping something you don't want fiddled with then maybe dynamic linking isn't what you're looking for.
My favorite nasty hack along these lines was to inject a new implementation of gethostname via LD_PRELOAD as the simplest path to prevent a CI server from surfacing a hostname in a place it shouldn't be.
This is not an attack vector and not something programs should try to protect against. If attackers have control to the point that they can run a program with LD_PRELOAD, they've already won.
There are cases where developers need to protect software from a local user with root access, like DRM related software, games that want to defend against pirates, etc.
Those use cases are inherently evil, and the rest of us should go out of our way to make them impossible, or at least as difficult as possible. A local user with root access should always have full control over everything, regardless of the wishes of any hardware manufacturers or software developers.
Won on that node, agreed. I suppose I meant the 'investigate programs' part of the title and if that aids in garnering info for attacking something it interacts with.
But of course it's all splitting hairs. A sufficiently dedicated / motivated / funded person can investigate even the most hardened static position independent binary. Dynamic linking with LD_PRELOAD is like propping the front door open in comparison.
Sure. We agree. Cat and mouse. Can you help me understand what value you're adding to the discussion though? Low hanging fruit arguments based on semantics might be best suited elsewhere
> He/she is mentioning that an obtained client can always be hacked, no matter what, and the reader of the original comment may not realize that.
That pointless remark misses the whole point. Even though ideally an attack vector would be eliminated, it's already good enough if it becomes unexploitable by the vast majority of potential attackers. That's why there is a whole field called "app hardening" as in a sliding scale instead of "perfect app protection".
It's a true statement; LD_PRELOAD cannot be used with statically linked binaries. You can "fiddle" in other ways, but not by using the LD_PRELOAD attack vector (although personally I wouldn't call it an "attack vector", although in some cases it could be where you can upload a malicious file and control the environment of another program somehow, or something along those lines).
>Which is unambiguously not the case, they merely slow it down by a small margin.
When you are using shared libaries, it is fairly trivial to hook into the library calls and replace them with whatever you want. When you are using static libraries, the linker and optimizer could for example inline the machine code directly in the application code. What tools do you have to do similar tricks with statically compiled binaries?
I think this is a great entrypoint into the static/dynamic argument and I'd love to argue with some people about it. I believe dynamic used to make sense but no longer does in the vast majority of cases. Static binaries have their costs, but are so much easier to reason about.
This is a really silly entrypoint into the static/dynamic argument. Static linking does not protect anything here, only makes it harder for developers.
They each have tradeoffs even only considering security.
Consider a situation in which there is a new vulnerability in openssl. you can treat this as a hypothetical question or just.. remember any of your past experiences of any of the many openssl vulns.
How many binaries on your server use the vulnerable version? If all binaries are dynamically linked you can answer this fairly trivially with a shell script to enumerate binaries, pass them to ldd, and a little grepping.
If all of your binaries are statically linked what do you do? Ideally pull the build info from your build server that shows you every version of everything that went into the binary.. which is data that just doesn't exist for most people
Maybe you scan the binaries to do some kind of signature analysis... but I would not be confident in the results not having false positives and false negatives.
Now let's patch it. How quickly can you recompile every static binary on your server? Can you even easily cut new builds of these existing versions but with a small patch increment or will your dev teams just rush a new release of any changes they're working on?
or with dynamically libraries, you update the library on your server and be done with it
... or so you thought. you didn't check what processed were running with the old library still open in memory and restart them so you're still vulnerable :)
> I think this is a great entrypoint into the static/dynamic argument and I'd love to argue with some people about it.
I don't think it is. You start from an irrational and unsubstantiated belief that ignores any of the basic usecases of shared libraries.
> I believe dynamic used to make sense but no longer does in the vast majority of cases.
It's your personal belief, and one that's unsubstantiated and os based on ignorance.
> Static binaries have their costs, but are so much easier to reason about.
That assertion is completely irrelevant, as it fails to address any of the usecases for shared libraries. Being able to run code, and other dubious claims of simplicity, don't even qualify as questioning the purpose of shared libraries.
There’s also other security considerations. As an operator or builder, do you want to patch a library (say OpenSSL) to keep your system up to date or patch every binary. If changing a dependency requires rebuilding all consumers recursively, the there’s not a huge benefit.
I think in the specific case of security issues, more bugs have been fixed by upgrading dynamic dependencies than introduced. That's just my gut feeling though, and I'd like to see data.
> I think in the specific case of security issues, more bugs have been fixed by upgrading dynamic dependencies than introduced.
That's just your personal assertion, which is entirely baseless and unsubstantiated. It's ok to have beliefs, but instead of pushing them as truths you should at least start by doing some cursory research to see if they are even plausible. And yours isn't.
It's not entirely unsubstantiated, as my experience is that the former is very common. The latter is much harder to observe though, so it's just an impression.
I'm very interested in your assertion that my impression is implausible though. What evidence do you have?
It seems to me that deploying a static binary is for situations where one doesn't have control over the underlying system, or where shipping dependencies hasn't been solved, i.e, you just want to ship one binary.
Only cheap if you're running on huge servers. End user machines and edge compute are more constrained, so one needs to be more polite with resource use there.
A great framework for doing something along those lines is Frida (https://github.com/frida/frida). Works on a bunch of stuff, including Android and iOS. Some global-ish certificate pinning bypasses work through Frida, by patching http libraries to not raise exceptions, accept system certificates, etc and just quietly hum along instead. Certificate unpinning in turn enables network MITM with mitmproxy, which makes it a lot quicker and easier to inspect, block, or modify network traffic.
Funnily enough, I've seen much stronger obfuscation from reverse engineering from my cheap Tuya IoT devices app than from my bank app.
LD_PRELOAD relies on a naive target. A target can be crafted to bypass LD_PRELOAD - it can be as simple as statically linking the target. Also, dynamically linked targets are not guaranteed to be naive. The target can still directly issue syscalls that LD_PRELOAD was intended to interdict (by displacing a higher level function in a library).
ptrace or SECCOMP_RET_TRAP can be used to do syscall interception. But that would be somewhat complex in comparison to the ease of use of LD_PRELOAD.
gVisor[1] is a project which intercepts every syscall with the method above and services the syscall by itself (no passthrough) for the purpose of sandboxing.
You can also use eBPF to audit and meddle with syscalls.
Similarly ldd – which is mentioned in the article – works by setting LD_TRACE_LOADED_OBJECTS, which then gets picked up by Linux's dynamic linker, so any program that doesn't use it can just do whatever instead.
Interesting blog post by Fangrui Song "ELF interposition and -Bsymbolic" [1] - talks about the real world cost of this feature that's imposed by ELF. On one hand it enables incredible extensibility, OTOH codegen is obligated to be pessimistic about resolving symbols.
On Windows, you can start a process suspended, then inject a DLL into it before the entry point even loads. That DLL can basically do anything to override the behavior of the program.
I used LD_PRELOAD for patching RCE vulnerability in PunkBuster[0]. They did patch the exploit, but that didn't involve many of the older games they dropped support for. The AC itself isn't effective or even operational for the most part in those, but it still serves as a reliable method of identifying players.
Even their server libraries are obfuscated, and hooking open() turned out to be just easier than trying to patch the binaries themselves.
We've built a tool using LD_PRELOAD that speeds up SAT and QBF solvers (but the same idea could be used to speed up other programs too). The idea is to fork, then LD_PRELOAD the other program and overwrite its read functions (and equivalents to also capture inlined read functions). The child process loads some solver which will try to read its input from STDIN. The overwritten read is triggered and instead of just reading, our library shim connects to the parent process. The parent process feeds the child with data, the shim converting the data into text, feeding it to the solver as if it would be reading from a file. Now once we are finished with stating the problem and want to query it (i.e. send assumptions, in QBF or SAT solver speech), we issue a fork command, which lets the child process fork again into a second process, while the solver program thinks it is still in the read() call. We then feed the assumptions only to this grandchild, close its STDIN, and return the result to the calling parent process. When there's another assumption, we can issue the fork again and send assumptions to the new instance, never having to process the full problem again
This is nice when the formula is large and the assumptions are small and numerous, which (e.g. for parallelization) was very useful in our research.
The copy on write nature of fork() of course also helps, effectively reducing the required RAM to keep solvers in memory.
The best of all this, it works remarkably well on Linux and is even (mostly) POSIX conforming!
Check out our paper: https://ceur-ws.org/Vol-3201/paper1.pdf
Or just the code: https://github.com/maximaximal/quapi
This feature is used since ages by sudo to implement the noexec option, to prevent dynamically linked executables from executing further programs. At some point I took advantage of this (sudo allows specifying a custom .so file for the noexec function) to implement other restrictions in programs run by sudo (of course, there are always ways to go around them).
Nowadays sudo uses seccomp filtering for its noexec option, which is enforced by the kernel and does not allow the workarounds that userspace-based solutions (including LD_PRELOAD) have.
Another use of LD_PRELOAD is to alter the shared object loading order. This is particularly useful for working around TLS shortage from shared objects [1].
I used LD_PRELOAD to test system call failure code paths in a program (https://boston.conman.org/2022/12/21.1) that would otherwise be difficult to test.
2 decades ago, when X windows support for Indian language display in Unicode was spotty at best, I'd created an LD_PRELOAD hack, libxindic, to do character reordering for mostly correct display (It really didn't do the right thing with Tamil as Tamil ligates the -u and -uu matras to the base characters. It was useful back then, and even worked with Mozilla (but was a bit unstable).
It almost seems like it gives credence to self-inflicted problems by implementing changes like that.
The creativity of people to break things by misusing them is unbound. Fix your bugs, fix non-bugs that seem like "foreseeable misuse". Don't fix "my users are in outer space" non-bugs.
Why the fuck do Azure and Google Collab mess with LD_PRELOAD. Why do Clickhouse crashes when it does? Does it rely on unspecified behavior or are the preload libraries problematic? Is the preloaded library buggy?
It looks like an arms race where stome piece of software forces use of a particular version of library instead of using what's in the system (but not static linking), then someone else use LD_PRELOAD to force back the use of the system library, and then other software ban LD_PRELOAD to counter the counter. I understand that some ugly things are sometimes needed to make software work, but think of the collateral damage.
This is awesome. macOS actually enables the same env var protections by default if your process is opted into the hardened runtime. You can do that by passing —-options=runtime to your codesign invocation.
I get the first one but not the second one. Since you are just redirecting dlopen, aren't you just rewriting it for people that actually work on your codebase which then compiles?
If you link your executable to a library that references a symbol defined in your executable, that symbol will be added to the executable's dynamic symbol table.
You're not supposed to go to such lengths to stop people from shooting themselves in the foot, because when you do, you also stop people from doing clever things.
The same applies. If you don't want it to do clever stuff on your computer, then just configure it not to do so. Don't try to make the software less configurable so that I can't make it do clever stuff on my computer.
Inspecting library calls has been automated in `ltrace`. It works like ptrace, but tracks all dynamic symbol calls instead of all syscalls. Making your own LD_PRELOAD takes more time. Though it is more flexible.
LD_PRELOAD can also be used to turn an executable binary into a library. You have to intercept __libc_start_main(), provide your own custom main(), then call whatever functions from the binary your heart desires. You may need to use raw function addresses taken from some IDA or other Ghidra, as the binary is not required to export symbols.
The first security program I remember when I started playing with Linux around 99 was called libsafe which used ld preload to intercept calls on the fly to prevent buffer overflows.
The macOS equivalent is DYLD_INSERT_LIBRARIES. You can then use Objective C reflection from your library to patch someone else's app by replacing method implementations for example.
IIRC it has been further tightened since the above was written. DYLD_INTERPOSE is now a thing, DYLD_LIBRARY_PATH and DYLD_INSERT_LIBRARIES may be silently ignored and dropped from the env.
Absolutely no relation to the topic whatsoever. Statically linked binaries also prevent this, that’s not the point. And musl is far from being a 1:1 replacement from glibc, by design - many of the options just aren’t supported, e.g only a subset of resolv.conf options (and not single-request-reopen which very typically was used to work around kernel race conditions on dns resolution in k8s envs)