It's worth noting, though, that using `LD_PRELOAD` to intercept syscalls doesn't actually intercept the syscalls themselves -- it intercepts the (g)libc wrappers for those calls. As such, an `LD_PRELOAD`ed function for `open(3)` may actually end up wrapping `openat(2)`. This can produce annoying-to-debug situations where one function in the target program calls a wrapped libc function and another doesn't, leaving us to dig through `strace` for who used `exit(2)` vs. `exit_group(2)` or `fork(2)` vs. `clone(2)` vs. `vfork(2)`.
Similarly, there are myriad cases where `LD_PRELOAD` won't work: statically linked binaries aren't affected, and any program that uses `syscall(3)` or the `asm` compiler intrinsic to make direct syscalls will happily do so without any indication at the loader level. If these are cases that matter to you (and they might not be!), check out this recent blog post I did on intercepting all system calls from within a kernel module.
It's going to see a lot of use in container runtimes like LXC for faking mounts and kernel module loading (and in tools like remainroot for rootless containers), but it will likely also replace lots of uses of LD_PRELOAD.
It's all about the use case: if being constrained to inferior processes and adding 2-3x overhead per syscall doesn't matter, then `ptrace` is an excellent option. OTOH, if you want to instrument all processes and want to keep instrumentation overhead to a bare minimum, you more or less have to go into the kernel.
If you want to capture events like file opens and network traffic, I'd take a look at eBPF or the Linux Audit Framework.
I didn't actually realize it was an Intel project. I wonder how it stacks up against Pin.
At a previous job, we wanted binary reproducibility - that is to say, building the same source code again should result in the same binary. The problem was, a lot of programs embed the build or configuration date, and filesystems (e.g. squashfs) have timestamps too.
Rather than patch a million different packages and create problems, we put together an LD_PRELOAD which overrode the result of time(). Eventually we faked the build user and host too.
End result: near perfect reproducibility with no source changes.
I've also used it for reasons similar to the GM Onstar example in the article -- adding an "interposer" library to log what's going on.
I've pulled similar stunts with pydbg on a Windows XP virtual machine -- sniffing the traffic between applications and driver DLLs (even going as far as sticking a logger on the ASPI DLLs). That and the manufacturer's debug info got me enough information to write a new Linux driver for a long-unsupported SCSI device which only ever had Win9x/XP drivers.
Look for any debug data you can turn on in the driver and correlate that against whatever you see going to the scanner. Try to save timestamps if you can, then merge the two logs.
I was a little surprised that while Polaroid had stripped the DLL symbols, they'd left a "PrintInternalState()" debug function which completely gave away the majority of the DP_STATE structure fields.
After that, I reverse-engineered and reimplemented the DLL (it's a small DLL), swapped the ASPI side for Linux and wrote a tool that loaded a PNG file and spat the pixels at the reimplemented library.
And then someone sent me a copy of the Palette Developer's Kit...
(Incidentally I'd really love to get hold of a copy of the "GENTEST" calibration tool, which was apparently included on the Service disk and the ID-4000 ID Card System disks)
I shoot 135 film and some medium format, I have tried Super 8 and would love to start shooting 16mm film - but having a film recorder and actually use it something?!
:-D What can you do, what would you do?
If I was filthy rich I'd project 35mm movies in my living room. :)
Unfortunately, frontpage extensions required files to exist in people Linux home directories, and people would often mess them up or delete them. People would need their frontpage extension files "reset" to fix the problem. Fortunately, Microsoft provided a Linux binary to reset a users frontpage extension files.
Unfortunately, it required root access to run. Also unfortunately, I discovered that a user could set up symlinks in their home directory to trick the binary into overwriting files like /etc/passwd.
We ended up actually releasing a code change that would overwrite getuid with LD_PRELOAD so that the Microsoft binary would think it was running as root, just to prevent it from being a security hazard.
Think the worst case of "Well it works on my machine"
Debian have a similar tool called "fakeroot" which is part of their packaging process.
Here's my friend's LD_PRELOAD hack, it pushes the idea further: hooking gettimeofday() to make a program to think that the time goes faster or slower. Useful for testing.
EDIT : Not libfaketime but the LD_PRELOAD recipe
And as a speed hack for Quake 2!
Today, one would use Docker for this exact purpose, putting each test run into its own container (or even multiple containers). But since the LD_PRELOAD hack worked so well, the project in which I implemented the above is still using it (although they're eyeing a switch to Docker, in part because it also makes it easier to separate non-IP-related resources such as files on the filesystem, but mostly because knowledge about Docker is more widespread than about such ancient tech as LD_PRELOAD and how to hack into name resolution of the OS).
It is incorrect. Both /dev/urandom and /dev/random are connected to a CSPRNG. Once a CSPRNG is initialized by SUFFICIENT unpredictable inputs, it's forever unpredictable for (practically) unlimited output (something 2^128). If the CSPRNG algorithm is cryptographically-secure, and the implementation doesn't leak its internal state, it would be safe to use it for almost all cryptographic purposes.
However, the original design in the Linux kernel was paranoid enough, it blocks /dev/random (even if a CSPRNG can output unlimited random bytes) if the kernel thinks the the output has exceeded the estimated uncertainty from all the random events. Most cryptographers believe if a broken CSPRNG is something you need to protect yourself from, you already have a bigger trouble, and it's unnecessary from a cryptographic point-of-view to be paranoid about a properly-initialized CSPRNG. /dev/random found on other BSDs is (almost) equivalent to Linux's /dev/urandom.
However, /dev/urandom has its own issues on Linux. Unlike BSD's implementation, it doesn't block even if the CSPRNG is NOT initialized during early boot. If you automatically generate a key for, e.g. SSH, at this point, you'll have serious troubles - predictable keys, so reading from /dev/random still has a point, although not for 90% of the programs. I think it's a prefect example of being overly-paranoid about unlikely dangers, while overlooking straightforward problems that are likely to occur.
The current recommended practice is to call getrandom() system call (and arc4random()* on BSDs) when it's available, instead of reading from raw /dev/random or /dev/urandom. It blocks, until the CSPRNG is initialized, otherwise it always outputs something.
*and no, it's not RC4-based, but ChaCha20-based on new systems.
This isn't quite true. The BSDs random (and urandom) block until initially seeded, unlike Linux's urandom. Then they don't block. (Like the getrandom/getentropy behavior.)
> The current recommended practice is to call getrandom() system call (and arc4random() on BSDs) when it's available, instead of reading from raw /dev/random or /dev/urandom. It blocks when the CSPRNG is initialized, but otherwise it always outputs something.
+1 (I'd phrase that as "blocks until the CSPRNG is initialized," which for non-embedded systems will always be before userland programs can even run, and for embedded should not take long after system start either).
Not just that, but if you have a threat model where you actually need information theoretic security (e.g. you're conjecturing a computationally unbounded attacker or at least a quantum computer)-- the /dev/random output is _still_ just a CSPRNG and simply rate limiting it doesn't actually make a strong guarantee about the information theoretic randomness of the output. To provide information theoretic security the function design would need to guarantee that at least some known fraction of the entropy going in actually made it to the output. Common CSPRNGs don't do this.
So you could debate if information theoretic security is something someone actually ever actually needs-- but if you do need it, /dev/random doesn't give it to you regardless.
[And as you note, urandom doesn't block when not adequately seeded ... so the decision to make /dev/random block probably actually exposed a lot of parties to exploit and probably doesn't provide strong protection even against fantasy land attacks :(]
This is an interesting point I hadn't thought about before, so thanks for that. I suppose if you're generating a OTP or something like that, there might be some small advantage to using /dev/random, but the probability of it making a difference is pretty remote.
The one thing I haven't been able to figure out is why Linux hasn't "fixed" both /dev/random and /dev/urandom to block until they have sufficient entropy at boot and then never block again. That seems like obviously the optimal behavior.
The Real Solution™ is to make /dev/random and /dev/urandom the same thing, and make them both block until properly seeded. And replace the current ad-hoc CSPRNG with a decent one, e.g. Fortuna. There were patches almost 15 years ago implementing this (https://lwn.net/Articles/103653/), but they were rejected.
There's simply no good reason not to fix Linux's CSPRNG.
Unless all of your cryptography is information-theoretically secure, there is no problem using a PRNG.
If you happen to be using a an information-theoretically secure algorithm than you are theoretically weaker using a limited entropy PRNG; but there is no practical implications of this.
I was writing a dock program over a decade ago, and java programs didn't put the PID on the window, whereas everything else did.
Had to fix it somehow...
The /dev/random interface is considered a legacy interface, and /dev/urandom is preferred and sufficient in all use cases, with the exception of applications which require randomness during early boot time; for these applications, getrandom(2) must be used instead, because it will block until the entropy pool is initialized.
And only Linux has /dev/urandom.
The one and only danger is during the machine's boot process, because while /dev/random and /dev/urandom use the same data:
* on linux /dev/random has a silly and unfounded entropy estimator and will block at arbitrary points (used to be a fad at some point, but cryptographers have sworn off it e.g. Yarrow had an entropy estimator but Fortuna dropped it)
* also on linux, /dev/urandom never blocks at all, which includes a cold start, which can be problematic as that's the one point where the device might not be seeded and return extremely poor data
In fact the second point is the sole difference between getrandom(2) and /dev/urandom.
If you're in a steady state scenario (not at the machine boot where the cold start entropy problem exists) "just use urandom" is the recommendation of pretty much everyone: tptacek, djb, etc…
http://blog.cr.yp.to/20140205-entropy.html (see bottom of page)
AFAIK, there's another important difference: getrandom(2) doesn't use a file descriptor (so it'll work even if you're out of file descriptors, or in other situations where having an open fd is inconvenient), and it doesn't need access to a /dev directory with the urandom device.
(Note, that's from 2014; today I would recommend getrandom() instead.)
So, recompile every program and every subsequent update to use this functionality or...bind mount. One of these sounds easier than the other.
It disables fsync, o_sync etc, making them no-ops, essentially making the programs writes unsafe. Very dangerous. But very useful when you're trying to bulk load data in to a MySQL database, say as preparation of a new slave (followed by manual sync commands, and very careful checksumming of tables before trusting what happened)
C is one programming language. C w/ ELF semantics and powerful link-editors and run-time linker-loaders is a rather different and much more powerful language.
I won't be sad to see Rust replace C, except for this: LD_PRELOAD is a fantastic code-injection tool for C that is so dependent on the C ABI being simple that I'm afraid we'll lose it completely.
But such elements in a regular article simply harm its coherence and readability for anyone that does not spend much of their time in (rather noisy and immature, IMHO) communities which feature "meme image macros" heavily.
(Bonus negative points if some of the images are animated. That makes me think that the author actively hates the readers.)
I agree that the article is emotional, but the annoyance or not is so very subjective.
https://github.com/d99kris/stackusage measures thread stack usage by intercepting calls to pthread_create and filling the thread stack with a dummy data pattern. It also registers a callback routine to be called upon thread termination.
https://github.com/d99kris/heapusage intercepts calls to malloc/free/etc detecting heap memory leaks and providing simple stats on heap usage.
https://github.com/d99kris/cpuusage can intercept calls to POSIX functions (incl. syscall wrappers) and provide profiling details on the time spent in each call.
- Test low memory environment
- Add memory tracking
- Ignore double frees (fix broken programs)
- Cache allocations / lookaside lists
- Trace all file operations
- Seamlessly open compressed files with fopen()
- Speed up time() as program sees it
- Offset time() to bypass evaluation periods
- Alternative PRNG
- Intercept/reroute network sockets
- Trace various API calls (useful when debugging graphics APIs)
- Force parameters to some API calls
- Set a custom resolution not supported by program
- Switch between HW and SW cursor rendering
- Framelimiting and FPS reporting
- Replace a library with a different one through a compat layer
- Frame buffer postprocessing (e.g. reshade.me)
- Overlays (e.g. steam)
It's a bit more unwieldy to use, because it doesn't just replace all matching symbols (it's not how symbol lookup works for DLLs in Win32) - the injected DLL has to be written specifically with Detours in mind, and has to explicitly override what it needs to override. But in the end, you can do all the same stuff with it.
Does it? I've almost (ie: haven't :P) used Detours but https://github.com/Microsoft/Detours/wiki/OverviewIntercepti... reads like it can rewrite standard function prologues.
The authors have a small library that sets up some signal handlers for things like divide by zero and segmentation faults. They LD_PRELOAD this library when starting a buggy binary (they test things like Chromium and the GIMP), and when the program tries to divide by zero or read from a null pointer, their signal handlers step in and pretend that the operation resulted in a value of 0. The program can then carry on without crashing and usually does someting meaningful. Tadaa, automatic runtime error repair!
So, I copied glibc.so from a SUSE machine to that RHEL machine and ran the test case with LD_PRELOAD, compared with the RHEL glibc. I showed these results to Redhat. Eventually, a patch was applied to glibc on their side.
Would this work with Go or Rust binaries?
It definitely doesn't work with Go, and Rust might work but I'm not sure they use the glibc syscall wrappers.
I don't know enough about .rlib to know whether you could overwrite Rust library functions, but that's a different topic.
It uses LD_PRELOAD and a custom malloc so that you can find out all the horrible ways that application developers did not plan to run out of memory.
Another would be possible RCE. Say you can get a server-side app to set environment variables, like via header injection. Then say you can upload a file. Can you make that server-side app set LD_PRELOAD to the file, and then wait for it to execute an arbitrary program?
Being able to override some library function such that running my text editor does $BADTHING isn't very interesting from a security perspective: if I have the capability to do that, I could also just run a program that does $BADTHING directly. Why bother with additional contortions to involve the text editor?
also for OSX: DYLD_INSERT_LIBRARIES
The libc is the stable ABI on pretty much any OS that's not called Linux, just use it.