
x86 API Hooking Demystified (2012) - codesuki
http://jbremer.org/x86-api-hooking-demystified/
======
cecilpl2
I used this technique once to very useful effect in an automated dependency-
checking tool.

Essentially given an idempotent data-transformation process, you can inject a
dll into it and hook its filesystem calls, maintaining a list of all files it
touches. Upon process termination, serialize out the filesystem state (size,
timestamp, MD5) of each of those files, saving the data in a dependency file
whose filename is the hash of the command line.

When you run the same command line again, the dll can first look up the
dependency file from the previous run, and if all files on the filesystem are
in the same state as previously, you can short-circuit the process execution
entirely.

This is orders of magnitude faster in many cases for highly-parallel data
build jobs where only some small percentage of the source data changes each
time you run the build job, and has the advantage that you don't need manually
maintain a list of dependencies for each process type (no new dependency can
be added without changing one of the existing dependencies).

~~~
new_realist
On UNIX this sort of thing is typically done with a virtual file system.

~~~
pjc50
Isn't it normally done with the similar technique of LD_PRELOAD, which unlike
a vfs can be done from userland?

~~~
vthriller
> unlike a vfs can be done from userland

You can implement and mount such virtual FS entirely in userspace using FUSE.

------
stevemk14ebr
I wrote a hooking library that implements the methods talked about, plus a
bunch more it doesn't. As well as handle a bunch of edge cases not mentioned:
[https://github.com/stevemk14ebr/PolyHook_2_0](https://github.com/stevemk14ebr/PolyHook_2_0)

~~~
jchw
Thanks for linking, this is really cool. Also nice to see AMD64 support, I
never figured out how it can be done. (Is there an absolute jmp _without
destroying registers_ in x64?)

~~~
stevemk14ebr
yes there is. \x25\xff\x00\x00\x00\x00 where 0 is a 32 bit displacement to a
memory location that contains a 64 bit constant is what i use: jmp [disp] but
you need to be able to place the constant +- 32bits. This is really hard for
x64 on windows, since VirtualAlloc gives you no guarantees, i used to walk
pages manually now i just new/delete in a loop and hope for the best.

There's also:

push rax

mov rax, 0xDEADBEEFDEADBEEF

xchg qword ptr ss:[rsp], rax

ret

but i dont like it as much since it touches stack (technically more detectable
since you overwrite stuff at rsp - 8). RSP & RAX are original values after
that gadget though.

Also fun fact. My library supports JIT-ing, so you can create the stub that
the hook jmps to at runtime and it will JIT translation logic for the calling
convention and pack the args + ret value into a structure that can be
modified. So you can hook unknown functions at runtime.

~~~
intea
jmp [rip] .dq 0xCCCCCCCCCCCCCCCC

works too

~~~
jchw
Mixing code and data, right? I think the downside here is if you tried to
install another hook it’d fail because the LDE wouldn’t be able to make heads
or tails of the address. I suggest embedding the value into something with an
imm64 argument (mov?) so that LDEs can handle it.

I guess also though, at ~16 bytes its probably deep enough into the function
that it may no longer be position independent, or hell, maybe the function
isn’t even that long to begin with.

~~~
intea
If you're writing a hooking library / a hook you _should_ be keeping track of
where they are. It's a big hook, that is true but it's also one that doesn't
spoil a register and is pretty straightforward to add. It's a tradeoff.

~~~
jchw
Well the bigger problem imo is _other_ hook engines that might also be roaming
around the process space. I think all you need is two extra bytes to make it
valid instructions, and _in theory_ then nested hooking should work fine.
Though it only exacerbates the length issue.

~~~
stevemk14ebr
if you place the data at the end of the trampoline it avoids these issues of
mixing data and code, it's like a little custom data segment you make since
you have to allocate the trampoline anyways. This is what i do in my lib. The
disp is after the jmp the trampoline uses to jmp back to the original. The
original function only has the jmp [disp] and no data is mixed.

------
MrGilbert
We used hooking to "isolate" the audio output from a certain application on
Windows:

The application would ask the operating system for the default audio device.
By intercepting this request, we were able to re-route it to our own, virtual
audio device. Our program would then fetch the audio data from the virtual
device, and replay it to the "real" audio device. At the same time, the audio
gets saved to ram, and finally to disk.

The benefit of this method was that we were able to actually isolate the audio
from all other sources on the computer. So you could, in theory, mute the
playback, while still being able to let the recording run. Ultimately, we
abandoned this method, as it proved quite unreliable. But it was fun to come
up with, and finally implement.

------
jchw
Oh hey! This is a favorite topic of mine. I wrote myself a hooking library for
a project where I didn’t want to use libc (and it was Win32 by design so I
could just use the equivalent Win32 calls). The hardest part was definitely
the LDE (length disassembly engine) and in the end I found a small header-only
open source library that did it perfectly. The rest was very easy, especially
on x86.

API hooking sometimes even works in hostile environments, like on software
that tries to guard against patches and modification, simply because it can be
challenging to detect, and you can do it early in (like inside a DLL
entrypoint). So if you can do all of your work at API boundaries, you can get
away with a whole lot, even if an app is packed with a strong VM packer.

Worth noting that for many less difficult use cases on Linux, you can use
LD_PRELOAD to somewhat similar effect.

------
userbinator
On Linux (and probably other Unices), another way to hook is via the
interesting default behaviour of the dynamic linker to only use symbol names
instead of qualifying them with the library's filename from which they are to
be imported and prefer symbols in already-loaded libraries; I suspect that
more often than not, this happens by accident instead of deliberation.

~~~
jchw
I mentioned LD_PRELOAD separately in my other comment, but I believe this same
behavior is what enables LD_PRELOAD to work.

(I wonder how this interacts with glibc symbol versioning, now that I think
about it.)

------
djmips
Renderdoc uses hooking to instrument your 3D API calls to provide excellent
graphics debugging. Good example to dive into since it's also cross platform.

[https://github.com/baldurk/renderdoc](https://github.com/baldurk/renderdoc)

I used graphics API hooking in a Source Engine game where I didn't have the
source for the engine but needed to display the 2D Flash based GUI (Iggy) at
the correct time. It was fun to get it working.

Valve's Steam also does something similar since it has the ability to
superimpose it's GUI over a running game.

~~~
Const-me
Conceptually yes, but technically renderdoc works differently. On Windows it
patches IAT (Import Address Table), then for D3D it wraps complete D3D COM
interfaces. It doesn’t use the described tricks to patch code with jmp
instruction, it’s not too reliable, IMO.

Similar on Linux, it doesn’t have IAT but it has PLT (Procedure Linkage Table)
which is basically the same thing as IAT.

------
_0ffh
API hooking used to be super easy and very common under DOS. All the API calls
used to work using traps, so you'd just have to 1) store the original
interrupt vector 2) change it to point at your own interrupt service routine,
3) do whatever when the trap was invoked.

Point 3 might include logging, passing the request through to the original ISR
(possibly with changed parameter values), changing the return values, anything
really. Easy & fun. =)

------
osullivj
I've used MS Detours to hook my own C++ interceptor functions into dispatch
from Excel to XLL extensions via the internal Excel4V interface. It worked
very well, showing me the XLOPER values. But that was for a 32 bit Excel. I
didn't appreciate how much trickier it is with the amd64 instruction set.

------
Ididntdothis
If I remember correctly the old Mac operating systems used this a lot to ship
patches or to change system behavior.

