I presume that is necessary for this in addition to belonging to the same UID?
> As far as I know process_vm_readv isn't even detectable if the agent process is more privileged than the examinee process—so you're free to manipulate your private copy of the application in the comfort of your own address space.
Interesting. This would be really useful in debugging. Many issues don't reproduce except for in specific configurations. Having access to the memory dump of the live process "streamed" to the debugger would be great!
It's the same check as ptrace itself, so the intuition is "can I strace or gdb this process."
(This also means that you don't want or need CAP_SYS_PTRACE to get gdb/strace working in Docker, that lets you ptrace anything and also coincidentally turns off the syscall filter. Just turn the filter off, that works without privileging the processes in the container.)
You can also disable ptrace() completely.
(Grep for "ptrace_scope" in the ptrace(2) man page for details.)
This is also possible with standard debuggers, such as GDB: It can attach to a running process and not only examine the memory, but also debug (stop, pause, skip, ...) the stack trace and control flow. Usage is as simple as gdb -p $(pidof my_running_program)
Taking a snapshot of other processes is also a basic use case of this family of functions .
> that Win32 has already this kind of feature since ages: Process access routines such as OpenProcess()  coupled with ReadProcessMemory() 
And Unix has had ptrace(2) for a very long time too, which will accomplish the same thing. Also you can read another process's memory through /proc. So there are multiple paths.
Also keep in mind, the most "legit" use case for this stuff is for writing a debugger. If you have used a debugger you are already relying on this functionality being there.
Edit: I am not aware of a win32 equivalent of this thing that lets you easily handle another thread's page faults in user mode though. That seems a little wacky. You can use debugger APIs to handle "STATUS_IN_PAGE_ERROR" and "STATUS_ACCESS_VIOLATION", which might get you there.
A flaw in the Linux implementation, though, prevents one to run ptrace on a process that is using ptrace itself.
As more programs use ptrace, this flaw is becoming quite annoying.
What unfork does is more complicated than a mere read, though. I'm still not entirely clear on its use case, but it does all sorts of tampering that the code comments describe as "cursed." It also seems to be specifically targeting applications which have anti-debug measures.
The closest equivalent I can think of in Windows would be to mark pages as no-access and use vectored exception handling to trap access faults. During a fault, the exception handler would fill in the page (e.g. via ReadProcessMemory) and flip the page protection to read or read/write.
Since you wouldn't want to flip the page protection until after the memory had been updated, you would probably have to used a pagefile-backed section to update the memory at a separate virtual address with independent page protections. And unlike the userfaultfd approach, this mechanism would not help for cases where the mirrored memory was being passed to a syscall.
I think Linux could do this too, via a signal handler, but AFAIK the Linux memory manager does not efficiently support per-page access protection (unlike Windows). In the worst case, each page would get its own vma structure in the kernel, which would be quite expensive. So absent userfaultfd, the Windows memory manager probably has the edge.
Glibc used to have unexec(), which is fairly old, but it was removed because nobody used it (except Emacs, and there were better solutions to the problem it was solving).
It's as clean as any official Win32 API which uses their privilege system to restrict/allow accesses to each and any bit of information on the process state and/or memory.
> Can you simply exec the result?
This is possible using CreateThread()  which creates a remote thread inside another process execution context.
> Glibc used to have unexec()
My understanding is that unexec() was more about making a snapshot of the whole process state to an executable on disk.
gdb -batch -n -ex 'call chdir("whatever")' -p $$
It's a simple example of ptrace() though.
It really looks like some overly clever college student's weird trick that somehow managed to survive for decades in an established product.
Also, that machine was being time shared with a dozen or more users.
Launching emacs or TeX on this machine might take tens of seconds without access to unexec(), but only 3 seconds for the freeze dried version.
unexec() was easier at the time. There were no shared libraries, no address space layout randomization. One memory region grew up from the bottom, one down from the top. There was no mmap() jamming mysterious stuff in the middle. Just copy the bottom, copy the top, do magic to adjust the stack for your unexec() call, and write the thing out as an executable.
(Yeah, I excised unexec() from BibTeX back in the ‘80s to port it to a 68k Mac for a coworker, then later implemented unexec() for a Motorola 88k based multilevel secure SysV system in the early ‘90s because launching emacs was driving me insane. I prefer our shiny new future of stupidly fast computers.)
Description: put thread to sleep as long as there is activity on any fd, wake up only when all fds are inactive.
Useful for: Scheduling work to be performed only when server is idle.
Description: Select a random file, load it into the buffer cache, and remove it from file system.
Useful for: Freeing up some disk space in a pinch.
Description: Resurrects the previous child process.
Useful for: Implementing the !! operator in bash.
Description: Invoke signal handler whenever a given fd activates.
Useful for: User space interrupts.
Description: Create a file which refers to an open fd.
Useful for: Implementing /proc/self/fd functionalit.
Yeah that's nice.
> Useful for: User space interrupts.
There is libfam. At least on my system, it doesn't have a manual page.
> Description: Create a file which refers to an open fd.
That does sounds useful, and I don't know any library that does it.
In my silly world, unopen() would just take any fd (socket, file, pipe, etc.) and create a file system binding which anyone could open. Kind of like how /proc works on Linux today.
(Whereas moving to recycle-bin is a manual process you need to remember to do).
alias rm trash
You need linkat() with the AT_SYMLINK_FOLLOW flag enabled.
Famous last words.
> A: It's true that meshing address spaces is much harder than copying them. ... [truncated] ... 64-bit systems with ASLR are far more forgiving. Nevertheless, I think that with some effort two allocators or even dynamic linkers could survive together.
That is a really cool side effect of ASLR!
Freezing the process can affect its correct operation. (Sometimes when I need a memory dump of a production java app, I can't take it because can not afford freezing a production app)
Without the freeze, the memory copy we get can be inconsistent.
One could then take a core dump, java heap dump, or similar, of the paused copy process.
I'm curious, why does the tool try to copy the original process memory into the memory of the tool itself, risking a collision? Is it impossible to create a third process - an exact copy of the original process?
> all while leaving no ptrace and sending no signal
If this is a design goal, I'm afraid it is indeed impossible to take a snapshot of the original process. As far as I know (I researched the status quo 2 years ago when I needed copy-on-write for VM cloning/forking), the only way to make a snapshot of a process' address space is to invoke the clone (fork) system call. If you need to take a snapshot of another process, then you need ptrace.
But you're absolutely right that the unfork functionality itself can be implemented more robustly by doing this ptrace/fork trick.
Unfortunately I’m not sure that’s a good assumption, due to the stack and heap needing to exist even for statically-linked binaries.
If I understand this right,the process being unforked into you won't notice a thing and will happily chime on.
But you are the debugger…
> If I understand this right,the process being unforked into you won't notice a thing and will happily chime on.
Right, whereas when running an actual debugger you need to deal with signals and making sure you don't touch memory.
Ah thanks. My autocomplete didn't like the word "debuggee". Edited!
I like it.
But that's probably the name of a startup and would confuse people.
English is also not my native language.