
Show HN: A “living” Linux process with no memory - izabera
https://github.com/izabera/zeromaps
======
geofft
I've now seen a similar case multiple times in the wild - if a process has a
thread in uninterruptible sleep (e.g., blocked on a bad disk or a stuck
network filesystem) and you kill it, the process dies, but the kernel waits
forever for that thread before informing the parent process. The parent
doesn't get SIGCHLD, and wait() doesn't return. (So, for example, your
favorite init/supervisor won't restart the process or even realize the process
has died and raise an alert.) If the thread wasn't the first thread, then the
process looks like it's gone and stops having most information in /proc/$pid,
but it's still got the thread in /proc/$pid/tasks/.

I've taken to calling such a process a "lich," because it's not quite a zombie
- the parent can reap a zombie by calling wait(), but the lich has used magic
to avoid true death.

~~~
btown
Can/do supervisors poll for this situation? Seems like that should be
possible.

~~~
xenadu02
Also curious why the kernel can't abandon the thread since the process is dead
anyway; set a bit in kernel space so on the walk back from the syscall it
bails out instead of trying to return to userspace.

That doesn't solve the kernel thread being stuck forever problem. I'm not sure
what the fix is there... is it bad drivers with no timeout mechanism? I don't
know how they'd do that unless the kernel IO is all async anyway. Just tearing
down the kernel thread seems likely to leave the locks abandoned in bad
states.

~~~
cryptonector
The kernel can't abandon that thread entirely, but it could "detach" it from
the process it was part of and arrange for it to clean itself up and disappear
if ever it wakes. And so the parent process could get told of the child's
death.

------
croo
> Why? I don't know. I thought it was funny

This is the profit.

While job searching I went on an interview where I got asked why I did a side
gig listed on my resumee. "For friend or for money?" It was neither and I said
"I don't know, because programming is fun?"

I got hired.

~~~
jldugger
> I got hired.

So, for money.

~~~
jshevek
Not all of the outcomes of our actions give evidence for the motivations for
our actions.

~~~
osrec
Especially given how rarely our desired outcomes match our original
motivations... We try to achieve one thing, but often end up achieving
something else!

------
peter_d_sherman
[https://lwn.net/Articles/288056/](https://lwn.net/Articles/288056/)

Excerpt:

"There are advantages and disadvantages to each type of sleep. Interruptible
sleeps enable faster response to signals, but they make the programming
harder. Kernel code which uses interruptible sleeps must always check to see
whether it woke up as a result of a signal, and, if so, clean up whatever it
was doing and return -EINTR back to user space.

The user-space side, too, must realize that a system call was interrupted and
respond accordingly; _not all user-space programmers are known for their
diligence in this regard._

Making a sleep uninterruptible eliminates these problems, but at the cost of
being, well, uninterruptible. If the expected wakeup event does not
materialize, _the process will wait forever and there is usually nothing that
anybody can do about it short of rebooting the system._

This is the source of the dreaded, unkillable process which is shown to be in
the "D" state by ps.

Given the highly obnoxious nature of unkillable processes, one would think
that interruptible sleeps should be used whenever possible. The problem with
that idea is that, in many cases, the introduction of interruptible sleeps is
likely to lead to application bugs.

As recently noted by _Alan Cox:_

 _Unix tradition (and thus almost all applications) believe file store writes
to be non signal interruptible. It would not be safe or practical to change
that guarantee._ "

That's the _whyness_ to all of this...

~~~
Reelin
[https://eklitzke.org/uninterruptible-
sleep](https://eklitzke.org/uninterruptible-sleep)

> If the process is in uninterruptible sleep then the process can’t be
> interrupted, which will cause the strace process itself to hang forever.
> Remarkably, it appears that the ptrace(2) system call is itself
> uninterruptible, which means that if this happens you may not be able to
> kill the strace process!

Everything about this is broken and makes no sense. The Night Watch is as
relevant as ever.
([https://www.usenix.org/system/files/1311_05-08_mickens.pdf](https://www.usenix.org/system/files/1311_05-08_mickens.pdf))

------
kees99
Did anybody try to run it? Python part of this PoC depends on python2 module
"fuse", which in turn depends on "gunpowder", "a library to facilitate machine
learning on large, multi-dimensional images". What is even going on here?

    
    
      $ pip2 install fuse
      Collecting fuse
        Downloading https://files.pythonhosted.org/packages/c3/f6/82777531d0dd0fa1d1b509258873f4b48e1ec702dcf0258214fafb474895/fuse-0.1.3.tar.gz
      ERROR: Packages installed from PyPI cannot depend on packages which are not also hosted on PyPI.
      fuse depends on gunpowder@ git+https://github.com/funkey/gunpowder@721718b6569b47a2f5d5d6633c76c85f779e25c7

~~~
kristjansson
I think you want [https://pypi.org/project/fuse-
python/](https://pypi.org/project/fuse-python/) instead. That python packages
can and often do have different names than their corresponding distributables
is ... not great.

~~~
mikepurvis
I agree that it's terrible, though it can be nice in cases where a package
ends up forked— for example allowing Pillow to be a drop-in replacement for
PIL.

------
plumsempy
...and thus pure consciousness was born.

~~~
AgentME
My new meditation strategy is to visualize myself as a process with no memory
waiting in uninterruptible sleep.

------
kdom13
As someone who doesn't understand much of what's going on here, what resources
would you suggest I study to improve?

~~~
kccqzy
A more pragmatic book is The Linux Programming Interface, by Michael Kerrisk.
It is Linux-specific though, but that's probably fine.

Uninterruptible process sleep is covered in section 22.3. Threads, in chapters
29–33. Signals, chapters 20–22. The proc file system in section 12.1. Memory
mappings, chapters 49–50.

~~~
aaron_m04
The version I have covers kernel 2.6. Is that new enough to still be useful?

~~~
usr1106
Useful yes, complete no.

Uninterruptible sleep, signals, memory mappings etc have not changed
fundamentally.

Namespaces and cgroups are new concepts in Linux, they won't be covered. IIRC
user namespaces appeared in 3.14. New cgroup hierarchy got somehow usable much
later than that.

The new concepts are needed for containers. Not at all relevant for the
original article here.

~~~
usr1106
Actually he is still maintaining a list of changes since the book has come
out.
[http://man7.org/tlpi/api_changes/index.html](http://man7.org/tlpi/api_changes/index.html)

------
Arch-TK
Really neat!

Not sure why people care what it's written in. I'm pretty sure people wouldn't
complain if this was a blog post written in english rather than a program
written in C and python2.

------
emmelaich
Can it be done without fuse?

Fuse is restricted; probably for reasons like this.

~~~
sargun
Ha. It isn't. User namespaces allow you to do this attack: I have a
reproduction here: [https://github.com/sargun/fuse-
example](https://github.com/sargun/fuse-example)

Basically, what’s happening is that the FUSE daemon which is the one handling
the FUSE requests has /dev/fuse open. There is a thread in that FUSE daemon.
let’s call the FUSE daemon P10, and the thread P11.

P10 wires up a FUSE filesystem on /tmp, it opens /dev/fuse with FD5 P11 enacts
an uninterruptible (blocking operation) on /tmp/foo, called OP1, and /tmp/foo
is FD6. P10 reads the operation (OP1) from /dev/fuse, so now OP1 is in
userspace \----The Pid 1 of the namespace is killed--

P10 just terminated, and cannot make progress. It will never respond to OP1.
FD5 and FD6 remain open, because P11 is in uninterruptible sleep. P11 is in an
uninterruptible disk sleep waiting for OP1 to respond, and the fuse connection
never aborts.

FUSE abortion doesn’t kick in because the FD of `/dev/fuse` that P10
originally opened as the FUSE daemon under FD5 will not be closed until all
threads as part of the process are terminated. The mount namespace wont be
torn down until P11 is torn down. P11 will never be torn down because it’s
waiting for someone to do something.

------
seqizz
I am not sure if uninterruptable sleep is considered "living", but fun
experiment.

------
anonu
Can't think of any way to profit from this. Just reboot your box.

~~~
taftster
That's the '???' part of it, I would assume.

------
fierro
how does the FUSE filesystem interact with the c program running? I don't see
the C program attempting to r/w to that x directory

~~~
saagarjha
[https://github.com/izabera/zeromaps/blob/85ffcf82f7eb8365ebf...](https://github.com/izabera/zeromaps/blob/85ffcf82f7eb8365ebffea755f475d97d6d1d889/zeromaps.c#L9)

------
hsnewman
Can this be implemented in Go?

~~~
jerf
I think so, but probably not reliably and/or for the long term.

In terms of the actual unmap call, the payload is in the munmap call, which is
a syscall, and if you do have to set up the registers for that call, Go does
have its assembler support which should be able to do that for you. You should
be able to write Go source code that would perform that call, although you're
going to be poking into some corners of Go most programmers never have to get
into. You can also get to fuse, which in my experience is great at producing
uninterruptible sleeps even when you're _not_ trying to do it on purpose....

However, in terms of reliability and the long term, you have the problem that
you're not the only thread in the program the way a C program can count on
that, and if any of those other threads wake up and their memory is missing,
it'll probably result in the process dying. You can do a couple things to try
to avoid that a little, like pinning yourself to an OS thread and running the
program in one CPU mode, but especially on the latest version of Go where
they've implemented true pre-emption, you can't keep the runtime from running
by just hogging the execution thread anymore.

So I suspect that as long as you win the race to unmap everything before the
runtime wakes up (which may also require you to call sysmap through the
assembly code directly, rather than using the syscall libraries as I'm pretty
sure those notify the scheduler of what you're doing), you may be able to
_briefly_ be a process with no RAM mapped, but in human terms it won't be long
before the runtime wakes up to do something, anything, and crash the process
as a result. There's no way to avoid that in Go. You won't be able to have a
process just sitting there indefinitely with no mapped RAM that you can admire
and treasure and hand off to your children as part of their inheritance.

~~~
saagarjha
> you have the problem that you're not the only thread in the program the way
> a C program can count on that

Note that if you link against libc I think it would be legal for it to create
a thread behind your back, so even in this case you're not quite safe.

------
fierro
why do we need to JIT the munmap code? Can we not just call munmap(2)

~~~
spc476
The call to munmap() requires both a stack (which will be unmapped at some
point) and the code from libc (which will be unmapped at some time).

The program gets a list of all the allocated pages first, then creates one
more for the JIT code (and a copy of the list) which is unmapped as the last
thing it does. There really is no other way of doing it.

