Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: A “living” Linux process with no memory (github.com/izabera)
312 points by izabera 11 months ago | hide | past | favorite | 110 comments

I've now seen a similar case multiple times in the wild - if a process has a thread in uninterruptible sleep (e.g., blocked on a bad disk or a stuck network filesystem) and you kill it, the process dies, but the kernel waits forever for that thread before informing the parent process. The parent doesn't get SIGCHLD, and wait() doesn't return. (So, for example, your favorite init/supervisor won't restart the process or even realize the process has died and raise an alert.) If the thread wasn't the first thread, then the process looks like it's gone and stops having most information in /proc/$pid, but it's still got the thread in /proc/$pid/tasks/.

I've taken to calling such a process a "lich," because it's not quite a zombie - the parent can reap a zombie by calling wait(), but the lich has used magic to avoid true death.

The solution to this situation, as I understand it (on Linux), is TASK_KILLABLE, which effectively supercedes uninterruptible sleep as a concept: https://lwn.net/Articles/288056/

But TASK_KILLABLE is not used in most places it should be. Patching your (least) favorite driver to use TASK_KILLABLE could be a good entry point to contributing to the kernel.

This older, tangential HN discussion and the comments on LWN have a bit more info: https://news.ycombinator.com/item?id=18056946

For anyone who was wondering, I found a post explaining why threads can be in "uninterruptible sleep" in the first place:


On this article, one of the tricks you can do to avoid the hung NFS client reboots is to quickly alias the NFS server IP onto the local device (ip addr add ...). This basically causes the networking layer to act like the NFS server went away but also return a RST (reset) network packet to that effect, allowing the failure to propagate upwards to the rest of the stack and unhang.

you could also mount nfs with soft timeouts which won't wait forever and instead return an error to userspace.

Which is a very good practice if you're in control of the software receiving the error. But so much freaking software doesn't handle this correctly it's maddening and so sleep everything that touches it is the only safe thing to do.

For some network filesystems, I want to mount them "Kill every process with a filehandle open to this on error". Is there a way to do that?

NFS has been converted to to the TASK_KILLABLE mentioned elsewhere in the comments, so you could scan for threads stuck in this state, scan proc for open file descriptors pointing to the network filesystems and then kill those processes.

That of course doesn't work if you're dealing with a filesystem that still uses TASK_UNINTERRUPTIBLE

Unplugging the ethernet cable usually also works.

This isn't a UNIX thing, it happens on Windows too. See non-alertable kernel-mode waits in [1], [2].

[1] https://docs.microsoft.com/en-us/windows-hardware/drivers/dd...

[2] https://docs.microsoft.com/en-us/windows-hardware/drivers/ke...

Yeah if you've ever used NFS you're well acquainted with this scenario lol

I saw my first lich on SPARCstation 10 (or 20?).

It was a java program that I killed -9. It died but was never reaped.

It prevented a software shutdown so I had to use the ol' STOP-A

Can/do supervisors poll for this situation? Seems like that should be possible.

This becomes a tarpit. Whatever metric you think to measure for this may itself become a source of false positives in the field. Whatever things you choose to try to avoid that becomes another source of falseness in the system. This tends to cascade as people apply patch upon patch until you have a system that basically randomly fails and nobody can tell you why. This is not a theoretical report; in theory this approach can work, if you get it perfect enough. This is a practical report, both of systems I've built myself and of many other such cases I've witnessed, including some things around process management and supervision.

In the end, you really have little choice but to trust what the OS is telling you, and if the OS is lying for whatever reason, to fix the OS. As hard as that last bit may be as a solution, it still turns out to be the easiest, most-effective fix. And it may not be a perfect fix, either... but it'll still be the best.

Also curious why the kernel can't abandon the thread since the process is dead anyway; set a bit in kernel space so on the walk back from the syscall it bails out instead of trying to return to userspace.

That doesn't solve the kernel thread being stuck forever problem. I'm not sure what the fix is there... is it bad drivers with no timeout mechanism? I don't know how they'd do that unless the kernel IO is all async anyway. Just tearing down the kernel thread seems likely to leave the locks abandoned in bad states.

The kernel can't abandon that thread entirely, but it could "detach" it from the process it was part of and arrange for it to clean itself up and disappear if ever it wakes. And so the parent process could get told of the child's death.

Sure supervisors can be made to poll. But a supervisor won’t be able to do anything to recover from this.

>> Can/do supervisors poll for this situation?

Sounds like a hack.

Typically supervisors are monitoring readiness/liveness of the managed process for other reasons; you don't have the number of replicas you've promised if one of the replicas is un-live (perhaps hitting this bug, perhaps internally deadlocked, perhaps internally waiting forever on something the OS doesn't care about, like a certain network request), and you shouldn't be routing network traffic to a service that isn't ready for it.

So while I agree with the other posts that a supervisor can't recover from this state, it can be aware that it's happened. And it's quite common for supervisors to check for readiness and liveness beyond "well the OS says it's running, sounds good to me."

> So, for example, your favorite init/supervisor won't restart the process or even realize the process has died and raise an alert.

That's why a process that wants to stay running has to provide some form of heartbeat API.

I think lich fits well :)

what happens to the other threads? Are they killed?

Can a malicious process use this to avoid being killed

All the other threads are killed.

A malicious process can't do useful work - as soon as the kernel gets unstuck, the signal will get delivered to the thread and kill it.

> Why? I don't know. I thought it was funny

This is the profit.

While job searching I went on an interview where I got asked why I did a side gig listed on my resumee. "For friend or for money?" It was neither and I said "I don't know, because programming is fun?"

I got hired.

People who do stuff like this remind me of the Beethoven quote: "Don't only practice your art, but force your way into its secrets; art deserves that, for it and knowledge can raise man to the Divine."

Why are you learning how to create linux processes with no memory? Why are you hacking overloading into Python? Why are you writing programs for fun? To dig unnecessarily deep and force your way into programming's secrets.

Well said, and good quote.

> I got hired.

So, for money.

Not all of the outcomes of our actions give evidence for the motivations for our actions.

Especially given how rarely our desired outcomes match our original motivations... We try to achieve one thing, but often end up achieving something else!

I've had projects that - eventually and incidentally - brought me money, but were not started or done for money.

The thing about the "money" argument is that one could just as strongly claim that you went to 1st grade "for money".

except that there is a certain lack of autonomy both legally, mentally, and physically that prevents first grade aged children from refusing to go to school in many nations.

In other words : you can't derive a motivation from an action an individual does if it's something that they're forced to do so socially or legally.

And at the end, we end up with the capitalists dream: "I was born... for money"

I was born into debt. Then, I was forced to go to schools that taught obedience to authority, getting a job, and paying taxes. Then, I saw ads that told me I needed to buy products and run up debt. Then, I eventually didn't care about money except where absolutely necessary.

Everything in between made it seem like making money my No. 1 focus was necessary, though. A lifetime of being manipulated and pressured. It all gets better when you can finally pull back the curtain seeing what's really there. :)



"There are advantages and disadvantages to each type of sleep. Interruptible sleeps enable faster response to signals, but they make the programming harder. Kernel code which uses interruptible sleeps must always check to see whether it woke up as a result of a signal, and, if so, clean up whatever it was doing and return -EINTR back to user space.

The user-space side, too, must realize that a system call was interrupted and respond accordingly; not all user-space programmers are known for their diligence in this regard.

Making a sleep uninterruptible eliminates these problems, but at the cost of being, well, uninterruptible. If the expected wakeup event does not materialize, the process will wait forever and there is usually nothing that anybody can do about it short of rebooting the system.

This is the source of the dreaded, unkillable process which is shown to be in the "D" state by ps.

Given the highly obnoxious nature of unkillable processes, one would think that interruptible sleeps should be used whenever possible. The problem with that idea is that, in many cases, the introduction of interruptible sleeps is likely to lead to application bugs.

As recently noted by Alan Cox:

Unix tradition (and thus almost all applications) believe file store writes to be non signal interruptible. It would not be safe or practical to change that guarantee."

That's the whyness to all of this...


> If the process is in uninterruptible sleep then the process can’t be interrupted, which will cause the strace process itself to hang forever. Remarkably, it appears that the ptrace(2) system call is itself uninterruptible, which means that if this happens you may not be able to kill the strace process!

Everything about this is broken and makes no sense. The Night Watch is as relevant as ever. (https://www.usenix.org/system/files/1311_05-08_mickens.pdf)

Did anybody try to run it? Python part of this PoC depends on python2 module "fuse", which in turn depends on "gunpowder", "a library to facilitate machine learning on large, multi-dimensional images". What is even going on here?

  $ pip2 install fuse
  Collecting fuse
    Downloading https://files.pythonhosted.org/packages/c3/f6/82777531d0dd0fa1d1b509258873f4b48e1ec702dcf0258214fafb474895/fuse-0.1.3.tar.gz
  ERROR: Packages installed from PyPI cannot depend on packages which are not also hosted on PyPI.
  fuse depends on gunpowder@ git+https://github.com/funkey/gunpowder@721718b6569b47a2f5d5d6633c76c85f779e25c7

I think you want https://pypi.org/project/fuse-python/ instead. That python packages can and often do have different names than their corresponding distributables is ... not great.

I agree that it's terrible, though it can be nice in cases where a package ends up forked— for example allowing Pillow to be a drop-in replacement for PIL.

I so so strongly hate this feature in python

I think it's meant to depend on the fuse-python (https://pypi.org/project/fuse-python/). fuse is something else

Pretty sure they meant to depend on python-fuse not fuse. The former is python bindings to libfuse.


Use your OS's own packaging system to install python-fuse.

It may already be installed!

The python-fuse package on Ubuntu 18.04 does not seem to work (too old?).

  $ python2 fs.py x
  Traceback (most recent call last):
    File "fs.py", line 8, in <module>
      class fs(fuse.Operations):
  AttributeError: 'module' object has no attribute 'Operations'
  $ python2
  Python 2.7.17 (default, Nov  7 2019, 10:07:09) 
  [GCC 7.4.0] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import fuse
  >>> fuse.__version__

You probably have to change the import to something like

    from fuse import fuse

That’s not it. Grep shows the word Operations isn’t present in the fuse module.

...and thus pure consciousness was born.

My new meditation strategy is to visualize myself as a process with no memory waiting in uninterruptible sleep.

As someone who doesn't understand much of what's going on here, what resources would you suggest I study to improve?

It kind off depends on what exactly you don't understand, but a book about Operating Systems and Linux Programming are probably what you want to read. They'll teach you about threads, segfault, the meaning of uninterruptible sleep, signals, the /proc/ directory, etc.

Can you suggest a book about Operating Systems?

I'd suggest "Operating Systems: Three Easy Pieces". (http://pages.cs.wisc.edu/~remzi/OSTEP/)

I've found it very accessible. Largely, the way that they teach is to describe a system based on a set of assumptions, then slowly relax each of the assumptions one by one until you reach an example of a real system. That style of teaching really works for me.

"Operating System Concepts" by Silberschatz, Galvin and Gagne, is a classic. It covers the basics: processes, syscalls, filesystems. (it's what we read in our OS course in university)

Thank you!

A more pragmatic book is The Linux Programming Interface, by Michael Kerrisk. It is Linux-specific though, but that's probably fine.

Uninterruptible process sleep is covered in section 22.3. Threads, in chapters 29–33. Signals, chapters 20–22. The proc file system in section 12.1. Memory mappings, chapters 49–50.

I haven't read that book, but I enjoyed a conference tutorial about cgroups and namespaces by Michael Kerrisk. One of the best trainers/presenters I have experienced!

The version I have covers kernel 2.6. Is that new enough to still be useful?

Useful yes, complete no.

Uninterruptible sleep, signals, memory mappings etc have not changed fundamentally.

Namespaces and cgroups are new concepts in Linux, they won't be covered. IIRC user namespaces appeared in 3.14. New cgroup hierarchy got somehow usable much later than that.

The new concepts are needed for containers. Not at all relevant for the original article here.

Actually he is still maintaining a list of changes since the book has come out. http://man7.org/tlpi/api_changes/index.html

Of course. Linux famously "doesn't break userspace" so almost anything that works in 2.6 still works today.

Really neat!

Not sure why people care what it's written in. I'm pretty sure people wouldn't complain if this was a blog post written in english rather than a program written in C and python2.

Can it be done without fuse?

Fuse is restricted; probably for reasons like this.

Ha. It isn't. User namespaces allow you to do this attack: I have a reproduction here: https://github.com/sargun/fuse-example

Basically, what’s happening is that the FUSE daemon which is the one handling the FUSE requests has /dev/fuse open. There is a thread in that FUSE daemon. let’s call the FUSE daemon P10, and the thread P11.

P10 wires up a FUSE filesystem on /tmp, it opens /dev/fuse with FD5 P11 enacts an uninterruptible (blocking operation) on /tmp/foo, called OP1, and /tmp/foo is FD6. P10 reads the operation (OP1) from /dev/fuse, so now OP1 is in userspace ----The Pid 1 of the namespace is killed--

P10 just terminated, and cannot make progress. It will never respond to OP1. FD5 and FD6 remain open, because P11 is in uninterruptible sleep. P11 is in an uninterruptible disk sleep waiting for OP1 to respond, and the fuse connection never aborts.

FUSE abortion doesn’t kick in because the FD of `/dev/fuse` that P10 originally opened as the FUSE daemon under FD5 will not be closed until all threads as part of the process are terminated. The mount namespace wont be torn down until P11 is torn down. P11 will never be torn down because it’s waiting for someone to do something.

I am not sure if uninterruptable sleep is considered "living", but fun experiment.

Can't think of any way to profit from this. Just reboot your box.

That's the '???' part of it, I would assume.

how does the FUSE filesystem interact with the c program running? I don't see the C program attempting to r/w to that x directory

Can this be implemented in Go?

I think so, but probably not reliably and/or for the long term.

In terms of the actual unmap call, the payload is in the munmap call, which is a syscall, and if you do have to set up the registers for that call, Go does have its assembler support which should be able to do that for you. You should be able to write Go source code that would perform that call, although you're going to be poking into some corners of Go most programmers never have to get into. You can also get to fuse, which in my experience is great at producing uninterruptible sleeps even when you're not trying to do it on purpose....

However, in terms of reliability and the long term, you have the problem that you're not the only thread in the program the way a C program can count on that, and if any of those other threads wake up and their memory is missing, it'll probably result in the process dying. You can do a couple things to try to avoid that a little, like pinning yourself to an OS thread and running the program in one CPU mode, but especially on the latest version of Go where they've implemented true pre-emption, you can't keep the runtime from running by just hogging the execution thread anymore.

So I suspect that as long as you win the race to unmap everything before the runtime wakes up (which may also require you to call sysmap through the assembly code directly, rather than using the syscall libraries as I'm pretty sure those notify the scheduler of what you're doing), you may be able to briefly be a process with no RAM mapped, but in human terms it won't be long before the runtime wakes up to do something, anything, and crash the process as a result. There's no way to avoid that in Go. You won't be able to have a process just sitting there indefinitely with no mapped RAM that you can admire and treasure and hand off to your children as part of their inheritance.

> you have the problem that you're not the only thread in the program the way a C program can count on that

Note that if you link against libc I think it would be legal for it to create a thread behind your back, so even in this case you're not quite safe.

Normally I'm not a fan of people nitpicking language choices... but in this case I upvoted because this is a really weird "stupid program trick" and I'm curious what languages support such shenanigans.

can a bf interpreter be written in sed?


we're programmers, anything can be done with enough time or money

Can you write a program to tell if any given program is going to halt?

Tell you what: you give me time and money, and I'll work on this program. Keep giving me time and money until I tell you I've succeeded.

Can someone write a program to tell me how much money I’d need to give you?

Give me time and I'll tell you at the end how much money you need to give me.

Sounds like a good deal if we don't need to pay until you're done!

No need to be difficult, just give me an estimate. Is this going to be a small, medium or large t-shirt size task?

Agreed, take all the time you need.

Sure, I'll just be giving you half your wage with each coming hour.

As long as linear execution time is acceptable, okay. Give me a couple million dollars and I'll have it to you in a week.

With enough time, sure! [0][1]

[0] Disclaimer: may take slightly longer than heat death of current universe.

[1] Alternative solution: https://m.xkcd.com/1266/


Surely rust is the preferred option for that? Go hasn't been nearly as vocal in my experience.

why do we need to JIT the munmap code? Can we not just call munmap(2)

The call to munmap() requires both a stack (which will be unmapped at some point) and the code from libc (which will be unmapped at some time).

The program gets a list of all the allocated pages first, then creates one more for the JIT code (and a copy of the list) which is unmapped as the last thing it does. There really is no other way of doing it.


I thought you were joking, I assumed it would be C. Thanks for getting me to take a look at the code.

(For the lazy: it is in C, but to reliably make a system call that will uninterruptibly block there's a Python FUSE driver that hangs to cause the call from C to block.)

He could. You also could have done it in Python3, and you still can.

"He" is named Isabella.


Why pointing this detail out is being downvoted? It is a polite message and it is correct above all. Don't upvote it if you think it's not that relevant (I personally think it is) but don't downvote it either.

Maybe it's a level deeper, the down-voters objecting to the assumption that 'he' can't be called 'Isabella'!

(I haven't voted either way and have no strong feelings on it, just pointing out how doomed this line of objection/correction is. Personally I generally say 'they' or 'the OP', and did long before (all my life) I was aware of anyone having pronoun preferences, it's just correct isn't it? Never used to be a problem. 'Is that a person over there?' 'Yes, they're just in a funny pose.')

You are seriously getting bent out of shape over just 5 freaking lines of code?

There's nothing wrong with using Python 2 for a PoC.

I'd be much more reluctant to get on OP’s case about it than GP, but there’s a strong argument to be made that this hurts the overall python ecosystem.

There's also a strong argument to be made that people shouldn't have to care about ecosystems that are so delicate that a proof of concept in a previous version "hurts the overall [ecosystem]."

"How dare you use a hammer! We're all using Hammer 2 now! I don't care that it was 'Only one nail," put it down and pick up Hammer 2! Your hammer is what, ten years old? That's ancient! It's not even compatible with my Hammer 2! What do you mean, 'They hit the same fucking nails!' That's not the point! I can't track how many nails you're hitting per-minute now, or even take advantage of the nails you've hit! It's barbaric!"

Can you imagine?

To make the analogy work I have to imagine that the two hammers cause horrible problems when used at the same time, and that they're both free with 5 minute shipping.

It's a sucky situation to be in, but we should probably get everyone on the same hammer.

(If we want a somewhat less nonsensical analogy, replace hammer with screwdriver. Two incompatible screw head patterns and you can only use one driver on any particular object.)

So sure don't nitpick a toy project, but if someone's writing significant code in python 2... it's not ideal.


But all of this is a dumb tangent, and we should just appreciate how funny it is to ask that this nonsensical pointless quasi-OS-breaking code be done in python 3 instead.

> But all of this is a dumb tangent, and we should just appreciate how funny it is to ask that this nonsensical pointless quasi-OS-breaking code be done in python 3 instead.

I'll admit that I did not expect it to start a thread.

No "horrible problems" arise from Python 2 being used on the same system that Python 3 is on. The only problems are trivial, which was highlighted in the analogy.

I wrote this reply earlier and deleted it. Part of me feels like it's too far afield, but it's also kind of not, so I'll include it. It's what's on the top of my mind right now:


I'm a volunteer teacher at a Girls Who Code club. We use Python. I spent the last class before the universe collapsed into Cornoavirus hell introducing the kids to the concept of functions.

After introducing the concept, I asked the kids to modify a quiz program they'd written with a different teacher to use functions to ask a question and check the answer, instead of copying and pasting the code each time. This was a pretty challenging assignment—these kids hadn't encountered functions before—but one of the girls managed to get it and I was really proud of her.

Except her code wouldn't run because—I realized with horror—the quiz program she'd written with another teacher was actually python 2 code!

I am still, weeks later, so damn frustrated that this happened. I'm trying to get the kids to wrap their heads around functions; I don't want to start explaining the political BS around python 2 and 3.

It's primarily my fault, of course, and secondarily the python foundation's fault for handling this transition so badly. But every new project that's done in python 2—whether serious or a toy—is also contributing to the issue in a very small way.

So while this isn't a conversation I would have personally brought up in this thread, I think it's all just very unfortunate. Python is a nice language and it doesn't deserve this crap.

I saw this reply earlier! It was gone by the time I tried writing a response, but I don't feel like it's too far afield at all: it's a really nice response. (And thanks for doing that! It's always nice to see people volunteering for genuinely helpful things like this.)

It doesn't seem like it was your fault, and it's understandable that you're frustrated.

I don't, however, agree that anything written in Python 2 is contributing to that problem. Just because code exists doesn't mean it has to be used by other people, and this is a proof-of-concept obviously outside of the range of the class you were teaching.

It's the Python Foundation's fault, and it doesn't seem like anyone else's. Python 2 is nicer to write from some perspectives, and, for example, the idea of somebody who didn't like Perl 6's changes being told to write Perl 6 instead of Perl 5 or else they were being harmful doesn't really make sense either.

> Python 2 is nicer to write from some perspectives, and, for example, the idea of somebody who didn't like Perl 6's changes being told to write Perl 6 instead of Perl 5 or else they were being harmful doesn't really make sense either.

Except Perl 6 (Raku) is now marketed as an entirely separate language, whereas Python 3 is the version of Python that has superseded Python 2.

Raku's renaming was recent. It took 19 years to happen. Python 3 still has time to be renamed!

I didn't say same system, I said same time.

As in, trying to run one program with both.

Wouldn't occur in this case, as the script starts with:

    #!/usr/bin/env python2

You're arguing against something I never said.

Yes, this file, completely in isolation, has no problem.

But people want to different pieces of python code together. If one or the other only works with a specific version, it becomes a problem when you try to combine them.

Therefore, best practice is to get everyone on the same version. We want to stop having to ask which version everything is, and facing the disappointment when things can't go in the same program. We want to get back to the situation where it's just 'hammer', and that's good enough for 95% of people.

...use the hammer that didn't arbitrarily break compatibility, then? (Note that the original analogy pointed out that they couldn't be mixed, anyway: "or even take advantage of the nails you've hit!")

This is like complaining that the well is poisoned when you're the one who put arsenic in it.

This also isn't even an open source project, so it's not like you can mix and match anyway.

I didn't make python 3, and "stick on python 2 forever" doesn't seem like the solution here as far as I can tell.

> This also isn't even an open source project, so it's not like you can mix and match anyway.

Again focusing on the execution of one specific project, as-is, is beside the point of "get everyone on the same hammer".

I didn't make python 3

No, but you're complaining about a person using Python 2, the one that didn't arbitrarily break compatibility, while promoting the use of Python 3 because not being compatible is bad.

Again focusing on the execution of one specific project, as-is, is beside the point of "get everyone on the same hammer".

I agree with this.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact