I've taken to calling such a process a "lich," because it's not quite a zombie - the parent can reap a zombie by calling wait(), but the lich has used magic to avoid true death.
But TASK_KILLABLE is not used in most places it should be. Patching your (least) favorite driver to use TASK_KILLABLE could be a good entry point to contributing to the kernel.
This older, tangential HN discussion and the comments on LWN have a bit more info: https://news.ycombinator.com/item?id=18056946
That of course doesn't work if you're dealing with a filesystem that still uses TASK_UNINTERRUPTIBLE
It was a java program that I killed -9. It died but was never reaped.
It prevented a software shutdown so I had to use the ol' STOP-A
In the end, you really have little choice but to trust what the OS is telling you, and if the OS is lying for whatever reason, to fix the OS. As hard as that last bit may be as a solution, it still turns out to be the easiest, most-effective fix. And it may not be a perfect fix, either... but it'll still be the best.
That doesn't solve the kernel thread being stuck forever problem. I'm not sure what the fix is there... is it bad drivers with no timeout mechanism? I don't know how they'd do that unless the kernel IO is all async anyway. Just tearing down the kernel thread seems likely to leave the locks abandoned in bad states.
Sounds like a hack.
So while I agree with the other posts that a supervisor can't recover from this state, it can be aware that it's happened. And it's quite common for supervisors to check for readiness and liveness beyond "well the OS says it's running, sounds good to me."
That's why a process that wants to stay running has to provide some form of heartbeat API.
Can a malicious process use this to avoid being killed
A malicious process can't do useful work - as soon as the kernel gets unstuck, the signal will get delivered to the thread and kill it.
This is the profit.
While job searching I went on an interview where I got asked why I did a side gig listed on my resumee. "For friend or for money?" It was neither and I said "I don't know, because programming is fun?"
I got hired.
Why are you learning how to create linux processes with no memory? Why are you hacking overloading into Python? Why are you writing programs for fun? To dig unnecessarily deep and force your way into programming's secrets.
So, for money.
In other words : you can't derive a motivation from an action an individual does if it's something that they're forced to do so socially or legally.
Everything in between made it seem like making money my No. 1 focus was necessary, though. A lifetime of being manipulated and pressured. It all gets better when you can finally pull back the curtain seeing what's really there. :)
"There are advantages and disadvantages to each type of sleep. Interruptible sleeps enable faster response to signals, but they make the programming harder. Kernel code which uses interruptible sleeps must always check to see whether it woke up as a result of a signal, and, if so, clean up whatever it was doing and return -EINTR back to user space.
The user-space side, too, must realize that a system call was interrupted and respond accordingly; not all user-space programmers are known for their diligence in this regard.
Making a sleep uninterruptible eliminates these problems, but at the cost of being, well, uninterruptible. If the expected wakeup event does not materialize, the process will wait forever and there is usually nothing that anybody can do about it short of rebooting the system.
This is the source of the dreaded, unkillable process which is shown to be in the "D" state by ps.
Given the highly obnoxious nature of unkillable processes, one would think that interruptible sleeps should be used whenever possible. The problem with that idea is that, in many cases, the introduction of interruptible sleeps is likely to lead to application bugs.
As recently noted by Alan Cox:
Unix tradition (and thus almost all applications) believe file store writes to be non signal interruptible. It would not be safe or practical to change that guarantee."
That's the whyness to all of this...
> If the process is in uninterruptible sleep then the process can’t be interrupted, which will cause the strace process itself to hang forever. Remarkably, it appears that the ptrace(2) system call is itself uninterruptible, which means that if this happens you may not be able to kill the strace process!
Everything about this is broken and makes no sense. The Night Watch is as relevant as ever. (https://www.usenix.org/system/files/1311_05-08_mickens.pdf)
$ pip2 install fuse
ERROR: Packages installed from PyPI cannot depend on packages which are not also hosted on PyPI.
fuse depends on gunpowder@ git+https://github.com/funkey/gunpowder@721718b6569b47a2f5d5d6633c76c85f779e25c7
It may already be installed!
$ python2 fs.py x
Traceback (most recent call last):
File "fs.py", line 8, in <module>
AttributeError: 'module' object has no attribute 'Operations'
Python 2.7.17 (default, Nov 7 2019, 10:07:09)
[GCC 7.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import fuse
from fuse import fuse
I've found it very accessible. Largely, the way that they teach is to describe a system based on a set of assumptions, then slowly relax each of the assumptions one by one until you reach an example of a real system. That style of teaching really works for me.
Uninterruptible process sleep is covered in section 22.3. Threads, in chapters 29–33. Signals, chapters 20–22. The proc file system in section 12.1. Memory mappings, chapters 49–50.
Uninterruptible sleep, signals, memory mappings etc have not changed fundamentally.
Namespaces and cgroups are new concepts in Linux, they won't be covered. IIRC user namespaces appeared in 3.14. New cgroup hierarchy got somehow usable much later than that.
The new concepts are needed for containers. Not at all relevant for the original article here.
Not sure why people care what it's written in. I'm pretty sure people wouldn't complain if this was a blog post written in english rather than a program written in C and python2.
Fuse is restricted; probably for reasons like this.
Basically, what’s happening is that the FUSE daemon which is the one
handling the FUSE requests has /dev/fuse open. There is a thread in
that FUSE daemon. let’s call the FUSE daemon P10, and the thread P11.
P10 wires up a FUSE filesystem on /tmp, it opens /dev/fuse with FD5
P11 enacts an uninterruptible (blocking operation) on /tmp/foo, called
OP1, and /tmp/foo is FD6.
P10 reads the operation (OP1) from /dev/fuse, so now OP1 is in userspace
----The Pid 1 of the namespace is killed--
P10 just terminated, and cannot make progress. It will never respond
to OP1. FD5 and FD6 remain open, because P11 is in uninterruptible
P11 is in an uninterruptible disk sleep waiting for OP1 to respond,
and the fuse connection never aborts.
FUSE abortion doesn’t kick in because the FD of `/dev/fuse` that P10
originally opened as the FUSE daemon under FD5 will not be closed
until all threads as part of the process are terminated. The mount
namespace wont be torn down until P11 is torn down. P11 will never be
torn down because it’s waiting for someone to do something.
In terms of the actual unmap call, the payload is in the munmap call, which is a syscall, and if you do have to set up the registers for that call, Go does have its assembler support which should be able to do that for you. You should be able to write Go source code that would perform that call, although you're going to be poking into some corners of Go most programmers never have to get into. You can also get to fuse, which in my experience is great at producing uninterruptible sleeps even when you're not trying to do it on purpose....
However, in terms of reliability and the long term, you have the problem that you're not the only thread in the program the way a C program can count on that, and if any of those other threads wake up and their memory is missing, it'll probably result in the process dying. You can do a couple things to try to avoid that a little, like pinning yourself to an OS thread and running the program in one CPU mode, but especially on the latest version of Go where they've implemented true pre-emption, you can't keep the runtime from running by just hogging the execution thread anymore.
So I suspect that as long as you win the race to unmap everything before the runtime wakes up (which may also require you to call sysmap through the assembly code directly, rather than using the syscall libraries as I'm pretty sure those notify the scheduler of what you're doing), you may be able to briefly be a process with no RAM mapped, but in human terms it won't be long before the runtime wakes up to do something, anything, and crash the process as a result. There's no way to avoid that in Go. You won't be able to have a process just sitting there indefinitely with no mapped RAM that you can admire and treasure and hand off to your children as part of their inheritance.
Note that if you link against libc I think it would be legal for it to create a thread behind your back, so even in this case you're not quite safe.
 Disclaimer: may take slightly longer than heat death of current universe.
 Alternative solution: https://m.xkcd.com/1266/
The program gets a list of all the allocated pages first, then creates one more for the JIT code (and a copy of the list) which is unmapped as the last thing it does. There really is no other way of doing it.
(I haven't voted either way and have no strong feelings on it, just pointing out how doomed this line of objection/correction is. Personally I generally say 'they' or 'the OP', and did long before (all my life) I was aware of anyone having pronoun preferences, it's just correct isn't it? Never used to be a problem. 'Is that a person over there?' 'Yes, they're just in a funny pose.')
"How dare you use a hammer! We're all using Hammer 2 now! I don't care that it was 'Only one nail," put it down and pick up Hammer 2! Your hammer is what, ten years old? That's ancient! It's not even compatible with my Hammer 2! What do you mean, 'They hit the same fucking nails!' That's not the point! I can't track how many nails you're hitting per-minute now, or even take advantage of the nails you've hit! It's barbaric!"
Can you imagine?
It's a sucky situation to be in, but we should probably get everyone on the same hammer.
(If we want a somewhat less nonsensical analogy, replace hammer with screwdriver. Two incompatible screw head patterns and you can only use one driver on any particular object.)
So sure don't nitpick a toy project, but if someone's writing significant code in python 2... it's not ideal.
But all of this is a dumb tangent, and we should just appreciate how funny it is to ask that this nonsensical pointless quasi-OS-breaking code be done in python 3 instead.
I'll admit that I did not expect it to start a thread.
I'm a volunteer teacher at a Girls Who Code club. We use Python. I spent the last class before the universe collapsed into Cornoavirus hell introducing the kids to the concept of functions.
After introducing the concept, I asked the kids to modify a quiz program they'd written with a different teacher to use functions to ask a question and check the answer, instead of copying and pasting the code each time. This was a pretty challenging assignment—these kids hadn't encountered functions before—but one of the girls managed to get it and I was really proud of her.
Except her code wouldn't run because—I realized with horror—the quiz program she'd written with another teacher was actually python 2 code!
I am still, weeks later, so damn frustrated that this happened. I'm trying to get the kids to wrap their heads around functions; I don't want to start explaining the political BS around python 2 and 3.
It's primarily my fault, of course, and secondarily the python foundation's fault for handling this transition so badly. But every new project that's done in python 2—whether serious or a toy—is also contributing to the issue in a very small way.
So while this isn't a conversation I would have personally brought up in this thread, I think it's all just very unfortunate. Python is a nice language and it doesn't deserve this crap.
It doesn't seem like it was your fault, and it's understandable that you're frustrated.
I don't, however, agree that anything written in Python 2 is contributing to that problem. Just because code exists doesn't mean it has to be used by other people, and this is a proof-of-concept obviously outside of the range of the class you were teaching.
It's the Python Foundation's fault, and it doesn't seem like anyone else's. Python 2 is nicer to write from some perspectives, and, for example, the idea of somebody who didn't like Perl 6's changes being told to write Perl 6 instead of Perl 5 or else they were being harmful doesn't really make sense either.
Except Perl 6 (Raku) is now marketed as an entirely separate language, whereas Python 3 is the version of Python that has superseded Python 2.
As in, trying to run one program with both.
Yes, this file, completely in isolation, has no problem.
But people want to different pieces of python code together. If one or the other only works with a specific version, it becomes a problem when you try to combine them.
Therefore, best practice is to get everyone on the same version. We want to stop having to ask which version everything is, and facing the disappointment when things can't go in the same program. We want to get back to the situation where it's just 'hammer', and that's good enough for 95% of people.
This is like complaining that the well is poisoned when you're the one who put arsenic in it.
This also isn't even an open source project, so it's not like you can mix and match anyway.
> This also isn't even an open source project, so it's not like you can mix and match anyway.
Again focusing on the execution of one specific project, as-is, is beside the point of "get everyone on the same hammer".
No, but you're complaining about a person using Python 2, the one that didn't arbitrarily break compatibility, while promoting the use of Python 3 because not being compatible is bad.
I agree with this.