Hacker News new | past | comments | ask | show | jobs | submit login
Pipes, deadlocks, and strace annoyingly fixing them (complete.org)
57 points by pabs3 on June 20, 2022 | hide | past | favorite | 30 comments



Something hanging is not automatically a "deadlock". If you jiggle it and starts moving again, that tends to be evidence against the deadlock hypothesis. It could be a lost wakeup problem. If something is blocked on a write due to a lost wakeup problem, what it means is that the task was preparing to block a due to no buffer space just around the moment when space became available. Due to a race condition, it blocked anyway, missing the "buffer available" event.

Lost wakeup problems are more readily susceptible to recovery by "jiggling" the situation than deadlocks. If there is a deadly embrace situation, and you randomly wake up some involved task (in such a way that it doesn't just error out) it will typically just spin around and wait again, re-engaging with the deadlock. You can interrupt deadlocks in a breakpoint debugger, and when you hit "go" again, the deadlock typically continues.


People may find the author's bug [1], the first time someone reported the bug I suspect this is to the project [2], the associated workaround [3], or the kernel.org bugzilla entry [4], interesting.

[1] - https://github.com/openzfs/zfs/issues/13571

[2] - https://github.com/openzfs/zfs/issues/13232

[3] - https://github.com/openzfs/zfs/pull/13309

[4] - https://bugzilla.kernel.org/show_bug.cgi?id=212295


If `strace`ing a process fixes your problem you should try `perf trace`ing it instead. It works at a lower level (using kernel tracepoints rather than the ptrace API) so it disturbs the traced process less, at the cost of having less powerful output (it doesn't capture e.g. the actual data being written to a pipe).


This is tangental, but I would say never trust bash pipes.

I wrote a tool to get around the problem:

https://github.com/ThomasHabets/goodpipe

I have more info about the problem in the README and this blog post: https://blog.habets.se/2021/06/The-uselessness-of-bash.html

But tl;dr: If gpg fails in your example, then zfs receive will just think "oh that was all the data, apparently". Hopefully zfs receive is fine with that, and will notice it. But most tools would not.


"Hey, I want to run N copies of this one program in parallel, and I also would like to be able to press Ctrl+C to stop them all at once, surely it's easy?" Yes, but only if you know the right spell:

    (trap 'kill 0' SIGINT; for i in {1..$N} ; do this_one_program $i & done ; wait)
I've seen quote enough of coworkers who struggled trying to Ctrl-C the &-ed copies of programs from a single shell, only to give up and run them from several xterm windows/Terminator tabs instead, can't really blame them.


GNU Parallel handles this nicely, and is quite a powerful GNU toolbox of GNU as well. I use it quite a lot. GNU. Don't mix it up with the (IIRC) Suckless tool of the same name, which is significantly less powerful. GNU.

Just be prepared to silence the GNU.


I don't use gnu parallel because it is not installed by default. Xargs on the other hand, is.


  set -o pipefail
is your friend


I mention this in the blog post. It doesn't fix the big problem. If fixes a smaller problem only.


goodpipe looks interesting! My first thought is that it seems to lack support for bash file descriptor things that I often rely on.


Can't you just refer to these as /proc/self/fd/N ?

Or am I missing what kind of support you need?


I'm not sure and I'm not great with the semantics of these features. For example there's the process-feeds a file type:

command <(other command)

Where "command" sees the output of "other command" as a file.

But other times you do things like rename or use tee to duplicate data to different descriptors and rename them back and forth to jump over one sub command and get used in a different subshell to produce different output from the same input from a different command. I think part of this is called "moving file descriptors". It's one of the things that c-shells totally suck at.

Usually in bash it looks cleanest to break these up into bash functions and then assemble one huge pipe based on that. Like run command rename stdout to like file descriptor six to jump over the next command and then swap them around.

This mostly comes up when I'm writing things that trawl through very slow archives or filesystems to avoid multiple passes. It's hard to think of a simple example, but being clever often cuts processing time by an order of magnitude.

Anyway often "pipe" is probably a bad description. Pipes often look more like chains but what I really want more often is more of a directed graph where processes are nodes and data flows in the top and out to different file descriptors that connect to different nodes. Conceptually it isn't complex--each node has stream inputs and outputs that just need to be connected. It's the semantics of it that suck.

So I sort of liked the json description approach in goodpipe, and it would be cool if there were ways to be more flexible about this. goodpipe as is seems limited to the c-shell style of simplistic pipes.

But... maybe this is possible with named file descriptors in goodpipe. I'll have to think about it.


Ah, those features. Gotcha.

I only implemented a straight pipeline, no forking, merging, or loops. Mainly because I know enough about pipeline technologies to know that this means creating a language that won't be simple, because the problem isn't simple.

I agree, ideally any pipeline should be expressable. Without turning into something better solved by Apache Beam or other existing streaming or batch pipelines.

The main problem goodpipe tries to solve is the ordered termination of pipeline parts, so that the output bit doesn't finalize its output when the input closes. In the general directed graph case it would be necessary to define the order of termination.

So, in short, the problems here are to define the scope of problems/digraphs, design a language that allows expressing that (yet be simple enough to get right), and then the implementation is the easy part.

If you want to help improve goodpipe to handle more use cases then I'd be happy to work with you. Or if you want to make something new then I can cheer from the sidelines too. :-)


Would $PIPESTATUS be a solution for the 'sort fails' issue in your blog post?


No.

As I say in the blog post, I need to kill all the commands to the right of the failing command, and set -o pipefail and $PIPESTATUS can't do that.


I'd simply run the pipe in a subshell, go over $PIPESTATUS one by one and kill the subshell if I found a failed one. But if you insist on killing to the right I'm afraid there's no analogue of $PIPESTATUS array providing list of PIDs.


> if you insist on killing to the right

Well, as described in the blog post it's the entire point, the very problem I'm trying to solve; to notify downstream programs that upstream failed, not finished, before it's had time to act on the fact that its input returned EOF.

But I'm open to other solutions to the problem.

Until I hear anything else though I claiming that bash pipes are unfit for purpose.


> Well, as described in the blog post it's the entire point

then I'm sorry but I can't understand your point. I don't disagree bash's process management facilities are severely lacking in many respects, but your example ain't one of them.

If middle command has failed it must have exited, so did the first and last command. Because it's a pipe. It is what commands in a pipe do. When one process exits so do the others. All you need to know is whether one of the commands has failed (which is provided with pipefail, and PIPESTATUS if you want even more granularity) and you can do cleanup if there was a failure.

If for some bizarre reason the rightmost command still continues to run (despite process to its left has exited) or you want to be extra sure, you may run the pipe in a subshell and end it with 'kill -- -$BASHPID'.

None of these require meticulously setting up named pipes for each command.

I fail to see in what kind of a scenario a command in a pipe would be (or you'd want it to be) still running on the left hand side despite command(s) to its right has exited.

But again, if you insist on doing so, whatever floats your boat.


> I don't disagree bash's process management facilities are severely lacking in many respects, but your example ain't one of them.

It is. I really did upload bad data because of bash's poor pipe handling. Maybe a better example:

generate_data | commit_to_database

commit_to_database reads from stdin and writes to a database. When it reaches EOF it'll COMMIT.

I don't want to commit something half-generated. Maybe commit_to_database replaces the current set of data with the updated version, serving bad data to users.

It needs to know that it's complete.

Another example is the classic "curl https://... | bash". This command is of course a problem. But not for the obvious reason. The obvious reason is that you don't know what it does, yet you just run it. But really most people are not going to audit that shellscript anyway.

So lets say you actually trust that source. There's still a problem. If the file has a command like "rm -fr /tmp/foo", and the TCP connection is cut right after "rm -fr /" then even though there's no end-of-line that "rm -fr /" is still run.

Bash thinks the file just ended, and runs the command. It's "committing" data that was partial, because it thought it was complete.

(this is why bash scripts often actually create functions to do this, and then have the very last line of the file run the function. But they don't all do that)

> If for some bizarre reason the rightmost command still continues to run (despite process to its left has exited) or you want to be extra sure, you may run the pipe in a subshell and end it with 'kill -- -$BASHPID'.

That's still a race condition. And that kills the parent process, not the downstream process, so does not prevent the downstream process from committing.

> I fail to see in what kind of a scenario a command in a pipe would be (or you'd want it to be) still running on the left hand side despite command(s) to its right has exited.

I would not. I'm trying to communicate an "failure, abort, do not commit" to downstream commands.


> commit_to_database reads from stdin and writes to a database. When it reaches EOF it'll COMMIT.

> I don't want to commit something half-generated. Maybe commit_to_database replaces the current set of data with the updated version, serving bad data to users.

Thanks, it's clear now.

I (and I assume others recommending pipefail) was puzzled because idiomatic way to do your sorting example in a regular filesystem is:

    set -o pipefail
    whatever | false | tee output.tmp
    if [ $? = 0 ]; then
        echo Commit
        mv output.tmp output.sorted
    else
        echo Abort
        rm output.tmp
        exit 1
    fi
...which gsutil command can't because it does the commit step implicitly, with no way to do it explicitly, I gather? But at that point I still don't understand how can you avoid the race condition (regardless of the language) and kill gsutil before it does the implicit commit step.

> that kills the parent process, not the downstream process

It kills all child processes, including commands started in a pipe.


Yes, you can dump all intermediate results into a file. But then it's no longer a pipeline.

Sometimes the intermediate result doesn't even fit on disk.


> Yes, you can dump all intermediate results into a file. But then it's no longer a pipeline.

It was an example, not a recipe.

And why wouldn't it be a pipe? It is a pipe, with an explicit commit step. That's what you asked. It isn't an "intermediate" file, it's a journal/log file before the commit, if you will.

While I've never used gsutil, I guess a similar construct would be "gsutil cp - gs://example/test.data.sorted.tmp" followed by "gsutil mv gs://example/test.data.sorted.tmp gs://example/test.data.sorted".

Updated example:

    set -o pipefail
    gsutil cat gs://example/test.data \
       | sort -S300M \
       | gsutil cp - gs://example/test.data.sorted.tmp

    if [ $? = 0 ]; then
        echo Commit
        gsutil mv gs://example/test.data.sorted.tmp gs://example/test.data.sorted
    else
        echo Abort
        gsutil rm gs://example/test.data.sorted.tmp
        exit 1
    fi

Yes, I know it doesn't abort it early. I'm sure it can be done, I'm not sure it can be done neatly (maybe with coproc?).

And what I still don't understand about your example is how solely depending on killing the gsutil cp command early enough isn't racy. There's gotta be an explicit step, no way around that.


The followup blog post says this is a Linux kernel bug:

https://changelog.complete.org/archives/10390-pipe-issue-lik...


Zfs can do encrypted backups so … maybe this gpg stuff is unnecessary


For me, the "-T0" in zstdcat raises major alarm bells. There doesn't exist a simple problem that cannot be made an utter clusterfuck by adding multithreading.

(If you have a single, unseekable stream of bytes on input, and one output stream, the usefulness of threads is evading me.)


Your CPU time processing the bytes greatly exceeds the I/O time reading them off the wire (not to mention generally pipelining I/O and CPU activities).


This "greatly", in case of zstd, is where citation is needed.


My point still stands. Multithreading is not something you "just" throw in. It's riddled with unexpected gotchas you didn't know you had in your own code, plus all quirkiness of the underlying libc and OS.


The amount of unexpected gotchas varies greatly by the programming language you use.


True. Most of my Rust tools are multithreaded, and this has never caused trouble.

But zstd is written in C, is it not?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: