You could also just send output to separate files by re-opening stdout/stderr and use https://www.vanheusden.com/multitail/index.html (or GNU screen or tmux or whatever) to multiplex a terminal in a more organized way. This also solves an "open problem" in the article of stray prints.
If you really want to share one terminal / stdout but also prevent timing-based record splitting, you could also send outputs to FIFOs / named pipes and then have a simple record boundary honoring merge program like https://github.com/c-blake/bu/blob/main/funnel.nim with doc at https://github.com/c-blake/bu/blob/main/doc/funnel.md As long as "record format" is shared (e.g. newline-terminated) this can also solve the stray print problem.
When you say "multiplex a terminal" with tmux, do you mean splitting screens so the same terminal window has multiple shells prompts in it? I'm trying to understand how that would be used to address the problem in the demo at the bottom of the post.
I will try to be more concrete than I was (though I did link to some Nim/shell that "does it all" the 2nd way I mentioned).
j.py:
import multiprocessing as mp, contextlib as cl, time
def f(x):
name = mp.current_process()._name
with open("o." + name, 'a+') as f:
with cl.redirect_stdout(f):
print(x * x)
time.sleep(2)
mp.Pool().map(f, range(36))
Then
shell$ python3 j.py & sleep 1; multitail o.*
On Unix that should create $(nproc)-panes (i.e. screen areas) of output in your terminal, each pane logging any output each worker process did. (EDIT: For fancier progress, multitail -cT may help; See the man page.)
Some Py MP expert could perhaps show how to do this with only one `open` or a nicer `name`. As a bonus - the log/status outputs are not erased but left in "o.Stuff". The cost of that is having log files to clean up.
Separate files means no contention at all and any stray prints from any libraries go into the log file. It's a separate exercise for the reader to extend this to stderr capture (either in e.* or also in o.*).
And, really, as popular as Python and MP are, all of this is probably written up somewhere.
Someone else should outline imitating `multitail o.*` in tmux, I think, as it is more involved, but a pointer to get started is here: https://github.com/tmux/tmux/wiki/Advanced-Use (That's all I have time for today. Sorry.)
I've got the impression that in practice, what keeps developers from letting trivially parallelizable tasks run in parallel is a) the overhead of dealing with poor parallelization primitives and b) the difficulty in properly showing the status of parallel invocations.
Having good features to support this in standard libraries would go a long way to incentivizing devs to actually parallelize.
In a way, it's a subset of the distributed systems tracing problem - you have multiple tasks running in parallel on the same node, but they will have been initiated as different (sub) tasks, and should be tracked by the specific task via which they were initiated. So systems like OpenTelemetry and Honeycomb can be great for this, allowing you to see events in aggregate as well as in the context of a trace that propagates between different threads and systems.
But there's so much complexity there that IMO it's best left outside of standard libraries - and it's indeed a daunting amount of new vocabulary for newcomers. I'm not aware of simpler abstractions on top of the broader telemetry ecosystem for monitoring simple parallelization, but arguably there should be one that keeps things quite simple.
Agreed. I have been using joblib for a good few months. It is fine, but I still haven't figured out basic things like printing the status of process-based jobs.
[Parsl is much better, e.g., logging is built-in, but it can be a little overwhelming.]
Yes, totally agree. I’ve written some code and I’d rather convert it to C using cpython before I paralyze it. Python is horrible for both these things, and you may not even get a better speed increase because of the overhead. It’s like use cpython get 10-100x better speed with a few lines of code, or spend my whole day in a horrible mess of data structures and getting my functions to work with map properly with maybe nothing to show for it.
Yes this is 100% the type of thing that should be in a standard library but also the type of thing I have no doubt Python steering would feel better belongs in a 3rd party library.
We do see some cool stuff under the hood from core Python devs but interest in further quality of life features seems to be lacking.
Funny enough Glibc has a lock for threaded printing internally. You can disable the lock in glibc with the __fsetlocking function and the FSETLOCKING_BYCALLER parameter.
I had a threaded server that we were debugging which would only dump state correctly if we deleted a printf right before in a different thread. Really confused me until I figured this out.
I like the self-built approach especially for the learning value.
If you’re using this in a CLI tool you’re writing in Python you might be using the library rich anyway, which provides this functionality as well including some extra features.
What I don't like about rich is that, dependencies and all, its installed size comes out to around 20 MB. 9 MB of that is due to its dependency on pygments for syntax highlighting, which a lot of people probably don't even want/need.
If anyone knows of a smaller, more focused library providing something similar to rich's Live Display functionality, I'd appreciate it.
There is an open issue [1] on GitHub to make it more modular and get rid of markdown and syntax highlighting but I have no hope for rich to get more minimal.
You probably don't want to run this gist directly. Looks like a risk of a runaway process creation, depending on the platform. The process creation code should be guarded by `if __name__ == "__main__"`.
I prefer using a purpose-built tool like GNU parallel. Parallel's only purpose is to run things in parallel and collect the results together. The advantage is you only have to learn to use it once, rather than learn to do this again and again in all the different languages/tools you might use.
I can’t help but notice you have not explained how to handle progress reporting in gnu parallel.
Also do you really use parallel from software trying to parallelise its internal workload? Note that in tfa the workers are an implementation detail of a wider program.
> you have not explained how to handle progress reporting in gnu parallel.
`parallel --eta` provides progress reporting over all tasks based upon observed completions.
> in tfa the workers are an implementation detail of a wider program.
TFA could write the workers as standalone processes then subprocess out to GNU Parallel.
At that point, one might not even want the Python outer process in this article. I often write little shell wrappers that establish non-trivial GNU Parallel or Make invocations under the hood. Generally, leaning on common workhorses for multiprocessing seems more sensible than rolling my own one-off.
I don't like the wild use of globals, even if they are "guarded" by locks. And then, oh boy there're locks!
But, it surely works, so that's nice. It would be cool to have a small lib that solves this nicely :thinking:...
I've used a separate printing thread printing everything from a queue, and had the other threads push everything they want to print to this queue. Is there some advantage to doing it like in the post over the queue method?
tqdm has a position parameter which allows offsetting concurrent progress bars. It should work automatically for intra-process concurrency anyway. I don’t know if it works correctly with multiple processes though.
yeah my code needs to use multiprocessing, which does not play nice with tqdm. thanks for the tip about positions though, that helped me search more effectively and came up with two promising comments. unmerged / require some workarounds, but might just work:
Yeah frankly as an other commenter noted I’d probably use the parent process as orchestrator of IO in which case tqdm would work fine, for that sievific case at least.
If you really want to share one terminal / stdout but also prevent timing-based record splitting, you could also send outputs to FIFOs / named pipes and then have a simple record boundary honoring merge program like https://github.com/c-blake/bu/blob/main/funnel.nim with doc at https://github.com/c-blake/bu/blob/main/doc/funnel.md As long as "record format" is shared (e.g. newline-terminated) this can also solve the stray print problem.