Neat Parallel Output in Python

cb321 · 2024-02-25T12:29:50 1708864190

You could also just send output to separate files by re-opening stdout/stderr and use https://www.vanheusden.com/multitail/index.html (or GNU screen or tmux or whatever) to multiplex a terminal in a more organized way. This also solves an "open problem" in the article of stray prints.

If you really want to share one terminal / stdout but also prevent timing-based record splitting, you could also send outputs to FIFOs / named pipes and then have a simple record boundary honoring merge program like https://github.com/c-blake/bu/blob/main/funnel.nim with doc at https://github.com/c-blake/bu/blob/main/doc/funnel.md As long as "record format" is shared (e.g. newline-terminated) this can also solve the stray print problem.

ryan-duve · 2024-02-25T15:18:46 1708874326

When you say "multiplex a terminal" with tmux, do you mean splitting screens so the same terminal window has multiple shells prompts in it? I'm trying to understand how that would be used to address the problem in the demo at the bottom of the post.

cb321 · 2024-02-25T17:41:14 1708882874

I will try to be more concrete than I was (though I did link to some Nim/shell that "does it all" the 2nd way I mentioned).

j.py:

    import multiprocessing as mp, contextlib as cl, time
    def f(x):
        name = mp.current_process()._name
        with open("o." + name, 'a+') as f:
            with cl.redirect_stdout(f):
                print(x * x)
                time.sleep(2)
    mp.Pool().map(f, range(36))

Then

    shell$ python3 j.py & sleep 1; multitail o.*

On Unix that should create $(nproc)-panes (i.e. screen areas) of output in your terminal, each pane logging any output each worker process did. (EDIT: For fancier progress, multitail -cT may help; See the man page.)

Some Py MP expert could perhaps show how to do this with only one `open` or a nicer `name`. As a bonus - the log/status outputs are not erased but left in "o.Stuff". The cost of that is having log files to clean up.

Separate files means no contention at all and any stray prints from any libraries go into the log file. It's a separate exercise for the reader to extend this to stderr capture (either in e.* or also in o.*).

And, really, as popular as Python and MP are, all of this is probably written up somewhere.

Someone else should outline imitating `multitail o.*` in tmux, I think, as it is more involved, but a pointer to get started is here: https://github.com/tmux/tmux/wiki/Advanced-Use (That's all I have time for today. Sorry.)

tgsovlerkhgsel · 2024-02-25T13:34:40 1708868080

I've got the impression that in practice, what keeps developers from letting trivially parallelizable tasks run in parallel is a) the overhead of dealing with poor parallelization primitives and b) the difficulty in properly showing the status of parallel invocations.

Having good features to support this in standard libraries would go a long way to incentivizing devs to actually parallelize.

btown · 2024-02-25T16:04:49 1708877089

In a way, it's a subset of the distributed systems tracing problem - you have multiple tasks running in parallel on the same node, but they will have been initiated as different (sub) tasks, and should be tracked by the specific task via which they were initiated. So systems like OpenTelemetry and Honeycomb can be great for this, allowing you to see events in aggregate as well as in the context of a trace that propagates between different threads and systems.

https://opentelemetry.io/docs/languages/python/getting-start... https://docs.honeycomb.io/getting-data-in/opentelemetry/pyth...

But there's so much complexity there that IMO it's best left outside of standard libraries - and it's indeed a daunting amount of new vocabulary for newcomers. I'm not aware of simpler abstractions on top of the broader telemetry ecosystem for monitoring simple parallelization, but arguably there should be one that keeps things quite simple.

dr_kiszonka · 2024-02-25T18:26:58 1708885618

Agreed. I have been using joblib for a good few months. It is fine, but I still haven't figured out basic things like printing the status of process-based jobs.

[Parsl is much better, e.g., logging is built-in, but it can be a little overwhelming.]

RandomWorker · 2024-02-25T14:57:16 1708873036

Yes, totally agree. I’ve written some code and I’d rather convert it to C using cpython before I paralyze it. Python is horrible for both these things, and you may not even get a better speed increase because of the overhead. It’s like use cpython get 10-100x better speed with a few lines of code, or spend my whole day in a horrible mess of data structures and getting my functions to work with map properly with maybe nothing to show for it.

eyegor · 2024-02-25T15:42:25 1708875745

Do you mean cython? I've never seen a pure c extension in "a few lines", last time I wrote one there was a ton of boilerplate.

agumonkey · 2024-02-25T14:44:06 1708872246

Every year I write a similar threaded cli monitor .. Now maybe rich can solve this for all, but i'm surprised it took so long to emerge.

appplication · 2024-02-25T15:13:39 1708874019

Yes this is 100% the type of thing that should be in a standard library but also the type of thing I have no doubt Python steering would feel better belongs in a 3rd party library.

We do see some cool stuff under the hood from core Python devs but interest in further quality of life features seems to be lacking.

acover · 2024-02-25T15:17:31 1708874251

What's a good way to show the status in the command line?

For a one off project it seems simpler to just write an html UI.

rciorba · 2024-02-25T11:01:32 1708858892

Having the workers acquire the lock and update the terminal themselves seems like it would cause lock contention.

An alternative would be to have only the main process do the updating and have the workers message it about progress, using a queue.

convivialdingo · 2024-02-25T17:25:20 1708881920

Funny enough Glibc has a lock for threaded printing internally. You can disable the lock in glibc with the __fsetlocking function and the FSETLOCKING_BYCALLER parameter.

I had a threaded server that we were debugging which would only dump state correctly if we deleted a printf right before in a different thread. Really confused me until I figured this out.

ametrau · 2024-02-25T14:39:08 1708871948

This is what I would do. And every process has at least some state output always (so there are no blank lines).

tekknolagi · 2024-02-25T15:18:28 1708874308

The real life processes take 30+s so contention isn't a big problem

wolfskaempf · 2024-02-25T10:28:41 1708856921

I like the self-built approach especially for the learning value.

If you’re using this in a CLI tool you’re writing in Python you might be using the library rich anyway, which provides this functionality as well including some extra features.

https://rich.readthedocs.io/en/stable/progress.html

morningsam · 2024-02-25T11:47:38 1708861658

What I don't like about rich is that, dependencies and all, its installed size comes out to around 20 MB. 9 MB of that is due to its dependency on pygments for syntax highlighting, which a lot of people probably don't even want/need.

If anyone knows of a smaller, more focused library providing something similar to rich's Live Display functionality, I'd appreciate it.

Ringz · 2024-02-25T12:11:01 1708863061

There is an open issue [1] on GitHub to make it more modular and get rid of markdown and syntax highlighting but I have no hope for rich to get more minimal.

[1]: https://github.com/Textualize/rich/issues/2277

Jakob · 2024-02-25T12:15:02 1708863302

It’s a bit obvious, but you could contribute to make pygments an optional dependency, and then not install that. Or you fork it outright.

vijucat · 2024-02-25T18:29:19 1708885759

I believe that you can use QueueListener and QueueHandler to serialize the calls:

https://docs.python.org/3/library/logging.handlers.html#logg...

tekknolagi · 2024-02-25T20:53:54 1708894434

Nice suggestion!

pynchia · 2024-02-25T16:43:18 1708879398

Nice idea. However, please note: `map(func, repos)` does not do anything. Your code does not do any processing. Try it.

You need to consume the iterator that map returns.

Please use Python as it is supposed to be used, not as Fortran.

ShamelessC · 2024-02-26T07:46:44 1708933604

Lighten up.

tekknolagi · 2024-02-25T20:54:10 1708894450

Fixed; thanks

isoprophlex · 2024-02-25T11:36:00 1708860960

The provided code doesn't seem self-contained, as if it would run just like that without modification. For example, where does 'num_lines' come from?

Edit: never mind. Ignore that. There is a link on the page which I overlooked, with a more complete example. https://gist.githubusercontent.com/tekknolagi/4bee494a6e4483...

Cool stuff. Now I'm eager to find a way to make this work for multiple tqdm progress bars, running in parallel.

ibejoeb · 2024-02-25T13:52:46 1708869166

You probably don't want to run this gist directly. Looks like a risk of a runaway process creation, depending on the platform. The process creation code should be guarded by `if __name__ == "__main__"`.

tekknolagi · 2024-02-25T20:54:30 1708894470

What does this mean? Runaway?

ametrau · 2024-02-25T14:39:41 1708871981

Only a windows issue

tclover · 2024-02-25T13:52:24 1708869144

Cool! Out of interest, I tried to do this in Java :) https://github.com/cloverthe/java-parallel-output/blob/maste... here is video: https://streamable.com/yxm05v

tekknolagi · 2024-02-25T16:35:31 1708878931

Cool, thanks for porting! Looks great

globular-toast · 2024-02-25T12:32:15 1708864335

I prefer using a purpose-built tool like GNU parallel. Parallel's only purpose is to run things in parallel and collect the results together. The advantage is you only have to learn to use it once, rather than learn to do this again and again in all the different languages/tools you might use.

masklinn · 2024-02-25T14:10:45 1708870245

I can’t help but notice you have not explained how to handle progress reporting in gnu parallel.

Also do you really use parallel from software trying to parallelise its internal workload? Note that in tfa the workers are an implementation detail of a wider program.

RhysU · 2024-02-25T16:54:44 1708880084

> you have not explained how to handle progress reporting in gnu parallel.

`parallel --eta` provides progress reporting over all tasks based upon observed completions.

> in tfa the workers are an implementation detail of a wider program.

TFA could write the workers as standalone processes then subprocess out to GNU Parallel.

At that point, one might not even want the Python outer process in this article. I often write little shell wrappers that establish non-trivial GNU Parallel or Make invocations under the hood. Generally, leaning on common workhorses for multiprocessing seems more sensible than rolling my own one-off.

prosaole · 2024-02-26T11:00:22 1708945222

Try:

    parallel --latestline seq ::: {1..10}0000000

(Requires version 20220522)

atoav · 2024-02-25T16:01:05 1708876865

So you have an example how to spawn gnu parallel from python and displaying progress?

nargella · 2024-02-25T12:51:11 1708865471

interesting footer link https://notbyai.fyi/

LtWorf · 2024-02-25T14:37:08 1708871828

One could use the dialog command, for which there is a "dialog" python module to show a list of progress bars.

https://pythondialog.sourceforge.io/images/screenshots/mixed...

hackan · 2024-02-25T19:03:58 1708887838

I don't like the wild use of globals, even if they are "guarded" by locks. And then, oh boy there're locks! But, it surely works, so that's nice. It would be cool to have a small lib that solves this nicely :thinking:...

echoangle · 2024-02-25T15:31:44 1708875104

I've used a separate printing thread printing everything from a queue, and had the other threads push everything they want to print to this queue. Is there some advantage to doing it like in the post over the queue method?

huac · 2024-02-25T13:13:56 1708866836

clever! will have to see if this works with tqdm progress bars, has anyone tried that?

masklinn · 2024-02-25T14:03:04 1708869784

tqdm has a position parameter which allows offsetting concurrent progress bars. It should work automatically for intra-process concurrency anyway. I don’t know if it works correctly with multiple processes though.

huac · 2024-02-25T14:34:40 1708871680

yeah my code needs to use multiprocessing, which does not play nice with tqdm. thanks for the tip about positions though, that helped me search more effectively and came up with two promising comments. unmerged / require some workarounds, but might just work:

https://github.com/tqdm/tqdm/issues/1000#issuecomment-184208... https://github.com/tqdm/tqdm/issues/811#issuecomment-1368850...

masklinn · 2024-02-25T15:59:19 1708876759

Yeah frankly as an other commenter noted I’d probably use the parent process as orchestrator of IO in which case tqdm would work fine, for that sievific case at least.

cl3misch · 2024-02-25T10:09:15 1708855755

This is indeed neat, but I would be surprised if there wasn't already a library for such multi-line functionality.

rnmmrnm · 2024-02-25T10:51:33 1708858293

cool call me when they got one for go lol

tekknolagi · 2024-02-25T14:49:57 1708872597

This should work just fine as is ported to Go