So what is the consensus view on how to do parallelism in python if you just hav...

hot_gril · 2023-09-14T21:40:31

People here mention Pool, and I've seen it many times. It's this: https://docs.python.org/3/library/multiprocessing.html#intro...

  from multiprocessing import Pool

  def f(x):
      return x*x

  if __name__ == '__main__':
      with Pool(5) as p:
          print(p.map(f, [1, 2, 3]))

This forks out up to 5 processes. f(x) runs fully in parallel for each input. The inputs and outputs sent between processes via pickling.

WinLychee · 2023-09-14T21:18:44

if you have a task that is easy to split, make a python script that runs on a subset of the task, split into N subsets, and write one output per process? Once they all complete, join together the outputs. Maybe https://docs.dask.org/en/stable/ is a good start if you want a framework. I don't think there's a consensus, it depends on the problem.