Hacker News new | past | comments | ask | show | jobs | submit login

So what is the consensus view on how to do parallelism in python if you just have something that is embarassingly parallel with no communication between processes necessary?



People here mention Pool, and I've seen it many times. It's this: https://docs.python.org/3/library/multiprocessing.html#intro...

  from multiprocessing import Pool

  def f(x):
      return x*x

  if __name__ == '__main__':
      with Pool(5) as p:
          print(p.map(f, [1, 2, 3]))
This forks out up to 5 processes. f(x) runs fully in parallel for each input. The inputs and outputs sent between processes via pickling.


if you have a task that is easy to split, make a python script that runs on a subset of the task, split into N subsets, and write one output per process? Once they all complete, join together the outputs. Maybe https://docs.dask.org/en/stable/ is a good start if you want a framework. I don't think there's a consensus, it depends on the problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: