Hacker News new | comments | show | ask | jobs | submit login

I'd like to point out that the Python standard library offers an abstraction over threads and processes that simplifies the kind of concurrent work described in the article: https://docs.python.org/dev/library/concurrent.futures.html

You can write the threaded example as:

  import concurrent.futures
  import itertools
  import random

  def generate_random(count):
    return [random.random() for _ in range(count)]

  if __name__ == "__main__":
    with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
      executor.submit(generate_random, 10000000)
      executor.submit(generate_random, 10000000)
    # I guess we don't care about the results...
Changing this to use multiple processes instead of multiple threads is just a matter of s/ThreadPoolExecutor/ProcessPoolExecutor.

You can also write this more idiomatically (and collect the combined results) as:

  if __name__ == "__main__":
    with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
      out_list = list(
      	executor.map(lambda _: random.random(), range(20000000)))
In this example case, this will be quite a bit slower because the work item (in this case generating a single random number) is trivial compared to the overhead of maintaining a work queue of 200000000 items - but in a more typical case where the work takes more than a millisecond then it is better to let the executor manage the division of labour.



To take advantage of process level parallelism you still have to have a pickle-able function, i.e. defined at the top level in a module.

  In [1]: import concurrent.futures

  In [2]: with concurrent.futures.ProcessPoolExecutor(max_workers=8) as executor:
     ...:     out_list = list(executor.map(lambda _: random.random(), range(1000000)))
     ...:
  Traceback (most recent call last):
    File "/usr/local/Cellar/python/2.7.6_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 266, in _feed
      send(obj)
  PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed


Good point. Just a couple of points on futures: 1) they're backported to python 2[1], and 2) to make the example work, you need a pickleable function as you say, for example, if you have ipython running in a virtualenv:

    import pip
    pip.main(["install","futures"])

    import random

    def l(_):
        return random.random()

    with f.ProcessPoolExecutor(max_workers=4) as ex:
        out_list = list(ex.map(l, range(1000)))

    len(out_list)
    #> 1000
[1] https://pypi.python.org/pypi/futures


Whops, forgot to add a line to import futures:

    import futures as f #Include after pip.main(...


Wow - that is significantly more elegant than what I discussed in the article!

I wasn't aware of the concurrent.futures library, thanks for pointing it out.


concurrent.futures is nice but it's a real shame that ThreodPoolExecutor doesn't take an initializer argument like multiprocessing.Pool does; e.g., if you want a bunch of processes to work on a big data file, it's convenient to have all workers load that file at initialization. See https://code.google.com/p/pythonfutures/issues/detail?id=11


You should probably file your bugs here: http://bugs.python.org/ with the expectation that fixes will be backported.





Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: