If you already know Python, the advice in this article is certainly a lot easier...

pdimitar · 2023-09-14T11:35:20

Certainly. My point is that if you need to write that much code and/or do that much research, at one point the effort of doing it in another language will be less than to keep insisting on using a tool that's not designed for it.

It happened with me and many other former colleagues.

Though obviously, everyone decides for themselves when does that point come -- or if it comes at all.

FreakLegion · 2023-09-14T11:46:20

The point of the article is a handful of lines. The rest is accoutrement like the URL list and timing code. But sure, if

    tasks = {}
    for url in URLs:
        future = executor.submit(fetch_url, url)
        tasks[future] = url

bothers you, this is perfectly (some would say more so even than the original) Pythonic:

    tasks = {executor.submit(fetch_url, url): url for url in URLs}

networked · 2023-09-14T17:50:24

I have found another way in the documentation for `concurrent.futures`. You can use `Executor.map` (https://docs.python.org/3/library/concurrent.futures.html#co...). It eliminates the need to wait on the futures explicitly.

  def main():
      with ThreadPoolExecutor(max_workers=len(URLs)) as executor:
          for url, title in zip(URLs, executor.map(fetch_url, URLs)):
              print(f"URL: {url}\nTitle: {title}")

The default value of `max_workers` since Python 3.8 has been

  min(32, os.cpu_count() + 4)

You should probably avoid

  max_workers=len(items_to_process)

It will not save memory or CPU time when you have few items (workers are created as necessary) and may waste memory when you have many.

throwaway46281 · 2023-09-14T13:21:51

As a side note, using a future as a map key struck be as a bit weird, though perfectly valid. It'd be more natural IMO to use a list for the futures, and have the fetch_url function return a (url, result) tuple. Or use the url as the map key and just iterate over the map items instead of using as_completed on the keys

Demiurge · 2023-09-14T12:25:10

What “much research” are you talking about?

The amusing part is that the article calls out two groups of people into which your advice falls.

It’s not that much code, it’s about 4 lines of code, creating a “pool” and calling a wait on future objects.

This is a perfect solution for Python developers who have been perfectly happy using Django for years, and just need to scrape some API or download multiple files.

No, they shouldn’t switch to a different language the moment they need to optimize something embarrassingly parallel, they can see whether a simple solution in stdlib is enough, and probably move on.

oefrha · 2023-09-14T13:56:15

If this is too much research for you, wait until you have to deal with the many problems of Go channels in the real world. (Reasonably well-known though controversial article: [1]) Don't even get me started on Rust. Concurrency and parallelism is hard.

Yes, I've written a shit ton of code in all aforementioned languages.

[1] https://www.jtolio.com/2016/03/go-channels-are-bad-and-you-s...

pid-1 · 2023-09-14T11:58:18

> and/or do that much research,

Is reading the official docs section on concurrency lots of research?