Hacker News new | past | comments | ask | show | jobs | submit login

> For those, Python actually comes with pretty decent tools: the pool executors.

Delusion level: max.

You have to be in a very, very bad place when this marginal improvement over absolute horror-show that bare Process offers seemed "pretty decent".

Python doesn't have good tools for parallelism / concurrency. It doesn't have average tools. It doesn't have even bad tools. It has the worst. Though, unfortunately, it's not the only language in this category :(




> It doesn't have even bad tools. It has the worst.

> It's not the only language in this category

Soo....not the worst? :) Or tied for it?

What do you find difficult/wrong with pool executors?

Also, you reference "Process", but FYI the article talks about multiple threads, not multiple processes.


Pool executors only solve one kind of use case. They aren't a general solution to concurrency+parallelism.

And they're still the worst version of this pattern, because despite using multiple OS-level threads with all the associated overhead, the GIL prevents most of the real parallelism from happening. And if you want full parallelism, you have to use multiprocessing.Pool, which adds pickling overhead and incompatibility.


> Soo....not the worst? :)

Yeah... I know, it's hard to imagine that there could be more than one worst. But, as I have to practice these things with my 4 year old, I become more patient with adults who don't get the concept too.

Imagine you are in a class and the teacher gives everyone a pencil and a sheet of paper. Now, you want to find out who has the shortest pencil. All students compare their pencils and turns out that there are several pencils that are of the same exact length, and those are the shortest ones at the same time. So, more than one student has the shortest pencil.

But it doesn't end there. Not all sets which define a "greater than" relationship are totally ordered. In such sets it's possible to have multiple different smallest elements. Trivially, in a set that's not ordered, every element is the smallest.

> What do you find difficult/wrong with pool executors?

Difficult? -- I don't know.

Wrong? -- Well, it's pretty worthless... does it make it wrong? -- That's up to you to decide.

The idea of threads is bad for many reasons: one in particular is of how exceptions in threads are handled. But this isn't unique to Python. Python just made a bad decision to use threads in the language that's supposed to be "safe". Python thread implementation craps its pants when dealing with many aspects of threads. For example, thread-local variables. Since threads are objects in Python, you'd expect local variables to be properties on those objects... well the mechanism to use them is just idiotic and nothing like you would expect. When it comes to interacting with "native" code from Python, you'd expect some interaction with Python's scheduler so that the native code can portion its own execution, allow Python to interrupt it etc. but there's nothing of the kind.

Even though we haven't even gotten to the pools yet, pools, obviously, don't address any of the thread-related problems. If anything, they only amplify them. Specifically, the pool from concurrent package is worse than its relative from multiprocessing package because it uses "futures". The whole idea of "futures" is somehow broken in Python because of the neverending bugs related to deadlocking. It's been repeatedly "fixed", but every now and then deadlocks still happen. Here's the latest one I know of: https://bugs.python.org/issue46464 .

I've gone once down the rabbit hole of trying to make a native module work with Python threads... there's no good way to do it, but pools, be it from concurrent.futures or from multiprocessing are both very bad for many reasons. I was hoping to be able to give users an ability to control how parallel my native code is through the tools exposed by Python already, but that turned out to be such a disaster that I've given up on the idea. Python's thread wrappers are worthless for the native code that wants to actually execute concurrently -- they are only designed to execute Python code, non-concurrently. Like I already mentioned, Python has no infrastructure to communicate to the native code its scheduling decisions, no thread-safety in memory allocation, the code is overall poorly written (as in missing const, other imprecise typing, memory-inefficient data-structures)... there are no benefits to using that vs rolling your own. Only struggle with bad decisions.


If it's the worst, how is it not the only language in this category?

How do you rank C, Perl, JavaScript, PHP, ... parallelism compared to execution pool + futures here? The absolute MAX WORST?


It's possible to have more than one worst. In a totally-ordered collection this happens if you have two or more equal elements, which happen to be worse or equal to any other element. In a partially-ordered collection, there could be groups of elements that are not comparable between each other, and so you will potentially have multiple distinct worst elements.

Trivially, in a collection that has no "worse than" relation you can define one that doesn't compare them at all, and declares them all "incomparable" -- which, again, would make them all worst.

Bonus question: can you imagine a collection where there is no worst element?

> How do you rank C, Perl, JavaScript, PHP

Well, none of these languages have their own parallelism / concurrency aspect. (Except Perl 5 maybe? I'm not really familiar with the language). They all rely on the system running them to do the parallelism.

So... all of these will go roughly into the same bin as Python?

Some languages have libraries that would allow them to do better (eg. you have PThreads in C), but that's not the function of the language.


Well execution pool doesn't even do parallelism really, just concurrency for the most part (thanks GIL). And JavaScript handles concurrency far better than Python; its event loop is designed for just that. JS and Py can also use subprocesses for true parallelism.

C and Java threads are better than Python because, uh, they can actually run in parallel. Rust adds convenience and safety on top, plus its own event loops. Golang has Goroutines. Erlang has some very powerful solution that I don't remember.

IDK about PHP and Perl, barely touched them. Maybe they're worse than Python for this. Everything else isn't. Python was not originally built with these use cases in mind, which is totally fine, but I'm not going to pick Python if I'm doing complex concurrency/parallelism. For simple process pools, Python is good enough.


While I'm painting with broad brush, I'll guess that the parent divides languages into two categories "Go" and "the worst".


Not really.

There are languages which put at least some effort into parallelism / concurrency (and Go would be one of those along with Java, Erlang, Ada, Clojure, even C++ to some extend).

Then there are languages which outsource everything to the system: eg. Lua, Ruby. They have a way in the language to make a system call, and so if the system can create multiple processes or multiple threads, they can use that.

There are languages that have no way to do even that. For example JavaScript, XSLT or SQL. Surprisingly, a lot of these handle concurrency very well in their runtimes due to automatic parallelization performed by the runtime (not the language).

Python is the language that has neither design nor discernible goals. It has some parallelism in the language, but it's lacking important components which are then either outsourced to the system, or aren't there at all. Because of the randomness of the "design decisions" Python cannot also be reliably automatically parallelized, nor do the developers have reliable tools for building parallel applications, especially not in a modular way because different modules may not agree on the way to go about parallelization.

Python has always been a language where you need to be really knowledgeable about things outside of Python and about Python's own implementation details to get ahead. If all you knew was Python, you'd do very poorly. This is in contrast to languages like Java, which put a great deal of attention towards making sure that even the dumbest programmer will not screw up too much.

Now the people who know how to use Python well are gone, and the language is gradually transforming into Java. But it still has a very long road ahead before it can do enough hand-holding for the losers. Parallelism is one of those things where the goals are very far and so far, mostly, unattainable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: