Faster Parallel Python Without Python Multiprocessing

heyflyguy · on June 1, 2019

I am a terrible python scriptwriter. Still, I use my own scripts that I have written through trial and error and reading lots of stack overflow. I process about 500 images per day and each one takes about 30 seconds each. Adrian Rosenbrock has been a real lifesaver. I have a machine dedicated to this. I tried using multiprocessing once and could not get it to work. Being able to process in a parallel fashion would be a gamechanger for me.

The beauty of python for someone like me is that I can get my job done without actually having to do it. I free up my own time to leverage more of my creativity and have a multiplicative effect on my productivity.

The reason I bring all of this up is that so many of the examples for advanced libraries I see are geared towards seasoned software engineers. The examples include false arrays of data so that you can "just see" how to use it. I don't think anyone realizes how confusing this is to the guy who is a restaurant manager, or the gal who is a researcher that just needs to know how to make this work for them.

If it's an image, how about putting image = cv2.imread("C:\image.jpg") or whatever?

Anyway, a bit of a rant but there are people who are very thankful that smart people in this world like yourself write libraries we can use to make our daily lives better. Including example code that is stupid simple would make me so much happier.

mike_mg · on June 1, 2019

While I appreciate the efforts of authors and believe in long term mission, they seem to not mention anywhere some key shortcomings of Ray, while marketing it pretty hard (eg see the paper).

I have used ray (a year ago) in one of the advertised basic applications: parallelising the environments for RL. It was unusable back then, as it was clogging up the memory.

The plasma store which is backend for arrow was never cleaned which made the computation stop after 3 hours

Here’s the issue:

https://github.com/ray-project/ray/issues/2128

Or perhaps this has been fixed already?

htfy96 · on June 1, 2019

Every time I see a performance issue/solution of Python, I was wondering why there is no company maintaining a distribution with a high-performance Python JIT compiler with patched, GIL-free packages. Given the prevalence of Python and what have succeeded in JVM, it seems like a fruitful business.

ATsch · on June 1, 2019

The thing is that there's really little benefit from this. Anything performance critical will be written as a native extension anyway, even if Python speeds up by 10x. If you're doing that, you can lift the GIL anyway and run as much in parallel as you want.

protomyth · on June 1, 2019

Isn’t that a case of the tail wagging the dog? If Python code were 10x faster, the need to write performance sensitive code in a native extension would be a lot lower. There is a cognitive cost in having to switch between languages and knowing when to switch.

gameswithgo · on June 1, 2019

its unfortunate that dynamic interpreted languages with easy uptake for small projects get so popular, and then big, serious things have to eat the performance penalty for years and years. The world could use a language that is 'easy' like python/ruby but fast.

Something like a simplified nim/f# with type inference. So it feels dynamic as you code but isn't really.

Or yes if the normal way to run python was with a top notch jit that would be a big plus.

0_gravitas · on June 1, 2019

It's not necessarily C-fast and it is dynamically typed but in the ratio of easiness-to-performance elixir seems to be pretty affable, especially for concurrency stuff. It does tend to balk at some heavier stuff like number-crunching, and that's when I see people start to use Rust NIFs.

carlmr · on June 1, 2019

Elixir also has easy to use environment with mix.

ForHackernews · on June 1, 2019

> The world could use a language that is 'easy' like python/ruby but fast.

IMHO, Julia is that language: https://julialang.org/

ris · on June 1, 2019

> If Python code were 10x faster

That's a pretty big "if". Unfortunately making python a lot faster is difficult because python is such a dynamic language. The pypy folks have done their best, with a respectable amount of success.

protomyth · on June 1, 2019

That's a pretty big "if".

The 10x figure I was quoting was from the comment I was responding to. I do think people need to look at what was accomplished by the Strongtalk team and various other projects dealing with dynamic languages including some Lisp implementations. I'm pretty sure Python can be a lot faster, but the legacy code is going to be a big, big problem.

ris · on June 2, 2019

> various other projects dealing with dynamic languages

Not all dynamic languages are born the same.

> I'm pretty sure Python can be a lot faster

You might like you join the PyPy team then and find out.

protomyth · on June 2, 2019

You missed the but the legacy code is going to be a big, big problem. part. If, as other posters point out, those C extensions are not designed to work without the the GIL, then its going to be a 2->3 situation.

Frankly, I honor the Python folks and all their achievements, but I really dislike the language. I mostly am researching another paradigm.

mushufasa · on June 1, 2019

Most of the time, if performance is a top priority, you'll choose a different language to begin with.

It's nice to have the option to speed up python in certain cases though.

glandium · on June 1, 2019

One problem is that because of the GIL, many python libraries don't use locking on their data, etc., because they don't need to. That makes them thread-unsafe without the GIL.

ris · on June 1, 2019

If you look more closely, it often makes them thread-unsafe even with the GIL. Take a look some time at what operations the GIL can pre-empt a python thread in the middle of.

zaphirplane · on June 1, 2019

You mean libraries with c extensions

glandium · on June 1, 2019

No, even libraries without. Picking something random in the python 3.7 standard library: collections.OrderedDict.__setitem__ doesn't look thread-safe when it updates its linked list. EDIT: well, in fact, the data race in that one is there whether there is a GIL or not...

xapata · on June 1, 2019

Just FYI, there are few remaining use cases for OrderedDict, now that built-in dict keeps insertion order.

ehsankia · on June 1, 2019

Yes, getting rid of the GIL breaks most c extensions, many of which are very important.

e12e · on June 1, 2019

There was/is stackeless with a(n initial?) focus on game engines iirc:

https://github.com/stackless-dev/stackless/wiki

It got less attention than pypy, but might be more pragmatic wrt parallel perf. There's also some work in pypy to remove the GIL, but not sure if there's been any news on that lately.

srean · on June 1, 2019

Stackless is wonderful, but its for concurrency not parallelism. Long time ago when I checked it out it was faster than go in concurrency but not so in parallelism.

riskneutral · on June 1, 2019

Because Python is not typed?

sametmax · on June 1, 2019

Python is strongly typed, by dynamically typed.

grumpyprole · on June 1, 2019

From a type-theoretic point of view, Python is unityped: https://existentialtype.wordpress.com/2011/03/19/dynamic-lan...

Python has a lot of merit as an easy to use language, a modern BASIC, but "strongly typed" is marketing that confuses rather than enlightens (to quote Prof. Harper). The term "strongly-tagged" is perhaps more apt, but sadly that ship has sailed.

codr7 · on June 1, 2019

So what you and Harper are saying is that Python only has one static type? While that is sort of true, I can't see how its leading anywhere. Many static languages have no dynamic types, I think that's worse.

Strongly typed used to mean safe, as in you can't add strings to ints or reinterpret values. As opposed to C and others where anything goes. I think that's a useful distinction.

Imagine all that lovely energy poured into understanding the compromises involved in designing realistic type systems. But its messy, and you don't become famous from dealing with messy problems.

grumpyprole · on June 1, 2019

> Many static languages have no dynamic types, I think that's worse.

I can't think of a static language that doesn't support a dynamic type, it's just that most require explicit casts or unpacking to use with existing typed interfaces, e.g. "Object" in Java.

> Strongly typed used to mean safe

It still does, it's just that type theory doesn't distinguish between a runtime failure with a nice error message versus a segfault, both are programs that "go wrong".

I would be in favour of "strong dynamic types" instead of "strongly typed".

codr7 · on June 2, 2019

I don't know enough Haskell or Rust to claim its impossible (outside of unsafe of course), but it clearly goes against the design of the language and is meant as fallback more than option.

Yes, you can write Java using nothing but Object. And no, the experience won't be comparable to using a dynamically typed language.

Map is not terrain, if theory doesn't match reality then changing the former is the only thing that makes sense.

Categories are generalizations, making them more specific weakens them. A RISC approach works better from my experience, where categories are kept trivial and members belong to multiple categories.

tech_tuna · on June 1, 2019

I love Python but they best way to do do parallel Python is to use Go.