How can the GIL, which is restricted to one process, hinder concurrency? It can ...

zbowling · on July 3, 2013

> Noone does parallel compute on CPUs these days, not since GPGPUs rocked up almost 5 years ago

I want to live in your world where all you are processing is vectors and FFTs in parallel on GPUs and not doing real work (accessing databases, processing data from sockets, etc).

Threading is not dead. It's only crippled in python so everyone wants to invent ways of saying it is dead.

Threading being hard to write is also a fallacy. I use thread backed dispatch queues which make concurrency simple in my language of choice right now. Threading like that is easy thanks to closures and a good design patterns. My apps are entirely async and run heavily parallel and it's easy to maintain and write using that.

3amOpsGuy · on July 3, 2013

Accessing databases, processing data from sockets, are not CPU bound activities? I believe you've misread my post.

For all your IO cases, and all your cases are IO, would you, and future maintainers of your code, not be better served with simpler abstractions which permit scaling past a single host?

colanderman · on July 3, 2013

> Accessing databases, processing data from sockets, are not CPU bound activities?

I work on a product (network device) which involves all of these activities, and they are all memory latency bound. The overhead of task switching is far too high to recover any benefit from task-switching during memory stalls.

To top it off, the product performs a significant amount of computation, almost none of which fits a SIMT GPU model (i.e. there is a lot of branching).

The only performance solution for our product available from today's hardware is CPU parallelism.

zbowling · on July 3, 2013

I wasn't referring to the IO bound side of it but the general work involved with everyday generic work that was not something that a GPU can do very well. It's silly to say the answer to doing parallel is to throw it on the GPU.

But referring to the IO side debate, the current design of many of the libraries that you call in the C world are often inheirtly blocking. 'gethostname' for example is a blocking call. There is no async version of it. To use them without contention on your single threaded application, you have to call them from worker threads.

The common pattern is to spin up a thread to call it and do work on it. It's easier often to have your workers be thread bound like that to simplify your code and only lock shared resources when you need them. I can also make a massively async version of all my code that handles everything using async methods and in many cases this is better but it's harder to write and not always an option. Something I have to deal with daily because I run into the C10K problem all the time at work (http://en.wikipedia.org/wiki/C10k_problem).

Even in the async model though I still want to be running code in parallel and I would still rather build that model up with thread powering it and not multiple processes and shared memory.

3amOpsGuy · on July 3, 2013

A GPU, as you know, doesn't exist in isolation. It sits on a multicore host. The load of input data and the writeback of results does not occur from the GPU as I suspect you know. Maybe in future with unified memory this will be possible but not on current devices.

The actual computation, the bit that was previously multi threaded (or more commonly, multi process) on a CPU, now lives on a GPU. I'm not sure what's silly? The compute bound workload, is now done on the GPU. The IO workload is still done on the CPU, in an inherently single threaded fashion. Even when the multi process computation was done on the CPU, load and store operations were still single threaded. This stands to reason since there is no advantage in splitting 500 concurrent hosts connections into 500* CPU cores connections to hit a central data repository with...

I can't think of any code off the top of my head that calls gethostbyname repeatedly. Maybe a network server of some description which is doing reverse lookups to allow for logging purposes? Although that seems inefficient, I can't think of a real time use case for the host name when you're already in possession of the IP, I can only think of logging / reporting uses cases which would be better served doing the lookup after the fact / offline.

If that's a valid example of what you're suggesting, then would the existing threaded code not be more efficiently implemented asynchronously? There's a finite limit to the number of threads you can create and schedule for these blocking calls, at some point you will have to introduce an async tactic. At that point, why not drop the threading altogether?

You say you would rather build a model on top of threads. Why? Does it make your testing simpler? Does it reduce the time for new starts to get up to speed with your code? Does it reduce the SLOC count? Is it simpler to reason about?

I hope you would agree, in all these cases and many more, threading is at a significant disadvantage. I stand by the assertion that its dead(-ish).

The ish qualifier comes from another case we've not discussed, yet!

coldtea · on July 3, 2013

>Noone does parallel compute on CPUs these days, not since GPGPUs rocked up almost 5 years ago (and we often use python as the host language, thanks pyCuda!)

You'd be surprised. Furthermore, I call BS.

3amOpsGuy · on July 4, 2013

Who can afford to throw CPUs at parallel compute problems today with GPGPUs available? Oil and Gas industry? Nope, the 3 biggest are nVidia customers in a big way. Finance? Nope, some of the smaller companies here have stepped past even GPGPUs and are now co locating FPGAs in the exchanges. Big pharma? Not that I know of, also onto GPU clusters in the 2 big cases I know there.

So yes I would be surprised. Surprise me?

Calling BS on?

coldtea · on July 4, 2013

Calling BS based on the fact that the VAST majority of people doing work with Python use Numpy and similar tools, and not GPUs based for their work.

And it's not the "oil and gas industry" or finance, which might have been your expertise and might use GPUs, but are nowhere near being even a large minority of Python use.

It's scientific computing. This is Python's largest niche that needs to parallel compute problems.

yummyfajitas · on July 4, 2013

GPUs are great for array processing. Not all workloads fit that model. That's why computers still come with a CPU.