

Working simultaneously vs. waiting simultaneously - drjohnson
http://www.yosefk.com/blog/working-simultaneously-vs-waiting-simultaneously.html

======
dkersten
_If you want performance, it 's kinda funny to use pure Python and then mourn
the fact that you can't run, on 8 cores, Python code that's 30-50x slower than
C to begin with._

I really dislike this attitude. If Python is much easier, more pleasant and
_safer_ to write than C, then being able to get a ~8x (or even 4x) speedup by
running on an 8 core machine is a perfectly reasonable thing to want to do.

~~~
_yosefk
Well, let's say you can't use numpy or similar - "C written by someone else"
\- and further, let's imagine a world where CPython has no GIL. Then there are
2 options:

1\. You rewrite things in single-threaded C, getting a speed-up of 30-50x.

2\. You rewrite things in multi-threaded Python, getting a speed-up of say
4-8x.

I argue that 1 is often actually _safer_ than 2, because parallelism-related
bugs are harder to hunt down than C's memory-overwriting bugs (this can be
mitigated with good APIs and a good race detector working against these APIs -
as in Cilk or
[https://github.com/yosefk/checkedthreads](https://github.com/yosefk/checkedthreads)
\- but it still requires good testing coverage).

Now if 2 were more pleasant and safe than 1, I might agree with you even
though 1 gives the better speed-up. But I believe 1 is faster _and_ safer.

~~~
dkersten
It really depends on the problem though. There are plenty of problems which
are reasonably easy to get a parallelism speedup (even if not close to the
_ideal_ speedup) by only rewriting or restructuring small parts of the
program, but where rewriting in C would be a hell of a lot of complex work.

A lot of people write terrible C code with a plethora of hard to find memory
bugs. A lot of other people don't even know C at all. The GIL means that
people can't even try[1] to get some speedup from Python without having to
reach for C.

[1] I know, I know, Python lets you use multiple processes for parallelism. I
also dislike the attitude that this is a an adequate enough solution that
thread-based parallelism sn't needed, but that's a story for another day.

~~~
sampo
> _" A lot of people write terrible C code with a plethora of hard to find
> memory bugs"_

Wee it wouldn't need to be C. Also Ada, D, Fortran, Go, Java (Scala?), Lisp,
OCaml, (Rust), or some such, would give speed close to that of C.

~~~
dkersten
Not all of those have easy interop with Python though (since we're talking
about Python here) and it requires people to _know_ one of those languages.

------
justinmk
Rob Pike's "Concurrency is not parallelism"[1] slides enlightened me to the
distinction being made here. I think it's valuable and over-due for the
industry[2] to define the boundaries of these concepts (which are often
regarded as amorphous or synonymous).

[1] [http://blog.golang.org/concurrency-is-not-
parallelism](http://blog.golang.org/concurrency-is-not-parallelism)

[2] or am I conflating "industry" with "me"?

~~~
mkohlmyr
I would guess most programmers understand the distinction between the two.
However understanding the difference in theory doesn't always mean one applies
successfully it in practice. E.g. if you were unaware of the CPython GIL you
might think threading implies parallelism.

------
rdtsc
I think this is rather confusing and it introduces un-necessary abstractions
-- "working" and "waiting".

It is easier to think of it as:

* concurrency -- a property of the algorithm or the business process where multiple things can be can happen at the same time. Think about mergesort for example, you can sort the left and the right parts of the list at the same time. There is no guarantee that when the program executes concurrent things will actually execute at the same time. Or think of fetching web pages from multiple websites with no need to wait between them.

* parallelism -- a property of the execution environment. Based on the specific hardware, and OS facilities, some of the concurrent things can actually happen at the same time. Can fetch the web pages from 100 sites at the same time. Will actually sort both sides of the array on two separate CPU cores. And so on.

It is as simple as that. There is some fuzziness around the two. Like saying
can one have parallelism without concurrency. Maybe your compiler noticed how
it could transform your for loop into a SIMD operation. Ok, is that
concurrency and parallelism or is it just parallelism. I don't really bother
being that pedantic at that point. Is it concurrency only if user specifically
uses provided language concurrency primitives or does "implicit" (let compiler
figure it out) concurrency also count kind of stuff.

But, wait, that's not all. There is another axis of comparison that is useful
in practice, and that is CPU vs I/O.

* CPU : Mergesort example above would be an example of mostly CPU concurrency and parallelism. You can spawn two threads to make it explicit and if you have more than one core, you might get some CPU parallelism.

* I/O : You an fetch 2 pages at once using a thread per page and your socket layer works properly you might get good parallelism.

I/O is where the discussion of the GIL comes in. Python has great I/O
parallelism. GIL doesn't really matter much for it. I have written a spyder
like I described and it was very performant, very nice, just using plain on
GIL-ified threads.

Anyway it is important to keep these concepts clear to be aware of them. These
are important today and I feel they are not taught well enough in academia
(not in this explicit, comparative fashion).

~~~
derefr
An interesting property of CPU vs. I/O concurrency, to me, is that while
I/O-heavy concurrency works fine with cooperative or reduction-counting
scheduling, CPU-heavy concurrency basically _requires_ a preemptive scheduling
approach (i.e. multiple native threads.)

A lot of people would assume that on a single-core machine, you could get away
with running, for example, Erlang with only a single native scheduler-thread
for its tasks. This would be true until the first time you called a C-FFI
function that took a full CPU-second to return. Then all the other tasks in
your system would wake up wondering what year it is and how their beards grew
so long. Erlang added "dirty schedulers" (basically, extra scheduler-threads
that will get spawned above-and-beyond the default one-per-core-with-CPU-
affinity set) precisely so the preemption of CPU-heavy tasks could be pushed
off into the OS's capable hands.

~~~
rdtsc
> This would be true until the first time you called a C-FFI function that
> took a full CPU-second to return.

As you said, up until last release that would be true in general. Even if you
had 16 CPUs running C NIFs inside of it that block for longer than 1ms at a
time would probably be a bad idea.

Word of warning, think really well about compiling in and sticking native code
(NIF) drivers in the middle of the Erlang VM. You lose reliability, fault
tolerance, predictable low latency response. Good examples of NIFs could be
computing a hash function, or something similar that would be constant time
and easily tested. Not something like getting a piece of data from a database.

~~~
derefr
I think the best case for dirty schedulers in Erlang isn't for running CPU-
intensive tasks within the same VM that's doing your IO, though; it's for
running a separate Erlang node, on a separate machine, and sending it CPU-
heavy work to do over the distribution protocol. Effectively, an isolated
Erlang node + dirty schedulers + NIFs is a souped-up, easier-to-code-for C
node.

------
noonat
> Indeed you can hardly do many things at once - not in a single pure Python
> process. One thread doing something takes the GIL and the other thread
> waits.

> You can however wait for many things at once just fine - for example, using
> the multiprocessing module (pool.map), or you could spawn your own thread
> pool to do the same.

This paragraph is a bit deceiving. The multiprocessing module does spawn
subprocesses by default, so it can indeed be used to workaround a GIL issue
and _do_ many things at once. It's not the same as a thread pool (although the
multiprocessing module does offer a compatible interface which uses threads).

~~~
justinmk
> spawn subprocesses by default, so it can indeed be used to workaround a GIL
> issue and do many things at once

But that defeats the purpose of the distinction, because when you wait on a
web request, I/O, etc, a machine or peripheral is also doing something on the
other end.

------
cmbaus
This is a really good way to put this distinction. Most applications use
threads because they are waiting on I/O, not because they are concurrently
processing.

------
fleitz
ZeroMQ implements this pattern really well.

It's just single threaded code... all the threading happens in the
background...

There's also apache storm which abstracts you from all the gory details of
spinning up workers, pools, tracking message failure, etc.

