
Threads Basics (2009) - timClicks
http://www.hpl.hp.com/techreports/2009/HPL-2009-259html.html
======
vnorilo
This is all good, but I want to point out that threads are not the unsafe bit,
the unsafe bit is _shared mutable state_. If you can use a model where that is
not a problem (data flow arch, STM), threads cease to be scary.

~~~
gpderetta
Someone has to implement the data flow engine or the STM. The article is about
the low-level details (read hardware memory model and basic is synchronisation
primitives) required to do so.

~~~
vnorilo
sure! I was just reacting to the general threading angst in the comments.

------
gpderetta
I knew it was Hans Boehm just from the hp.com domain.

For those who do not know him, he is the co-author of the Boehm garbage
collector and one of the primary authors of the C++11/C11 memory model.

~~~
agumonkey
Pretty career

    
    
        Google 
        HP Labs (Researcher, Research Manager) 
        SGI (Software Engineer) 
        Xerox PARC (Researcher)

~~~
gpderetta
IIRC when he was at SGI, he worked very close with Alexander Stephanov on the
STL.

Edit: right, he implemented the rope data structure in the original STL, which
never made it into the standard.

~~~
agumonkey
Oh yeah, I found the rope paper online

------
taneq
Basic rules for multithreading:

1) Don't. You don't need to.

2) Still don't.

3) If you really, really do need threads, keep it as simple as possible and
make sure that every access that needs to be synchronized, is synchronized.

I've been programming for a long time now and I've seen precious few cases
where multithreading is legitimately a better option, let alone the only
option. Usually even then it's just about using more cores.

~~~
sureaboutthis
So don't, unless you need to, and then it's OK?

~~~
taneq
Yes, but in many many cases where people think they need to, there's actually
a simpler and safer way to do it. Too many programmers see a long running task
or an unresponsive UI and immediately declare that threads are the answer,
when they're almost always not.

Admittedly they're much more likely to be useful now with so many spare cores
on every machine, but they're still a hammer when most problems are screws.

~~~
opencl
How is anything other than threads supposed to be a solution to long-running
tasks blocking the UI thread?

~~~
monocasa
Which long running tasks?

Blocking IO? Switch to non blocking and events.

Long running computationally expensive work? Break it into small jobs and have
your job queue be a priority queue.

At that point you're correct regardless of if you've got one thread or many
servicing the queue. And generally the best way to take advantage of multiple
cores os to have it be totally agnostic to the number of cores it's running on
like this anyway.

------
agumonkey
I now understand the appeal of clojure (or FP concurrency):

\- can't mutate

\- except for automatically synchronized entities

have fun

------
jammygit
My favourite intro to threads is just the Modern Operating Systems text.
Fantastic author

------
dragontamer
Two big developments in recent years are:

1\. SIMD "threads" being programmed with traditional programming languages
(ie: CUDA C++). These are "false threads", they run in groups of 32 (NVidia)
or 64 (AMD) and have unique characteristics compared to traditional threads.

2\. The push for "Task-based" parallelism. Instead of writing threads, you
should write tasks. Tasks are then run on a thread-pool. The difference being:
Tasks don't always spin up a new thread. You track "dependencies" between
tasks to minimize communication and synchronization instead. Tasks are both
more efficient than threads AND easier to reason about. The crazy example is
Intel's "Inefficient Fibonacci": [https://software.intel.com/en-
us/node/506102](https://software.intel.com/en-us/node/506102) . The task-based
Fibonacci is both efficient and simple. Threads simply cannot compete.

\-------------------

It turns out that threads themselves are both inefficient and complex. Pure
performance pushes us towards SIMD / GPGPU compute. Even on CPUs, the fastest
computational model is AVX or AVX512.

Simplicity pushes us towards the Tasking model, which happens to be more
efficient in practice anyway. Its far cheaper for thread-pools to swap tasks,
rather than spinning up threads for the scheduler to pass around and manage.

Classical thread based programming seems like a dead end to me. Its too
complex and doesn't give enough returns on modern architectures. The Task
model is built on top of threads, but I expect most programmers to switch to
the Tasking model, and only to leave "Threads" to the OS-devs or System-level
devs far below.

The SIMD model gets everything done efficiently with barriers in most cases
(no need for semaphores or mutexes... indeed... mutexes don't work on
traditional SIMD GPUs like Pascal or AMD Vega... due to how SIMD Wavefronts
execute). Other issues are handled with "CUDA Shared Memory" or "OpenCL Local
Memory" and atomics. Its weird to lose the Mutex or Semaphore as a tool, but
you learn that you really didn't need it once you learn to use Barriers and
Atomics effectively.

SIMD Model gets a LOT done with atomics, popcount, ballot, and various
primitives that are alien to CPU Programmers. Watch a "Merge Path based
Parallel Merge Sort" on a GPU if you wanna see what I mean. Its very alien,
but it works extremely efficiently.

Tasks similarly have a different model compared to Threads. They're the
natural successor. Almost everything I can think of can be more elegantly
expressed as a Task instead of as a Thread. That'd be OpenMP 4.0, Intel TBB,
or Microsoft PPL.

