

Checkedthreads: bug-free shared memory parallelism - suraj
http://www.yosefk.com/blog/checkedthreads-bug-free-shared-memory-parallelism.html

======
zeteo
>In a fork/join program, you need just two orders: run all loops from 0 to N.
run them all backwards, from N to 0.

Does this hold for the following code:

for (j = 0; j < 3; j++) a = j % 2;

In both forward (0, 1, 2) and backward (2, 1, 0) order, the final value of a
is 0. But with the j==1 loop running last, a becomes 1 instead.

~~~
_yosefk
Depending on how you look at it :)

* It does work in the sense of reordering every pair of instructions that could ever run in parallel.

* It doesn't work in the sense that results aren't changed; this is true for a lot of more likely programs - for instance, a+=arr[j], and many others.

Reordering every pair of independent instructions doesn't guarantee that
you'll find all the races; that's why checkedthreads has the Valgrind tool. It
does find a whole lot of bugs very quickly, and it's also needed for the
Valgrind tool to actually cover all the races - but it's not sufficient by
itself.

To find all the bugs using just event reordering, you'd need to try something
closer to _all possible way to interleave independent instructions_ and that's
a boatload of orders...

~~~
zeteo
More generally the issue seems to be that if two different loop counter values
j1 and j2 write the same thing to shared memory, then all the writes made by
values between j1 and j2 are masked from this kind of testing. This seems like
a potentially common case, especially with long loops and / or few possible
values to write (e.g. booleans).

~~~
_yosefk
Well, yeah. The reason such reordering is worth mentioning is because it's
necessary for instrumentation to do its thing, and because in practice, we've
found that this finds a load of bugs by itself; which is good because
instrumentation is slow. (A compiler pass could be better than runtime
instrumentation perhaps, but then you'd have problems with bugs involving
compiled libraries.)

You can, of course, run with more schedules (checkedthreads lets you do this
using env CT_SCHED=shuffle CT_RAND_SEED=654 or some other number) and then
more bugs would be found, but you quickly reach diminishing returns compared
to just running instrumentation (especially because a lot of bugs are not
found by such coarse-grain reordering at all, for example, anything involving
accumulators, counters, thread-unsafe allocators, the settings of bits in bit
masks, etc.)

------
casca
Great writeup and it looks like an excellent tool. We'll definitely give it a
try. Parallelization is really hard to get right and any tools that can help
are much appreciated.

~~~
_yosefk
Glad to hear that - and do contact me (Yossi.Kreinin@gmail.com) if anything
goes wrong, or if you need a feature.

It's working very smoothly for us, to the point where nobody is worried about
parallelism bugs any more - but while it's basically the same approach, the
code itself is new, so I could have new bugs in there as well.

~~~
profquail
Would you mind adding a license to the code (e.g., Apache 2.0)? GitHub's TOS
says that without one, code is considered "all rights reserved" -- putting any
users on shaky legal ground.

Very cool (and practical!) project though.

~~~
_yosefk
My README.md says the code is "free" as in "do whatever you want with it" -
isn't it good enough? As in, does the license have to be in some specially-
named file or what-not?

~~~
profquail
Unfortunately, I don't think "do whatever you want with it" is specific enough
for the lawyers of the world ;)

The license doesn't need to be a specially-named file, though most people use
something sensible like 'license.txt' or 'LICENSE'. For example, here's one of
my in-progress hobby projects:

<https://github.com/jack-pappas/fsharp-tools>

The Apache 2.0 license is a "do whatever you want with it" license (i.e., a
permissive license).

~~~
_yosefk
OK; I kind of thought WTFPL was a good enough license and mine was pretty much
similar except for the F part.

Since I got 3 different people asking for a license, I might as well find one.
I'd like something _really_ permissive that also lets you strip the thing and
doesn't have to be at the top of every file and doesn't require to give
credit... Something that lets you do whatever you want to.

------
octo_t
Do you have any proofs of correctness for this at all? You claim to be able to
detect almost all data race conditions, but I don't see much evidence for that
in the blog post.

Is all of the necessary work being offloaded to Valgrind?

~~~
_yosefk
Well, no formal proof, but give me one counter-example and maybe we can work
from there :)

The post does have an informal kind of "proof" (there are various degrees of
"formality"...) where I cover the various cases.

What's "all of the necessary work"? You mean is there an overhead due to
checking? I think very little - maybe you could count as overhead the fact
that you can swap schedulers at run time, the cost here is a call through a
function pointer. (I could have done it as a compile time option of course, I
just don't think the tiny overhead is worth the rather large trouble to the
user.)

~~~
octo_t
By necessary work I meant "Valgrind has associated proofs of correctness so
transforming parallelisations into things valgrind can more easily diagnose".

(Disclaimer: my research is on safer parallel programming)

~~~
_yosefk
I'd be glad to discuss this over email if you like; the upshot is that I don't
think there's anything formally proven anywhere along the way, but it might be
interesting to discuss if it theoretically could be (that is, given a
hypothetical imperative language, is fork/join code possible to fully verify
as I believe or not.)

