
Notes on concurrency bugs - joeyespo
http://danluu.com/concurrency-bugs/
======
bluejekyll
> all of the programs studied were written in C or C++, and that this study
> predates C++11. Moving to C++11 and using atomics and scoped locks would
> probably change the numbers substantially

I keep seeing this claim, "C++ is better now!" Does anyone have any experience
that really defends this claim?

One of the reasons that I've become so enthralled with Rust is that it's
adopted a memory access system which aligns with everything that I've learned
over my career in distributed systems:

-all data const/final by default -No nulls, fully initialized structs -semantics that require the developer to adopt good practices in accessing memory across threads

I can go on, but it's already obvious to me that Rust is a huge leap forward
in terms of threading and the guarantees it makes.

I left C++ a long time ago, and it happened to coincide with a atomic
increment bug/memory leak that took me 2 weeks to track down in the STL String
library. This was back in 2000, early days of multiprocessor x86. This exact
thing is solvable with the new C++ atomic support, but does it guarantee that
you'll use it across threads like Rust?

~~~
MaulingMonkey
> This exact thing is solvable with the new C++ atomic support, but does it
> guarantee that you'll use it across threads like Rust?

No.

That said, stuff like Clang's thread safety annotations can help:
[http://clang.llvm.org/docs/ThreadSafetyAnalysis.html](http://clang.llvm.org/docs/ThreadSafetyAnalysis.html)

> I left C++ a long time ago

Lucky ;).

C++, and all the supporting tooling around it, is better than it used to be,
make no mistake. But C++ is still no Rust - make no mistake on that either.

------
bratfro
The footnotes held quite a gem. It's common knowledge that the patent system
is broken, but I cannot believe for the life of me someone was able to patent
how to swing on a swing. My word.
[https://www.google.com/patents/US6368227](https://www.google.com/patents/US6368227)

~~~
rumcajz
Note that it's sideways swinging, not the normal one that's patented! :)

~~~
DHMO
The rocket-propelled over-the-top one isn't patented, though.

[https://www.youtube.com/watch?v=HrrorPT8jsM&feature=youtu.be...](https://www.youtube.com/watch?v=HrrorPT8jsM&feature=youtu.be&t=45s)

~~~
amelius
Well if the original patent was defined in terms of an abstract force, like it
should have been, then the rocket-propelled one is patented too. All advice
I've heard so far about filing patents is to keep things as abstract as
possible.

------
sitkack
> 70% of bugs had simple fixes

> 30% were fixed by ignoring the badly timed message and 40% were fixed by
> delaying or ignoring the message.

Makes me think our message passing based concurrency frameworks should do this
automatically. This is made even simpler if vast portions of the application
is pure and generates simpler transactions to be applied to a state store.

~~~
sitkack
we need a type system over the allowable time constraints that a process will
accept.

------
jeffreyrogers
Did Dan change the styling on his site? I remember it being much easier to
read before. Now there is almost no styling at all.

~~~
100k
Yeah, I think he did. Ironically, it reads great on mobile -- the text is full
width (which on my phone is a great line length) and there's no design getting
in the way.

~~~
jeffreyrogers
Yeah, I think it would look fine with a max-width style added. I get the
allure of lightweight design but it doesn't take much effort to make it easier
on the eyes. I know I can override the default styles, but it isn't hard to
get right in the first place.

------
nkurz
Perhaps an editor could mark this (2016) despite this being the opposite of
the usual custom? I've learned not to get excited when someone posts one a Dan
Luu article, since it's usually something old that I've already seen. But
despite the lack of a date at the top, and despite starting with references to
2010 and 2008 papers, this one is actually new!

On further thought, maybe it would better to change the title to claim it's
from (2010), wait for enough people to complain, then "something", then use
that momentum to convince Dan to finally put dates on his articles. Just need
to figure out what that "something" should be...

\--

I looked into Thread Sanitizer (libtsan) recently, and was happy to see that
it's supported on recent GCC as well. Documentation is a little strange, as
it's split between a Google Wiki on Github and Clang, while the source is in
LLVM:

[https://github.com/google/sanitizers/wiki/ThreadSanitizerCpp...](https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual)

[http://clang.llvm.org/docs/ThreadSanitizer.html](http://clang.llvm.org/docs/ThreadSanitizer.html)

[https://llvm.org/svn/llvm-
project/llvm/trunk/lib/Transforms/...](https://llvm.org/svn/llvm-
project/llvm/trunk/lib/Transforms/Instrumentation/ThreadSanitizer.cpp)

I was spooked by this FAQ on the Google Wiki page, though:

    
    
      Q: My code with C++ exceptions does not work with tsan. 
      A: Tsan does not support C++ exceptions.
    

Does this mean that it does not work at all on code that is written with
exceptions, or that it might have false-positives or false-negatives when
exceptions actually happen at runtime?

\--

For other tools, Intel has offers their "Parallel Inspector":
[https://software.intel.com/en-us/intel-inspector-
xe](https://software.intel.com/en-us/intel-inspector-xe). I haven't tried it,
but it sounds like it would be useful for these issues:
[https://software.intel.com/en-us/get-started-with-
inspector](https://software.intel.com/en-us/get-started-with-inspector). Does
anyone know how it compares with TSan?

\--

    
    
      An example of an atomicity violation is this bug from MySQL:
      Thread 1:
        if (thd->proc_info)
          fputs(thd->proc_info, ...)
      Thread 2:
        thd->proc_info = NULL;
    

While definitely a concurrency bug, I'm surprised that this would happen
frequently enough to create numerous bug reports unless there is also an
undesired compiler optimization that's removing the "guard" in Thread 1. That
is, the window of opportunity seems very small if the code is being executed
as written. I didn't look at the details of the linked bug reports, but I
suspect the compiler is able to reason based on something earlier that
thd->proc_info must be non-null at this point, and thus has omitted the check.

If this is the case, it's possible that "Stack" would have caught this bug as
well, or at least highlighted it as a place where the generated code was
different than the programmer's intent. Stack is painful to install, and seems
abandoned, but does catch flag some bugs that other tools miss:
[https://github.com/xiw/stack/](https://github.com/xiw/stack/)

\--

Does anyone know of other tools in this space? I'm still hoping there's a
"silver bullet" I haven't found yet.

~~~
sitkack
The only effective tool I have found is Rust, a correctness checker for race
conditions.

~~~
nickpsecurity
The first, practical one was Concurrent Pascal (1975) used in a number of
OS's:

[http://brinch-hansen.net/papers/](http://brinch-hansen.net/papers/)

Later, Eiffel's SCOOP model in 90's was immune to races for a long time with
researchers doing mods for better speed, deadlock detection, livelock
detection, etc. It was ported to Java at one point. The research page in the
link below shows they're probably still the top players in this given steady
stream of results.

[https://en.wikipedia.org/wiki/SCOOP_(software)](https://en.wikipedia.org/wiki/SCOOP_\(software\))

Works in combination with Eiffel's Design-by-Contract which can knock out
semantic errors he mentions:

[https://www.eiffel.com/values/design-by-
contract/introductio...](https://www.eiffel.com/values/design-by-
contract/introduction/)

Ada's Ravenscar also did safe concurrency. Ada 2012 and SPARK have Design-by-
Contract with SPARK also proving absence of common errors in code
automatically. Cyclone was a C variant that used region-based memory
management and analysis to show absence of dangling pointers, etc. Rust
improved on that with a better language, dynamic safety, and race-free
concurrency.

So, there's been stuff resistant to concurrency problems for quite a while
among people using safer languages. Rust is just the latest and most open.

~~~
sitkack
Much of Cyclone was an inspiration for Rust. Digging through the Hansen
papers. Thank you.

------
spudlyo
Why are these academic types analyzing bugs in MySQL!? Is there a reason they
didn't choose PostgreSQL, or is it just because these studies were all done in
2009 and 2010? Surely if the analysis would have been done in 2016 they would
have picked a database with higher quality concurrency bugs to study.

~~~
v4tk
PostgreSQL does not have bugs - there is even no bugtracker
[https://lwn.net/Articles/660468/](https://lwn.net/Articles/660468/) . It is
hard to analyze bugs if there is no central place for them.

~~~
jacquesm
> PostgreSQL does not have bugs

What a nonsense. Postgres has over 14K issues listed against it, they just
don't use a bug tracker, they use a mailing list instead. It's a holdover from
the old days.

[https://www.postgresql.org/list/pgsql-
bugs/2016-07/](https://www.postgresql.org/list/pgsql-bugs/2016-07/)

