
Augmenting Agile with Formal Methods - joshuacc
https://www.hillelwayne.com/post/augmenting-agile/
======
agentultra
I'm experimenting with including formal methods on my team. We're working with
concurrency in an event-sourced system and want to be reasonably certain our
design is correct. We're also working on deeply embedding our specifications
into our test suites to address some of the trade-offs mentioned near the end.
We like to eat our cake.

I'm down with advocating more use of formal methods in industry and open
source. The cost of using these tools has come down at such a rapid pace in
the last decade or so compared to the rising cost of complexity our system
designs are taking on. It's starting to make the trade-off attractive imho.

~~~
jdp23
Agreed - with better and lower-cost tooling the tradeoffs get steadily more
appealing.

Back in 2005ish I put together a panel with Kent Beck of extreme programming
fame and Martyn Thomas of Praxis, which was doing commercial development using
formal methods and Spark/Ada. We were hoping for fireworks but it turns out
they were in agreement on a lot of stuff: formal methods need to be used in
the context of a pragmatic development process, sometimes iteratively
exploring is the right approach, for high reliability you need to complement
testing with other techniques like formal methods. Concurrency was one of the
examples where there was general agreement that testing and code inspection
just isn't enough.

~~~
nradov
Static analysis tools for common languages are getting fairly good at
detecting many types of concurrency defects.

------
daxfohl
Thanks for writing this article. I've been reading TLA+ for a while but the
examples in Lamport's book all seem so irrelevant to my day to day work that I
eventually gave up. I think starting with a motivational example like this is
a better approach. I look forward to the book.

~~~
blub
You might find SPIN more approachable, the syntax is similar to a cross
between C and bash :)

Here's an example from the book "The SPIN model checker", by Holzmann, the
creator of SPIN:

    
    
        mtype = { P, C, N };
    
        mtype turn = P;
        pid who;
    
        inline request(x, y, z) {
            atomic { x == y -> x = z; who = _pid }
        }
    
        inline release(x, y) {
            atomic { x = y; who = 0 }
        }
    
        active [2] proctype producer()
        {
           do
           :: request(turn, P, N) ->
                   printf("P%d\n", _pid);
                   assert(who == _pid);
                   release(turn, C)
           od
        }
        active [2] proctype consumer()
        {
           do
           :: request(turn, C, N) ->
                   printf("C%d\n", _pid);
                   assert(who == _pid);
                   release(turn, P)
           od
        }
    

And this is Dekker's mutual exclusion algorithm from the same book:

    
    
        bit     turn;
        bool    flag[2];
        byte    cnt;
    
        active [2] proctype mutex()
        {  pid i, j;
    
           i = _pid;
           j = 1 - _pid;
        again:
            flag[i] = true;
            do
            :: flag[j] ->
                if
                :: turn == j ->
                    flag[i] = false;
                    (turn != j) ->   /* wait until true  */
                    flag[i] = true
                :: else ->
                    skip               /* do nothing       */
                fi
            :: else ->
                break                  /* break from loop  */
            od;
    
            cnt++;
            assert(cnt == 1);          /* critical section */
            cnt--;
    
            turn = j;
            flag[i] = false;
            goto again
        }
    

I don't like the TLA syntax, which is quite different from a typical
programming language. In this blog's deadlock example it looked much nicer
than usual for some reason.

~~~
pron
TLA+ is intentionally made to look different from a typical programming
language because it _is_ very different. Even a simple programming language is
far more complex than the simple mathematics underlying TLA+. If all you want
to do is to verify some simple assertion about a relatively simple program,
then SPIN might indeed suit you better (assuming you can express what you want
in SPIN). But TLA+'s main contribution is to allow you to design and deeply
understand a complex system, and how and why it works, by reducing it to its
bare essentials, and describing it at whatever level of detail you're
interested in (often in multiple levels of detail, which are then formally
related to one another).

------
blub
Extreme programmers were dogmatic about specific development practices like
unit testing and for that and other reasons XP has fallen out of favor. Scrum
and Kanban seem to be the big winners.

Unit testing is really bad at catching concurrency errors, interactions
between components, domain errors and undefined behavior.

I admit that verification based on temporal logic (like used by TLA or SPIN)
is a solid helper when designing algorithms involving concurrency, but at the
same time I must also admit that I've never seen it used. Typical concurrency
taming strategies I've seen in the past include using higher level primitives
(like a thread pool where jobs can be enqueued) or taking and adapting code
written by an expert (e.g Anthony Williams has some C++ examples in his book).

Race conditions are much more frequent in my experience. In fact I'm so biased
towards finding those that I was frustratedly wondering how a race condition
could happen in the example if everything is synchronized :) Proving that a
multi-threaded work queue is correct and using that might be a good tactic,
assuming there isn't such a primitive available. But if a program is designed
around locks and sharing, that becomes much harder to model. Sooner or later
such a program will be in a broken state.

------
thinkingkong
How much of this is Agile vs just the methodology for writing good tests?

~~~
pmwhite
The second. In the given example, if skilled programmers cannot find the bug
in a couple dozen lines of code within an hour, it is not an Agile vs Better
Agile vs non-Agile question. It is a question of whether there is a member on
your team who has the "toolkit" to test concurrency scenarios sufficiently
well to find this kind of bug.

Of course, the hard part is recognizing when to use formal methods and when
not to. I do wonder if someone who had used TLA+ (frex) in the past could look
at this code and know that now is the time. Or not.

~~~
gbacon
At the bottom of JavaUnitTestChallengeSolved[0], William Underwood posted the
test below and claims that it always catches the `notify()` problem. He did
not mention using TLA+.

    
    
        public static void testConcurrency() throws InterruptedException {
          final int[] threadsFinished = new int[]{0};
          final BoundedBuffer buffer = new BoundedBuffer();
    
          Thread thread1 = new Thread() {
            public void run() {
              synchronized (buffer) {
                synchronized (this) {
                  this.notifyAll();
                }
                try {
                  buffer.wait();
                } catch (InterruptedException e) {}
                threadsFinished[0]++;
              }
            }
          };
          Thread thread2 = new Thread() {
            public void run() {
              synchronized (buffer) {
                synchronized (this) {
                  this.notifyAll();
                }
                try {
                  buffer.wait();
                } catch (InterruptedException e) {}
                threadsFinished[0]++;
              }
            }
          };
    
          synchronized (thread1) {
            thread1.start();
            thread1.wait();
          }
          synchronized (thread2) {
            thread2.start();
            thread2.wait();
          }
          synchronized (buffer) {
            buffer.put("");
            thread1.join(1000);
            thread2.join(1000);
          }
          assertEquals("Not all test threads were awoken!", 2, threadsFinished[0]);
        }
    

[0]:
[http://wiki.c2.com/?JavaUnitTestChallengeSolved](http://wiki.c2.com/?JavaUnitTestChallengeSolved)

~~~
hwayne
That's just testing that all of the test threads are awake after they run
once. It only does one buffer operation (the `put` in `synchronized buffer`),
and we trivially know it's "broken" if there are only puts and no gets. It's
also using two threads for a buffer length of 4. This isn't actually surfacing
the bug, just something that looks somewhat like the bug if you already know
what the bug is.

