
Finding bugs in Haskell code by proving it - based2
https://www.joachim-breitner.de/blog/734-Finding_bugs_in_Haskell_code_by_proving_it
======
namuol
Mostly-off-topic:

The `bisect-binary` project listed in the introduction is almost the _exact_
program I made while I was a devious high school student trying to circumvent
weak (i.e. most widely-used-) antivirus software by locating a few "key" bytes
in the binary that the AV used to fingerprint a virus.

I would zero out half the file, scan the file with AV, then recursively repeat
the same for any zero'd regions of files that AV didn't consider a virus.
Essentially a search of the file for the smallest set of bytes that needed to
be changed for AV to fail detection.

Usually this would result in only a small number of bytes that needed to be
slightly tweaked.

Pretty unproductive "hobby", but I must admit it was really fun. :)

~~~
maweki
This sounds fun. How did you handle the case where on a bisect both halves
didn't trigger the AV on their own?

~~~
namuol
This was pretty rare for the most popular AVs, but if it didn't work I would
try manually zeroing different sections until I found something. I was
surprised how easy it was to make a virus undetected, and that they didn't
routinely change their fingerprint algorithm as a defense against this attack.

------
tomsmeding
This looks really cool!

I've never worked with Coq myself, and really, the process described here
still looks much too convoluted for practically proving nontrivial amounts of
code correct. But it looks like lots of fun too! And it is sort-of the holy
grail of computer science, writing code and being able to prove it correct
instead of just testing it with maybe-or-maybe-not enough test cases.

In practice I find that Haskell itself already prevents a lot of bugs, but of
course in functions like the one dissected here, logical errors are still
easily made.

I'll definitely look into Coq one of these weeks.

~~~
neilparikh
You might already know about this, but if you're looking to learn Coq, I would
recommend the Software Foundations series
([https://softwarefoundations.cis.upenn.edu/](https://softwarefoundations.cis.upenn.edu/)).

~~~
harveywi
I started with Certified Programming with Dependent Types
([http://adam.chlipala.net/cpdt/](http://adam.chlipala.net/cpdt/)).

If you are more interested in programming, consider trying Idris
([http://idris-lang.org/](http://idris-lang.org/)). Edwin Brady, the language
designer, wrote a great book that I would recommend called Type-Driven
Development with Idris ([https://www.manning.com/books/type-driven-
development-with-i...](https://www.manning.com/books/type-driven-development-
with-idris)).

------
mbid
I have trouble understanding the usefulness of this. Haskell is barely used in
the first place, and even less for highly safety critical software.

It's also not clear to me why I would use this over writing the program in the
constructive logic of coq, agda or idris and then extract ("compile") the
program from the proof itself. Of course, a usecase would be to formally
verify "legacy" haskell code, but then my first point applies -- there is not
that much important haskell code out there.

The last reason for the existence of this program I could see is that this is
just experimental and the authors want to explore whether it's possible to
formally prove things about haskell programs. But isn't that obviously true,
given enough time put into the project?

~~~
samgd
Standard Chartered has over 3 million lines of Haskell code and Facebook uses
it to fight spam - there's definitely more than you think out there :-)

[http://hauptwerk.blogspot.co.uk/2017/04/four-openings-for-
ha...](http://hauptwerk.blogspot.co.uk/2017/04/four-openings-for-haskell-
developers-at.html)

[https://code.facebook.com/posts/745068642270222/fighting-
spa...](https://code.facebook.com/posts/745068642270222/fighting-spam-with-
haskell/)

~~~
virtualwhys
> Standard Chartered has over 3 million lines of Haskell code

Not exactly, they have their own closed source fork of GHC, which is strictly
evaluated. Given that the codebase is proprietary it's unknown what other
differences there are beyond having dumped lazy evaluation. It's a Haskell-
like language, but far different than the Haskell that the public has access
to.

------
thethirdone
I have been playing around with an idea for a programming language that
enables practical theorem proving.

The core idea behind it is that statically proven assertions would be really
powerful.

Loop invariants can be simply expressed as an assertion inside the loop.

Additionally, to give it a bit more power; assertions could be given the
ability to act over sets (say over all ints).

This might look something like

    
    
      XY = {int, int}
    
      @assert commutative
      fn add(a XY, b XY) XY {
        return (a[0]+b[0], a[1] + b[1])
      }
      
      fn commutative<a,b>(f (a,a)b){
        assert x,y: f(x,y) == f(y,x)
      }
    

And then it would also be helpful to add on assertions to types so assignments
to variables also checks the assertions

    
    
      order = enum {greater, equal, lesser}
      comparator<a> = (a,a)order
      
      fn sorted<a>(pair ([]a,comparator<a>))bool {
        (l, cmp) = pair
        if l.length() <= 1 {
          return true
        }
        return cmp(l[0],l[1]) != order.greater & sorted((l[1:],cmp))
      }
      
      SortedList<a> = ([]a,(a,a)order) & sorted
    

There are a few major issues with my idea currently (it really requires
dependent types to be ergonomic and allocation and pointers could be strange)
which have prevented me from making a quick demo compiler or language spec.
But I would be very excited to discuss it.

~~~
ghkbrew
Have you looked into research regarding refinement types? Because they
describe almost exactly what your reaching for (types with assciated provable
assertions). Specifically Liquid Haskell[0] extends Haskell with (dependent)
refinement types, but keeps the type checking decidable and the proof burden
relatively light by using a restricted predicate language and an SMT solver.

[0] [https://ucsd-progsys.github.io/liquidhaskell-blog/](https://ucsd-
progsys.github.io/liquidhaskell-blog/)

~~~
thethirdone
Yeah, I am aware of refinement types. You can do a lot of what I am
envisioning with just refinement types. Expressing loop invariants with just
refinement types is really awkward though.

The main reason why the language I am envisioning is not a functional language
is because Idris, Liquid haskell, ... have already explored many of the
related ideas. I want to take ideas from them and bring it into an imperative
language as that hasn't been as well explored.

~~~
shawa_a_a
To throw another one onto the _have you looked at X_ pile, Microsoft Research
has put a lot of effort into the Why3 theorem proving platform, which is the
backend for the verifier built into their (experimental?) verifier-aware,
imperative language, Dafny[1]. It feels very much like writing C#/Java but
with verified pre/post conditions, loop invariants etc.

I took a formal verification course in college that involved writing several
verified sorting, search etc. algorithms in Dafny [2]. I remember it being
somewhat cumbersome to write your assertions in a way that the checker can
check them, but it's looks very much like what you're suggesting.

(At the time I didn't realise that not only does the checker check the
program, but compiles it into a CLR-compatible binary, hence you'll see
equivalent C code for comparison)

[1] [https://github.com/Microsoft/dafny](https://github.com/Microsoft/dafny)
[2] [https://github.com/shawa/formal-verification-
project](https://github.com/shawa/formal-verification-project)

~~~
thethirdone
> To throw another one onto the _have you looked at X_ pile,

I am pretty sure I have seen Dafny before, but it wasn't on the top of my
head. Thanks for mentioning it.

> I remember it being somewhat cumbersome to write your assertions in a way
> that the checker can check them, but it's looks very much like what you're
> suggesting.

I agree with both of those statements. I am not thinking of something
revolutionary. I have been thinking of a few ways to make it more ergonomic
than Dafny though. Refinement types would be one of the key ways to do that.

I think all of the key capabilities would be the same though. It would
probably make sense to make a transpiler to Dafny to try it out.

------
lcdoutlet
Nice Post!

I think tools that enable formal reasoning about code are incredibly valuable.

Just going through the process the author was able to find and fix three bugs.
Could you imagine a pipeline where people submitted code that had already been
proven to be correct?

Sounds amazing!

------
based2
[https://www.reddit.com/r/programming/comments/7iuakc/finding...](https://www.reddit.com/r/programming/comments/7iuakc/finding_bugs_in_haskell_code_by_proving_it/)

