
Testable IO in Haskell - kilimchoi
http://engineering.imvu.com/2015/06/20/testable-io-in-haskell-2/
======
clarus
We run into the same kind of challenges to test programs with IO in Coq.io.

To stay pure, we keep the IOs uninterpreted until the compilation (to an OCaml
program). We define the tests on the program with uninterpreted IOs. For
clarity of the tests, we use an interactive debugger (reusing the tactical
mode of Coq) to step through the IO operations. The main advantage of using
Coq is that the tests can be made symbolic, thus covering a larger number of
cases (if not all the cases). A simple example is explained here:
[http://coq.io/getting_started.html#use-
cases](http://coq.io/getting_started.html#use-cases) (coincidentally, this is
the same example as in this blog post).

------
peteretep
If your tests fail for any of the reasons the author leads with, and don't get
fixed straight away with some soul searching on the part of your developers,
you need to start hiring some engineers.

~~~
chadaustin
Hi, I have personal experience working with the some of the best software
engineers on the planet for the last decade and despite everyone's best
efforts, when a unit test suite grows to thousands and thousands of tests,
there will always be some that are flaky. Perhaps they call into a code path
which was unwittingly refactored to access a local database. Perhaps they read
the current time for some time-dependent string formatting. Perhaps they
mutate some global state and forget to clean it up. Perhaps they call into
some code that opens a TCP connection to some third-party service.

Every large test suite I've seen in every language at every company I've
worked with has intermittent or flaky tests. Maybe each test on average is
99.9% reliable. But across the entire suite, you will have some low-level
degree of intermittent test failures. Then you need a team of people whose job
it is to triage, investigate, and either fix or ignore those failures. This is
a complete waste of human capacity, and it's not a very career-enriching task
either. Haskell's ability to define restricted subsets of generalize IO is
extremely powerful and sidesteps this entire problem.

The power of the technique in the article is that your tests are _guaranteed_
reliable by the type system, because it is no longer possible to accidentally
call into some general IO action. Code under test must go through the World
API, which is fully and deterministically mocked in tests.

~~~
pron
> Haskell's ability to define restricted subsets of generalize IO is extremely
> powerful and sidesteps this entire problem.

Except it doesn't, precisely where that problem is most annoying. For example,
suppose you want to test a scheduler, or any mechanism that places a timeout
on an operation. You could abstract away the entire timing mechanism, but that
is exactly what you're trying to test: maybe the developer used some system
function wrong, and is now waiting 5 seconds instead of 5 milliseconds?

This approach (which is the same as the mocking approach -- there's nothing
special about Haskell here) doesn't do too well when you want to test timing,
and those tests are usually the most brittle (hosted continuous integration
services often introduce really long pauses). Of course, you could weave
virtual time services throughout your code, but then you have the third-party
library problem again.

Mocking (or IO isolating as you call it in Haskell) ends up producing really
good unit-tests, but often the more important tests span more than one small
unit, and making those non-flaky is very hard.

In theory, you could replace all references to time in a JVM language -- even
in libraries -- using an agent or a bootstrap classloader in order to make
larger tests run predictably, but I'm not aware of people going to such great
lengths (maybe Google does it; I know, for example, that when it comes to
accessing the filesystem, they replace the JVM's filesystem provider with an
in-memory file-system[1] when they don't mock, but I don't know how the deal
with clocks).

EDIT: Apparently somebody has done just that as a research project[2]. It's
actually given me an idea for a nice afternoon-project.

[1]: [https://github.com/google/jimfs](https://github.com/google/jimfs)

[2]:
[http://www3.imperial.ac.uk/pls/portallive/docs/1/55125696.PD...](http://www3.imperial.ac.uk/pls/portallive/docs/1/55125696.PDF)

~~~
chadaustin
Hi pron,

I get the feeling you're spouting very strong opinions without ever having
actually written real production code in Haskell. Because what you're saying
is nonsense.

Anything you can do in an imperative language you can do in Haskell - the only
difference is that Haskell allows you to define subsets of IO that are
enforced by the type system.

Mocking time is crucial for testing timeouts. You don't want your test to
actually take five seconds. You want to rapidly test the code in the case that
a timeout occurs before completion of the action and vice versa. You want
those tests to be 100% deterministic and fast. With the World approach
described you can get that.

Also, there is nothing about the technique described that limits you to unit
tests. IMVU tests entire web services with this approach and it works well.
"World" in that case includes APIs for accessing Redis, MySQL, the customer's
session, time, concurrency, and so on. They're all perfectly faked out in
tests, and the tests are 100% reliable.

~~~
pron
Of course I haven't written production code in Haskell, and I'm not saying
that there's stuff you can't do in Haskell.

So far, I've said two things. One is that the technique described in the post,
which "uses the power of Haskell", is actually trivially done (and has been
used by countless projects for years) in almost all imperative languages, and
less intrusively (it doesn't require introducing new types and works with
third-party code). I don't know much about Ruby and Python, but the JVM also
allows you to define subsets of IO (even restricted by actual files) that are
enforced by the runtime, or subsets of IO operations that will be enforced by
the compiler. This technique has been used in imperative languages for a very
long time. That's not an opinion but a fact.

The second thing I've said _is_ an opinion, and is unrelated to a specific
programming language. I said that a lot of very small, fully-faked unit-tests
only get you so far, and it's fairly easy to get those tests to run 100%
predictable. The more troublesome tests are those that aren't faked, and they
are crucial to expose many bugs. That's it. I don't think I've said anoything
too controversial.

~~~
mralston
Haskell lets us have generic "mocks", which is -way- better than any kind of
expectation-based mock. (admittedly, the example in andy's blog post doesn't
show the power of this technique very well)

But my experience with mocks is that expectation-based mocks are fragile and
make it easy to write tests that pass but are fundamentally wrong; if you just
say you expect XYZ to come out of the database, even if your code gets
refactored to not insert XYZ into the database earlier, the test will pass
despite the code being completely broken.

With the haskell World approach, you don't have that - the fake database is
fake because it's in memory (and thus very fast and uses a blank state for
each test), but it's actually implementing the same semantics as the real
database.

Some of the other IO stuff we've encapsulated is just expectation-based mocks
and as you said that's not new, but it's still clean and doesn't require any
changes to the code under test to make it mockable, which we did when we used
mocks in our PHP.

~~~
simplify
How do you verify that your mock behaves exactly as the real database? The
type system certainly helps with function signatures, but what about behavior?

~~~
chadaustin
As with any fake implementation of a service, you definitely need coverage
that verifies that the fake and real have the same behavior.

------
tinco
So let me try to see if I get it. You change the signature of your I/O
functions to return a self defined monad called World. To make sure this works
in production you make World an instance of I/O so it can be ran by your
regular main.

Then for the test suite, you make an instance of World that keeps a state
monad and mock implementations of the I/O functions that return/change that
state instead of having undeterministic side effects.

The whole process seems logical, but the Haskell is rather ugly. Surely
someone has written a DSL or library to make working like this a little
easier? Especially defining the FakeState monad with S.modify calls looks like
a pain.

~~~
mralston
The thing is that while the implementation of the fake can get ugly, that's
effectively library code that's written once - the actual application code is
incredibly clean since all you have to do to it is tag it as being World
instead of IO.

Likewise, for tests, the actual interface in practice is clean - we have a few
fake-World-only setup functions, and when testing things like the database we
can just use the regular insert interface to do our setup.

------
zimbatm
The nice thing about this approach is that it's then possible to use
QuickCheck to generate FakeIO data.

------
deepnet
Can State not be considered as an immutable infinite list, indexed by time ?

Does Haskell have a forgettable concept ?

State could then be a forgettable infinite list rather than a side/effect.

~~~
wz1000
State is modeled in Haskell as the type

    
    
        newtype State s a = State (s -> (a, s))
    

Where s is the type of your state and a is the type of your result. It's
basically a function that takes the current state and gives you a result as
well as a new state.

------
pron
To those interested, the _exact_ same technique is achievable in imperative
languages[1] -- only with less effort and in a less intrusive way -- with
mocks.

[1]: That support at least some notion of meta-programming/reflection.

~~~
chadaustin
I think you've misunderstood the implications of the technique described.

It's superficially similar to mocks, but the real power is that it defines
_restricted_ effects, so that it is _impossible_ for the code under test to
access, say, the current system time or to print to stdout. It is _only_
allowed to access APIs which are fully replaced in tests.

The real benefit is that the type system guarantees these tests are not flaky
or intermittent - in unit tests they are guaranteed pure and deterministic.

This technique isn't applicable to languages like Python or C++ or JavaScript
where any computation is free to have arbitrary side effects. In Haskell you
can restrict computations to subsets of side effects, which is an enormously
powerful technique, and is, for example, why Haskell's STM implementation is
so great (and simple) compared to languages with unrestricted effects.

~~~
pron
First, let's separate two kinds of effects: memory effects and IO. While PFP
treats memory effects as side-effects separate from program logic, imperative
languages do not (either approach has its pros and cons). As the post talks
about IO, let's restrict ourselves to that.

If you believe that the _main_ contribution of this approach is by absolutely
preventing _any kind_ of uncaptured IO (I think it is extremely valuable even
without this language-supervised restriction), then this too, is trivially
possible in, say, Java (or all other JVM languages). Just install a security
manager in your tests, and it will make sure you don't accidentally access IO
by bypassing mocks.

This would still have the advantage of not writing your program in any special
way to accommodate this technique _and_ it would apply to third-party
libraries, too.

~~~
chadaustin
I didn't know you could use Java SecurityManager to implement a similar
system. That's cool. Do you know anyone who does that in their test suite? I'd
love to chat with them.

The imvujs test framework for JavaScript tries to achieve the same effect by
disabling known common sources of test intermittency:
[https://github.com/imvu/imvujs/blob/master/src/imvujstest/te...](https://github.com/imvu/imvujs/blob/master/src/imvujstest/testglobals.js#L24)

Sadly, this technique, like any blacklist, doesn't work for types of IO that
can't be prevented, like mutations to document.location.

The technique described in the article is effectively a named whitelist of IO
operations (called World) that, say, all HTTP request handlers are restricted
to.

~~~
pron
> Do you know anyone who does that in their test suite? I'd love to chat with
> them.

I do. Not often, as I don't have a lot of IO in my tests, but I've found the
security manager useful for that purpose from time to time. I also use it to
help our users enforce global contracts, such as prohibit IO or any blocking
code in fork-join computations.

~~~
chadaustin
That's cool. I chatted with a friend about this too and they said they've used
SecurityManager in this way. But one thing they said is that SecurityManager
can't be used to restrict things like accessing the current time. Do you know
if that's true or not?

~~~
pron
That's true -- out of the box. As with all things JVM, this, too is very
pliable. You can create a new permission -- "access clock" \-- and then it is
quite easy to use an agent to inject a call for the security manager to check
for this permission whenever the clock is accessed.

In fact, since we started this discussion, I've written a small library that
uses an agent to fake all clock accesses on the JVM with a user-supplied
virtual clock, that can be set globally or per-thread. Injecting s security
check is even easier.

