
DustMite: A General-Purpose Data Reduction Tool - aldacron
https://dlang.org/blog/2020/04/13/dustmite-the-general-purpose-data-reduction-tool/
======
nn3
Seems like a relatively standard test case reduction tool aimed for D.

The tool reduces test cases to make them easier to debug. It removes pieces
from input data and testing each variant to see if it still reproduces the
bug.

In the end you have a much smaller reproducer which is easier to debug. Here's
a tutorial for gcc [2].

Most serious compilers use one in some shape or form, e.g. delta[1] (which
probably was the first widely used one) or creduce[3] (which hugely improved
the state of the art for C), and also various descendants reducing on
something different (like LLVM bugpoint [4]). A lot of the original ideas go
back to Andreas Zeller's delta debugging [5]

Somehow the blog author forgets to mention this rich history.

[1] [http://delta.tigris.org/](http://delta.tigris.org/)

[2]
[https://gcc.gnu.org/wiki/A_guide_to_testcase_reduction](https://gcc.gnu.org/wiki/A_guide_to_testcase_reduction)

[3] [http://embed.cs.utah.edu/creduce/](http://embed.cs.utah.edu/creduce/)

[4] [https://llvm.org/docs/Bugpoint.html](https://llvm.org/docs/Bugpoint.html)

[5]
[https://en.wikipedia.org/wiki/Delta_debugging](https://en.wikipedia.org/wiki/Delta_debugging)

~~~
aldacron
The forum post he linked to where he announced DustMite says it's inspired by
Tigris Delta and lists some advantages:

[https://forum.dlang.org/post/op.vvsvhh1ptuzx1w@cybershadow.m...](https://forum.dlang.org/post/op.vvsvhh1ptuzx1w@cybershadow.mshome.net)

------
abathur
This _is_ cool, in any case.

TL;DR: DustMite feeds reductions/variations of a data set (like your source
code) into an oracle which tests if it satisfies some property. The primary
example is reducing your source code to a local minimum that still exhibits
some compiler failure.

The article's conclusion notes some other cool uses; my favorites were:

\- "reducing a large commit to a minimal diff"

\- "reducing a commit list, when git bisect is insufficient due to the problem
being introduced across more than any single commit;"

\- "reducing a large data set to a minimal one, resulting in the same code
coverage, with the purpose of creating a test suite;"

\- "if you have complete test coverage, it can be used for reducing the source
tree to a minimal tree which includes support for only enabled unittests. This
can be used to create a version of a program or library with a test-defined
subset of features."

------
InfiniteRand
I would be interested in a performance comparison of delta debugging tools, in
my experience the performance of this type of tool has not been great (to be
fair I have only tried creduce, delta, and my own coding experiments),
although I might give dustmite a try

------
abathur
Is this the intended entry point? should this have linked to
[https://dlang.org/blog/2020/04/13/dustmite-the-general-
purpo...](https://dlang.org/blog/2020/04/13/dustmite-the-general-purpose-data-
reduction-tool/) or similar?

~~~
dang
We've changed to that from
[https://forum.dlang.org/post/wntuwcsudlzrmkwrsdxe@forum.dlan...](https://forum.dlang.org/post/wntuwcsudlzrmkwrsdxe@forum.dlang.org).
Thanks!

(I also detached your other comment so that it can float to the top and get
more attention.)

