
Lincoln Index: Estimating the number of bugs left to find (2010) - ColinWright
https://www.johndcook.com/blog/2010/07/13/lincoln-index/
======
tgb
I thought this meant insects and that reminded me of a favorite paper with the
subtitle "A bird in the hand is worth log n in the bush"
[https://arxiv.org/abs/1511.07428](https://arxiv.org/abs/1511.07428) In this
different problem (estimating the number of species you have yet to encounter
based off the number of individuals from that species you have observed so
far), the cute answer is that you're "done" finding new species when you have
found at least two individuals from each species. Would this generalize to
software bugs by saying that at least two testers had found each bug in order
to assume that the software was bug-free?

------
pytester
IME unfound bugs tend to cluster in specific areas where testers didn't
realize that they needed to look. Once the tester is aware of where they need
to look for them then they quickly evaporate.

This method requires that a tester is looking for bugs randomly and is as
likely to cover any realistic use case as any other use. In practice I find
that's almost never true.

~~~
klodolph
That's a good point, and it's doubly true when you are looking for security
vulnerabilities. If your bugs affect user experience, then a rarely
encountered bug is not much of a problem. If your bugs are security
vulnerabilities, then the bugs may be equally exploitable regardless of how
rare they are to encounter.

------
huhtenberg
As an old adage goes -

    
    
        The number of bugs remaining is directly
        proportional to the number of bugs found.

~~~
java-man
and another one:

there are always at least two bugs still present in the code.

------
pmiller2
This is interesting, but it requires an otherwise borderline insane test plan
to get the information necessary to calculate.

~~~
AstralStorm
Yes: the test plan is generally achieved by adversarial fault injection.

That is how you should test your test system.

You should also classify the faults you inject and somehow (preferably from
practice) estimate how often they occur.

~~~
aiCeivi9
Didn't people mostly give up on doing that after bugs injected on purpose
somehow keep reaching production, despite various failsafes?

~~~
yorwba
That can only happen if you inject the bugs directly into the code where they
could make it into source control, instead of doing it all on a test
automation server that only ever pulls the code. You need the automation
anyway (you weren't thinking about doing manual fault injection, right?), so
siloing it off from production shouldn't be too hard.

------
krajzeg
Doesn't this also assume that the testers have _exactly_ the same area of
responsibility? If you subdivide the work in any way, the probabilities of
tester 1 and tester 2 finding a specific bug are decidedly not independent,
but very much dependent on which one of them is working on the component with
the bug.

~~~
ALittleLight
I was wondering if we could use this by having two different people execute a
test pass that covers an area of concern and then calculate the Lincoln
estimate for that.

------
mjlee
Matt Parker made a great video of estimating the size of a population using a
version of this technique. (in this case the population is statisticians in
their natural environment)

[https://www.youtube.com/watch?v=MTmnVBJ9gCI](https://www.youtube.com/watch?v=MTmnVBJ9gCI)

------
it
Here's a way to apply this idea to an open source project. The issue tracker
could allow people to submit bugs, but a cron job running once per minute
would move all the externally submitted bugs to an internal bug tracker that
only the developers on the project could see. Then the independence assumption
would be closer to being satisfied.

------
bdavis__
There are several well known techniques, look under "Software Reliability
Growth Models"....

------
edwintorok
Can this be extended to N testers?

~~~
laurentl
That's a really interesting question (and one for which I don't know enough
probability theory to give a good answer).

At the very least, if you have N testers you can apply the technique pairwise
to generate N(N - 1)/2 estimates for the number of bugs. Taking the average or
median of these estimates would probably give you a more robust value for the
total number of bugs.

You could also build a correlation matrix between testers (i.e. are bugs
really independently found or do some testers tend to find the same bugs) and
you could test the assumption that bugs are equally hard to find (which
realistically isn't the case).

With that new insight into your original data, you could compute an _a
posteriori_ estimate for the total number of bugs, Bayes style.

(In other words, you start with the prior assumption that testers are
independent and bugs are equally hard to find to build a first answer. Based
on this estimate, you revise your assumption with a more realistic probability
distribution for bugs and testers, and use that distribution to compute a new
answer)

Or, maybe just stop messing with R and get on with fixing all the bugs found
by your N testers.

------
Cyphase
This is from 2010.

