
Cryptic genetic variation in software: hunting a buffered 41-year-old bug - mineo
https://cryptogenomicon.wordpress.com/2014/10/13/cryptic-genetic-variation-in-software-hunting-a-buffered-41-year-old-bug/
======
matthewcanty
That was a really interesting read, and very well written. I wonder if anyone
can clear this up though...

I find the terminology of open and closed intervals contradictory to their
meaning. Does anyone know why they are described like this?

`Closed` makes me think shut or not-including - however it includes its
endpoints. `Open` makes me think inclusive - yet does not include its
endpoints.

~~~
saurik
The analogy I've always used is that the closed interval has what amounts to a
lid or a cap on it, while the open interval does not. Another (sort of
related) way of looking at it is that the closed interval has a maximum (or
minimum) value, whereas the open interval, despite only missing a single
value, suddenly has this feeling of continuation, because without that final
value it now asymptotes to the end and you will always be able to find a
larger (or smaller) value than any previous value you examine.

~~~
masklinn
The so-called "french notation" is interesting and — I think — clearer: a
closed interval is [a, b] and an open interval is ]a, b[.

~~~
entropy_
I learned Math in a french-system school and yes, that's definitely a much
better notation in my opinion. The difference between [] and () is not
immediately clear, whereas [a,b] versus ]a,b[ makes it obvious that one
includes a and b while the other does not. It also makes it easy to remember
"open" and "closed" and what they mean in terms of whether or not the interval
bounds are included or excluded

~~~
cgio
I don't know if it's very intuitive given that a and be are still inside the
][. Maybe a]..[b would make it clearer?

~~~
munificent

        a[..]b

------
kator
I always cringe when I see or have to write:

    
    
        for ( ; ; )
    

or:

    
    
        while ( true )
    

etc.

You just know you're setting yourself up for undocumented bugs later.

I've been known to build "escape" vars into these like:

    
    
        int attempts = 1000;
        for(;attempts > 0;attempts--) {
            ** DO SOMETHING **
        }
    
        if (attempts <= 0) {
            ** BAIL ***
        }
    

Where 1000 or whatever number is a reasonable estimate of the function's need
to loop x 10 or etc.

~~~
munificent
Really? I use "while (true)" pretty frequently in my code. In my mind, all
loops are composed of these parts:

1\. Some initialization (optional). 2\. Some stuff you do each iteration
(optional). 3\. Check the exit condition and exit the loop (optional). 4\.
Some stuff you do each iteration (optional).

"while" loops nicely handle the case where 2 is empty. "do while" loops handle
4 being empty. "for" loops handle both 2 and 4 being non-empty but only really
accommodate 4 being a single expression.

For any other case where 2 is non-empty and 4 is more than a single
expression, I just use a "while (true)" with an "if (...) break;" in the body.

~~~
nknighthb
Burying the only way out in the loop body is more error-prone, especially for
those who come after you. Putting an escape hatch in the loop setup makes it
less likely that an operationally-problematic[0] infinite loop scenario will
be triggered. This is particularly important when your code doesn't have a
runtime environment that can easily kill runaway requests.

[0] Or utterly catastrophic. Think embedded system with limited debugging
facilities. This kind of bug could manifest as the system completely locking
up, and take much, much longer to track down than a "infinite loop detected!"
message in an error log.

~~~
munificent
> Burying the only way out in the loop body is more error-prone, especially
> for those who come after you.

I don't _prefer_ to exit from the middle of a loop, but if that's the most
succinct way to implement the correct behavior, I'll do it. I think short code
that exits from the middle is less error-prone than code that has to be more
convoluted to put the exit at the top of bottom.

> Putting an escape hatch in the loop setup makes it less likely that an
> operationally-problematic[0] infinite loop scenario will be triggered.

An escape hatch is yet more code that has to be tested and debugged. What if
the escape hatch triggers to _early_?

> [0] Or utterly catastrophic. Think embedded system with limited debugging
> facilities.

Sure, but unusual platforms require unusual coding styles. If I was targeting
that I might adjust my practices.

~~~
nknighthb
> _code that has to be more convoluted to put the exit at the top of bottom_

Note I said _only_ way out. The point is that if "the" (intended) exit can't
reasonably be in the loop condition, then adding an escape hatch to have a
second way out is useful.

Assuming C you can even just compress the whole thing to a for_x_tries(n)
macro.

> _What if the escape hatch triggers to early?_

You'll get a helpful message and can bump the count. Easily-isolated errors
are preferable to frozen/DoS'd systems.

------
PhantomGremlin
At least one, possibly two other bugs lurking in the implementation.

1) Algorithm FT says:

    
    
        1. Generate u. Store the first bit of u
        as a sign s (s=0 if u<1/2, s=1 if u>=1/2).
    

and yet the C code implements

    
    
        if ( u <= 0.5 ) s = 0.0;
        else s = 1.0;
    

2) I can't be sure of the following w/o access to doc. But i4_uni() says

    
    
        a uniform distribution over (1, 2147483562)
    

which, offhand, is suspicious. A distribution over positive integers would
probably want to use _all_ available values in a 32-bit signed int, so it
would most likely end at 2^31 -1 which is 2147483647, and not the value given.

~~~
abecedarius
Good point about the C code. OTOH, 2147483562 is 1 less than a prime, which is
unlikely to be an accident.

------
imurray
It would be safest for most rand() functions to omit both zero and one, unless
a user was _really_ sure they wanted otherwise. If we were generating real
numbers, we'd _never_ see precisely zero or one. The fact that we do is an
artifact of limited precision. These boundary cases cause problems in common
computations like u.log(u) or (1-u).log(1-u).

~~~
ZoFreX
It's pretty common in practice to want [0,1) (that is, 0 <= x < 1) from your
random number generator. Never generating 0 would be a problem when the range
is small and discrete - generating random letters for example.

~~~
captaincrowbar
If you're generating integers, or selecting from a discrete set with uniform
probabilities, there's no reason to involve floating point at any point in the
process.

------
saurik
(Also posted 20 hours ago, also no comments.) (It occurs to me that if there
are ever comments on this post, my comment will sound really confusing: let it
be clear that this article is #1 currently, was posted 40 minutes ago, and
there are no comments yet. ;P)

[https://news.ycombinator.com/item?id=8453042](https://news.ycombinator.com/item?id=8453042)

~~~
kovrik
Well, this article is so thorough, detailed and well-written, that I don't
know what to say other than "Wow, that was cool and very interesting!".

More stories like that?

~~~
vkolencik
I especially like the clever analogy with genetics. Douglas Hofstadter said
something along the lines of analogy being more powerful when the two concepts
it connects are otherwise very distant from each other.

------
mbq
Thumbs up for not using rand, but the assuming that MT is a golden bullet is
not exactly scientific; one should just test few RNGs, it may come in that the
code exploits some ultra-hidden hole in MT or that some much faster RNG works
equally well. Also reproducibility of a stochastic code means that the code
results lead to the same conclusions regardless of the seed, not that you get
bit to bit identical output for the same seed. In case one assumes (only) the
latter, it may end up in seed cherry picking, not-optimizing code because it
would "break reproducibility" or not investigating the natural deviation of
the results.

------
jdnier
A great exposition, and worth it just for the introduction to "schrödinbug".
[http://en.wikipedia.org/wiki/Heisenbug#Related_terms](http://en.wikipedia.org/wiki/Heisenbug#Related_terms)

------
rmsaksida
Really interesting read.

I wonder about the resolution: was snorm() reimplemented correctly, or was an
RNG with the 'wrong' interval supplied?

------
menaf
Menaf

~~~
Houshalter
What is the purpose of this?

