
Measuring Information in Millibytes - Ethcad
https://engineering.vena.io/2016/07/12/measuring-information-in-millibytes/
======
eridius
> _This is a little more than eight thousandths of one bit. Therefore, the
> information given by one passing test run is just a little over one
> millibyte. To debug this compiler, we 're going to be running hundreds and
> hundreds of experiments, each of which gives us a tiny fraction of one bit,
> until we can finally put together the single byte we need to isolate the
> faulty optimization._

It doesn't work that way. If you're dead set on measuring a single passing
test run as a milllibyte's worth of information, you have to realize that
every subsequent passing test run is worth less and less. Otherwise running a
thousand passing tests would give you a full byte of information, and yet a
thousand passing tests does not give you 100% confidence that the enabled
optimizations are not buggy. The only way to get a full byte's worth of
information is to actually hit upon a failing test run.

And in fact even this is mistaken. You don't have a millibyte's worth of
information. You have a lot more. The number of passing test runs at a given
optimization gives you a probability that the optimization is buggy. Claiming
that all this information you have about probabilities of the bugginess of
various optimization passes is really just a millibyte is misleading; you have
a lot of information, you're just ignoring it and saying you only _care about_
a millibyte's worth of it.

~~~
hinkley
I run into this with much larger p and have to explain it to coworkers.

Flip the scenario around: the code is fine but the test has a concurrency bug
in it. It fails about 20% of the time. You ask someone to fix it. After six
green test runs they declare the bug fixed. But then build 8 fails and so does
build 10.

Why? Because probabilities aren’t additive, they’re multiplicative. You didn’t
test 6p > 1, you tested (1 - o)^6 < 0, _which will never be true_.

In this case there was a 26% chance the test was still broken after 6 green
tests. You need 11 runs just to get the probability into single digits.

(Another consequence of this is that if you have 5 flaky tests the probability
of a green build is one in four, and the likelihood of getting multiple
consecutive red builds is high enough that it happens every week or two).

------
cperciva
Tarsnap storage costs 0.25 picodollars per millibyte-month of storage.

~~~
WJW
And it works great! Well worth my picodollars.

~~~
hinkley
If I write a check with the dollar amount in scientific notation, will my bank
honor it? How about a credit union?

------
infinity0
> You decide to try running with optimizations 1-128 enabled, and 129-256
> disabled. That run fails. Now you know that one of the first 128
> optimizations is to blame. You try again with 1-64 enabled, and 65-128
> disabled, and that run passes. The bad optimization must be between 65 and
> 128. Continuing in this fashion—a binary search—you're able to locate the
> faulty optimization in just eight steps.

This doesn't work for the case where the failure(s) might be caused by a
combination of a subset of the optimisations. That is,

1000 0100 0010 0001

all succeed, but

0101

fails.

A binary search is not possible for these more complex bugs. We had such cases
in the reproducible builds project.

For the case where you can assume (subset A exhibits a failure => any superset
of A exhibits a failure), then you can just go through all the options (in the
OP's case 256) and switch them on, one at a time. Then the running time is
256, not 8.

For more complex cases where you can't assume even that, then you're stuck
indeed with 2^256 test cases.

~~~
hinkley
I think for pairs you can still do a logarithmic search, it’s just not base 2.

A trinary or quaternary search should be able to find two bugs at once (and
intuition tells me a quad search should be able to handle three).

Say you have 16 optimizations and 7 and 12 don’t work together. Eliminate the
first half, the second half, the odd or the even opts and the build is still
green. But if you remove the first third or quarter the tests still fail.

So in a trinary search you might see 1-16 -> 6-16 -> 6-13 -> 6-8,12-13 ->
6,7,8,12 -> 6,7,12 -> 7,12. Including the first and last sanity checks you’re
in for about 8-14 test passes for 16 tests, which is a lot better than 2^16.

But there’s still a binary search you can do, and that’s a bisect of the
commit history. Before some commit this bug didn’t happen. And that’s going to
contain one half of the bug. Now you run a second binary search to find its
complement.

------
tingletech
if you use base e logarithms, you can measure information in natural units aka
nats.
[https://en.wikipedia.org/wiki/Nat_(unit)](https://en.wikipedia.org/wiki/Nat_\(unit\))

------
_nalply
Another example of sub-bit information:

Base85! 85 different ASCII characters have lb(85) ≈ 6.4094 bit. With 5 of then
you get slightly more than 32 bits. Each of the 5 characters have a surplus
0.4 bit of information to add up to 2 bits.

To encode to Base85 use a 32 bit entity as an 32 bit unsigned integer then
divide by 85 five times and take the remainders (from 0 to 84) as indices to
the string of all Base85 characters of length 85.

To decode view the characters as a number written in base 85. This means that
there are a few Base85 representations which are more than 2^31 - 1, this
would mean an decoding error.

------
codethief
To anyone interested in learning more about Shannon entropy, I recommend
reading Shannon's original article. It's highly readable!

There's a link to the PDF on
[https://en.m.wikipedia.org/wiki/A_Mathematical_Theory_of_Com...](https://en.m.wikipedia.org/wiki/A_Mathematical_Theory_of_Communication)
.

------
nayuki
Data compression is another place where measuring in millibytes is feasible.
For example, if a video codec took in pixels in RGB8.8.8 format (3 bytes per
pixel) and has a 100:1 compression ratio, then you could say that the
compressed video uses an average of 30 millibytes per pixel.

------
lucio
>Why eight steps? Well, to differentiate between 256 equally likely
possibilities requires eight bits. Each experiment, since it differentiates
between two equally likely outcomes (pass and fail), provides one bit of
information. Therefore, we need eight such experiments to give us the eight
bits we need to determine the faulty optimization.

Is this explanation a little odd?

Each step in a binary search halves the amount of suspects, you need 8 steps
because with 8 steps you go from 256 suspects to one.

I can't see it the way the article describes it.

It's just me? Anyone else found the article reasoning for 8 steps a little
odd?

~~~
winceschwab
Yeah, I noticed that too. Treating a byte as eight tally marks misses the
point about the combinatorialnature of place settings.

It's not 100% wrong, because bit shifting and bit packing are real-world
applications used in memory all the time.

But these are not the only possible values:

    
    
      00000001
      00000010
      00000100
      00001000
      00010000
      00100000
      01000000
      10000000
    

It's not incorrect to make a decision to limit one's input to those values,
but it _is_ incorrect to presume that binary data can only be recorded that
way.

Treating those 8 place settings as eight lightbulb circuits that each need an
individual test isn't the right mindset for someone performing compiler
optimization.

------
bitwize
I got a paragraph in and was like -- he had to run eight tests a thousand
times to get one yes/no answer each, didn't he.

------
RobIII
Password entropy is another place where fractional bits can be encountered.

