i mean run all the shorter bitstrings until time runs out or they terminate.

ukj · on March 26, 2020

How does “time run out” in practice other than you putting an upper bound on your computation?

Obviously, the code will not halt until it compresses the stream successfully.

yters · on March 26, 2020

enumerate all bitstrings of length 10.

run them for N steps.

if none terminate on the target bitstring of length 100, then we've eliminated the computation hypothesis for 10 bits and N runtime

if we continue this approach and eliminate all the available storage and time available, then we eliminate the computation hypothesis altogether for our scenario

ukj · on March 27, 2020

You understand that "Number of bitstrings of length 10" is a function of the cardinality of your alphabet, right?

A binary alphabet has 2^10 such strings. A decimal alphabet has 10^10 such strings.

So before you can make the assertion you've made you first have to answer two question:

1. How many symbols does your alphabet contain? 2. How did you choose that number?

Down that path you arrive the Linear speedup theorem. https://en.wikipedia.org/wiki/Linear_speedup_theorem

yters · on March 27, 2020

bit is base 2 as far as i know

yes, there is always a turing machine that can generate the target with any performance characteristics

so, the key is to keep the reference turing machine fixed

ukj · on March 29, 2020

You can’t keep the reference Turing machine fixed in light of the linear speed up theorem.

Hence the argument for compression.

A Turing machine with a better language is faster.

It compresses time.

yters · on March 29, 2020

Yes, I agree there is always a faster Turinng machine for a particular target. With this sort of argument, all bitstrings have a compression of zero bits. I.e. you pick a TM that has the target bitstring as output for the null string. And if it starts with the target on the output tape, there is also zero time. Thus, every bitstring can be generated with zero space and time, with some TM.

So, thats why when talking about compression in general, we keep the reference TM fixed.

When testing, there is the possibility with a single test maybe we have the wrong reference machine. But as the number of tests grows, that probability approaches zero. So, with enough testing we can eliminate the linear speedup theorem loophole, at least with high probability.

As a practical application of this sort of thing, check out the 'normalized information distance'. It is based on algorithmic mutual information, which in theory is not approximatable, as you argue. However, in practice it works surprisingly well, even without prior knowledge of the domain.