> The overall Kolmogorov complexity of a string is thus defined as
where p is the shortest program string for language L such that
and we consider all programming languages.
This is wrong. The definition of Kolmogorov complexity is always given relative to some fixed programming language. For example, K_perl or K_ruby are just fine, although we usually consider some Universal Turing Machine. The reason that we can prove theorems about Kolmogorov Complexity without even specifying which language we fixed is because of invariance (on all but a finite number of strings, any two languages will differ by at most some fixed constant). But the point is that you have to take a single fixed programming language. Then the issue raised in the blog post does not arise.
As this post is almost a year old, I doubt it matters much or is worth contacting the author though. But for future reference, if you think you've found such a basic mistake in a foundational concept in math or CS, you can always ask about it on cs.stackexchange.com or math.stackexchange.com rather than making a potentially embarassing post about how mathematicians are wrong/silly.
First, for any single fixed language, the problem does not exist. That includes any single L_silly language.
Second, when considering the set of all languages, there is indeed an issue. However it can be remedied by requiring an implementation of the language - while we might say that program P_117 has, in language L, the semantics that it emits
(a fixed very long string), supposedly showing that P_117 which is of length 5 can emit such a long string, it doesn't really. To implement that language, you would need to encode that long string somehow in the implementation.
In other words, if you consider a single language, that's fine. If you consider all possible languages, you must require an implementation of them, not just an abstract definition. The implementation must be shared between all of them, which is equivalent to a single language in the first place. And a single language is really where the benefit and intuition of Kolmogorov complexity lies anyhow.
The confusion arises when we try to prove that the language "doesn't matter" as it's just a constant factor. There is something faulty with that proof, or rather it proves something other than what it is assumed to, but Kolmogorov complexity itself is still valid.
Well, Kolmogorov complexity depends on language choice, there's no way around that and the invariance theorem doesn't say anything different. It's more subtle than that, so we have to be precise.
The invariance theorem shows that given languages L and M, there's a constant c such that for any string x, |K_L(x) - K_M(x)| <= c. Intuitively, that constant c is the program size of an M interpreter in L (or vice versa). It's easy to see how K_L(x) can't be larger than <an M interpreter in L> plus <the shortest M program that outputs x>.
What this means is that given sufficiently large and complex strings x, K_L(x) is very close to K_M(x), because c is constant and finite, while you can construct a string x with arbitrarily large K_L(x). This gives us confidence that K_L measures something "real", even though our measurement is relative to the (arbitrary) definition of L.
Of course, for short programs, the constant factors dominate. You won't get a shorter Ruby program for your 30 character string by implementing a full Perl interpreter. In this case, the constant difference c is so large relative to |x| that the invariance theorem, while technically correct, is basically pointless.
In that light, the constant factors are still trivial.
But those are just formalities, the great beauty of Kolmogorov complexity starts to emerge when you start to consider archive utilities as an approximation for K(x).
Think K(x) ~ |Zip(X)| or K(x) ~ |Rar(x)| now you have a real tool to approximate K(x) and guess what it is already installed.
Now what about Kd(x,y) = K(x) - K(y)/K(x||y) ~ |Zip(x)| - |Zip(y)| / |Zip(x||y)|
It is a distance metric for how entropically far is x from y and that is very interesting and can be very useful for anomaly detection for instance or password strength comparison.
|K_1(s) - K_2(s)| <= c
If you really wanted a practical standard measure, the language would have to be simple Turing machine instructions, so that the language implementation isn't more optimized for expressing certain things (which is the problem discussed in the post).
See https://en.wikipedia.org/wiki/Binary_lambda_calculus for a concrete definition, and a proof that the complexity of the prime numbers is at most 167 bits.
This language can be implemented in only 25 lines of C.
This is a philosophical problem in how we should do induction. It's also a practical one in creating artificial intelligence.
Algorithmic information theory also allows you to re-derive Godel's Incompleteness Theorem or the Halting Theorem from it, in a sense, it's a super-set of those concepts.
For example, we can talk about the compression of knowledge into finite sets of axioms and many bits of information exist in such a system.
Or we can reason about concepts like Omega, a truly random number in mathematics, incompressible in the sense then there is provably no shorter way to encode it other than brute force calculating it, but which is a real number, and which we can know the first few bits of it. (and a quite interesting number in the sense that it's an Oracle of knowledge and if you had it's complete expansion, you could solve many questions in mathematics with it)
The author defines language as a mapper from input to output, which is necessary but not sufficient. Its L_silly (written in Ruby) is not a language since it uses "eval" which depends on the entirety of Ruby.
But at least with calendars it might be manageable. We want to do operations like find the lowest number of bits needed to represent some data, or to do comparisons between the complexity of different strings. If this varies based on the programming language you use, then it's problematic at the very least.
It deals with complicated information theory topics similar to Kolmogorov Complexity (albeit by a different name) in an easy-to-approach way. Highly recommended.
"We could, alternatively, choose an encoding for Turing machines, where an encoding is a function which associates to each Turing Machine M a bitstring <M>. If M is a Turing Machine which, on input w, outputs string x, then the concatenated string <M> w is a description of x. For theoretical analysis, this approach is more suited for constructing detailed formal proofs and is generally preferred in the research literature. In this article, an informal approach is discussed."
If you consider the length of the Turing machine and the input string, then your don't get K = 0. The author's approach only includes the size of w, which is always 0.
How big are the machines used by the author's silly approach? If we wanted to describe the machine that prints "abababababbaba", we could write the program/specification as:
(1) Use the machine L_silly_abababababbaba
(2) Use the string ''
How can we encode this into a shorter program/specification? Maybe by concatenating the machine name and ''?
Or we can be clever and realize that the string is always '' so we just need to machine name
Or we can say we don't need the L_silly_
Can we go shorter? What if we specify a gzipped version of the string? We can take the string, unzip it, and use that Turing machine. But most strings are not compressible via gzip. So there will be many strings for which the shortest program via this approach is:
... and some of the other words in this posted comment...
PS C:\programming\Kolmogorov> ruby .\interpreter.rb "puts(\`"ab\`" * 20)"
So even with this universal language, we still have
So you input string "puts(\`"ab\`" * 20)" into a ruby program, and somehow Kolmogorov complexity is 0 and not length of the argument you passed? I don't buy that.
Some of the interesting theorems are found in a chapter of the following book:
It gives a good grasp that Kolmogorov complexity answers meaningful questions not well defined in regular information theory: questions of optimality in finite lengths.