Hacker News new | comments | show | ask | jobs | submit login
Kolmogorov Complexity - it's a bit silly (forwardscattering.org)
28 points by wglb on Apr 22, 2014 | hide | past | web | favorite | 36 comments



The post has an incorrect definition of Kolmogorov Complexity.

It says:

> The overall Kolmogorov complexity of a string is thus defined as K(x)=|p| where p is the shortest program string for language L such that L(p)=x and we consider all programming languages.

This is wrong. The definition of Kolmogorov complexity is always given relative to some fixed programming language. For example, K_perl or K_ruby are just fine, although we usually consider some Universal Turing Machine. The reason that we can prove theorems about Kolmogorov Complexity without even specifying which language we fixed is because of invariance (on all but a finite number of strings, any two languages will differ by at most some fixed constant). But the point is that you have to take a single fixed programming language. Then the issue raised in the blog post does not arise.

As this post is almost a year old, I doubt it matters much or is worth contacting the author though. But for future reference, if you think you've found such a basic mistake in a foundational concept in math or CS, you can always ask about it on cs.stackexchange.com or math.stackexchange.com rather than making a potentially embarassing post about how mathematicians are wrong/silly.


The website is down so I have no idea what his argument is, but when I studied the concept I found it alarming they just handwave away the issue that it depends what Turing machine you use. There's no reason the constant can't be arbitrarily huge.


If it helps any, the concept of KC is so impractical that this is probably the least of our worries.... The main important thing is that any theorem you prove will hold no matter which programming language/Universal TM you choose.


As long as arbitrarily huge is finite it doesn't really affect the usefulness of kolmogorov complexity.


How doesn't it? It makes it completely impractical to use because the complexity you get for any string is entirely dependent on the language you use. We have no way of deriving what function set the universe uses from first principles.


Just as for any finite string you chose you can chose a finite language L to trivially produce it, for any finite language L you chose, you can chose a finite string S that can't be trivially produced. If you chose L such that it trivially covers the structures of one universe (or a finite number of universes), there are infinite other universes in the same class of complexity which it doesn't cover.


I'm not sure if you are disagreeing with me or not. The problem is that we have no idea what language or metalanguage we should start with.


The point is entertaining, but does not undermine Kolmogorov complexity.

First, for any single fixed language, the problem does not exist. That includes any single L_silly language.

Second, when considering the set of all languages, there is indeed an issue. However it can be remedied by requiring an implementation of the language - while we might say that program P_117 has, in language L, the semantics that it emits

2983483746923874560238475613849523984560238475620837456198736491378461938764134

(a fixed very long string), supposedly showing that P_117 which is of length 5 can emit such a long string, it doesn't really. To implement that language, you would need to encode that long string somehow in the implementation.

In other words, if you consider a single language, that's fine. If you consider all possible languages, you must require an implementation of them, not just an abstract definition. The implementation must be shared between all of them, which is equivalent to a single language in the first place. And a single language is really where the benefit and intuition of Kolmogorov complexity lies anyhow.

The confusion arises when we try to prove that the language "doesn't matter" as it's just a constant factor. There is something faulty with that proof, or rather it proves something other than what it is assumed to, but Kolmogorov complexity itself is still valid.


> There is something faulty with that proof, or rather it proves something other than what it is assumed to, but Kolmogorov complexity itself is still valid.

Well, Kolmogorov complexity depends on language choice, there's no way around that and the invariance theorem doesn't say anything different. It's more subtle than that, so we have to be precise.

The invariance theorem shows that given languages L and M, there's a constant c such that for any string x, |K_L(x) - K_M(x)| <= c. Intuitively, that constant c is the program size of an M interpreter in L (or vice versa). It's easy to see how K_L(x) can't be larger than <an M interpreter in L> plus <the shortest M program that outputs x>.

What this means is that given sufficiently large and complex strings x, K_L(x) is very close to K_M(x), because c is constant and finite, while you can construct a string x with arbitrarily large K_L(x). This gives us confidence that K_L measures something "real", even though our measurement is relative to the (arbitrary) definition of L.

Of course, for short programs, the constant factors dominate. You won't get a shorter Ruby program for your 30 character string by implementing a full Perl interpreter. In this case, the constant difference c is so large relative to |x| that the invariance theorem, while technically correct, is basically pointless.


Yes, exactly. I think the difficulty is that the invariance theorem talks about any two fixed languages - then we can bound the difference between them - but the article here talks about all possible languages. If you consider them all at once, things do seem peculiar, but it's meaningless: Considering any one is sufficient, and the invariance theorem justifies that by showing that no other one would be better.


This post is a bit silly. Kolmogorov complexity doesn't have anything to do with computer languages, it has to do with instructions to describe something. The instructions include verysilly, the assembler/interpreter implementation of verysilly, and how to build the computer it's running on.

In that light, the constant factors are still trivial.


The major problem with the post is that it does not cover the complexity of the language itself. L(p) should equal to L(d, p) where d is the definition of the language used to write p. So it becomes a whole different question now. And indeed for L(s) where s = (d,p) you can now find the most minimal language definition and an empty p such that it can produce x. K_L(x) = |s| = |d| + |p| And for input receiving programs we need to consider the input as part of the program itself as well.

But those are just formalities, the great beauty of Kolmogorov complexity starts to emerge when you start to consider archive utilities as an approximation for K(x). Think K(x) ~ |Zip(X)| or K(x) ~ |Rar(x)| now you have a real tool to approximate K(x) and guess what it is already installed. Now what about Kd(x,y) = K(x) - K(y)/K(x||y) ~ |Zip(x)| - |Zip(y)| / |Zip(x||y)| It is a distance metric for how entropically far is x from y and that is very interesting and can be very useful for anomaly detection for instance or password strength comparison.


How do you measure the length of the definition of the language used to write p? You just have the same problem all over again, because it depends on the meta-language you use to define your language.


I think the post is correct that we can define a language which makes the algorithmic complexity of a given string zero. This doesn't contradict the invariance theorem, as far as I can tell. The constant c in the invariance theorem

    |K_1(s) - K_2(s)| <= c
will be at least the complexity of the special string in the other language, but it still exists.


If you want to define KC as the minimal length of some set of instructions to describe the string, then you still haven't gotten away from the fact that a common language must still be fixed in advance. You just changed it from a computer language to English (or whatever language your instructions are written in).


The author misses the point that Kolmogorov Complexity isn't meant to be used as a practical measurement, but is just a theoretical way to think about information. There is no way to measure a string's "Kolmogorov Units", but you can just generally agree that "abababababababababababababababababababab" is less complex than "ababaaabbabaabaababbbabaababbbaaababbbba".

If you really wanted a practical standard measure, the language would have to be simple Turing machine instructions, so that the language implementation isn't more optimized for expressing certain things (which is the problem discussed in the post).


Basing a concrete definition of Kolmogorov complexity on the lambda calculus rather than Turing machines has many advantages.

See https://en.wikipedia.org/wiki/Binary_lambda_calculus for a concrete definition, and a proof that the complexity of the prime numbers is at most 167 bits.

This language can be implemented in only 25 lines of C.


The problem the author is hitting on is that there isn't any standard language you can use. Every language will express some things better and other things worse.

This is a philosophical problem in how we should do induction. It's also a practical one in creating artificial intelligence.


This seems to be missing the point of Kolmogorov complexity. In Kolmogorov complexity we aren't concerned with the complexity of some fixed finite string, for this very reason.


If we aren't concerned about the Kolmogorov complexity of any fixed finite strings, then what exactly is it useful for?


It's a mathematical device for reasoning about compressibility. It's like saying "What's the use of Shannon Entropy of fixed strings?"

Algorithmic information theory also allows you to re-derive Godel's Incompleteness Theorem or the Halting Theorem from it, in a sense, it's a super-set of those concepts.

For example, we can talk about the compression of knowledge into finite sets of axioms and many bits of information exist in such a system.

Or we can reason about concepts like Omega, a truly random number in mathematics, incompressible in the sense then there is provably no shorter way to encode it other than brute force calculating it, but which is a real number, and which we can know the first few bits of it. (and a quite interesting number in the sense that it's an Oracle of knowledge and if you had it's complete expansion, you could solve many questions in mathematics with it)


Interesting. Is this the Omega you're referring to?

http://en.wikipedia.org/wiki/Chaitin%27s_constant


I think the author's point is that K(s) is the shortest program in _any_ language. Thus for any string s there exists a language ("silly,s") that produces the program with length 0 (or 1).


But this just misunderstands the definition. In the standard definition you pick _some_ Turing-complete language, and then define K(s) with respect to that language. You can then go on to show that in the limit of longer and longer strings, it does not make much difference which language you pick.


Precisely.

The author defines language as a mapper from input to output, which is necessary but not sufficient. Its L_silly (written in Ruby) is not a language since it uses "eval" which depends on the entirety of Ruby.


Hiding the entropy in the language reminds me of this writeup about the $5000 compression challenge from back in the day: http://www.patrickcraig.co.uk/other/compression.htm


An analogy: there are multiple calendar systems in widespread use. Each one of those defines a different year 0. In fact, how I define time 0 is completely arbitrary, so that for any given moment in time, I can define a calendar system that defines this moment to be exactly 00:00:00 on day 1 of year 0 in my new calendar system. From this fact, using the logic of the article, I can show that any moment is time 0 (in some calendar), and thus measuring time is completely pointless!


It's more like each calendar system maps the time to some completely arbitrary function. So in one calendar system, the number of years between 2000 and 2014 is 14 years, and the distance between 1990 and 1991 is 1,000 years.

But at least with calendars it might be manageable. We want to do operations like find the lowest number of bits needed to represent some data, or to do comparisons between the complexity of different strings. If this varies based on the programming language you use, then it's problematic at the very least.


How can I downvote this post? He uses a non-standard definition with a very obvious flaw, and then proceeds to explain how Kolmogorov complexity is "silly" because it has that flaw.


If you want an actual introduction to Information Theory, I'd recommend "An Introduction to Information Theory: Symbols, Signals and Noise".

It deals with complicated information theory topics similar to Kolmogorov Complexity (albeit by a different name) in an easy-to-approach way. Highly recommended.

[1]: http://www.amazon.com/An-Introduction-Information-Theory-Mat...


The author is not including the machine size.

From wikipedia:

"We could, alternatively, choose an encoding for Turing machines, where an encoding is a function which associates to each Turing Machine M a bitstring <M>. If M is a Turing Machine which, on input w, outputs string x, then the concatenated string <M> w is a description of x. For theoretical analysis, this approach is more suited for constructing detailed formal proofs and is generally preferred in the research literature. In this article, an informal approach is discussed."

If you consider the length of the Turing machine and the input string, then your don't get K = 0. The author's approach only includes the size of w, which is always 0.

How big are the machines used by the author's silly approach? If we wanted to describe the machine that prints "abababababbaba", we could write the program/specification as:

(1) Use the machine L_silly_abababababbaba (2) Use the string ''

How can we encode this into a shorter program/specification? Maybe by concatenating the machine name and ''?

L_silly_abababababbaba ''

Or we can be clever and realize that the string is always '' so we just need to machine name

L_silly_abababababbaba

Or we can say we don't need the L_silly_

abababababbaba

Can we go shorter? What if we specify a gzipped version of the string? We can take the string, unzip it, and use that Turing machine. But most strings are not compressible via gzip. So there will be many strings for which the shortest program via this approach is:

abababababbaba

... and some of the other words in this posted comment...


-----

PS C:\programming\Kolmogorov> ruby .\interpreter.rb "puts(\`"ab\`" * 20)"

abababababababababababababababababababab

So even with this universal language, we still have Ksilly,x2(x2)=0

-----

So you input string "puts(\`"ab\`" * 20)" into a ruby program, and somehow Kolmogorov complexity is 0 and not length of the argument you passed? I don't buy that.


Yea, he's wrong on that. The point of tailoring languages to a string still -- weakly -- stands, though: but he could also do that for a finite number of strings. Weakly also because he completely misses the point of Kolmogorov complexity. It's not meant to be practical in any way. It's a mathematical tool to uncover some facts about the complexity of things (mostly mathematical objects) and a more solid framework for data compression and randomness, which he glanced at the beginning of the article. In a sense it's like other non-computable constructions, it provides truths without specific instances, because their existence would be too powerful.

Some of the interesting theorems are found in a chapter of the following book:

http://www.amazon.com/Elements-Information-Theory-Thomas-Cov...

It gives a good grasp that Kolmogorov complexity answers meaningful questions not well defined in regular information theory: questions of optimality in finite lengths.


This proof doesn't show that Kolmogorov complexity of any fixed string is 0, it shows that Kolmogorov complexity of any fixed string is constant (which is trivial). That is obscured by an abuse of notation which makes arguments about Kolmogorov complexity easier to parse in 99% of cases. So, imho, this post isn't very substantial - it's essentially a garden variety "1 = 2" sort of argument.


If you allow your definition of "language" to include those that output the same fixed string for any input, then yes, much silliness will ensue. But of course, this has nothing to do with Kolmogorov, and everything to do with the fact that you often get silliness when you redefine words into meaninglessness.


Cool, thanks. Weird coincidence... just a few days ago I broke out my old information theory textbook (Cover & Thomas) and worked through this exact same line of reasoning to convince myself that "the Kolmogorov complexity" of a string depends very strongly on the choice of language, and can be brought down to 0 in adversarial cases like what you described.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: