
Kolmogorov Complexity - it's a bit silly - wglb
http://www.forwardscattering.org/post/7
======
bo1024
The post has an incorrect definition of Kolmogorov Complexity.

It says:

> The overall Kolmogorov complexity of a string is thus defined as K(x)=|p|
> where p is the shortest program string for language L such that L(p)=x and
> we consider all programming languages.

This is wrong. The definition of Kolmogorov complexity is always given
relative to some fixed programming language. For example, K_perl or K_ruby are
just fine, although we usually consider some Universal Turing Machine. The
reason that we can prove theorems about Kolmogorov Complexity without even
specifying which language we fixed is because of invariance (on all but a
finite number of strings, any two languages will differ by at most some fixed
constant). But the point is that you have to take a single fixed programming
language. Then the issue raised in the blog post does not arise.

As this post is almost a year old, I doubt it matters much or is worth
contacting the author though. But for future reference, if you think you've
found such a basic mistake in a foundational concept in math or CS, you can
always ask about it on cs.stackexchange.com or math.stackexchange.com rather
than making a potentially embarassing post about how mathematicians are
wrong/silly.

~~~
Houshalter
The website is down so I have no idea what his argument is, but when I studied
the concept I found it alarming they just handwave away the issue that it
depends what Turing machine you use. There's no reason the constant can't be
arbitrarily huge.

~~~
cma
As long as arbitrarily huge is finite it doesn't really affect the usefulness
of kolmogorov complexity.

~~~
Houshalter
How doesn't it? It makes it completely impractical to use because the
complexity you get for any string is entirely dependent on the language you
use. We have no way of deriving what function set the universe uses from first
principles.

~~~
cma
Just as for any finite string you chose you can chose a finite language L to
trivially produce it, for any finite language L you chose, you can chose a
finite string S that can't be trivially produced. If you chose L such that it
trivially covers the structures of one universe (or a finite number of
universes), there are infinite other universes in the same class of complexity
which it doesn't cover.

~~~
Houshalter
I'm not sure if you are disagreeing with me or not. The problem is that we
have no idea what language or metalanguage we should start with.

------
azakai
The point is entertaining, but does not undermine Kolmogorov complexity.

First, for any single fixed language, the problem does not exist. That
includes any single L_silly language.

Second, when considering the set of all languages, there is indeed an issue.
However it can be remedied by requiring an implementation of the language -
while we might say that program P_117 has, in language L, the semantics that
it emits

2983483746923874560238475613849523984560238475620837456198736491378461938764134

(a fixed very long string), supposedly showing that P_117 which is of length 5
can emit such a long string, it doesn't really. To implement that language,
you would need to encode that long string somehow in the implementation.

In other words, if you consider a single language, that's fine. If you
consider all possible languages, you must require an implementation of them,
not just an abstract definition. The implementation must be shared between all
of them, which is equivalent to a single language in the first place. And a
single language is really where the benefit and intuition of Kolmogorov
complexity lies __anyhow __.

The confusion arises when we try to prove that the language "doesn't matter"
as it's just a constant factor. There is something faulty with that proof, or
rather it proves something other than what it is assumed to, but Kolmogorov
complexity itself is still valid.

~~~
codeflo
> There is something faulty with that proof, or rather it proves something
> other than what it is assumed to, but Kolmogorov complexity itself is still
> valid.

Well, Kolmogorov complexity depends on language choice, there's no way around
that and the invariance theorem doesn't say anything different. It's more
subtle than that, so we have to be precise.

The invariance theorem shows that given languages L and M, there's a constant
c such that for any string x, |K_L(x) - K_M(x)| <= c. Intuitively, that
constant c is the program size of an M interpreter in L (or vice versa). It's
easy to see how K_L(x) can't be larger than <an M interpreter in L> plus <the
shortest M program that outputs x>.

What this means is that given sufficiently large and complex strings x, K_L(x)
is very close to K_M(x), because c is constant and finite, while you can
construct a string x with arbitrarily large K_L(x). This gives us confidence
that K_L measures something "real", even though our measurement is relative to
the (arbitrary) definition of L.

Of course, for short programs, the constant factors dominate. You won't get a
shorter Ruby program for your 30 character string by implementing a full Perl
interpreter. In this case, the constant difference c is so large relative to
|x| that the invariance theorem, while technically correct, is basically
pointless.

~~~
azakai
Yes, exactly. I think the difficulty is that the invariance theorem talks
about any two fixed languages - then we can bound the difference between them
- but the article here talks about all possible languages. If you consider
them all at once, things do seem peculiar, but it's meaningless: Considering
any one is sufficient, and the invariance theorem justifies that by showing
that no other one would be better.

------
pessimizer
This post is a bit silly. Kolmogorov complexity doesn't have anything to do
with computer languages, it has to do with instructions to describe something.
The instructions include verysilly, the assembler/interpreter implementation
of verysilly, and how to build the computer it's running on.

In that light, the constant factors are still trivial.

~~~
syahalom
The major problem with the post is that it does not cover the complexity of
the language itself. L(p) should equal to L(d, p) where d is the definition of
the language used to write p. So it becomes a whole different question now.
And indeed for L(s) where s = (d,p) you can now find the most minimal language
definition and an empty p such that it can produce x. K_L(x) = |s| = |d| + |p|
And for input receiving programs we need to consider the input as part of the
program itself as well.

But those are just formalities, the great beauty of Kolmogorov complexity
starts to emerge when you start to consider archive utilities as an
approximation for K(x). Think K(x) ~ |Zip(X)| or K(x) ~ |Rar(x)| now you have
a real tool to approximate K(x) and guess what it is already installed. Now
what about Kd(x,y) = K(x) - K(y)/K(x||y) ~ |Zip(x)| - |Zip(y)| / |Zip(x||y)|
It is a distance metric for how entropically far is x from y and that is very
interesting and can be very useful for anomaly detection for instance or
password strength comparison.

~~~
kcanini
How do you measure the length of the definition of the language used to write
p? You just have the same problem all over again, because it depends on the
meta-language you use to define your language.

------
mappum
The author misses the point that Kolmogorov Complexity isn't meant to be used
as a practical measurement, but is just a theoretical way to think about
information. There is no way to measure a string's "Kolmogorov Units", but you
can just generally agree that "abababababababababababababababababababab" is
less complex than "ababaaabbabaabaababbbabaababbbaaababbbba".

If you really wanted a practical standard measure, the language would have to
be simple Turing machine instructions, so that the language implementation
isn't more optimized for expressing certain things (which is the problem
discussed in the post).

~~~
tromp
Basing a concrete definition of Kolmogorov complexity on the lambda calculus
rather than Turing machines has many advantages.

See
[https://en.wikipedia.org/wiki/Binary_lambda_calculus](https://en.wikipedia.org/wiki/Binary_lambda_calculus)
for a concrete definition, and a proof that the complexity of the prime
numbers is at most 167 bits.

This language can be implemented in only 25 lines of C.

------
odin1415
This seems to be missing the point of Kolmogorov complexity. In Kolmogorov
complexity we aren't concerned with the complexity of some fixed finite
string, for this very reason.

~~~
kcanini
If we aren't concerned about the Kolmogorov complexity of any fixed finite
strings, then what exactly is it useful for?

~~~
cromwellian
It's a mathematical device for reasoning about compressibility. It's like
saying "What's the use of Shannon Entropy of fixed strings?"

Algorithmic information theory also allows you to re-derive Godel's
Incompleteness Theorem or the Halting Theorem from it, in a sense, it's a
super-set of those concepts.

For example, we can talk about the compression of knowledge into finite sets
of axioms and many bits of information exist in such a system.

Or we can reason about concepts like Omega, a truly random number in
mathematics, incompressible in the sense then there is provably no shorter way
to encode it other than brute force calculating it, but which is a real
number, and which we can know the first few bits of it. (and a quite
interesting number in the sense that it's an Oracle of knowledge and if you
had it's complete expansion, you could solve many questions in mathematics
with it)

~~~
kcanini
Interesting. Is this the Omega you're referring to?

[http://en.wikipedia.org/wiki/Chaitin%27s_constant](http://en.wikipedia.org/wiki/Chaitin%27s_constant)

------
tsellon
Hiding the entropy in the language reminds me of this writeup about the $5000
compression challenge from back in the day:
[http://www.patrickcraig.co.uk/other/compression.htm](http://www.patrickcraig.co.uk/other/compression.htm)

------
codeflo
An analogy: there are multiple calendar systems in widespread use. Each one of
those defines a different year 0. In fact, how I define time 0 is completely
arbitrary, so that for any given moment in time, I can define a calendar
system that defines this moment to be exactly 00:00:00 on day 1 of year 0 in
my new calendar system. From this fact, using the logic of the article, I can
show that _any_ moment is time 0 (in some calendar), and thus measuring time
is completely pointless!

~~~
Houshalter
It's more like each calendar system maps the time to some completely arbitrary
function. So in one calendar system, the number of years between 2000 and 2014
is 14 years, and the distance between 1990 and 1991 is 1,000 years.

But at least with calendars it might be manageable. We want to do operations
like find the lowest number of bits needed to represent some data, or to do
comparisons between the complexity of different strings. If this varies based
on the programming language you use, then it's problematic at the very least.

------
gelisam
How can I downvote this post? He uses a non-standard definition with a very
obvious flaw, and then proceeds to explain how Kolmogorov complexity is
"silly" because it has that flaw.

------
bqe
If you want an actual introduction to Information Theory, I'd recommend "An
Introduction to Information Theory: Symbols, Signals and Noise".

It deals with complicated information theory topics similar to Kolmogorov
Complexity (albeit by a different name) in an easy-to-approach way. Highly
recommended.

[1]: [http://www.amazon.com/An-Introduction-Information-Theory-
Mat...](http://www.amazon.com/An-Introduction-Information-Theory-
Mathematics/dp/0486240614)

------
marris
The author is not including the machine size.

From wikipedia:

"We could, alternatively, choose an encoding for Turing machines, where an
encoding is a function which associates to each Turing Machine M a bitstring
<M>. If M is a Turing Machine which, on input w, outputs string x, then the
concatenated string <M> w is a description of x. For theoretical analysis,
this approach is more suited for constructing detailed formal proofs and is
generally preferred in the research literature. In this article, an informal
approach is discussed."

If you consider the length of the Turing machine _and_ the input string, then
your don't get K = 0. The author's approach only includes the size of w, which
is always 0.

How big are the machines used by the author's silly approach? If we wanted to
describe the machine that prints "abababababbaba", we could write the
program/specification as:

(1) Use the machine L_silly_abababababbaba (2) Use the string ''

How can we encode this into a shorter program/specification? Maybe by
concatenating the machine name and ''?

L_silly_abababababbaba ''

Or we can be clever and realize that the string is always '' so we just need
to machine name

L_silly_abababababbaba

Or we can say we don't need the L_silly_

abababababbaba

Can we go shorter? What if we specify a gzipped version of the string? We can
take the string, unzip it, and use that Turing machine. But most strings are
not compressible via gzip. So there will be many strings for which the
shortest program via this approach is:

abababababbaba

... and some of the other words in this posted comment...

------
TrainedMonkey
\-----

PS C:\programming\Kolmogorov> ruby .\interpreter.rb "puts(\\`"ab\\`" * 20)"

abababababababababababababababababababab

So even with this universal language, we still have Ksilly,x2(x2)=0

\-----

So you input string "puts(\\`"ab\\`" * 20)" into a ruby program, and somehow
Kolmogorov complexity is 0 and not length of the argument you passed? I don't
buy that.

~~~
darkmighty
Yea, he's wrong on that. The point of tailoring languages to a string still --
weakly -- stands, though: but he could also do that for a finite number of
strings. Weakly also because he completely misses the point of Kolmogorov
complexity. It's not meant to be practical in any way. It's a mathematical
tool to uncover some facts about the complexity of things (mostly mathematical
objects) and a more solid framework for data compression and randomness, which
he glanced at the beginning of the article. In a sense it's like other non-
computable constructions, it provides truths without specific instances,
because their existence would be too powerful.

Some of the interesting theorems are found in a chapter of the following book:

[http://www.amazon.com/Elements-Information-Theory-Thomas-
Cov...](http://www.amazon.com/Elements-Information-Theory-Thomas-Cover-
ebook/dp/B00HLG9ISQ/ref=sr_1_3?s=books&ie=UTF8&qid=1398217278&sr=1-3&keywords=information+theory)

It gives a good grasp that Kolmogorov complexity answers meaningful questions
not well defined in regular information theory: questions of optimality in
finite lengths.

------
thro2
This proof doesn't show that Kolmogorov complexity of any fixed string is 0,
it shows that Kolmogorov complexity of any fixed string is constant (which is
trivial). That is obscured by an abuse of notation which makes arguments about
Kolmogorov complexity easier to parse in 99% of cases. So, imho, this post
isn't very substantial - it's essentially a garden variety "1 = 2" sort of
argument.

------
javajosh
If you allow your definition of "language" to include those that output the
same fixed string for any input, then yes, much silliness will ensue. But of
course, this has nothing to do with Kolmogorov, and everything to do with the
fact that you often get silliness when you redefine words into
meaninglessness.

------
kcanini
Cool, thanks. Weird coincidence... just a few days ago I broke out my old
information theory textbook (Cover & Thomas) and worked through this exact
same line of reasoning to convince myself that "the Kolmogorov complexity" of
a string depends very strongly on the choice of language, and can be brought
down to 0 in adversarial cases like what you described.

