
Kolmogorov Complexity and Our Search for Meaning - draenei
http://nautil.us/issue/63/horizons/kolmogorov-complexity-and-our-search-for-meaning
======
Xcelerate
While what the author says is true (and I'm sure he knows vastly more about
the subject than I do), I think the way that Kolmogorov complexity is
described in the article is a bit misleading to a general audience. For short
strings, we often can and _do_ compute the Kolmogorov complexity. See some of
Hector Zenil's work, e.g. "Calculating Kolmogorov Complexity from the Output
Frequency Distributions of Small Turing Machines"
([https://arxiv.org/abs/1211.1302](https://arxiv.org/abs/1211.1302)).

What "uncomputable" means is that for the _general_ case, there cannot exist a
program that is guaranteed to find the Kolmogorov complexity of a given input
string. This is because as we enumerate through all Turing machines from
smallest to largest, some will not halt, but we can’t know if this is because
the program is just running for a very long time or if it is actually going to
run forever. In specific cases for short strings, we can often analytically
determine whether the program will halt or not. This is also why we know the
first few values of the Busy Beaver function, even though it is also
uncomputable.

~~~
jeromebaek
Well, obviously, the shortest program that outputs "0" is "0". And the
shortest program that outputs "1" is "1". We're only interested in the general
case, not specific cases.

Sometimes I think this distinction between "general" and "specific" is as deep
as the whole idea of uncomputability. When someone says "arbitrary Turing
machine" it is very difficult for a layman to wrap their head around what this
means. No wonder: I'm using the word "arbitrary", which means "random", which
means... uncomputable.

~~~
sadgit
> the shortest program that outputs "0" is "0".

Im tempted to debate this... in any programming language you need to use some
space to differentiate a literal from code to be executed. So, according to
Kolmogorov complexity, wouldn’t the shortest program to print a random number
(or 0) be larger than that number?

~~~
nl
No, you can conceive a language which echos everything unless it is prefixed
with an operator indicator.

~~~
sadgit
But you still seem to be in agreement that the implementation of the language
has a neccesary overhead... printing the operator that indicates code would
require another escape sequence, and the overhead of the operator to identify
code would mean that some compressible strings dont benefit from compression.

~~~
svantana
Well of course -- any machine/language is a map from inputs (programs) to
outputs, and the pigeon hole principle enforces that if some strings are
shortened, others must be lengthened.

------
dri_ft
"From the earliest days of information theory it has been appreciated that
information per se is not a good measure of message value. For example, a
typical sequence of coin tosses has high information content but little value;
an ephemeris, giving the positions of the moon and planets every day for a
hundred years, has no more information than the equations of motion and
initial conditions from which it was calculated, but saves its owner the
effort of recalculating these positions. The value of a message thus appears
to reside not in its information (its absolutely unpredictable parts), nor in
its obvious redundancy (verbatim repetitions, unequal digit frequencies), but
rather in what might be called its buried redundancy--parts predictable only
with difficulty, things the receiver could in principle have figured out
without being told, but only at considerable cost in money, time, or
computation. In other words, the value of a message is the amount of
mathematical or other work plausibly done by its originator, which its
receiver is saved from having to repeat."

—Bennett, Charles H. "Logical depth and physical complexity." _The Universal
Turing Machine: A Half-Century Survey_.

~~~
emanueldima
Is there a mathematical theory trying to quantify this value?

~~~
sgentle
You might find relative entropy, aka information gain, aka Kullback-Liebler
divergence interesting:
[https://en.m.wikipedia.org/wiki/Kullback–Leibler_divergence](https://en.m.wikipedia.org/wiki/Kullback–Leibler_divergence)

Also this treatment of K-L divergence as a measure of "Bayesian surprise":
[http://ilab.usc.edu/surprise/](http://ilab.usc.edu/surprise/)

That's all based on Shannon entropy (probabilistic), not Kolmogorov complexity
(algorithmic), but there are a lot of connections between them. This paper is
a pretty thorough summary:
[https://homepages.cwi.nl/~paulv/papers/info.pdf](https://homepages.cwi.nl/~paulv/papers/info.pdf)

And here's a paper defining algorithmic relative complexity by analogy to
relative entropy: [http://www.mdpi.com/1099-4300/13/4/902/pdf-
vor](http://www.mdpi.com/1099-4300/13/4/902/pdf-vor)

"We define the cross-complexity of an object x with respect to another object
y as the amount of computational resources needed to specify x in terms of y,
and the complexity of x related to y as the compression power which is lost
when adopting such a description for x, compared to the shortest
representation of x"

------
mikorym
Good article. I find this to be one of those rarer cases where a popular
article on mathematics is both precise enough to be informative and accessible
enough to maintain the audience.

The only part I dislike is the title. But perhaps that was the editors. I do
commend the author though on managing to write this part:

"The fact that Kolmogorov complexity is not computable is a result in pure
mathematics and we should never confuse that pristine realm with the far more
complicated, and messy, real world. However, there are certain common themes
about Kolmogorov complexity theory that we might take with us when thinking
about the real world."

I am not well versed in any of Kolmogorov's work, but the introduction
certainly makes the spirit of his work much clearer.

------
tromp
My webpage
[http://tromp.github.io/cl/cl.html](http://tromp.github.io/cl/cl.html)
demonstrates a concrete shortest program for the characteristic sequence of
prime numbers in a minimal language (if someone can find a shorter program in
the same language, I'd be happy to pay a $50 reward).

------
mpweiher
First a small question:

"While a computer might find some pattern in a string, it cannot find the best
pattern."

Isn't this wrong? The computer might have found the best pattern, we just
cannot know that it really is the best pattern. The two ideas seem close but
not quite the same thing, and I think Kolmogorov complexity and its proof is
about the latter, not the former.

Anyway, I think the idea of Kolmogorov complexity is absolutely fascinating,
and seems highly applicable to a wide variety of human cognitive/information
processing. For example, scientific theories appear to be algorithms that
"compress" our observations of the real world, and when our theories get
better, the compression gets better.

Or music. I think everyone knows the phenomenon that relative novices find
certain avant-garde styles to be "just noise", whereas experts find that
beautiful and the simpler music boring. Well, if you don't have the "decoding
algorithm" yet, the more complex piece really _is_ noise, whereas if you have
the more complex decoder, you can decode that piece and see the beauty in it
(which appears to be connected to skirting close to the maximum information
density you can find).

And so on.

~~~
jeromebaek
Seems like a problem for philosophy of language... When we say "it cannot find
the best pattern", what does this mean?

1\. We assume "the best pattern" _exists_.

But what do we mean here by "exists"? If something is uncomputable does it
exist? In what sense?

~~~
arketyp
The Gödel sentence is uncomputable but is commonly held to be true. A non-
constructivst would probably be more inclined to say that something which is
uncomputable can exist. But it is not obvious (to me). Noteworthy is that Per
Martin-Löf who is also credited for the accepted definition of algorithmic
randomness for infinite sequences, based on Kolmogorov complexity, developed
inuitionistic type theory, a framework for the foundation of mathematics in
the constructivist school.

~~~
mikorym
What is Martin-Löf's definition of algorithmic randomness (for infinite
sequences)?

~~~
arketyp
I'm not competent enough to attempt my own explanation. The Wikipedia articles
are good:
[https://en.wikipedia.org/wiki/Algorithmically_random_sequenc...](https://en.wikipedia.org/wiki/Algorithmically_random_sequence)

------
YeGoblynQueenne
>> While a computer might find some pattern in a string, it cannot find the
best pattern. We might find some short program that outputs a certain pattern,
but there could exist an even shorter program. We will never know.

If more programmers -especially functional programmers- were aware of this
limitation, we would have avoided countless flamewars about whose language is
best- where "the best language" is the one where someone has written a program
to perform some task, that is shorter than some other program to perform the
same task in another language sometimes written by another programmer (but
often, the same one).

~~~
jl2718
Well then I have to ask the opposite question. Can a language be made to
maximize the difficulty of all possible tasks?

~~~
im3w1l
If public key cryptography turns out to be mathematically sound (as crazy as
it may seem the functioning of basically our whole tech infrastructure is
based on mathematical conjecture), then we can use that to "preprocess" source
files before sending them to say a C++ compiler. The result should be turing
complete and basically impossible to program.

------
vanderZwan
> _Let us look at the above three strings. The first two strings can be
> described by relatively short computer programs:_

> _1\. Print “100” 30 times._

> _2\. Print the first 25 prime numbers._

> _The Kolmogorov complexity of the first string is less than the Kolmogorov
> complexity of the second string because the first program is shorter than
> the second program._

The second also abstracts away the concept of prime numbers. All of which have
their own Kolmogorov complexity, no? Is there such a thing as talking about
local and externalized Kolmogorov complexity? Actually, both assume
understanding of multiplication, decimal notation, printing.

~~~
sadgit
Its a good point, but you wouldn’t have to extend the length of the string
very much to be able to save space by including a prime number checker.

------
_greim_
This may sound silly, but does any aspect of this take into account which
language is used to print the string? Suppose there's a language where a
single dot "." is the command to print out some particular string. Then that
string's complexity _in that language_ is the shortest possible, and so forth.
I guess that language's implementation would somehow have to contain a
representation of the string, but that just begs the question of what meta-
language the language is implemented in.

~~~
FartyMcFarter
You need to choose a language and use it as the basis to define your
Kolmogorov complexity for _all strings_.

Otherwise, as you said, the definition of Kolmogorov complexity becomes
pointless and vacuous - every string has "Kolmogorov complexity" equal to 1!

Once you have done this and your language is Turing complete, your language's
program sizes can be transformed to other languages' and vice-versa by simply
emulating languages in each other (which does not necessarily give you the
shortest possible program, but just a bound for its size).

------
Ono-Sendai
In my opinion Kolmogorov complexity is unfortunately limited (if not useless)
for saying anything about fundamental complexity - see

Kolmogorov Complexity - it's a bit silly -
[http://forwardscattering.org/post/7](http://forwardscattering.org/post/7) and
More on Kolmogorov Complexity -
[http://forwardscattering.org/post/14](http://forwardscattering.org/post/14)

------
FartyMcFarter
> Alas, no such computer can exist! As powerful as modern computers are, this
> task cannot be accomplished. This is the content of one of the deepest
> theorems in mathematical logic. Basically, the theorem says that the
> Kolmogorov complexity of a string cannot be computed.

I disagree that this theorem is very deep. Even a CS-grad noob like me can
easily understand several proofs of it.

~~~
throwawaymath
_> I disagree that this theorem is very deep. Even a CS-grad noob like me can
easily understand several proofs of it._

I guess this depends entirely on your definition of "deep", but many extremely
important and far-reaching theorems (and corresponding proofs) in computer
science and mathematics are not particularly difficult to understand for a CS
graduate. Ideally you _should_ understand the proofs of various deep results.

If you already have an undergraduate degree in computer science, you will
likely understand results like the CAP theorem or Noisy-Channel Coding
theorem. If you don't already, you can probably get the idea in an hour or two
of reading. With a little more effort you can more or less fully understand
the proofs of various results in mathematics, like Bolzano-Weierstrass, Bayes,
Fundamental Theorem of Calculus, etc. These are all very important results
that dramatically changed the research landscape at the time they were
discovered. The Pythagorean Theorem and proof (by infinite descent) of the
irrationality of non-integral square roots are wildly important, but you
should trivially learn those in elementary or middle school.

Basically, it's a little odd to assert a result is not deep simply because you
can understand its proof. In fact, good students working through a textbook
can often formulate their own proofs of important theorems before they read
the author's if the book is structured especially well.

------
muxator
I just learnt I independently "discovered" Kolmogorov complexity in my
graduation thesis, while comparing the information content of a program and of
its refactored version.

The only obstacle to attaining a Turing award is probably being 50 years late,
and realizing it 10 more years later...

No buono!

------
TomMckenny
It's a nice article and while the example...

"The smallest number that cannot be described in less than 15 words"

...is stimulating, it is an example of a sentence that does not describe a
number rather than a number whose description can not be found.

