
Compressionism: A Theory of Mind Based on Data Compression [pdf] - optimalsolver
http://ceur-ws.org/Vol-1419/paper0045.pdf
======
dharma1
Jurgen Schmidhuber has similar ideas - "History of science is the history of
compression progress"
[https://youtu.be/3FIo6evmweo?t=1537](https://youtu.be/3FIo6evmweo?t=1537)

and (quoted in the paper)

[http://people.idsia.ch/~juergen/creativity.html](http://people.idsia.ch/~juergen/creativity.html)

~~~
optimalsolver
The Arbital entry on Unforeseen Maximums[0] is more pessimistic:

"Juergen Schmidhuber of IDSIA, during the 2009 Singularity Summit, gave a talk
proposing that the best and most moral utility function for an AI was the gain
in compression of sensory data over time. Schmidhuber gave examples of
valuable behaviors he thought this would motivate, like doing science and
understanding the universe, or the construction of art and highly aesthetic
objects.

Yudkowsky in Q&A suggested that this utility function would instead motivate
the construction of external objects that would internally generate random
cryptographic secrets, encrypt highly regular streams of 1s and 0s, and then
reveal the cryptographic secrets to the AI."

[0]
[https://arbital.greaterwrong.com/p/unforeseen_maximum/](https://arbital.greaterwrong.com/p/unforeseen_maximum/)

------
Bodell
I wrote finished a short story this year that explores some of these ideas
heavy-handedly. It is placed in a Universal Library setting, a continuation of
Borges' Library of Babel. In a sense it is both for and against the idea of
compression as a way to gaining knowledge, attempting to make a greyer area
out of it.

"Unlike the entirety of the Library, incomprehensible due to its sheer
vastness, the book itself was much smaller. Any single one could easily be
read hundreds of times over during the lifespan of a librarian. Therefore, the
book’s gibberish nature did not come of its own accord but simply sprouted out
of its relation to us and our lack of knowledge concerning the Library as a
whole. In other words, for any book to make any sense to any librarian, it
must have recognizable patterns, like those in language. And since every
possible book that could exist did, most of them were simply random,
nonsensical strings of characters. Leaving the librarians themselves to search
for the language or sets of languages that could bring meaning to them
(them:themselves or them:the books no one was sure). This was reflected —
maybe the same sort of reflection Tanner was searching for — in the various
languages the librarians did know and use; the ever-evolving nature of them,
incorporating and condensing larger and larger swaths of ideas that then
needed to be condensed even further into shorter acronyms for quicker
reference."

[https://medium.com/@odell.brenden/a-return-to-the-library-
of...](https://medium.com/@odell.brenden/a-return-to-the-library-of-babel-
cb8f479f6000)

~~~
thrtythreeforty
Your link is dead, here's the Web Archive:

[https://web.archive.org/web/20200623192321/https://medium.co...](https://web.archive.org/web/20200623192321/https://medium.com/@odell.brenden/a-return-
to-the-library-of-babel-cb8f479f6000)

~~~
Bodell
I killed it a few days after this post as I am working on submitting for
publishing. Which is to say, I am a glutton for punishment and rejection,
haha. However, thank you for looking up the archive link. If you do read it
please email me with notes you may have. Any input good or bad would warmly
received.

------
Animats
Prediction and compression as a way to encode data cheaply is a reasonable
idea. It's really cheap to encode something that the predictor expected next.
This is essentially what GPT-2 and GPT-3 are doing.

The author seems to be missing some implications of that. One is that a
predictor/compressor has a bias towards what it has compressed before. We know
that humans tend to over-generalize, and that that's a survival trait.
Something like this might be the mechanism behind that.

~~~
murbard2
Over-generalizing means that you will compress worse, as does under-
generalizing. Can't use the mere fact that "compression is going on", you have
to look into utility, costs, etc.

~~~
jacobush
How? "All _X_ people behave in _Y_ ways" is an over-generalization and seems
to compress very well.

~~~
murbard2
You need to encode the delta between your model and what you observe, and that
delta will be expensive to encode if your model over-generalizes.

~~~
jacobush
Thanks, your comment made understand a tiny little bit more of the machine
learning lingo. I think. :-D

------
tcgv
Another useful concept of memory organization is what's called a "schema":

> A schema is any pattern of relationships among data stored in memory. It is
> any set of nodes and links between them in the spider web of memory that
> hang together so strongly that they can be retrieved and used more or less
> as a single unit. [1]

When thinking about something we naturally bring togheter plenty of related
information, thus this "compression" described in the papper must somehow be
capable of aggregating these relationships into the data as well prior to
compressing it.

[1] Richards J. Heuer Jr. Psychology of Intelligence Analysis. Center for the
Study of Intelligence, 1999.

~~~
ilaksh
Well I think that the compression part would be storing some type of abstract
subgraph that could then be referenced or re-used for similar situations
rather than being duplicated.

~~~
byteface
If neurons that fire together wire together. Can we assume there's a shared
subprocess and this is just a type of compression?

------
greyface-
Related short story: Kolmogorov's AI

[https://www.devever.net/~hl/fi/kolmogorov](https://www.devever.net/~hl/fi/kolmogorov)

------
mannykannot
When I see passages like that quoted below, I wonder if the authors are
falling for the economist's fallacy, assuming that objective, quantifiable,
repeatable measures are _ipso facto_ measuring what they want to measure. I
have not seen anything in this paper persuading me that understanding is a
consequence of compression, rather than vice-versa.

 _A weakness of the Turing test (Turing, 1950) is that a program might pass
the test simply by exploiting weaknesses in human psychology. If a given
system passes the test we cannot be sure if it was because of the quality of
the responses or the gullibility of the judge. In contrast, Hutter’s
compression test is more reliable. The more that data is compressed, the
harder it becomes to compress it further (Chaitin, 2006). Because there is no
way to cheat by using a simple heuristic, data compression presents a reliably
hard standard. We argue that this process of identifying deep patterns through
compression is what people mean when they attribute both ‘intelligence’ and
‘consciousness’._

------
hackinthebochs
Compression is important to processes going on in the mind, but I don't think
its a central concept to understand how the mind words. Compression is
relevant inasmuch as our brains perform lossy compression to capture the
relevant regularity in our sense data. But this is only the start of the
analysis; it doesn't explain anything useful about our cognitive architecture.
The problem is strong compression trades space requirements for computational
requirements. But going all-in on compression is optimizing for the wrong
thing in the context of a bag-of-neurons computational model.

von Neumann architectures are good at fast and precise state transitions. A
bag-of-neurons is good at leveraging relationships to minimize computational
requirements. Thus a plausible cognitive architecture for the mind should
emphasize relationships rather than computation. A good cognitive architecture
is one where relationships in the data are maximally revealed by the
architecture, thus minimizing the computational burden to access and utilize
the relationship. This explains why the visual system takes up so much volume
of the brain. 3-dimensional sense data is packed with interrelationships and
these relationships need support from the cognitive architecture to be
utilized. The compression view is only useful to a point--after that the
constraints on the bag-of-neurons model become dominant and must drive the
architecture search.

When it comes to phenomenal consciousness, the question then becomes: is there
anything it is like to be a process whose topological structure maximally
captures external and internal state relationships? My intuition tells me the
answer is yes. One functional requirement of a human brain is that it
`believes` it is the author of its decisions. To manifest this `belief`
requires a self-representation of itself distinct from the environment, i.e. a
target of the attribution of authorship. Facts of this self-representation
entail what it is like to be this topological structure.

------
visarga
Compression only cares about representing a set of data, while the brain also
needs to account for future experiences. Thus brain representations need to
also include information that is useless in the present but could be useful in
the future. Compression is definitely a part of the story but not the whole.

------
canjobear
When I see a citation to Integrated Information Theory I usually hit back two
or three times.

~~~
kordlessagain
The axioms of it don't look terribly different than that of the Buddhist
functional model:

[https://www.researchgate.net/publication/315940922_Mapping_t...](https://www.researchgate.net/publication/315940922_Mapping_the_Mind_A_Model_Based_on_Theravada_Buddhist_Texts_and_Practices)

------
Kednicma
This seems like a reasonable next step after considering the failure of
Totoni's IIT. IIT has the "problem" that a diode is a little conscious; the
authors embrace and extend this "problematic" viewpoint to cover the informal
popular idea that compression is knowledge.

Their view on the Hard Problem is not new. They say that the main trick to the
Hard Problem is compressing information about a "self" which is repeatedly re-
quantified. However, they then must rely on some notion of human individuality
in order to explain why qualia are subjective.

The only problems I have with this approach are the normal criticisms of
panpsychic approaches: It's hard to observe, there's few useful predictions,
rocks are a little conscious, etc. However, rocks _are_ a little conscious, or
at least a little alive, thanks to lithophile bacteria which form microscopic
filaments threading through the rock; this wasn't known in Berkeley's time!

~~~
TheOtherHobbes
That doesn't mean rocks are conscious, it means the things that live on rocks
are conscious - an interesting idea that appears to scale well.

There is no CS solution to the hard problem, because the hard problem is not
about information, or behaviour, or data compression, or rocks. It's about
subjectivity, and there isn't even a working definition of what subjectivity
is, never mind any experiment that can be done to confirm that it exists.

Subjectivity is _essentially metaphysical_ in the sense that it's outside of
science.

It's impossible to make a testable statement about any phenomenon of any kind
that isn't filtered through human perception and all the layers of human
cognitive processing.

In a very real sense it's objectivity that's the illusion. Try to make a
statement about anything whatsoever that is truly independent of collective
human sense perception and human mental processes and see how far you get.

Whatever you think you're doing in science or math, you're looking out through
a distorted window whose properties you're not aware of, and trying to
correlate your observations with others looking out through windows with
similar distortions.

The best you'll get is an interesting list of distortions which everyone
agrees on. Even if they have predictive power, that doesn't make them truly
objective - it just makes them a spiky median of collective experience instead
of an outlier of individual experience.

~~~
Kednicma
I agree with your direction, but have qualms with your points.

Suppose for absurdity I replay my original argument, but with humans, and you
reply "that doesn't mean humans are conscious; it means that the things which
are housed within skulls are conscious". I understand the nuance you're
introducing, but don't think that it helps. Part of the difficulty of the
Combination Problem is that the boundary of conscious control extends beyond
the nexus of thought. I'll accept that rocks are just the substrate, but
integrated circuits are just carefully cooked rocks; substrate-vs-
consciousness thinking might be wrong.

I agree with you that _science_ is done empirically, and thus doesn't
experience objectivity. However, _maths_ is quite formal, and in the past
century we managed to achieve "formal formality" with category theory. We now
can talk of certain objects as existing universally; in their context, not
only do they exist, but they _uniquely_ exist. For example, `1+1=2`, which is
to say that there's an equivalence between any pair of objects selected one by
one, and any pair taken both at a time; for any particular counting context,
there's only one natural numbers object.

I am not quite as pessimistic as you regarding the quality of the conclusions
that we reach. Rather than saying that our filters make us subjective, I would
simply say that the _experiences_ of humanity are essentially human. There's
an anthropic bias because humans can't experience anything that isn't human;
we can't think truly alien ideas or build truly alien artifacts. Everything we
think is non-human is actually human. This doesn't prevent us from drawing
conclusions, but it does forbid objectivity.

~~~
tsimionescu
> I'll accept that rocks are just the substrate, but integrated circuits are
> just carefully cooked rocks; substrate-vs-consciousness thinking might be
> wrong

It's a side point, but I think the important difference is that a rock can
exist even without bacteria (imagine the rock is somewhere close to the
earth's core, just below the temperature that would melt it into lava), while
the bacteria on the rock could also exist independently of the rock (they
could be blown by the wind and moved to a different rock, for example). So it
makes sense to discuss the amount of consciousness in the (sterile) rock
itself, and in the (floating) bacteria as well, though of course the system of
rock+bacteria can have its own amount of consciousness.

------
082349872349872
Sounds plausible, but I'm biased as it agrees with my theory for the origin of
consciousness:
[https://news.ycombinator.com/item?id=23475069](https://news.ycombinator.com/item?id=23475069)

------
j780
This is very interesting and reminds me of something I read 4 years ago. I had
wondered about data compression related to brain research after I had read
this blog: [https://probablydance.com/2016/04/30/neural-networks-are-
imp...](https://probablydance.com/2016/04/30/neural-networks-are-impressively-
good-at-compression/)

------
niknoble
Ever since I learned about autoencoders, I've been saying compression is at
the heart of intelligence.

------
byteface
True that all thought is abstraction. But not necessarily compression?
Hypostatic abstraction would expand the data as would building any kind of new
relations? Neuro modulators are brain-wide. We can also change our perception.

------
novia
(2015)

~~~
sjy
The full citation seems to be Maguire, Phil, Mulhall, Oisín, Maguire, Rebecca
and Taylor, Jessica (2015) _Compressionism: A Theory of Mind Based on Data
Compression._ Proceedings of the 11th International Conference on Cognitive
Science. pp. 294-299. ISSN 1613-0073. It’s annoying that there is no date or
publication information in the PDF itself.

------
hoseja
Is it vegan to pirate WinRar?

