

Information Theory: Obstacles To True Artificial Intelligence - enki
http://paulbohm.com/articles/artificial-intelligence-obstacles-information-theory/

======
platypii
While I believe the author is not wrong in his argument that information
theory is a bound on machine intellignce, I think the problem with using
information theoretic bounds is that it is not a tight bound. In other words,
I don't think information theory is currently the limiting factor.

The author argues that any AI will consist of two parts, the machine learning
program L and the training set T, which combine to form the "intelligent"
program M. And thus by information theory, k(L) + k(T) >= k(M) [where k is the
kolmogorov complexity]. Thus M is bounded by the information in L and also
bounded by T. The author argues that since these both depend on humans
supplying them, we are limited by the human factor.

But how much information does one really need for AI? Well, how much
information is necessary for human intelligence? Assuming that L is our genome
(ignoring epigenetics), and T is our life experiences. The amount of data in
our genome is on the order of around 3gb. I would argue that's certainly
within the realm of feasibility for programmer's output. How about the
training set T? That's harder to say; does it include video, audio, touch,
etc? How many years until a human is considered intelligent (by AI standards)?
I think it's safe to say that a 10 year old blind human could pass a Turing
test. So if we ignore tactile and olfactory feedback, we basically just need
10 years of compressed audio as the training set. Generously encoding the
audio at 128kbps 24/7 for 10 years = 4.7 terabytes. Which is easily within the
realm of current machine learning. We have far more information than that (and
much more densely encoded as text), but still aren't close to True AI.

I think the problem is not that we don't have enough Information, I think it's
that we have not yet searched enough of the problem space. And that's where
more hardware can help us.

~~~
enki
as per <http://paulbohm.com/pdf/10.1.1.76.5543.pdf> and
<http://paulbohm.com/pdf/general_limitations.pdf> learning by example requires
many more examples than the kolmogorov complexity of the target concept.

T then is not just the life experience of one human - to learn from
observation by example, to reconstruct a human, you'd need to observe the life
experiences of trillions and trillions of humans to gain sufficient
information about humans to narrow down the possible implementations that
match human behavior not just mechanically given the same input, but also
given new previously untested input.

at 3gb encoded as dna the search space already is huge, but that ignores that
the genome alone doesn't contain the information needed to read it. (e.g. you
need a living thing to use the DNA, for it to make sense)

~~~
platypii
Well I guess I have some citations to read later, but intuitively I just have
a hard time believing that lack of training data is the problem. There's so
incredibly much data available on the internet. If that was the limiting
factor, it would just be a matter of throwing more data at T to increase the
amount of information in M, but that doesn't help if we haven't found the
right machine learning algorithm L. And searching that space is the real
challenge of AI. To search that space we either need human experts, huge
amounts of computational power to brute force, or some combination thereof.
We've tried human experts alone for decades, without much success, so I think
it's likely that we will need some computational assistance to find the right
algorithms. That's why AI people love throwing around graphs of Moore's law.

~~~
enki
not if the amount of training data required to learn by example is exponential
or even combinatorial. we might not even be in the same ballpark. all data
ever recorded on digital media might not be enough to learn even a simple
intelligence without assumptions about structure.

a kind of analogous problem is: can you learn how to build a living thing
purely from digitally recorded DNA samples, if you don't have access to the
internal structure of any living thing? How many dna samples would you need?

------
vannevar
As an AI researcher, I've often been asked when I think we will have
sufficient computational power for strong AI. I always answer, "About ten
years ago." And I've been giving that answer for over ten years. I agree with
the author that AI is a software problem rather than a hardware problem. But I
think he misformulates the limitations. In AI, the issue isn't whether we can
brute force a particular transformation (that is the ML approach), it's
whether we can create a self-organizing _system_ that recognizably
approximates human cognition. Growing a redwood versus trying to build one, so
to speak. Not an easier problem, but a different one whose limitations have
not yet been defined.

------
jbattle
This seems to be the core of the argument:

 _To get a better model, we either need make the learning algorithm more
complex/write more code (which is human work), or we need to gather more or
better sample data (which requires human work as well)._

Assuming this is a correct formulation, I don't see why this necessarily poses
problems. The gathering of data in particular seems like a process that can be
supercharged. You aren't restricted to one human speaking into the computer's
ear. You have (to start) the entire internet to consume. If you want/need more
structured data, you could hypothetically organize dozens or hundreds of
individual humans processing information for the 'mind'.

And once the AI has reached some valuable state, you can then start cloning it
(presumably a simple process of copying electronic state elsewhere). The one
or more limited AI's you've created could then be tasked with generating the
next step up the chain - even if that is simply learning to process and ingest
ever vaster amounts of information.

I'm not a wild-eyed futurist, but I either don't get or don't buy the
fundamental objection here.

~~~
enki
author: there's no fundamental objection. i'm only pointing out that throwing
faster computation at the problem doesn't necessarily solve it. right now all
the work essentially needs to be done by humans.

If you follow the link to the lower bound to learning by example, you'll see
that it's really hard to learn by example if you haven't pre-encoded ideas
about the interpretation into the reading program. Without prior knowledge,
you need lots and lots and lots of examples.

All the information that humanity has ever stored on digital media is likely
not remotely enough to reconstruct a human mind from.

