
Towards Machine Intelligence by IBM Research - laudney
http://arxiv.org/abs/1603.08262
======
laudney
There exists a theory of a single general-purpose learning algorithm which
could explain the principles of its operation. This theory assumes that the
brain has some initial rough architecture, a small library of simple innate
circuits which are prewired at birth and proposes that all significant mental
algorithms can be learned. Given current understanding and observations, this
paper reviews and lists the ingredients of such an algorithm from both
architectural and functional perspectives.

~~~
bcook
You should probably put that in quotes, since those words are not your own.

------
mindcrime
From reading the introduction, it sounds like the author is covering similar
ground as the book _The Master Algorithm_ [1] by Pedro Domingos[2]. If you
find this interesting, you may find his book interesting as well.

[1]:
[https://en.wikipedia.org/wiki/The_Master_Algorithm](https://en.wikipedia.org/wiki/The_Master_Algorithm)

[2]:
[http://homes.cs.washington.edu/~pedrod/](http://homes.cs.washington.edu/~pedrod/)

------
fizixer
Cites Schmidhuber besides Hinton and friends, so that is good.

But where is mention of Marcus Hutter and his AIXI formalism? Any review of
machine intelligence is incomplete without that.

~~~
TheOtherHobbes
I was unfamiliar with AIXI, so that was some interesting Google. Also, this:

"Since AIXI is incomputable, it assigns zero probability to its own
existence."

~~~
Houshalter
AIXI has many known problems. The interesting thing is that it is kind of like
a mathematically pure and elegant model of machine learning based AI. So any
problems with AIXI might apply to our current approaches to AI.

The biggest issue is that AIXI doesn't believe it exists in the universe it is
observing. It thinks it's playing some kind of video game. From this it
doesn't believe that it can truly die, it doesn't believe it can affect it's
own brain in any way, and it doesn't care about anything other than maximising
some "score" that the "game" provides it.

~~~
eli_gottlieb
Also, AIXI has only one layer of hierarchy to its model, and has no latent
parameters for its causal structures (Turing machines), nor any allowance for
compactness or uncountability of learned spaces.

It's a really incredibly bare-bones, skeletal model of a learning agent that
makes very rigid, unrealistic modeling assumptions _and then_ adds on
computational infeasibility.

~~~
Houshalter
I don't know what you mean by hierarchy, latent parameters, or compactness. It
models with Turing machine which are very general and can simulate other
Turing machine that are unboundedly huge.

The biggest issue is that an agent can't model the universe as merely a set of
inputs, outputs, and score. But there may not be a better way, it's quite a
difficult problem.

~~~
eli_gottlieb
>I don't know what you mean by hierarchy, latent parameters, or compactness.

The first two are terms from statistics. The last refers to topological
compactness.

Come on, these are vital things to know if trying to build statistical models.

>But there may not be a better way, it's quite a difficult problem.

It's probably not quite so difficult if one actually knows statistics and
analysis. Or so I'm guessing.

~~~
Houshalter
Yes I'm aware, I just don't see how they are relevant to AIXI, an AI with
infinite computing power that can model anything with a universal Turing
machine.

Statistics jargon will sadly not make AIXI work. It's problem is very deep.

~~~
eli_gottlieb
>Statistics jargon will sadly not make AIXI work. It's problem is very deep.

It's not _that_ deep. It's, at worst, dealing with the Curse of Dimensionality
because it has no notion of randomness: when the inputs it receives are noisy,
the prior probability of Turing machines with the noise bits pre-encoded will
drop exponentially with the length of the noise. Hence why compactness and
other assumptions about the real-line tend to help in real-world statistics:
they make it easy to notice noise and operate with imperfect precision.

Philosophy jargon about "self-awareness" isn't necessarily going to yield more
insights than it always has (ie: Chinese Room, ie: very little).

~~~
Houshalter
Any realistic approximation of AIXI would use probabilistic Turing machines.
That either output probabilities instead of exact predictions, or perhaps use
probabilities internally as well.

However for true AIXI with infinite computing power, that doesn't really
matter. Randomness can be represented as a stored random seed data, and isn't
treated any different than any other unknown variables.

I never used philosophy jargon or even the word "self-awareness". I stated my
issues with AIXI in plain english and explained them. It has literally nothing
to do with the Chinese Room.

~~~
eli_gottlieb
>Any realistic approximation of AIXI would use probabilistic Turing machines.

That would be a fresh model of AI, rather than an AIXI approximation. You
should probably look into that idea.

>However for true AIXI with infinite computing power, that doesn't really
matter. Randomness can be represented as a stored random seed data, and isn't
treated any different than any other unknown variables.

Tsk tsk. It's a curse-of-dimensionality issue. If we have N bits of optimally-
compressed random seed plus M bits of structure (and yeah, AIT has ways to
separate a string X into its structure and random bits iff you have the normal
Halting Oracle), then the prior probability of that particular machine is
2^{-(N+M)}. The noisier the input dataset, the larger N grows. In normal
learning, we want M to be constant (which we can usually assume it is: the
universe mostly doesn't acquire _new_ causal structure while we're looking at
it), which then allows the posterior probability of good hypotheses to rise
logarithmically with sample size. If each sample contains noise, then we
actually have to split things up:

2^-M for the causal structure, where M is constant and the posterior thus
gains information logarithmically. 2^-N for the random seed, where the ground-
truth random seed actually grows linearly in length with each sample we
observe (because of the bits of entropy Nature used-up to _make_ that sample);
the prior probability of each random-seed _drops_ logarithmically as the
number of samples grows we anticipate seeing grows.

So while this is all very informal, I'd have to say that a noisy Solomonoff
induction actually suffers from a Curse of Dimensionality because it assumes
everything is discrete, while more typical machine-learning models based on
continuous distributions can learn well in the face of noise.

------
andreyk
Do people generally read these things before upvoting? Legitimately curious.

It lists some possible goals to achieve more general human-like intelligence
beyond the fancy function approximation we get with deep supervised learning,
as stated in the abstract for both architectural and functional perspectives.
In general I find the language fairly wishi-washy and the writing often
awkward, but it is a nice summary of relevant thoughts and concepts. Beyond
the abstract, here is a bit of summary and my thoughts.

For architectural aspects, it lists:

1) Unsupervised - Agrees with LeCun, Bengio, etc. But not sure it's fair to
conclude this yet, maybe it should be reinforcement? our brains are prewired
to do some things

2) Compositional - basically hierarchical, aka deep. Seems reasonable.

3) Sparse and Distributed - again plausible and empirically seen in deep
learning. One reason ReLu neurons are nice is that they lead to sparser
distributed representations.

4) Objectiveless - a metaphysical statement having to do with the Chinese room
argument? This seems to mean not optimizing an objective function with
gradient descent, and instead "Clearly, the learning algorithm should have a
goal, which might be defined very broadly such as the theory of curiosity,
creativity and beauty described by J. Schmidhuber". Seems vague and not clear.

5) Scalable - Again not the best choice of words, it seems to argue for
parallelism as well as a "hierarchical structure allowing for separate
parallel local and global updates of synapses, scalability and unsupervised
learning at the lower levels with more goal-oriented finne-tuning in higher
regions. " I am disappointed no discussion of memristors or neuromorphic
computing was here.

For function aspects, it lists:

1) Compression - sure, pattern matching is in a sense compression so this
seems fairly obvious.

2) Prediction - "Whereas the smoothness prior may be considered as a type of
spatial coherence, the assumption that the world is mostly predictable
corresponds to temporal or more generally spatiotemporal coherence. This is
probably the most important ingredient of a general-purpose learning
procedure." Again, reasonable enough.

3) Understanding - basically equivalent to predicting?

4) Sensorimotor - not clear? Similar to human eye movement?

5) Spatiotemporal Invariance - "one needs to inject additional contex" having
constant concepts of things?

6) Context update/pattern completion - "The last functional component
postulated by this paper is a continuous (in the- ory) loop between bottom-up
predictions and top-down context." Constant cycling between prediction and
word state update, pretty clear.

~~~
mindcrime
_Do people generally read these things before upvoting? Legitimately curious._

I don't. And that's because HN doesn't have separate "upvote" and "save"
features... upvoting _is_ saving (or, saving _is_ upvoting, however you want
to look at it). So I save (upvote) anything that meets the bar of "the title
is interesting enough for me to think I might want to read this eventually".

~~~
andreyk
A save feature would be nice perhaps. Although maybe the philosophy of HN is
precisely "upvote if something seems interesting enough to read" rather than
(or in addition to) "upvote if you have read this and think it's good".

------
KasianFranks
Favorite quote: "An intelligent algorithm (strong AI [66], among other names)
should be able to reveal hidden knowledge which might not even be discoverable
to humans."

~~~
jrcii
I am confident that knowledge exists which is not only undiscoverable by
humans, but also beyond our intellectual capacity and impossible to grasp.

------
tvural
I think the main idea can be summarized pretty simply. The most important next
step towards general intelligence is creating a learning algorithm that can
solve a sufficiently general class of problems without much tweaking by
humans, and it couldn't hurt to list out the properties such an algorithm
would have to have.

------
kasev
Most of this ideas have been pioneered and implemented by Jeff Hawkins and his
team at Numenta. See his book "On Inteligence" or the open source project at
numenta.org.

------
bra-ket
how very unsurprising. my advice to the author and the rest of the field would
be to read a bit more on learning and memory in humans. AI starts with I.

------
grondilu
I'm too lazy to read this but not enough not to throw my two cents.

Non-human mammals are amazing considering how many of them are incredibly
capable very early. I'm thinking mostly of large prey for whom the ability to
walk and run is crucial. Basically they need to be able to do many things very
quickly. It's incredible to see how fast so many new-born grow in so many
species.

Also, I've searched for the word "play" in this article and found no
occurrence. To me how young mammals play and more importantly what drives them
to do so is the core mystery behind the development of the mammalian brain. I
suspect that once this is cracked, a big part of the work will be done.

