
The Limits of Machine Learning - dnetesn
http://nautil.us/blog/the-fundamental-limits-of-machine-learning
======
mturmon
Note to casual commenters: the precise real-world implications of the NFL
theorems proved by Wolpert and collaborators have been difficult to
appreciate, even to people well-versed in the computational learning world.

Starting point:
[http://www.santafe.edu/media/workingpapers/12-10-017.pdf](http://www.santafe.edu/media/workingpapers/12-10-017.pdf)

where we read: "However, arguably, much of that research has missed the most
important implications of the theorems."

~~~
Animats
Is that a corollary of the information theory theorem that for any lossless
compressed representation, there must some data pattern for which the
compressed representation is bigger?

~~~
xapata
Anything with more information/entropy requires more space to store.

------
Animats
Right, not too helpful.

Also, the machine shown in the picture isn't even a computer. It was a
special-purpose machine used to read microfilms of mark-sense Census forms and
write the results on tape. (I once had a summer job at Census HQ in Suitland
MD, and saw the FOSDIC machine.)

There are fundamental limits to hill-climbing. So far, nobody has something
that just keeps running and continues to get better. Hill-climbing maxes out
after a while and stalls.

We still need another big idea after deep learning and machine learning in its
present form. No idea where that will come from. Anybody see anything on the
horizon?

~~~
AndrewOMartin
Yes.

The critiques of AI from Hubert Dreyfus have stood the test of time, those who
want to understand or challenge them directly can read What Computers Can't Do
(1972, 1979), or even better the updated reprint What Computers Still Can't Do
(1992). He's a Heideggarian Philosopher but all you need to know is that
modern AI is ignorant of vast swathes of 20th Century investigation into the
human mind, and state of being.

Hence, I think the big idea you're talking about is in AI that takes Hubert
Dreyfus's critiques seriously.

Luckily for anyone reading about this for the first time, that process has
already started. Dreyfus wrote a 2007 paper on the successes and failures of
the first few steps of what he called Heideggarian AI with the snappy name of
"Why Heideggerian AI failed and how fixing it would require making it more
Heideggerian".

The "fixing" refers to work of a Neuroanatomist, with a suitable philosophical
background, called Walter Freeman III, and is broadly described in the paper,
but properly investigated in Freeman's (also eminently readable) book How
Brains Make Up Their Minds (2000).

A note of caution, you'll be introduced to concepts that blur the line between
body and environment, subject and object, intention and influence, and
eventually things like relinquising your belief in causality and an objective
"out there" universe (or at least any value in such a belief), all whilst
staying perfectly scientific and evidence based.

Finally, bear in mind that if we do create an intelligence worthy of the name,
we have reason to believe it will take about 18 years of "raising" by two
adult humans, after which it will want to do its own thing, and not your dumb
image classification tasks.

~~~
ilaksh
I just read a review of "How Brains Make Up Their Minds. Thanks for mentioning
that. It looks like a good book with mostly correct information (based on the
review).

From the summary, I can tell you that researchers in fields such as AGI, deep
learning, robotics, etc. have absolutely been working from many if not all of
Freeman's assumptions for years and almost all (if not all) of that has been
integrated into various research programs and systems. Freeman's pragmatist
view is now the most popular.

Of course, all of Freeman's ideas aren't _usually_ together in every one of
these systems or the conceptualization of them, but there are at least a few
that have most of them.

Certainly AGI researchers are aware of the concept of higher-level
abstractions being formed at root on the basis of sensory input and action
output. And most of the recent serious AI research such as pretty much any NN
for example demonstrates the idea of meaning from global patterns.

These are some interesting AGI videos in case people haven't seen them.
[https://www.youtube.com/channel/UCCwJ8AV1zMM4j9FTicGimqA/pla...](https://www.youtube.com/channel/UCCwJ8AV1zMM4j9FTicGimqA/playlists)

~~~
AndrewOMartin
I strongly object to the ideas in that book bring described with the terms
such as "higher-level abstractions", "sensory input and action output", and
"meaning from global patterns".

As I read it, they were precisely the notions Freeman was arguing against.

There is no "input" when the history of a being, it's current sensitivities
and orientation in the environment are all inseparable and mutually defining.

There are no higher level abstractions, and none needed, when you're so poised
in a situation with your whole brain and body that any appropriate stimulation
is already meaningful.

Freeman, for me turned thought upside down. I.e. we don't go, "I've detected
that as being food, so I may eat it", rather we go "I've noticed I interact
with all these things in a similar way (e.g. eating) that's how I recognise
them as related".

~~~
ilaksh
Your conclusion aligns with my statements. So I think you are interpreting
what I wrote incorrectly.

------
jeyoor
The author seems to say that the no free lunch theorem (NFL) indicates that
creating a "Universal Learner" (or artificial general intelligence) is an
impossible task.

I disagree based on the following:

1\. I think NFL's definition of a universal learner is broader than the
definition used by the average AGI researcher.

In practice, we are interested algorithms producing behavior at near-human
levels of intelligence, not absolutely universal learning algorithms.

2\. NFL does not seem to directly address the overall effectiveness of
applying a combination of algorithms based on real-world experience to
everyday problems.

------
xapata
I was expecting a discussion of the limits of statistical inference, but
instead read a hand-wavy bit that only briefly mentioned _a priori_ knowledge
and without using that term.

------
xg15
Somewhat off-topic, this article mentions a property of samples that has
puzzled me for a long time. Maybe someone can shed light on that?

 _[...] Field A would receive Fertilizer 1, Field B would receive Fertilizer
2, and so on.

But as Fisher pointed out, this type of experimentation was doomed to produce
meaningless results. If the crops in Field A grew better than those in Field
B, was that because Fertilizer 1 was better than Fertilizer 2? Or did Field A
just happen to have richer soil?

[...] The way around the problem, Fisher concluded, was to apply different
fertilizers to different small plots >at random<. [...] On average, the soil
under Fertilizer 1 ought to look exactly like the soil under Fertilizer 2._

The article talks at length about how _randomisation_ was the revolutionary
new thing that Fisher introduced. Yet, in the given expample, it's the
_repetition_ of the experiment with different combinations of fields and
fertilizers that does the trick, isn't it? It seems to me I should get the
same results if I repeated the trial 50 times with Fertilizer 1 on field A and
50 times with 1 on field B.

So why is adding unpredictability (randomness) so important?

~~~
xg15
... I didn't mean it to be _that_ off topic. This got posted on the wrong
thread, I'm sorry. If any mod could remove this, I'd be grateful.

------
graycat
> The Fundamental Limits of Machine Learning

Depends on the largely mathematical assumptions can bring to the data. What
can be done with the variety of assumptions is illustrated with details beyond
belief in the QA section of most research libraries.

For more, the OP has

> Almost all of the learning we expect our computers to do—and much of the
> learning we ourselves do —is about reducing information to underlying
> patterns, which can then be used to infer the unknown.

Ah, NOW I see! The OP has stated a relatively narrow problem.

E.g., consider arrivals at HN: Over each 30 minutes or so, they about have to
be a sample path of a Poisson process. Why? The renewal theorem, as in W.
Feller's second volume. Can say that without looking at "patterns" in the
data, indeed, without looking at any data at all.

Then from knowing that the arrivals are a Poisson process, there is a nice
stream of results can get right away, without the data and even more with the
data. E.g., the sum of two independent Poisson processes is another Poisson
process. Then more generally can have a continuous time, discrete state space
Markov process subordinated to that or a related Poisson process. From that
can have some, say, network queuing calculations good for capacity planning,
optimization of capacity planning, stochastic optimal control, anomaly
detection, etc. Have a good shot at using the strong law of large numbers and
the martingale convergence theorem.

Can say nearly all of this, and more, without looking for "patterns" in the
data or looking at the data at all. Again, looking at the data can say still
more.

There's a lot in the QA section of the library!

------
szemet
I don't understand why it says "2 * (5 + _1_ ) = 12" fits equally well as "x +
5 + 2 = 12" to the original pattern:

"5 + _2_ = 12"

As it changes one of the terms of the original addition (2 to 1). The second
solution fits better by my judgment. Changing the the puzzle, seems like a bit
of cheating...

~~~
ricksplat
I think what he is saying is, if you remove preconceptions, e.g. that '+'
means "addition" then you can infer that it means "multiply" and that the "\+
1" is implied, based on the answer.

As a human, using our prior experience we will most commonly say that + means
addition and will make that an axiom of our solution, which involves (for me
at least) adding the result of the previous line to the sum of each line.

    
    
        8 + 11 = 19 [ + 21 ] = 40
    

As a machine (or as a mathematician thinking with more flexible abstractions)
the numbers are intractable, but the symbols are and perhaps it is more
"logical" to look at each line separately and change the meaning of the
symbols.

    
    
        8 * (11 [ + 1 ]) = 96
    

The point is that humans have a different set of logical precepts to machines.

------
ppod
The universe is no narrow thing and the order within it is not constrained by
any latitude in its conception to repeat what exists in one part in any other
part. Even in this world more things exist without our knowledge than with it
and the order in creation which you see is that which you have put there, like
a string in a maze, so that you shall not lose your way. For existence has its
own order and that no man's mind can compass, that mind itself being but a
fact among others.” Cormac McCarthy

------
jkuria
IN response to some comments below, Here's a good article on the differences
between ML, Deep Learning and AI:

[https://blogs.nvidia.com/blog/2016/07/29/whats-difference-
ar...](https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-
intelligence-machine-learning-deep-learning-ai/)

------
wrong_variable
How is this article on nautil.us ? Did the author just read the wikipedia.com
page on Machine Learning ?

There is an entire field in ML called unsupervised learning. Labelling data
that do not have labels attached to them.

Its not a _Fundamental_ (uggh) limit, I am not sure if the author even knows
what _Fundamental_ means, its not like the halting problem, or heat death of
the universe.

ML is also a very young field with poor mathematical understanding, optimism
is the best way forward. Christopher Columbus didn't discover an entire NEW
WORLD because he had a pesimistic attitude towards his ideas.

Its going to take time, but historially, we are rapidly learning about how the
human brain and intelligence works - similar to how we rapidly learnt a lot of
physics in the 20th century.

~~~
tree_of_item
Christopher Columbus didn't discover "an entire new world", he was just one of
the first Europeans to land on a continent that had already been there, with
plenty of people, for thousands of years.

~~~
visarga
Well, it was a discovery, but just for Europe. American natives discovered
Europe as well.

~~~
Chris2048
Unless American natives knew of Europe, it was a trivial discovery. When one
continent discovers another, the relevance is the new link between them, and
interactions between the two economies.

If we discover aliens on a new planet, the fact that the aliens knew about
themselves before that doesn't change much - first contact may be when they
discover us too.

But let's not pretend this isn't slightly about political correctness, and
sensitivity over colonialism, which ruins the objectivity over the subject.

------
ilaksh
AI/AGI is so interesting, everyone (including me) wants to have their own
unique take on it. And comment. Even though we are not really that familiar
with the field.

I think that the group of researchers who have been calling their work AGI
should get a lot of credit, and their research should be a starting point for
discussions. Rather than people just spouting off as I am about to do.

Here are some AGI videos
[https://www.youtube.com/channel/UCCwJ8AV1zMM4j9FTicGimqA/pla...](https://www.youtube.com/channel/UCCwJ8AV1zMM4j9FTicGimqA/playlists)

I think that some type or combination of (deep or something) neural networks,
_in_ a general intelligence framework with things like attention, (virtual?)
embodiment, etc., may be enough for us to be able to more or less emulate
(general) human capabilities and behavior. That's my guess. The biggest issue
is slow learning or requiring quite a bit of data. If typical artificial
neurons aren't enough, there are actually some promising advances in spiking
neural networks some of which are actually able to perform quite accurately
and learn much more quickly. So that is another possibility. Can't be sure. I
think we should be optimistic that it doesn't require too many more major
breakthroughs (if any).

The reason that I think some (many?) people are sooo skeptical still is that
deep down they may believe that the cause or explanation for _animalis_
(animus?) is somehow uniquely human or magic. What I mean is the thing that
makes animals and people seem (or be) alive and conscious. This is somewhat
related to the concept of
[https://en.wikipedia.org/wiki/Panpsychism](https://en.wikipedia.org/wiki/Panpsychism)
which my understanding is somewhat more popular in Asia.

I think that with existing techniques, maybe some type of deep neural net,
combined with more dexterous and anatomically correct, dynamic and sensory-
integrating robots, we will shortly (if not already) have robots that do seem
to be quite alive. Consider a lizard. How many robots do we have that can
really emulate the dynamic and lithe behavior and interaction of a lizard?
Perhaps none. Our robots are quite slow and generally have limited freedom of
movement. I bet that if we did a good job of emulating most of the entire
complex anatomy of a lizard with some type of robot (including the breathing)
(maybe use some type of EAP muscles like
[https://www.seas.harvard.edu/news/2016/07/artificial-
muscle-...](https://www.seas.harvard.edu/news/2016/07/artificial-muscle-for-
soft-robotics-low-voltage-high-hopes)) and then came of up with a way to train
its behavior generation as based on a deep neural network from detailed videos
of interactions with human handlers, people would say "that thing is alive"
and change their minds about the reality of even human-like AIs in the next
few decades.

------
bilbobeer
The title of this article should really be "The fundamental limitation of
1950's Perceptron style ML".

These days ( 2016 ) there are 1,000's of algorithms, all tuned for a specific
problem, ... image recognition, speech, music transcription from audio, text
'learning' say the bible to generate automatic text. Algo's have names like
CNN, RNN, ... again 1,000's.

All ML is a 'hack', every algorithm has to be tuned and dialed in to get the
coveted 99% 'confirmation'.

It would be easy to fine tune a machine for both answers to the problem
described, likewise a human expert might as well see the two ( or maybe more )
correct answers. Then another algorithm could be trained to choose which
'correct' answer is best.

A universal machine, that requires an infinite number of hacked machines is
just another 'tower of babel', not unlike the WWW ( port 80/HTML ) of today.

The real problem with ML is the holy-grail of 99%, which means that 1 of 100
innocent people go to prison, or 1 in 100 children die from robot-cars.

A society that allows the technical ( rich elite ) to govern a society that
accepts 99% or even 99.9% to live and to hell with 1% or 0.01% this is the
real problem with Judges, Executioners, and cops that make decisions fed by
Google, Facebook, and all our other favorite fronts controlled by the CIA/NSA.

