
Artificial Intelligence Still Isn’t All That Smart - benryon
https://www.bloomberg.com/view/articles/2018-08-16/smart-machines-won-t-be-ready-to-do-complex-jobs-anytime-soon
======
YeGoblynQueenne
>> In the business world, machine learning often goes by the annoying moniker
of “artificial intelligence.” That science-fiction buzzword evokes visions of
godlike sentient computers, when in fact, the product is much closer to a
statistical regression. Machine learning is about using algorithms to predict
things — whether a web-security image contains a cat, what a Google user wants
to search for, or whether a self-driving car should brake to avoid a crash. No
one yet knows how to give a single computer system the mental flexibility to
reason and learn like a human being.

While the last senetence is true, the rest of the opening paragraph of this
article commits the typical sin of promulgating a reductionist definition of
"machine learning", probably because of a lack of historical perspective into
the origins and the use of the term.

Historically, then, the broadest definition of "machine learning" is the one
given by Tom Mitchell, according to whom a machine learning system is one that
can improve its behaviour over time. This covers machine learning algorithms
that are really nothing like statistical regression- decision tree learners,
or the wide array of algorithms and systems that learn logic-based
representations that predominated in the early years of the discipline (and
are still going, if I may, with Inductive Logic Programming).

In short, neither prediction, nor statistical regression are necessary or
sufficient characteristics of machine learning. Indeed, there is nothing
stopping a system that would learn to reason from examples from being
categorised as a "machine learning system".

Nothing- except the incomplete understanding of experts from other
disciplines, perhaps, like the author of this piece.

~~~
yters
I've studied machine learning a fair amount, and the algorithms are fancier,
but it's still all model fitting by minimizing error. Regression is also model
fittining (polynomials) by minimizing mean squared error. Not a huge
difference...

~~~
YeGoblynQueenne
>> I've studied machine learning a fair amount, and the algorithms are
fancier, but it's still all model fitting by minimizing error.

That's only true if you define machine learning as "model fitting", however
there are many algorithms and techniques that have nothing to do with fitting
a curve to data points and are very commonly considered to be machine
learning.

E.g. decision tree learners, nearest-neighbour algorithms, Bayesian inference
algorithms, Expectation-maximisation (and friends), clustering algorithms-
none of these really um fit under "curve fitting", although of course you can
broaden the scope of what is meant by "model fitting" to include potentially
anything that looks kinda like it from a few steps away.

Then of course there is the very broad family of logic-based machine learning
systems, pioneered by Ryszard Michalski and others in the '70s and Inductive
Logic Programming, that learn logic theories from relational data and do not
perform optimisation or model fitting (they tend to be greedy).

Basically, it's mostly the gradient-descent based algorithms that do curve
fitting (linear regression, perceptrons, SVMs and, er, that; oh, and gradient
boosting). These approaches are currently dominant but that's all.

There is much more to machine learning than curve fitting.

~~~
pas
All of those require a training step, and that is the same error minimization
as curve fitting.

Basically they are curve fitting in a high enough dimension.

Even theory builders can be thought of as curve fitters. The error metric is
greedy, the curve is not differentable, but that's it.

That said, of course there is much more to it, than training. (And as we saw
with OpenAI's dota bots and DeepMind's AlphaGo Zero, with good feature
engineering, adversarial learning setup, beating human performance is only a
matter of processing power.)

~~~
YeGoblynQueenne
>> All of those require a training step, and that is the same error
minimization as curve fitting.

>> Basically they are curve fitting in a high enough dimension.

>> Even theory builders can be thought of as curve fitters. The error metric
is greedy, the curve is not differentable, but that's it.

I don't understand most of what you are saying and the bits I understand, I
don't see how they're true. Could you please explain in a little more detail?
For example, how is, say, recursive partitioning, as in decision tree
learners, or storing data points for later comparison, as in KNN, "curve
fitting in a high enough dimension"? What is the function being fit and where
are its parameters?

Full disclosure: I study ILP algorithms for my PhD and I have implemented a
couple myself. There is nothing like optimisation going on in there, just
(inverse) resolution and some ordering of the search space. Most have nothing
like an "error metric", greedy or not. And I have never heard anyone describe
logic-based learners, including ILP algorithms, as curve fitting before.

~~~
pas
If something requires training, it constructs a model, the model can be
represented as a probability distribution. I think of that as the curve.

ILP (let's say simplex algorithm or other hill climbers) don't require
training, but they nevertheless try to optimize the utility (or goal) function
(which implicitly defines a metric on the space), which is a hypersurface,
which is a curve in an even higher dimension.

Since the cardinality of the set of solutions is not one, and usually
infinite, picking one solution over the other is optimization. (And - if I'm
not mistaken - unless we know that the search space is convex, we don't know
if there are better solutions or not.)

I don't think anything that doesn't build a model is machine learning.
(Decision trees build a model, Bayes classifiers build a model, etc. And I'd
say all are curve fitting.)

I'm not doing a PhD at all, so I'm not claiming that I'm an authority on each
correct terminus technicus for AI/ML/optimization math/CompSci, but to me they
fall under the model/curve fitting umbrella.

~~~
YeGoblynQueenne
The Simplex algorithm is a _linear programming_ algorithm. ILP is _Inductive
Logic Progamming_ , not linear programming. ILP algorithms do not perform
linear programming, do not have a utility function and do not use
optimisation. What they do is search a space of hypotheses for a hypothesis
that is consistent with some background theory and that explains some set of
examples. Wikipedia has an introduction:

[https://en.wikipedia.org/wiki/Inductive_logic_programming](https://en.wikipedia.org/wiki/Inductive_logic_programming)

You will not be able to understand the operation of ILP algorithms if you try
to look at them from the point of view of function optimisation, or, indeed
"curve fitting". You need a different set of tools, that of logic programming
and automated theorem proving- because the process is essentially that of
logical deduction. Please let me know if you delve into the subject and have
any questions (see my profile for my email address). I'm always happy to
support attempts to understand my research subject.

The same goes for other machine learning algorithms: if you use the wrong
abstractions to understand them, you'll get the wrong idea about them.

For instance- KNN doesn't "require training" as per your condition for machine
learning. It just memorises all its examples. Is KNN not machine learning? It
is- because it gets better the more examples it has. If you stick to thinking
of everything as "curve fitting", either you'll misunderstand what KNN does,
or you'll start thinking that KNN is not machine learning.

Decision tree nodes are all unique - their probability is always 1/n, where n
is the number of all nodes in the tree. In other words, their distribution is
uniform. If you try to understand decision trees as "a model represented as a
probability distribution" you'll find out that every decision tree learner
builds the same model. Obviously that's not true, because each decision tree
recognises different classes.

And so on.

When we talk of "curve fitting" what we mean is that we have the function of a
line (not necessarily straight) like αx + βy + γz + ... where the variables
are the attributes of a set of examples and the constants are unknown and we
wish to learn them. These are the "parameters" of the function that are
optimised by curve fitting algorithms. We say that a curve is being "fit"
because we assume there is some function that we don't know that describes a
line whose points are the entire class of entities from where our examples are
drawn. And since we don't know what this function is, we instead try to
approximate it with a function that "steps" on the few points that we have- or
in other words, "fits" those points.

In decision trees, KNN, Bayesian inference, etc, etc, there is no line
function and therefore no curve that we are trying to fit. You won't find a
line in higher dimensions either. Whereas there is probably some curve that
can fit the data points given as training examples to the algorithm, the
algorithm, lacking a line function, cannot represent that curve. And while you
can sometimes _visualise_ the end result as a curve, this is only a
convenience and, more importantly, it is not always possible. For example-
decision trees again (because they're easy): look around for visualisations of
decision trees. You will find tree diagrams aplenty, but not a single line
curve. Because you can't really fit a line on a bunch of decision tree nodes.
It's a graph. Where's the line? Where are the parameters?

You've obviously made an effort to organise what you know about machine
learning, the better to understand it. But you're missing lots of details and
the result is actually hindering your undersanding. In a sense, by looking at
everything from the point of view of "curve fitting" you're missing the forest
for the um, trees.

My advice is: if you really want to understand machine learning (or at least
to sound a bit knowledgeable about it) keep an open mind and don't rush to
conclusions. There is much more to machine learning than what you can get from
reading a few blogs, a couple of wikipedia pages and a few popular articles in
the tech press. Even experts don't yet have a high-level theory of what
machine learning is and how to treat different machine learning algorithms in
a high-level, unifrom manner, as some well defined class of objects with
common characteristics. That should give you a hint about how close your all-
encompasing, "it's all curve fitting" definition is to explaining anything.

~~~
pas
Interestingly I thought ILP is integer linear programming. (The first hit for
ILP algorithm.)

And then I read DeepMind's delta-ILP post you linked and was even more
confused as that talked about gradient descent, and then looked at your
Metagol example, and ...

So, decision trees are a great example, as they were the first thing I've
encountered ~15 years ago about machine learning, and I think it's a first
class ML citizen. But they are in their pure form absolutely vulnerable to
noise.

And as soon as noise enters the picture it becomes optimization. And I think
that's the model fitting that should be the primary concept. (I used curve-
slash-model, but I should have used just model.) [And even if there's no
"fitting", like with kNN, you still have data, and either you use a constant
classification set of points, or you'll see bias, but then the question
becomes what set to use.]

And you're right, that treating everything as X is folly, after all, it gives
you no information, decreases no entropy. But I'm looking at it from the
perspective of what's intelligence. And it seems to be always on this spectrum
of how to optimize for something. (Making hard things easy, by learning
rules/algorithms.) After all that's why are we really interested in ML, no?

> But you're missing lots of details and the result is actually hindering your
> undersanding. In a sense, by looking at everything from the point of view of
> "curve fitting" you're missing the forest for the um, trees.

If you were so kind to elaborate on this, I'm very interested. And in general,
anything that comes to your mind about my ramblings! Thanks for your detailed
comments!

------
Veedrac
AI is a lot dumber than people expect... which is the scary thing. If you look
at Leela Zero's training, you'll see a lot of huge flaws pointing to massive
failures of reason and areas where it is frankly incapable of making basic
generalizations. Yet variants of this same technology beat the best Go
players, synthesize realistic voices, give sentence labels to images, drive
cars, translate language, etc. Makes you wonder that, if such an unintelligent
bit of math can fake so much, whether intelligence is really that mysterious
or whether it just seems that way because we don't know how to write it.

------
Digit-Al
I don't think we are that close to creating artificial intelligence for a very
simple reason.

To solve a problem you must first be able to define the problem to be solved.
So, define intelligence. Take your time, this comment will wait for you. Done?
Good. Now define it in a way that everyone else agrees with and that doesn't
fall to pieces under scrutiny.

Philosophers have been trying this for thousands of years and are still no
closer to an answer. As far as I can tell we don't have any test for
intelligence that is good enough to establish, for absolute certainty, that
another human being is 'intelligent' let alone anything else.

Define the problem and the requirements to consider the problem solved and
only then do we have any hope of coming up with a solution.

~~~
goolulusaurs
Well, this seems to be the definition Deepmind uses:
[https://arxiv.org/abs/0712.3329](https://arxiv.org/abs/0712.3329) . And there
has been work to establish a quantitative measure of the extent to which an
agents behavior is intelligent or mechanical:
[https://arxiv.org/abs/1805.12387](https://arxiv.org/abs/1805.12387) .

~~~
mi3law
Thanks for sharing such interesting papers!

May I ask what makes you think that the first paper has Deepmind's definition
of intelligence? Was it said so by Deepmind at some point?

~~~
goolulusaurs
Shane Legg, one of the authors of the paper, is also the co-founder of
Deepmind and it is cited in several of their papers. There are some old videos
of him discussing it here:
[https://www.youtube.com/playlist?list=PL985E81376A4D061E](https://www.youtube.com/playlist?list=PL985E81376A4D061E)
.

------
roenxi
Take AlphaGo, because I know Go quite well. A computer is making better
decisions than a human, using a pretty reasonable approximation of the same
mechanism as a human but scaled up. It needs relatively more resources
(gameplay time, expert attention, etc) to a achieve a more focused result than
a human expert.

At the moment, it is very easy to suspect that it is only the economics of
training a neural net (hardware, specialist attention, data gathering an
processing) that is holding us back from AGI.

Human brains can evolve by chance. Nothing fundamental is stopping us from
creating a synthetic one. Natural Intelligence isn't that smart either,
realistically.

~~~
xamuel
We're getting really good at climbing trees. We've got guys who can climb even
the tallest trees, no problem. Any day now, we're gonna finally climb to the
moon.

~~~
state_less
The tree climbers may well build a rocket and land on the moon anyhow.

I think it’s a good idea not to add unwarranted mystery to the subject. We
have good reason to believe our computers are underpowered for the task. This
is a challenge, and while I think it’s going very fast, our collective sense
of proportion of the timeline seems optimistic. In my view, if it happens this
century, humans will have achieved an impressive feat.

If we continue with geometric increases in flops/watt, maybe the rosy
predictions will be closer than expected.

I suppose what I’m implying is that nervous systems are doing computation, and
not something mysterious that we don’t understand. The tree climbers knew they
needed to climb higher and realized they’ll need another tool not a different
understanding of spatial geometry.

~~~
bachbach
> We have good reason to believe our computers are underpowered for the task.

What are those reasons?

It appeared to me as if we had amassed a terrific reservoir of computational
ability - but we wasted nearly all because it was expedient to waste resources
when they were ample.

If I gave you a computer with 1 billion times the processing ability of
current supercomputers, can you convince me that you'd be capable of
replicating the functionality of a biological brain?

~~~
state_less
Do we have a terrific reservoir of computational ability though? The number
are important because they seem to be about a million times too inefficient.

A human brain might take 36.8e15 computations per second [1] and does all this
using about ~20 watts! That's pretty impressive at around 1.84e+15 ops / watt
vs our current 6e+9 ops / watt [2] from silicon. We have to make about a
million times more efficient processors to match the brains efficiency. My 4lb
laptop takes what feels like forever to do a limited deep learning task given
how inefficient it is.

If you gave just me the sort of computer needed, that alone would very likely
not be sufficient. I just don't think we should draw too strong of a negative
conclusions before we all have the necessary equipment to match the work of
the brain.

But if myself and fellow computer users did have access to efficient powerful
computers, we'd probably want to write algorithms that allowed us to make many
attempts at the solution per second. I'd imagine writing a simulator to
project many agents into where they are competing for resources and many of
these simulations taking place concurrently. Something of a multiverse of
simulation and optimization.

[1]
[https://en.wikipedia.org/wiki/Computer_performance_by_orders...](https://en.wikipedia.org/wiki/Computer_performance_by_orders_of_magnitude)
[2]
[https://en.wikipedia.org/wiki/Performance_per_watt](https://en.wikipedia.org/wiki/Performance_per_watt)

~~~
bachbach
> Do we have a terrific reservoir of computational ability though?

We do (and I wrote a long post with a list of bullet points proving so before
I just deleted it), but I don't think we should waste time debating whether
this is true or not true. We're on the same side after all, your objective is
my objective.

> But if myself and fellow computer users did have access to efficient
> powerful computers, we'd probably want to write algorithms that allowed us
> to make many attempts at the solution per second. I'd imagine writing a
> simulator to project many agents into where they are competing for resources
> and many of these simulations taking place concurrently.

I recommend you look at David Krateneur's ideas, he has a video called "The
Stupid Ways That We Have Thought About Intelligence".
[https://www.youtube.com/watch?v=pi7h6nmkvAM](https://www.youtube.com/watch?v=pi7h6nmkvAM)

> Something of a multiverse of simulation and optimization.

An artificial imagination really.

I really recommend watching that video.

------
skywhopper
Glad we are finally seeing some gradual pushback on the ridiculous hype of the
past few years. What is currently called “AI” can be useful in some very
narrow scenarios but even then it’s dangerous when people put too much trust
in it (same as any system—computerized, mechanical, or bureaucratic). Blindly
trusting the algorithms that brought you the Facebook and Twitter timelines to
make decisions about real things in human lives would be disastrous, and we
already do far too much of it.

------
sgt101
For an informed opinion see
[https://news.ycombinator.com/item?id=17766473](https://news.ycombinator.com/item?id=17766473)

------
magwa101
It doesn't have to be intelligent to be usefully applied broadly and upend our
economies.

------
wazoox
Even for very low-paid labour labelled as "low skilled", there are tons of
intricate judgements call. I've read sociology articles on this matter, for
instance a woman working in a factory at arranging cakes in boxes. She
explained all of the small evaluations, complex decisions she had to make at
the quick pace of the factory chain, like how to dispose different cakes to
maximize box fill, of get sure that all cakes are properly seated and won't
move in transport, etc. She deemed herself "unskilled", however at the same
time concluded that it took her many months to master the skill of arranging
cakes in boxes.

Ditto for mailmen: now they're using a program call "GeoPost" to "optimize
routes"; actually the program is unable to account for myriads of things that
an actual, human postman must take into account, for instance the postman is
supposed to deliver mail on the even side of the street first, than on the odd
side on his return trip an hour later, according to the software. That works
well for wide avenues, but in small residential areas people will get out of
their house and call the postman if he delivers mail for the opposite-side
neighbour but not them. Silly human, with their feelings :)

In fact the craze for AI reminds me so much (once again) of good ol' grandpa
Marx. As competition rages and given the tendency of profit to fall, companies
try to get rid of living labour and replace it with "dead labour" (machine
labour). But once again it meets the wall of value, which only comes from
living labour. This all adds up perfectly.

------
DanielGee
Sigh. Here they go alternating from AI is a "super genius threat" to "it isn't
that smart".

Maybe an objective, honest and non-clickbait "Current State of AI" would mean
they wouldn't have to swing from one extreme to the other every few months.

However you define "smart", currently AI is mostly domain-specific "smart".
Though there have been advances in general AI (non-domain specific AI), but we
are nowhere close an autonomous generalized AI.

~~~
freeone3000
The state of AI is such that it can be used to detect faces from a crowd to
falsely arrest you, deny you a home loan based on nothing, cut you off from
social media based on a misunderstanding, and still can't transfer knowledge
between domains.

~~~
EdwardDiego
...and still can't avoid running over pedestrians because they were pushing
bikes which confused it.

