
AI’s Language Problem - punnerud
https://www.technologyreview.com/s/602094/ais-language-problem/?set=602129
======
erikpukinskis
No one would ever imagine that locking a baby in a featureless room with a
giant stack of books would give them general intelligence. I don't understand
why AI researchers think it will work for AIs. They need bodies that are
biologically connected with the rest of the biosphere, with an intrinsic
biological imperative, if they are ever to understand the world. I'm not
saying they have to be exactly like us, but they will only be able to
understand us to the extent that they have body parts and social experiences
that are analogous to ours.

This isn't an engineering problem, it's a philosophical problem: We are blind
to most things. We can only see what we can relate to personally. We can use
language and other symbol systems to expand our understanding by permuting and
recombining our personal experiences, but everything is grounded in our
interactive developmental trajectory.

The kitten-in-a-cart experiment demonstrates this clearly:
[http://io9.gizmodo.com/the-seriously-creepy-two-kitten-
exper...](http://io9.gizmodo.com/the-seriously-creepy-two-kitten-
experiment-1442107174) Interaction is crucial for perception. Sensation is not
experience.

And here's the rub: Once you give an AI an animal-like personal developmental
trajectory to use for grounding the semantics of their symbol systems, you end
up with something which is not particularly different or better than a human
cyborg.

~~~
Houshalter
I believe we can get AI from just text. Obviously that won't work for babies,
because babies get bored quickly looking at text. AIs can be forced to read
billions of words in mere hours!

Look at word2vec. By using simple dimensionality reduction on the frequency
that words that occur near to each other in news articles, it can learn really
interesting things about the meaning of words. The famous example is the
vector "king" minus the vector for "man" plus "woman" equals "queen". It's
learning the meaning of words, and the relationships between them.

Recurrent NNs, using similar techniques, but with much more complexity, can
learn to predict the next word in a sentence very well. They can learn to
write responses that are almost indistinguishable from humans. And it's
incredible this works at all, given RNNs have only a few thousand neurons at
most, and a few days of training, compared to humans' billions of neurons
trained over a lifetime.

All of the information of our world is contained in text. Humans have produced
billions of books, papers, articles, and internet comments. Billions of times
more information than any human could read in their entire lifetime. Any
information you can imagine is contained in text somewhere. I don't think it's
necessary for AIs to be able to see, or interact with the world in any way.

If you can predict the word a human would say next, with enough accuracy, then
you could also produce answers indistinguishable from theirs. Meaning you
could pass the Turing test, and perform any language task they could do just
as well. So language prediction alone may be sufficient for AGI.

This is the theory behind the Hutter Prize, which proposes that predicting
(compressing) wikipedia's text is a measure of AI progress. The Hutter Prize
isn't perfect (it's only a sample of wikipedia, which is very small compared
to all the text humans have produced), but the idea is solid.

~~~
ACow_Adonis
> All of the information of our world is contained in text

Communicate the concept of "green" to me in text.

The sound of a dog barking, a motor turning over, a sonic boom, or the
experience of a Doppler shift. Beethoven's symphony.

Sour. Sweet. What does "mint" taste like? Shame. Merit. Learn facial and
object recognition via text.

Vertigo.

Tell a boxer how to box by reading?

Hand eye coordination, bodies in 3 dimensional space.

Look, I love text, maybe even more than yourself. But all these things imbibe,
structure and influence or text, but are not contained in them.

To make substantial inroads to something that looks like human esque AI, text
is not enough. The division of these fields are artificial and based on our
current limited tech and the specialisation of our researchers, faculties and
limitations.

When we read, we play back memories, visions, sounds, feelings, etc, and
inherent ideas gained through experience of ourselves as physical bodies in
space.

Strong AI, at least to be vaguely recognised as such, must work with
algorithms and machinery that understand these things, but which then works at
that next level of abstraction to combine them into proper human type
concepts.

Of course, there is the question about why we would want to create a human
like AI, it's my contention that human like AI isn't actually what many of us
would want, but that's another topic...

~~~
Houshalter
Are blind or deaf people not intelligent?

But if you must pretend to be sighted and hearing, there are many descriptions
of green, of dogs barking, of motors, etc, scattered through the many books
written in English (and other languages.)

Are these descriptions perfect? Maybe not. But they are sufficient to mimic or
communicate with humans through text. It's sufficient to beat a Turing test,
to answer questions intelligently, to write books and novels, and political
arguments, etc. If that's not AGI, I don't know what is.

~~~
argonaut
Yes they are. However, is a blind, deaf, person with absolutely no motor
control, no sense of touch, and no proprioception intelligent? _Unclear._ They
certainly have no language faculties.

~~~
Houshalter
But a blind person can't describe green. A deaf person can't describe the
sound of a motorboat. A person without taste can't describe mint flavor. That
is the point I was making.

I don't propose that a human could lose _all_ of their senses and still be
able to communicate. But I do believe computers could do so, if they are
designed to do that. Humans are not designed to work lacking those senses.

~~~
argonaut
So a blind person would never be able to understand the different categories
of color (other than that they are placeholders for distinct categories of
something).

Now we are just speculating. We believe a computer _might_ be able to
understand things for which it doesn't have the sense - but that is
speculation and totally untested, and certainly can no longer be justified by
using human minds as an example.

~~~
Houshalter
A blind person could pretend to be sighted though. There have been blind
authors who wrote about sighted characters, for instance. They need not
experience the thing themselves. Just learn from experience how sighted people
behave and describe things, and mimic that.

~~~
argonaut
Can you provide any examples of blind (from birth) authors giving convincing
visual descriptions from the points of view of sighted characters?

That seems hard to believe.

~~~
bsaul
You can explain red by saying it's a "warm" color for example. Metaphors work,
analogies, senstion from one sense can be explained using sensations from
another. Now you need to have at least one sense, which machines clearly
don't.

------
brianchu
Deep learning has succeeded tremendously with perception in domains that
tolerate lots of noise (audio/visual). Will those successes continue with
_perception in domains that are not noisy_ (language) and _inference
/control_, which the article touches on? I think it really is unclear whether
those challenges will require fundamental developments or just more years of
incremental improvement. If fundamental developments are needed, then the
timeline for progress - which everyone in tech seems to be interested in -
becomes much more indeterminate.

If you think about audio/visual data, deep nets make sense: if you tweak a few
pixel values in an image, or if you shift every pixel value by some amount,
the image will still retain basically the same information. In this context,
linearity (weighting values and summing them up) make sense. It's not clear
whether this makes sense in language. On the other hand, deep methods are
state of the art on most NLP tasks, but their improvement over other methods
isn't the huge gap as in computer vision. And while we _know_ there are tight
similarities between lower-level visual features in deep nets and the initial
layers of the visual cortex, the justification for deep learning in NLP is
simpler and less specific: what I see is the fact that networks have huge
capacity to fit to data and are deep (rely on a hierarchy of features). My
guess is we may need a fundamental breakthrough in a newfangled hierarchical
learning system that is better suited for language to “solve” NLP.

I think there are similar limitations with control and inference. When it
comes to AlphaGo the deep learning component is responsible for estimating the
value of the game state; the planning component is done with older methods.
This is much more speculative, but when it comes to the work on Atari games,
for example, I _suspect_ that most of what is being learned (and solved) is
perception of useful features from the raw game images. I wonder whether the
features for deducing game state score are actually complex.

I think what I'm trying to say is that when we look at the success of deep
learning, we have to separate out what part of that is due to the fact that
deep learning is _the go-to_ blackbox classifier, and what part of this is due
to the systems we use actually being a good model for the problem. If the
model isn't good, does that model merely need to be tweaked from what we
currently use, or does the model have to completely change?

~~~
sapphireblue
It is already succeeding on language tasks, see
[https://research.facebook.com/research/babi/](https://research.facebook.com/research/babi/)

It is funny how every AI post on HN turns into a speculative discussion forum
full of words "I think", "likely", "I suspect", "My guess" etc, when all the
research is available for free and everyone is free to download and read it to
get a real understanding of what's going on in the field.

>what I see is the fact that networks have huge capacity to fit to data and
are deep (rely on a hierarchy of features).

Actually recurrent neural networks like LSTM are turing-complete, i.e. for
every halting algorithm it is trivial to implement an RNN that computes it. It
is non-trivial to learn these parameters from algorithm IO data, but for many
tasks it is possible too.

>I suspect that most of what is being learned (and solved) is perception of
useful features from the raw game images.

It is not this simple, deep enough convnets can represent computations, the
consensus is that middle and upper layers of convnets represent some useful
computation steps. Also note that human brain can only do so much computation
steps to answer questions when in dialogue, due to time and speed limits.

>My guess is we may need a fundamental breakthrough in a newfangled
hierarchical learning system that is better suited for language to “solve”
NLP.

This is being worked on, see the first link for Memory Networks and Stack
RNNs, DeQue RNNs, Tree RNNs. Deep learning is a very generic term, there are
dozens of various feedforward and recurrent architectures that are fully
differentiable. The full potential of such models has not been nearly reached
yet and maybe language understanding will be solved in the coming years
(again, the first link shows that it is in process of being solved).

~~~
brianchu
I specifically and mindfully added those words because everything is really
_an open research question_. Would you rather I dissembled a false sense of
confidence? If anything, you're stating your vague case way over-confidently.
Turing-completeness is broad and nonspecific. Doing "some computation" is an
obvious statement that doesn't add any information. The human brain does not
seem to have time limits when it comes to thinking about what to say, and
further we don't understand enough about neuroscience to make statements like
that. Like I said, these are all active areas of research; the jury is still
out on whether any specific approach will be the breakthrough.

EDIT (reply to below): in general these statements are either vague and
nonspecific, or perfectly correct and non-informative, comments that don't
have much to do with my original point.

~~~
sapphireblue
I agree to your points. Your comment is a quality one, I mostly talked about
other ones.

>Turing-completeness is quite broad and nonspecific, like I said.

It is, but feedforward models (and almost every Bayesian/statistical model)
don't possess it even in theory, while RNNs do.

>Doing "some computation" is an obvious statement that doesn't add any
information.

Let me be more specific: currently researchers think that later stages of CNNs
do something that is more interpretable as computation than as mere pattern
matching. Our world doesn't require 50-level hierarchy, but resnets with 50+
layers do good, looks like because they learn some non-trivial computation.

>the jury is still out on whether any of those RNN approaches will be the
needed breakthrough.

Sure, we'll see. Maybe there won't be need in any breakthrough, just
incremental improvement of models. And even current models when scaled up to
next-gen hardware (see nervana) can surprise us again with their performance.

------
gluczywo
AI researchers overestimate the role of language in the development of human-
like intelligence understood as a common sense heuristic physics that is built
empirically through non-linguistic experiences (e.g. if I turn that glass, the
wine spills and leaves dirty stain on the couch what has further
consequences). The one that is most difficult to implement/reproduce/emulate
by machine.

Language is only a communication protocol most efficient in an interactive
context (dialog) that allows two agents with shared but not identical set of
experiences achieve understanding in some domain and context with the caveat
that understanding is unprovable and not absolute. Understanding that is only
empirically tested and behavior probed (e.g. long after successful
conversation Agent Alice discovers that Agent Bob "did not get it" as she
expected).

By analyzing sole language without experiences, the machine, using something
like word2vec, may discover semantic dependencies (e.g. man + cassock =
pedophile) but not true semantics that has world consequences.

Even with unlimited language corpora the machine does not have the set of
axioms that humans have (experiences and observed stories). These axioms are
needed to build further more abstract knowledge.

------
grondilu
It seems to me that a full mastery of language requires a grasp of semantics,
that is the ability to understand what a sentence means. I doubt it's possible
to do that without having basic common sense along with an overall
representation of the world, and that looks very close to strong AI, imho.

So I'm not surprised computers keep on struggling with language applications.
Once they succeed strong AI will not be much further away.

~~~
dmreedy
I think the 'overall representation of the world' requirement is pretty key
here. Language in AI is often treated as its own class of problem, with the
assumption that there is somehow enough signal in the raw mess of examples
provided to any given learning system (usually just plain text, stripped of
any prosody, emotion, cultural context, imagery; any of the other modalities
of communication available to a demonstrably functioning natural language
understander[1]) to build a model that 'understands' general use language. I
simply don't see how this is possible[2]. I know the classical philosophies
about the complementary nature of language and intelligence are out of fashion
right now[3], but I'm not quite convinced they deserve to be.

I'll raise your bet; I'm willing to believe that once we succeed in building a
general understanding of language, we'll look back and see that we
simultaneously have solved Strong AI. To twist the old saying, I think that
language is what the human brain does.

\---

[1] Yes, we can talk about P-zombies if you want. But I mean more in the
Turing Test sense here.

[2] Yes, I know the progress has been impressive. The progress in the 60s with
GOFAI was impressive at first too. Then it plateaued.

[3] I'm particularly referring to Sapir-Whorfishm and the various
communication heuristics proposed by Grice. But I'd throw Chomskian Universal
Grammar in there too.

~~~
visarga
Grounding language in other sense modalities (multimodal learning) is a thing.
We can even generate captions from images and generate images from captions,
albeit, not perfectly.

Another grounding source is related to ontologies. We are already building
huge maps of facts about the world like "object1 relation object2".

Another source of "common sense" is word embeddings. In fact it is possible to
embed all kinds of things, like, shopping bags, music preferences, networks
topologies - as long as we can observe objects in context.

Then there is unsupervised learning from video and images. For example,
starting from pictures, cut them in a 3x3 grid, shuffle the tiles and then
task the network to recover the original layout. This automatically extract
semantic information from images unsupervised. A variant is to take slides
from video, shuffle them around, then task the network to recover the original
temporal order. Using this process we can cheaply learn about the world and
provide this knowledge as "common sense" for NLP tasks.

I am not worried about grounding language. We will get there soon enough, but
we're just impatient. Life evolved over billions of years, AI is just emerging
now. Imagine how much computing power is in the collected brains of humanity,
and how much computer time we give AI to learn. AI is starved of raw computing
power and experience yet. Human brains would have done much worse with the
same amount of computing.

~~~
dmreedy
image caption is a separate, albeit related problem to what I'm talking about.

Ontologies are much the same; they are interesting for the problems they
solve, but it's not clear how well those problems relate to the more general
problem of language.

word embeddings are also quite interesting, but again, are typically based
entirely off whatever emergent semantics can be gleaned from the structure of
documents. It's not clear to me that this is anymore than superficial
understanding. Not that they aren't very cool and powerful. Distributional
semantics is a powerful tool for measuring certain characteristics of
language. I'm not sure how much more useful it will be in the future.

Uunsupervised learning from video and images is a strictly different problem
that seems to me to be much lower down the hierarchy of AI Hardness. More like
a fundamental task that is solvable in its own universe, without requiring
complete integration of multiple other universes. Whether the information
extracted by these existing technologies is actually usefully semantic in
nature remains to be seen.

I agree that we'll get there, somewhat inevitably; not trying to argue for any
Searlian dualistic separation between what Machines can do and what Biology
can do. I'm personally interested in the 'how'. Emergent Strong AI is the most
boring scenario I can imagine; I want to understand the mechanisms at play. It
may just be that we need to tie together everything you've listed and more,
throw enough data at it, and wait for something approximating intelligence to
grow out of it. We can also take the more top-down route, and treat this as a
problem in developmental psychology. Are there better ways to learn than just
throwing trillions of examples at something until it hits that eureka moment?

~~~
visarga
I think the key ingredient is to be reinforcement learning, and more
importantly, agents being embedded in the external world.

Regarding the "internal world", we already see the development of AI
mechanisms for attention, short term memory (references to concepts recently
used), episodic memory (autobiographic) and semantic memory (ontologies).

------
Severian
I have to agree to an extent with researchers Tenenbaum and Li. It seems to me
that the only way AI is going to learn language is to have some worldly
experience to link words to their ultimate semantics.

I don't think AI will be able to fully grasp the intricacies of human language
until it has "lived" long enough to form the links between ideas and
experiences. Mainly in the physical realm, as obviously a lot of our human
development is shaped by our environments. They will need eyes, ears, and
maybe noses. We should also consider giving them subconscious or instinctive
reactions to certain stimuli. An AI wouldn't immediately know that, for
example, rotten meat is bad to humans because it lacks a nose to send a signal
of danger and disgust.

We should also consider the idea to communicate with AI in regular face-to-
face speech. Talking is not the same as writing, and conveys a lot of
information beyond just the words.

------
swalsh
This is a long metaphor but work with me here...

There are many types of application programmers, but there are 2 types in
particular that are interesting. One of them is the purely technology-driven
developer. He uses all the new tools, he's read Knuth's books a hundred times,
he knows how to build elegant systems. However, he only takes enough interest
in the business as is necessary to know what to build. At the end of the day,
he'll build the most elegant beautiful system that almost never accomplishes
the business goal. He knows how to describe a problem, but he doesn't really
know the problem.

The second likes programming, he finds technology fun, but he is really driven
by trying to understand the full context of the business. Writing software is
a means to see an impact on people. He's driven by seeing a business problem
solved. I've only met 2 people in my career who are ACTUALLY like this,
they're rare... which is maybe a good thing because they write shit code.

A good engineering team tries to get both of these guys, you have the tech guy
making sure your platform is maintainable, and you have the business driven
guy who makes sure it's useful. One guy understands the structure of the tool,
the other understands the structure of the world the tool is in.

A language is a tool, it can be elegant, it can be beautiful, it can be
technically perfect.. and just like poetry, it can have a very little
practical utilitarian purpose. When I look at how we're using ANN's to develop
language today, this is how it feels to me. We're spending so much time trying
to figure out how to get a computer to build the most technically perfect
sentence, we're missing the maybe more interesting problem of trying to get a
computer to understand the world. My son right now isn't old enough to
conjugate a sentence, but he understands what certain things in the world do.
He clearly understands that cars move things, he understands you can use the
hose to get things wet. He's not that old, but he's developing a mental model
of the world. He just doesn't know how to describe it yet.

To me, having a computer look at a crane, and then print the word "crane" is
interesting, but even more interesting is if you could give it 3 pictures (a
crane, a building, and a pile of rubble) and teach it how 1+1=2.

~~~
tsunamifury
Actually thes two parts do exist in tandem, it's just that those
engineers/designers (a role you that can do the same) are usually
independently successful and have repeatedly built high revenue products or
started a company themselves.

The major flaw i see in manager/corp/team analysis of workers is that it
misses out on a portion of the population that is genuinely independently
functional and creates and shares the value they create. They don't work for
companies because they either don't need to or own their own. These are the
ideals worth keeping in mind.

------
Practicality
If we use animals as a reference, I would say that consciousness is more
fundamental than language, so most likely we need that in place before we can
get AI to be able to effectively understand language.

~~~
adwn
Do you mean "consciousness" as being aware of one's own existence and relative
position in a larger reality, or as having subjective experiences (qualia,
feelings)?

~~~
Practicality
It's a state of being able to say "I am doing X, I am doing Y. I am thinking
Z. (but not necessarily in words of course)" It's also a creation of an
executive process that is separate from the analytic processes. The executive
process can be aware of the analytic processes (Like we can know about our
heart beating, but it's a separate process)

Subjective feelings are not necessarily part of that.

This article on how consciousness evolved does a good job of explaining how it
works (finally), and I think it's something we could emulate.

[http://www.theatlantic.com/science/archive/2016/06/how-
consc...](http://www.theatlantic.com/science/archive/2016/06/how-
consciousness-evolved/485558/)

------
ideonexus
I think about the AI language problem a lot while raising my kids. The article
notes the word "forever" and how an AI must distinguish the literal from the
figurative meaning of the word in context. My five-year-old still doesn't
grasp the literal meaning of this word as "never-ending." To him, "forever" is
simply a very very long time. He has the same problem with the concept of
"infinity," where the word means both "the biggest number" but also has the
characteristic (in his mind) of having an upper quantifiable bound. His young
mind has not yet recognized the paradox that "infinity" is the biggest number,
so what does it mean when I say, "Infinity plus one"?

Neural networks are going to make huge inroads to the AI language problem
simply by exposing the AI to example after example of words in varying
contexts. But I wonder if the real problem is getting those neural networks to
let go of unnecessary data? Humans rely on excited neurons to recognize
patterns, but our neurons let a lot of sensory input pass us by to keep from
getting bogged down in the details. Are the image-recognition AI's described
in the article capable of selective attention? Will they get bogged down in
the morass of information in trying to pattern-match every word to every image
and context they learn?

~~~
EGreg
Maybe this is good. After all, mathematical abstractions may cause more
philosophical problems than they solve. What if there is nothing infinite in
this world? Having a firm grasp of reality before venturing into hypotheticals
can be good.

~~~
bbctol
But what is language if not a tool for manipulating hypotheticals? If your
language can only describe what you know to be possible, it can only describe
things you've already seen, and the required dataset in your memory to have a
conversation or provide useful information is way too large. Abstraction is
the very problem of language that AIs are trying to solve.

------
hyperion2010
I have often wondered whether the reason why 'AI' has struggled with human
language is because most programs are not embodied. If you cannot jump when
you hear the word joy, and cannot cry when you hear the word death and feel
the other sensations that go with them, you cannot truly understand the word.
The lack of additional visual cues and context will continue to severely
hamper the ability for machines to correctly understand the semantics of human
language. I know the language guys like to work with their strings of
characters, but humans can only communicate with those because we have already
built our semantic framework by being active agents in the world.

------
PaulHoule
My belief is that "Deep Network" systems will fail to produce commercially
useful results on language just as symbolic systems failed in the face of
visual recognition.

------
Animats
In terms of DNA difference, humans are very close to mammals that don't have
much language capability but demonstrate some degree of intelligence. This
suggests that language is not as fundamental as some researchers claim.

I once told Rod Brooks, back when he was proposing "Cog" (look it up), that
he'd done a really good insect robot, and the next step should be a good robot
mouse. He said "I don't want to go down in history as the guy who built the
world's best robot mouse". "Cog" was a dud, and Brooks went back to insect
level AI in the form of robot vacuum cleaners.

We need more machines which successfully operate autonomously in the real
world. Then they may need to talk to humans and each other. That might work.

The big problem in AI isn't language, anyway. It's consequences. We don't have
common sense for robots. There's little or no understanding of the
consequences of planned actions. We need to get this figured out before we can
let robots do much.

------
inputcoffee
This is also why AI won't overthrow us and become our masters. They don't even
know what it means to be a master.

EDIT: This is the most volatile comment I've ever posted. It has been going
+2, -2, +2, -2 for the last 35 minutes. People seem to love it or hate.

~~~
philh
Nor does a meteor know what it means to be an extinction event.

~~~
inputcoffee
True but a meteor doesn't have to. As it has been written of in our sci-fi,
the robot uprising specifically requires the machines to understand and, most
of all, care, about dominance.

If you're talking about just being victims of machine logic, we've been
suffering that since the invention of the traffic light traffic jam.

~~~
mattw1810
[https://wiki.lesswrong.com/wiki/Paperclip_maximizer](https://wiki.lesswrong.com/wiki/Paperclip_maximizer)

------
s3arch
We might still have to go much more basic.. Or we might end up going there in
future.... Some have mentioned here having consciousness is more fundamental
than language.

Even very basic creatures with less intelligence "learn" because they "want to
live".

That is the key - You want to stay alive.

You can't be immortal. You just don't live to learn forever. You live to stay
alive and feel happy. And that is what drive us to learn.

Could this be true in case of machines?

~~~
sapphireblue
This is not a question of "consciousness", this is a question of Reinforcement
Learning.

------
unabst
You cannot understand apples by analyzing the word "apple". Language is just
the paper trail. There are plenty of examples: You can't understand a bank or
money by just analyzing bank statements. You can't understand food or
supermarkets by analyzing their receipts. You can't understand the internet by
analyzing TCP/IP.

Regardless of the rich context a word may reside, or the infinite sample pool
of text upon which we may unleash our learning robots, as long as they are all
words, you will never encounter the real apple to which the symbol is linked,
nor reach the reality to which all the symbols are linked. The machine will
learn something. It just won't be anything like what we know or understand, or
what generated the paper trail in the first place. They will be awkward
simulations, which is exactly what we have.

My now 20 month old son wasn't born literate, but already spoke. A grunt, a
groan, a giggle, and a moan. These are his words. There is nuance, there is
rhythm, there is intention, and there is tone. Not that I'm any good at
writing baby books, but there is a reason babies enjoy rhyming and puns and
silliness. The point is, words offer so much more than their meaning. This is
the language they understand. They don't know English, and we aren't teaching
them English. But they are slowly but surely articulating themselves, be it
that they're hungry, lonely, or just want that grape. In fact, who is teaching
who? Parents learn the language of their child first to be any good at
parenting. Their expressions of their intelligence precludes the expressions
of our own. Maybe all we need is a machine that groans.

I find it no coincidence that the philosophy of Ludwig Wittgenstein evolved
with his experiences teaching children. And if I were to make a bold
prediction, it appears the field of AI will benefit immensely from all the
young and talented AI researches who start having kids of their own. The
comments here already seem to attest to this. It's either that or giving up
and deciding to teach preschoolers for a while. Either way, we'll soon have
our book on AI that will do what Philosophical Investigations did for
philosophy. And I can't wait.

------
mark_l_watson
My intuition is that the language problem will be solved, but that the
solution will likely be a hybrid symbolic and deep learning system. BTW, I
have been working (some of the time) on both symbolic NLP and neural
networks/machine learning since the 1980s: right now is the most exciting time
in the field of AI because progress is rapid and accelerating.

------
EGreg
Somewhere here, Noam Chomsky is still kicking and saying "I told you so"

[http://www.tor.com/2011/06/21/norvig-vs-chomsky-and-the-
figh...](http://www.tor.com/2011/06/21/norvig-vs-chomsky-and-the-fight-for-
the-future-of-ai/)

------
jd20
Solving language for computers seems much like climbing a series of mountains,
where each time you surmount one, you realize the next is even higher :)
Thanks to deep learning, machines have made rapid gains in speech recognition,
as well as improving semantic mapping (a la word2vec and other word embedding
approaches).

But once you have a system with human-level speech recognition and semantic
mapping, where do we go? The ability to have a meaningful dialogue with a
machine seems very difficult to model as a machine learning problem (what
constitutes ground truth? What's the reward function look like?), and also has
to deal with many unknowns. For example, ask a smart assistant like Alexa or
Siri about functionality it wasn't programmed with, and you get a terse
"Sorry, can't help you with that." But ask a child, and you prompt a question-
answer dialogue (i.e. learning) or perhaps feigned understanding. My toddler
son is an expert at giving me the answer he _thinks_ I want to hear, even when
he has no idea what I'm talking about :) There are certainly many new problems
which we can begin to think about tackling, but certainly no sign IMO that
we're running out of applications for deep learning in the field of language.

~~~
melling
"much like climbing a series of mountains, where each time you surmount one,
you realize the next is even higher"

Aren't you describing learning, in general? Physics, math, biology, etc

------
YeGoblynQueenne
The problem with natural language processing is that we are trying to learn it
(read: construct models of it) from utterances, things that are being said or
transcribed. And that is a big, huge problem because there is a lot more to
language than utterances. Hell, there is a lot more to language than language
itself.

There are things you cannot put into words, and yet you think them. There are
things that you can't put into words and yet you can make people around you
understand them. There are things you understand without even knowing you
understand them. But even before we go there- there are so many things that
people can make utterances about that are not possible to collect into example
sets and train models on.

How do you collect examples of whatever it is that makes people lie on the
beach to get a sun tan? How do you collect examples of imagination, dreams,
abstract thinking, all those things that your brain does that may be a side-
effect of self-aware intelligence or the whole point of self-aware
intelligence in the first place?

How do you collect a data set that's as big as the whole world you've
experienced in your however many years of life? And even if you could, what
machine has the processing power to train on that stuff, again and again,
until it gets it right?

Machine learning meaning is hopeless, folks. Fuggeddabout it. There's not
enough data in the whole world, there's no machine big enough to process it if
it existed. We 'll make some advances in text processing, sure, we'll automate
some useful stuff like translation (for languages close to each other) and
captioning (for photographs) and then we'll stall until the next big thing
comes about in a few generations from now.

That's what the current state of the art suggests.

~~~
posterboy
> There are things you cannot put into words

 _I_ am a very bad example though, for one because English is just my second
language. Sure there is thinking before words are learned. Language is a
complicated problem to talk about, just like self awareness. Consciousness is
a very nebulous term to me. Still, you'd have to prove that language is
theoretically unfit. Any such logic might be incomplete if you suppose you
cannot put it into words. A complete first order logic is expressible however,
following Goedels completeness theorem.

------
flaviuspopan
Is there any hope that if chat bots do actually become widespread in usage
that eventually we'll be able to aggregate their collective knowledge similar
to reinforcement learning for a single system? That seems like the only likely
way we'll ever be able to train AI in something as complex as language.

~~~
Practicality
The primary issue is that language doesn't have as well of a defined "success"
metric. So more data doesn't necessarily make better language.

Without a human analyzing the transcripts it's very difficult for the chat
bots to know which inputs it's receiving are "good" or "better."

Even the idea of good language is subjective. We all know there is such a
thing, but nearly everyone has different ideas of what this is.

------
DigitalJack
AI as it's defined today is fundamentally reactive. If we applied the AlphaGo
methodology to language, it would come up with what a good response would be
to words it heard, but the purpose of such a conversation would be the
conversation itself.

A real conversation is about conveying understanding, not about the words
spoken.

AlphaGo was trained on however many zillions of games and playing against
itself, but does it actually understand anything about the game? Or can it
simply react to the current state of the game and suggest what the next move
should be. It will never have a leap of intuition causing it to say "the only
winning move is not to play."

Intelligence is not purely reactive.

~~~
dave_sullivan
How would one prove the opposite? That a human actually understands anything
about the game and isn't reacting to the state of the game to suggest the next
move? I'm not saying AI as it exists today understands, I'm just saying this
"understanding" metric isn't a good metric unless it works in reverse.

~~~
jdmichal
I don't think we can with a game. Games are a progressive sequence of states
with permitted transitions defined by the rules; they are inherently reactive.
The only way to prove understanding is to ask things like, "Why did you make
that move?", or maybe more specifically, "Why was that move the one that best
maximizes your chances of winning?" I'm not sure AlphaGo could answer that
question.

Basically, you need to ask questions that require meta-cognition, like, "What
does Mary think about you?" That requires:

* Understanding of yourself as an entity.

* Understanding of Mary as another entity, with its own state.

* The capability to use previous interactions to approximate that other entity state.

------
KasianFranks
The focus on vector space as mentioned in the article "words can be
represented as mathematical vectors, allowing similarities between related
words to be calculated. For example, “boat” and “water” are close in vector
space even though they look very different. Researchers at the University of
Montreal, led by Yoshua Bengio, and another group at Google, have used this
insight to build networks in which each word in a sentence can be used to
construct a more complex representation—something that Geoffrey Hinton, a
professor at the University of Toronto and a prominent deep-learning
researcher who works part-time at Google, calls a “thought vector.”"

Is to me, the most significant way in which we can mimic the way the human
cognitive process develops associations between things. Auto-association is
key here.

In addition, understanding how to calculate similarities between vectors is
also important.

------
artursapek
I wonder if in the future we'll have matrix-style "blobs" of knowledge that we
can plug into compatible AI systems. That way it has to just be trained once
on a strong system, and then other systems can take advantage of the learning
by forking the state and importing it. It would definitely speed up training
AI, and possibly even enable it on lower powered hardware.

Imagine downloading "english teenager slang 2016 v2.0" to your home AI, so it
can understand what the hell your kids are saying :)

~~~
jknz
If the blob of knowledge consists of the weights of some neural network and if
this blob is public... Then an attacker could easily perform imperceptible
perturbation to the input in order to make the network believe that the yogurt
is an Eiffel tower or vice versa. (Can't find the related publications right
now but it appeared several times on hn before).

So if you don't want the system to be gameable, such public blobs of weights
may need to be avoided.

~~~
kmicklas
Likely by that time we will have overcome this problem.

------
JulianMorrison
This is the "time flies like an arrow" problem again, isn't it? Fruit flies
like a banana. Context, sense, meaning. All of which require facts about the
real world.

------
scotty79
Language is just a protocol for synchronizing slices of two world models, the
one in the head of the one who talks and the one in the head of who listens.
If the recipient doesn't have the model similar to the one sender has,
language is meaningless.

You can't "understand" language without having model of the world that humans
construct during their life and education.

So pretty much next step for language recognition is indistinguishable from
sentience.

------
hyperpallium
> his team was just as surprised as everyone else ... It was only several days
> later, after careful analysis, that the Google team made a discovery

Since success has a higher priority for researchers than explicable success,
and if the "singularity" is just progress that is not understood, it may be
almost here - and not require true AI.

Though to be fair, by that definition, the singularity has always been with
us, since we don't understand how we think.

------
babakd
The problem with deep learning and language understanding is that the task is
ill-defined end-to-end. For speech, image understanding, and translation, you
can come up with large datasets of x->y and have deep learning learn a complex
function to approximate the mapping. We don't have that luxury in language
understanding, at least not yet.

------
marcoperaza
Consciousness is the secret sauce. Consciousness as in "subjective
experience", what separates us from philosophical zombies, the sensation of
the color blue, of middle-c. Our bodies evolved and kept this extremely rich
phenomenon for a reason; it is extraordinarily unlikely for it to have arisen
and remained through genetic drift alone.

My theory, and I'd love to find someone offering a similar and more fleshed-
out hypothesis, is that conscious experience serves as a universal data type.
It can encode and play-back any type of knowledge and memory, and
relationships among them, from the color of the teacher's shirt that time you
broke your bone in 3rd grade, to the formula for electron energy in quantum
mechanics.

Unfortunately, the word consciousness is almost forbidden in most scientific
circles. The dominant view is that there is no Hard Problem of Consciousness
and that any discussion of it is quackery, or at least "not science". This
taboo is holding us back.

~~~
king_magic
I've had somewhat similar thoughts, and I am entirely unqualified (and highly
likely not the first) to put forth the idea that consciousness's killer app is
the ability to rapidly assembly abstract models of experiences (present from
current sensory input, past from short/long-term memory or future from mental
simulation) and be able to query/manipulate those models, and my (admittedly
potentially naive) suspicion is that whatever does "that" is the machinery
behind consciousness.

This is also why I think that deep learning / neural networks are only going
to take us _so far_. I think there is more to the story of how the brain works
than only neural networks that make predictions, and frankly I do not think
that any system that does not at least attempt to do "that" (simulating
consciousness's model building/manipulation feature) will have much better
luck at language processing/understanding.

~~~
marcoperaza
One possibility about the machinery of consciousness that would help address
the Hard Problem is that the brain is not creating consciousness from scratch,
but tapping into some currently misunderstood or unknown physical phenomenon.
It's hard to see how information processing alone (which can be done by monks
with paper and pencil, if incredibly slowly) can give rise to subjective
experience.

It seems that this phenomenon, whatever it is, plays a central role in sensory
perception, and there's reason to think that it's present even in animals with
simple brains. So I suspect that we're looking for some kind of simple
operation that can happen on the scale of a small number of neurons, maybe
even a single neuron.

This is all speculation of course, informed by some knowledge and intuition,
but speculation nonetheless. But it's the only way to push the frontier, and
the unwillingness to engage with consciousness as a matter of serious study
seems to be a major failing of brain science and AI.

------
chrchr
"If a lion could speak, we could not understand him." \-- Wittgenstein

------
nurettin
The web of research always spins around money. So far I am not seeing a
monetary interest group with a declared imperative of creating a self-
sustained artificial mind.

------
wangchow
Id rather see an AI that is smarter than humans. People are trying to make
human-like entities but lets think outside the box a little more.

------
jcoffland
> today’s machine-learning and AI tools won’t be enough to bring about real AI

This is important to understand in the midst of today's AI hype.

------
protomyth
Has anyone used something like an MMO to get people to help train their AI in
an artificial environment?

------
outsideline
The problem is : The industry set upon making a skyscraper from the top floor
(neocortex) down and assumes they can hack together a foundation and throw up
ad-hoc scaffolding on their way down that will magically reflect the brain's
capabilities. There are even ridiculous ideas held by industry 'experts' that
the foundation will just magically arise from nothing more than a sheer amount
of spaghetti wiring complexity. Nothing in the known universe has been proven
to work this way. Yet, no one questions this outlandish belief system because
the industry experts and notable names are stating it and.. hey look, their
top-floor systems actually do something interesting..So, they must know what
they're doing and saying.

So, the foundational problems remain...

They remain because there is no foundation to these cortical systems. Anyone
who states this is railed and laughed at. So, you get what you get.....

The article states : "Machines that truly understand language would be
incredibly useful–but we don’t know how to build them."

There are people and groups who know how to build them. They are focusing on
the 'foundation' first. That is not where the spotlight or money are directed.
So, they remain in the dark.

We gained head-winds with a very trivial model of neurons and cortex like
hierarchical neural network designs and the money sent people off to the
races. People began writing wrappers, stuccoing the top floor, hacking up
scaffolding, applying any C.S concept they could find in the parts bin to
fancify the top floor.

That's where all of the attention and money is.. What does your system do?
What benchmark can it beat? What data can it classify? What cool trick can it
do to impress us? So, you get impressive trick systems that require massive
amounts of data, training, and answer maps to obscure the lack of
intelligence. As there is none explicitly designed into these systems, the
system cannot convey its understanding.

It's nothing more than an answer map w/ annealing routines and memory... Very
similar to cortical regions.

The foundation and supporting layers up to the top have been ignored, aren't
getting any spot-light or money, nor are the individuals who continue to toil
on it.

They're considered to be 'philosophers' and jokers and not real
scientist/engineers/industry leaders. The A.I space shuts out a huge pool of
varying opinions via its : If you don't have a PHD, one need not apply. If
you're approaching it from any other methods than the ones subscribed to and
you're not a name, face, or have a laundry list of papers you get the : Good
luck (thumbs up).

And people stand around and wonder why the fundamental problems remain? Come
on...

In any event, it wont remain for long and that will be due to someone/groups
actually investing the time and energy to build a sound foundation. This
begins first and foremost by deep philosophical questions about the nature of
the universe and intelligence. The answers derived serve as a guiding light
for further along scientific and engineering pursuits.

This article should be : AI's lack of a foundation. Whose going to build it?
Whose going to invest the time to understand what exactly it is as opposed to
hacking away at it?

It's the truth but would be considered a 'hit piece'. Until someone constructs
a proper foundation, no one is going to give credence to the idea that current
A.I lacks it. Hindsight is 20-20 as is a force-fed neo-cortex.

------
anonymousguy
Small child have to learn language from nothing. They just figure it out
through exposure and practice. Even pets learn some language. This is the
model to emulate.

Ultimately language use requires a few skills:

* a good parser * motor cognition/coordination * a good memory * semantics/context * vocabulary * situational awareness

The first two in the list are what small children struggle with the most.
Fortunately, we can eliminate motor coordination as a need for AI. Although
extremely powerful parsers demand a specialized expertise to produce this part
of the problem is straight forward. I write open source multi-language/multi-
dialect parsers as an open source hobby.

I discount vocabulary and situational awareness, because most children still
haven't figure this out until they enter high school long after they have
learned the basics of speech. That pattern of human behavior dictates that
while it might be hard to teach these skills to a computer you can put this
off a long ways down the road until after basic speech is achieved.

If somebody paid me the money to do this research my personal plan of attack
would be:

1\. Focus on the parser first. Start with a text parser and do audio to text
later. Don't worry about defining anything at this stage. When humans first
learn to talk and listen they are focusing upon the words and absolutely not
what those words mean.

The parser should not be parsing words. Parsing words from text is easy. The
parser should be parsing sentences into grammars, which is harder but still
generally straight forward with many edge cases.

2\. Vocabulary. Attempt to define words comprising the parsed grammar. Keep it
simple. Don't worry about precision at first. Humans don't start with
precision and humans get speech wrong all the time. This especially true for
pronouns. Just provide a definition.

3\. Put the vocabulary together with the parsed grammar. It doesn't even have
to make sense. It just has to have meaning for words and the words together in
a way that informs an opinion or decision to the computer. Consider this
sentence as an example: I work for a company high up in the building with a
new hire that just got high and gets paid higher than my high school
sweetheart.

4\. If the sentence is part of a paragraph or a response to a conversation you
can now focus on precision. You have additional references from which to draw
upon. You are going to redefine some terms, particularly pronouns. Using the
added sentences make a decision as to whether new definitions apply more
directly than the original definitions. This is how humans do it. These
repeated processing steps means wasted CPU cycles and its tiring for humans
too.

5\. Formulate a response. This could be a resolution to close the
conversation, or it could be a question asking for additional information or
clarity. Humans do this too.

6\. Only based upon the final resolution determine what you have learned. Use
this knowledge to make decisions to modify parsing rules and amend vocabulary
definitions. The logic involved is called heuristics.

This only way all this works is to start small, like a toddler, and expand it
until the responses become more precise, faster, and more fluid. At least....
this is how I would do it.

~~~
tim333
It depends a bit on what you are trying to achieve but I think hooking neural
type networks together to simulate human mental faculties might be a better
way forward. For instance much of human thinking seems to work around
visualizing things in 3d space so you can say to someone imagine a dog on a
skateboard on top of a hill and you give it a push, what happens? Once you've
got that kind of stuff working with spatial awareness, cause and effect and so
on using neural type processing I think the language understanding would come
fairly naturally.

------
gmarx
since it was using movies maybe the 8 legs is for a human centipede?

------
WorldMaker
I don't think the issue has anything to do with cognition and more to do with
something that we do so subconsciously we don't always notice it as we do it:
error correction and context setting. A big part of language are our error
correction channels. In text it's a lot less obvious because we twist the
language to clear things up, but speech is full of a lot of "I'm sorry, what?"
and "uh, you know" and hand gestures and furrowed brows and a million other
side channels to get someone to repeat something or elucidate it or set a
deeper context.

But that happens in text too: we group things into paragraphs and add a lot of
punctuation and as we read we sometimes skim a bit, return as needed, reread
what we missed the first time. (Or in texts/IMs our cultures are in the
process of building whole new sub-dialects of error correction codes like
emoji and "k?".)

A lot of people would think a machine is broken if it hemmed and hawed as much
as people do in a normal conversation; if it needed full paragraphs of text to
context set and/or explain itself.

The biggest thing lacking in voice recognition right now is not the lack in
word understanding or any of the other NLP areas of research: it's in a lot of
the little nuance of conversation flow. For now, most of the systems aren't
very good at interruptions, for instance. From the easy like "let me respond
to your question as soon as I understand what you are asking to save us both
time" to the harder but perhaps more important things like "No [that's not
what I mean]" and "Wait [let me add something or let me change my mind]" and
"Uh [you really just don't get it]" and presumably really hard ones like
_clears throat_ [listen carefully this time].

The point should not be that we hit 100% accuracy: real people in real
conversations don't have 100% accuracy. The issue is how do you correct from
the failures _in real time_ and keep that "conversational" without feeling
strained or overly verbose (such as the currently common "I heard _x_ , is
that correct?" versus " _x_?" and head nod or very quick "yup").

We don't consciously think about the error correction systems in play in a
conversation so that makes them hard to judge/replicate and it's easy to
imagine there's an uncanny valley waiting there for us to get from no "natural
error correction" ability across to supporting error correction in a way that
it works with our natural background mechanisms.

At least in my mind, that's probably the next big area to study in language
recognition is deeper looks into things like error correction sub-channels and
conversational timing (esp. interruption) and elocution ("uh", "um", "you
know", "that thing", "right, the blue one"). I'd even argue that what we have
today is probably already getting to "good enough" for the long run if it
didn't require us to feel like we have to be "so exact" because you only get
one sentence at a time and you don't have good error correcting channels with
what we have today.

------
RodericDay
Reminds me a lot of that classic Stephen Bond essay:
[https://web.archive.org/web/20131204031113/http://plover.net...](https://web.archive.org/web/20131204031113/http://plover.net/~bonds/positivism.html)

~~~
ppod
Is there a story behind the pictures accompanying that article?

------
meeper16
These guys might disagree. Sometimes they make sense which is interesting
[http://sumve.com/ai-chatbots/relationships/relationship-
bots...](http://sumve.com/ai-chatbots/relationships/relationship-bots.html)

~~~
Practicality
They seem to be just talking over each other with vague references to what the
other bots are saying.

If you pay attention for about 5 minutes it's just nonsense. I mean, they are
clearly repeating things from somewhere else that were sensible in their
original context, but now they seem to be saying things nearly randomly.

