
Deep or Shallow, NLP is breaking out - samiur1204
http://cacm.acm.org/magazines/2016/3/198856-deep-or-shallow-nlp-is-breaking-out/fulltext
======
YeGoblynQueenne
>> "Most of our reasoning is by analogy; it's not logical reasoning."

I keep reading this quote, showcased in the article, again and again, and I
still cant believe that people are actually proposing this.

They're basically advocating that we should abandon logic, stop trying to
reason about things, and instead try stuff at random, and why? Because
computers have been shown to do alright with that approach at tasks in which
humans excel.

It is clear as daylight that whatever humans do, computers don't - otherwise,
you'd need to train your baby with the Brown Corpus before it could figure out
you're its mama. We have managed to overcome the limitations of our primitive
computational technology with some clever tricks and that's amazing.

But to take that rightly celebrated fact and make it into an argument that we
must now become ourselves as dumb as our computers, and also never try to make
the poor things intelligent in the same way we are, that's ... well, it's
dumb. That's what it is.

~~~
jessep
What confuses me about your argument is that you are treating a child as an
individual without context. The human brain, however, has reviewed bajillions
of bajillabytes of training data over the course of its evolution.

The brain is already wired to have certain types of perceptions, so by the
time a baby is born that child is just looking for that entity that fits the
mommy definition, not trying to define a that it will have a caregiver and the
optimal strategy is to bond to that caregiver.

Anyway, there are lots of examples where what you are saying holds, kids
drawing general conclusions from limited data, but we also have to remember
that the human brain is not a blank slate, it is already highly specialized at
the time of birth.

~~~
thaw13579
Consider this... the genome is ~725MB, and 99% percent is shared with
chimpanzees, leaving ~7.25 MB for human-specific traits. While I'm sure that
the mother instinct is strong (and probably encoded in that 99%), it's truly
amazing how little storage that is!

Edit: My point is not about imprinting (which I don't claim is restricted to
humans). I'm claiming that what we consider "human-specific" occupies
remarkably little genetic memory. To understand how human language acquisition
works with so little genetic context, we need to study humans and not NLP
algorithms.

~~~
duaneb
Well, imprinting specifically is likely much, much older than the chimp/homo
divide.

------
n0us
As research in NLP advances one of the things I am looking out for is how the
field will be broken into separate problems. Are there some problems which are
relatively low hanging fruit and others that we will be scratching our heads
over 30 years from now?

One thing that I think will be challenging is that language has observer
depending meaning. The same statement might have a completely different
meaning to someone with a different experience, or made in a different
context, or stated by a different person. Games like Chess and Go have
observer independent solutions. The winner is the same no matter who/what
plays the game.

Determining the meaning of a sentence is a problem where the real answer
depends on observer dependent perspective and therefore will need a completely
different way to measure success compared to more 'mathematical' tasks like
Go. Trying to program a machine to account for this kind of personal
experience that humans have, as well as for individual differences between
people will be quite challenging I think. I also think that the most
significant advances will come from cross cutting academic disciplines like
Psychology, Linguistics, and Philosophy of Language.

~~~
iofj
The answer for most of these problems where humans say "humans do X and X is
difficult because ..." is either :

1) it's not complex at all.

2) same as 1, with "in 99.99...% of the cases" appended to it.

Most of the problems in AI are a scaling problem. The reason you're not yet
seeing androids taking over the economy are twofold:

Portable energy. Our best technologies are not the equal of the human body
when it comes to how much power can be stored. But they are constantly
improving, even if we'll need another tenfold increase in energy density (less
if we make mobile robots gasoline powered, which is impractical for other
reasons, more if do the battery + electrical motors thing everyone wants).
This means humans are cheaper and easier for a lot of tasks.

Control. The human body (not including face and face-adjecent muscles) has
about 300 actuators. That means controlling a human body means controlling 300
individual motors at the same time in a useful way (and every motor affects
every other motor. Moving your hand forward means adjusting the power your
little toe is applying to the ground to maintain balance). The state of the
art is maybe 10-dimensional control (10 interdependent actuators), which can
be increased to maybe 20 if the problem can be split into subproblems (e.g.
cooperating robots, or parts of the robot that are attached, so imbalance
cannot occur). In some ways every extra dimension adds an order of magnitude
to the complexity, so it's not like we'll get there in 10 years. But in a
century ... probably.

The thing about these problems is that they are problems of degree. Just like
today's AI algorithms were known in the 1960's. So why didn't we have live
speech transcription in the 1960s ? We all know the answer : processing power
was 50 orders of magnitude less than we needed. It was a problem of degree. We
knew the problem, and had at least a good guess at what to do, but couldn't
contemplate that actually acting on those ideas would make any serious
progress due to the limitations of processing speeds at those times. If you
wanted a 1960s computer to do a 50x50 matrix multiplication, you'd be waiting
days. Doing millions of them was therefore considered useless.

Even this is being gentle. A good case can be made that every component of
these algorithms was known once differentiation was formalized, but nobody put
them together, not because they didn't realize it could work, but because it
was useless : doing things this way would have been incredibly inefficient
compared to the then normal ways of doing things. E.g. finding formulas and
constants by small adjustments on large chains of partial derivatives (ie.
"Deep learning") is something that Isaac Newton knew how to do. It's just he
would declare you totally mad for doing that. One might criticize that there
were a few holes in the mathematical understanding of matrices in Newton's
day, I wouldn't doubt that had he had a reason to really look into those
problems, he would have fixed them. But he was more than 50 orders of
magnitudes of : doing a single 50x50 matrix multiplication with any amount of
resources then available wouldn't have finished before he died.

Now machine learning tutorials tell you to run unrolled 30x30 LSTM expansions
on audio data and unroll for 50 datapoints. And then doing that millions of
times (tens of thousands of times on a few thousand samples). You can expect
this to run on the computer you're reading this on in a matter of hours.

~~~
danieldk
_Just like today 's AI algorithms were known in the 1960's._

But they were not. To take recurrent networks as an example, active work on
RNNs only started in the 80ies. LSTMs, which are one of the solutions to the
vanishing gradient problem were only discovered in 1997 by Sepp Hochreiter.

Of course, computational power is one of the limiting factors, but saying that
it is the only thing that held AI back is disingenuous and dismisses the hard
work of a couple of generations in AI research.

Edit: and Schmidhuber

~~~
YeGoblynQueenne
>> LSTMs ... were only discovered in 1997 by Sepp Hochreiter.

Aye. Hochreiter and the One Who Shall Not be Named. Let's not forget _him
again_ , eh?

------
xigency
In terms of shallow natural language processing, some pretty simple
observations can lead to a lot of bang for the buck.

One project I completed was able to really excel at keyword matching simply by
building a huge dictionary of words, in a literal sense: a dictionary of
contextually relevant words to a phrase, generated by very large texts.

I think baby steps are the key to getting further with NLP. For reference:
[http://nlp.stanford.edu/fsnlp/](http://nlp.stanford.edu/fsnlp/)

~~~
donpdonp
A quick idea of what the Stanford NLP parser is capable of can be gained by
trying a few things in their online gateway. Its impressive.

[http://nlp.stanford.edu:8080/parser/](http://nlp.stanford.edu:8080/parser/)

~~~
charlieegan3
I'd reccommend also checking out: [http://corenlp.run/](http://corenlp.run/).
It's a running version of the corenlp server. I'm making use of this, just
have it running in a container and get parses via the API (which is good but a
little odd to use).

------
YeGoblynQueenne
_In a recent interview with Communications, Hinton said his own research on
word vectors goes back to the mid-1980s, when he, David Rumelhart, and Ronald
Williams published work in Nature that demonstrated family relationships as
vectors. "The vectors were only six components long because computers were
very small then, but it took a long time for it to catch on," Hinton said._

Yeah, I know the work he's talking about. It's the one related to this
dataset:

[https://archive.ics.uci.edu/ml/datasets/Kinship](https://archive.ics.uci.edu/ml/datasets/Kinship)

From that page:

 _Creator:

Geoff Hinton

Donor:

J. Ross Quinlan

Data Set Information:

This relational database consists of 24 unique names in two families (they
have equivalent structures). Hinton used one unique output unit for each
person and was interested in predicting the following relations: wife,
husband, mother, father, daughter, son, sister, brother, aunt, uncle, niece,
and nephew. Hinton used 104 input-output vector pairs (from a space of
12x24=288 possible pairs). The prediction task is as follows: given a name and
a relation, have the outputs be on for only those individuals (among the 24)
that satisfy the relation. The outputs for all other individuals should be
off.

Hinton's results: Using 100 vectors as input and 4 for testing, his results on
two passes yielded 7 correct responses out of 8. His network of 36 input
units, 3 layers of hidden units, and 24 output units used 500 sweeps of the
training set during training.

Quinlan's results: Using FOIL, he repeated the experiment 20 times (rather
than Hinton's 2 times). FOIL was correct 78 out of 80 times on the test
cases._

And yet, if you have a wee look at Hinton's publication on Rexa, there's 43
citations, while there's a single one on Quinlan's (from Muggleton, duh).

So, you know, maybe it's not logic and reasoning that's the problem here,
rather a certain tendency to drum up results of neural models even when they
don't do any better than other techniques.

But, really, it doesn't matter. Google has the airwaves (so to speak). No
matter what happens anywhere else, in academia or business, their stuff is
going to be publicised the most and that's what we all have to deal with.

~~~
romaniv
_> ut, really, it doesn't matter. Google has the airwaves (so to speak). No
matter what happens anywhere else, in academia or business, their stuff is
going to be publicised the most and that's what we all have to deal with._

One thing that bothers me about ANN hype is that those models are very
resource and data hungry. Which suits Google perfectly, but will keep that
kind of AI out of people's hands/computers for decades to come.

~~~
wcrichton
This is checked back by the advent of cloud computing and public datasets,
though. It's super easy for anyone to spin up a thousand nodes and run
TensorFlow. There's a huge number of open-source image and text datasets. The
limiting factor is then money, not hardware.

------
nkurz
_Just as "Hello, World" may be the best-known general programming introductory
example, Mikolov, who was then at Microsoft Research, also introduced what
fast became a benchmark equation in natural language processing at the 2013
proceedings of the North American Association for Computational Linguistics,
the kingman+woman=queen analogy, in which the computer solved the equation
spontaneously._

Since the ACM has professional editors, I was surprised that they would twice
misrepresent the example "king – man + woman = queen" as "kingman+woman=queen"
in the article. At least they spelled "Hello, World" right, even if they
couldn't bring themselves to add the "!".

It looks like the problem is that in the PDF version, the phrase happens to be
hyphenated at the "minus sign" in both usages
([http://delivery.acm.org/10.1145/2880000/2874915/p13-goth.pdf](http://delivery.acm.org/10.1145/2880000/2874915/p13-goth.pdf))
[1] although one might hope this is something an editor would have checked.

[1] Looks like the ACM wants to you click on the PDF link yourself, from the
"View As" bar in the text version.

------
senthil_rajasek
link to google cache

[http://webcache.googleusercontent.com/search?q=cache:145V9qm...](http://webcache.googleusercontent.com/search?q=cache:145V9qmKz2gJ:cacm.acm.org/magazines/2016/3/198856-deep-
or-shallow-nlp-is-breaking-out/fulltext+&cd=1&hl=en&ct=clnk&gl=us)

------
YeGoblynQueenne
>> reasoning by analogy is the core kind of reasoning we do, and logic is just
a sort of superficial thing on top of it that happens much later

Well, for some that's certainly the case.

------
meeper16
More info on vectors [https://www.kaggle.com/c/word2vec-nlp-
tutorial/forums/t/1234...](https://www.kaggle.com/c/word2vec-nlp-
tutorial/forums/t/12349/word2vec-is-based-on-an-approach-from-lawrence-
berkeley-national-lab)

