
Solving Verbal Questions in IQ Test by Knowledge-Powered Word Embedding [pdf] - ilyaeck
http://arxiv.org/pdf/1505.07909v1.pdf
======
one-more-minute
_> Considering that IQ tests have been widely considered as a measure of
intelligence, we think it is worth making further investigations whether we
can develop an agent that can beat human on solving IQ tests ... we could be a
further step closer to the true human intelligence._

This is a bit like saying, "the best runners tend to be pretty tall, and we've
made a robot which is _really_ tall – so running robots are just around the
corner."

IQ tests certainly correlate well with intelligence ( _in humans_ ), but
they're a metric, not the thing itself. Another metric would be mental
arithmetic; people who can do sums quickly tend to be pretty smart, but that
doesn't mean that calculators are a step away from super-intelligences.

Interesting and cool work but let's be careful in the interpretation (and
remember that people said the same things about chess playing not that long
ago).

~~~
Jimmy
People fall victim to this fallacy in their own thinking all the time. They
want to get good at doing X, but task Y is a lot easier and success at Y is
mildly correlated with success at X, so they start practicing Y instead of X.
It's the reason why things like "brain games" are so popular. In reality,
people's time would almost always be better spent just practicing the primary
task they want to be good at.

That being said, making an AI that does better on IQ tests than humans is a
rather interesting and worthwhile endeavor.

------
dschiptsov
In accuracy of decoding (pattern matching)? What it had to do with IQ?

The decades old system for handwriting digits recignition for US Post beated
humans (now it is a few lines of Octave in Andrew Ng's course). Still it
cannot write a reply to a letter.)

~~~
joe_the_user
I'm reading the article and it's talking about solving analogies, antonyms and
similar problems ... as found on a standard IQ test.

I understand this isn't intelligence and the title doesn't imply it but I'm
not sure what your reference to decoding is about.

~~~
dschiptsov
But it still seems like pattern recognition based on a training set only,
which is, in my opinion, a task on a lever prior to intelligence, like what
visual cortex does. It cannot make new reference between words to produce
(infer) new antonyms and analogies (not presented in a training g set), which
is intelligence.

~~~
ganz
The system isn't training on antonyms and analogies - it's training on
wikipedia. It's learning the meaning (and multiple senses) for every word it
can find.

The test they use to see if it actually learned what these words meant, in a
limited sense, is to test it against a subset of verbal IQ tests (not what it
was trained on!). You could ask it the antonym, synonym, or analogy for
anything in English. This is an extension of word2vec / word embeddings.

That it beats the scores of college graduates impresses me.

~~~
fauigerzigerk
_" it's training on wikipedia. It's learning the meaning (and multiple senses)
for every word it can find."_

I don't think that is entirely correct. After cursory reading of the paper, my
understanding is that they look up a list of word senses for each word in a
dictionary (or multiple dictionaries). And then they try to learn something
about each of those word senses from wikipedia (that is they create seperate
word embeddings for each of those senses). So what they do not do is to learn
what senses a word has. That is done by the humans who created the
dictionaries.

What that means is that they cannot pick up new senses of words, which doesn't
matter for answering IQ test questions because these questions rarely change
and are typically based on well established word meanings.

Unfortunately it makes this approach less than ideal for things like
understanding the news (something I'm working on), where new contexts of words
keep popping up all the time.

------
huac
How reliable is MTurk for an IQ test? I presume the respondents don't put in
that much effort on these quizzes.

~~~
compbio
MTurk is representative of the human population. I hear it often that people
presume that MTurkers do not put in any effort for a few cents. But for most
it is a hobby/nice pastime. They do it to make a few bucks, instead of playing
chess or reading a book. Most people do not get a nickel playing chess, though
really put in the effort. MTurkers are similar.

See here for an MIT experiment in blurry text transcription:
[http://groups.csail.mit.edu/uid/deneme/?p=329](http://groups.csail.mit.edu/uid/deneme/?p=329)
for the unexpected accuracy resulting from crowd-sourcing.

------
jyzzmoe
And yet ...

[http://arxiv.org/pdf/1412.1897v1.pdf%EF%BB%BF](http://arxiv.org/pdf/1412.1897v1.pdf%EF%BB%BF)

~~~
JuliaLang
I can break humans too:
[https://en.wikipedia.org/wiki/List_of_optical_illusions](https://en.wikipedia.org/wiki/List_of_optical_illusions)

~~~
breuderink
Further, these inputs were /optimized/ to confuse the particular
classification software. If only we would be able to optimise optical
illusions for a given human viewer...

~~~
one-more-minute
I bet you could do it with sound, too. Imagine producing what would seem like
abstract noise – unrelated to anything in the natural world – but with
particular structures and sequences of tones optimised to produce particular
emotive responses; pleasure, excitement, calm, energised dancing, romantic
dancing...

... wait, don't we already do that?

------
clickok
Interesting and cool. It's almost like a magic trick, particularly in the
sense that an almost unbelievable result has been achieved through putting in
a serious amount of effort. It's not just a gimmick, though-- projects like
this demonstrate the steady creep of progress on seemingly unsolvable
problems[1]. But of course, there's thousands of such problems out there, with
millions of potential customers ready to make whoever solves one a
billionaire, so even if "true" AI fails to appear, odds are good we'll make a
fair approximation for specific applications.

One thing that popped out at me, however, was the distribution of the human
scores: monotonically increasing with age, which was somewhat odd. Shouldn't
it be more or less normally distributed?

1\. I wouldn't have thought to make a serious attempt towards beating a verbal
comprehension test with deep learning; the possibility of succeeding would've
seemed tiny in compared to the work. Similar to the notion that trying to
prove the Riemann Hypothesis is hard to think of as a real pursuit, at best
it's like a quixotic hobby.

------
jphilip147
It is great to know how deep learning is exploring various possibilities.

