
Statistical model of IQ test beats humans at the same IQ test - jacquesm
http://observer.com/2015/06/artificially-intelligent-computer-outperforms-humans-on-iq-test/
======
Houshalter
Here is the most up to date version of the paper:
[http://arxiv.org/pdf/1505.07909.pdf](http://arxiv.org/pdf/1505.07909.pdf)

The questions on the IQ test are very specific. Word analogies, identifying
antonyms, synonyms, and which word isn't like the others.

The method they use is similar to word2vec. That is, it does statistics on
millions of words and how they occur in sentences to try to create a vector
representation of it. These vector representations include a lot of semantic
information about the words and what they mean, which makes it possible for
the computer to do these tests automatically.

They then compared it's performance on these questions with people they paid
on Mechanical Turk. And the computer did better.

This is really interesting that it can beat humans, and shows some
understanding of the relationships between words. However it doesn't mean it's
intelligent, or even close.

It also doesn't mean IQ tests are worthless. What they measure is variation
between humans. People that do better on these questions, also tend to do
better on other unrelated exercises. Like puzzles, memory challenges, or even
reaction times. That these things are correlated suggests there is some single
underlying factor which determines some of the variation of mental ability in
humans, and we call that IQ, and can measure it.

However computers have a totally different set of factors that determine their
strengths and weaknesses, and applying human tests tells us nothing. It
doesn't tell us that the tests are worthless because dumb computers can pass
them, because it still correlates with intelligence _in humans_. It doesn't
mean the computer is intelligent, because some of the tests might be beatable
with clever algorithms like this, that obviously aren't very general.

~~~
51109
Thanks for the link to the paper. I think the older criticisms still stand:

\- They created their own data set. Instead of a general Q-A system, they may
have overfit to this particular task and question types.

\- They set the human intelligence benchmark by using Mechanical Turk. This
may not be representative of true human intelligence (given the lower quality
of Mechanical Turkers).

As a future work, I hope they look at the work the Allen AI institute is doing
with Aristo [1]

There is a current challenge [2] to beat 8th graders on a standardized 8th
grade science exam. Here the data set is made by someone else, the questions
are closer to real-life questions (vs. simple analogies), progress and results
can be compared to other research teams and the human benchmark is set by the
actual performance of 8th graders.

For human intelligence, next to accuracy and speed, we also care about
simplicity. This system nails accuracy and speed, but it may be beaten by
someone who has never read the entirety of Wikipedia. A deep net trained on
millions of words and their relations may be too complex for this task (it
uses up a lot of energy to train).

As to computer intelligence being different than human intelligence: I once
nearly aced an aptitude test where I had access to a search engine. The test
involved programming languages I never programmed a single line in. Yet,
searching for keywords in the question, combined with keywords in the answers,
I could give correct answers, merely by comparing page count statistics. Like
the robot in Searle's room, I was merely pattern matching, without a real
understanding or insight into the questions asked. The result of my test
leaked out on the workfloor, and for weeks I was a headline wonder (having
beaten all the senior engineers' scores), without really deserving it.

[1] [http://allenai.org/aristo.html](http://allenai.org/aristo.html) [2]
[https://www.kaggle.com/c/the-allen-ai-science-
challenge](https://www.kaggle.com/c/the-allen-ai-science-challenge)

~~~
nikkev
To add to your second point, Andrew Gelman had some blog posts earlier this
year detailing the challenges of doing online surveys, where he used simple
questions that respondents "should" have been able to answer in a survey and
found that a fair percentage >10% answered some of these simple questions
wrong. I am assuming there is no incentive for answering more questions
correctly so it's possible that some respondents may have answered blindly to
finish quickly leading to lower scores.

------
purpled_haze
"The researchers had the deep learning machine and 200 human subjects at
Amazon’s Mechanical Turk crowdsourcing facility answer the same verbal
questions. The result: their system performed better than the average human."

Mechanical Turk represents the average human?

And did they just ask the Turk subjects whether they had a bachelor's or
master's degree or not as proof that they had bachelor's or master's degrees?
Could you count on that being accurate?

~~~
blisterpeanuts
If you match up the best AI systems against the bottom 25% of the U.S.
population, the humans won't come out looking too smart. AI might still fail
the Turing test at least some of the time, but it's getting a little scary.

------
threatofrain
The point of an IQ test is to measure _g_ , or general mental ability. In
other words, if a computer can test well on one IQ test, it may also test well
on other IQ tests, since it should be _general_. Otherwise, it's a domain-
specific ability for a specific IQ test, which wouldn't be as interesting.

If a system could perform generally well, then the implications are
astounding. It could imply general analogizing capability. It would also mean
AI came way faster than anyone thought.

~~~
jakobegger
IQ tests may be designed to measure g, but that doesn't mean they actually do.
The best thing we can hope for is that a test _correlates_ with g. But for
every finite test it will be possible to find an algorithm that will produce
high marks without requiring general intelligence. In the specific case
outlined in the article it's clear that they only found a method to answer a
very specific type of test.

(And all that's assuming something like g even exists.)

~~~
chii
> But for every finite test it will be possible to find an algorithm that will
> produce high marks without requiring general intelligence.

if you didnt know what the test was, then this procedure of using a "specific"
algorithm is not distinguishable from general intelligence!

------
tokenadult
It was amusing to see the quotations from Wikipedia articles (which I have
worked on) that appear in the authors' paper without any citation to
Wikipedia. They have made some effort to look up the background of IQ testing,
to the extent of cribbing from some Wikipedia articles, but their citation of
the actual peer-reviewed psychology literature on IQ testing is all but
nonexistent (they cite mostly machine-learning papers), and as several astute
comments point out, training a machine (or, for that matter, a human being) to
perform specifically well on the specific item content of a test does NOT
raise the inference that the specially trained human or machine is generally
smart at whatever else the test-taking-trained human or machine does.

Another serious objection to the study, also brought up in some previous
comments, is that IQ scores are norm-based scores (they compare test-takers to
a general population, not to criteria of correct or incorrect answers), and it
is very likely here that the Mechanical Turk recruits found for the study are
NOT a representative sample of the relevant population of human beings for
comparison to the machine's performance.

But my most serious objection to this study, among several, is that the item
content chosen for this study was not even items professional developed and
pains-takingly tested item by item by psychologists for inclusion in an IQ
test battery, but rather the kind of dreck usually found in unvalidated online
IQ tests, which are simply parlor games and should not be taken seriously. The
study to a very great degree demonstrates the well known computer science
principle "garbage in, garbage out," and does regrettably little to advance
the literature on artificial intelligence.

For readers who would like to read some of the legitimate psychology research
literature on IQ testing, I strongly recommend the Wikipedia article "IQ
classification," which has a better bibliography (by far) than any other
Wikipedia article about IQ testing.

[https://en.wikipedia.org/wiki/IQ_classification](https://en.wikipedia.org/wiki/IQ_classification)

------
steinsgate
I don't think machines will ever achieve human like intelligence in the
current paradigm. A very basic property of the human intelligence is that
there's just one program (learning algorithm, if you please) that is capable
of learning all activities, such as playing chess, playing the guitar, solving
math problems etc. We are born with this program and our life is a learning
experience. On the other hand, computers have separate programs for different
activities. A chess playing computer can only play chess. A machine capable of
verbal reasoning (as in this article) is capable of only verbal reasoning. I
do not think human like intelligence can be achieved by simply assembling
these separate programs together, but this is just my humble opinion or
intuition.

~~~
sgk284
Your general sentiment was true with old-school AI that was just exploring
trees of options and intelligently pruning branches. However, for quite some
time now we've moved beyond that and into areas that are promising for
generalization.

You may want to explore the work that DeepMind[1] has done. They're starting
with simple universes (Atari games), but have developed a single program that
can learn to play dozens of different games without having ever been told the
rules. They learn by trial and error (specifically, Q-learning[2]) and rely
only on reading pixels and knowing the current score. They learn fairly
sophisticated behaviors and ultimately learn to play these games at levels far
superior to what humans can achieve.

Many people are now trying to generalize these results to more realistic
worlds, such as 3D games. And ultimately to agents that interact with the real
world.

[1] [http://deepmind.com/](http://deepmind.com/) [2]
[https://en.wikipedia.org/wiki/Q-learning](https://en.wikipedia.org/wiki/Q-learning)

~~~
steinsgate
I am aware of DeepMind. It is surely a step in the right direction. However,
while model independent reinforcement learning is one part of the puzzle, the
other (and equally important) part is transfer of knowledge between agents
(most human learning happens via interaction between a teacher and a student).
I would really love to see some progress in that area.

------
51109
Please change the link.

This one serves up a malicious advertisement to mobile users in certain
regions (whatsapop browser hijacker with vibration and redirect).

~~~
51109
For the security people at ad networks: The URL was

    
    
      whatsapops.com, 
    

the mobile device was Android latest, the region was The Netherlands. It's not
a virus on the device itself:

\- No app installs since setting up the device a year ago.

\- A lot of internet forum chatter with people using clean factory resets and
getting the same problem.

Next to mining for phone numbers, they try to sucker you into installing a 12
Euro a week app. Very similar to the porn-redirects ads from DoubleClick that
were plaguing mobile devices and tablets around 2012. This page also abuses
the vibrate API and was successful in disabling the back key by working with a
redirect-chain.

Related:
[https://productforums.google.com/forum/#!topic/adsense/KV1nv...](https://productforums.google.com/forum/#!topic/adsense/KV1nvWPVyAc)

------
rohankshir
"but the machine built for this study actually outperformed the average human
on these questions."

A subtle but key statement in the article. If the models are trained for the
test, we're essentially looking at a standard machine learning problem, albeit
with very modern techniques (word vectors, deep nets, etc). The point is that
all of these are optimized towards a goal. In this case, the goal is the IQ
test.

This is not close to being an intelligent being the way humans are. Candidate
optimizations you can say humans are 'trained' for might be survival, finding
meaning, reproduction, etc. All of these goals are extremely broad and
abstract, especially in the context of computers.

I'm not saying this article is sensationalist, but it may be perceived
sensationally. This article merely notes a predictable progression in
artificial intelligence.

------
rdancer
"intelligence level between people with bachelor degrees and people with
master degrees"

This is pure nonsense. Average IQ raises with years spent in education[1], but
the distribution is still bell-shaped, just shifted to the right ever so
slightly.

It sounds like the editor feels real good that they belong to the class of
educated = high IQ people, or real bad, that they didn't get the next-level
degree, and got stuck stupid. Disastrous belief either way.

[1] [https://brainsize.wordpress.com/2014/06/02/iq-years-of-
educa...](https://brainsize.wordpress.com/2014/06/02/iq-years-of-education/)

------
sehugg
I'm uncomfortable with research that merely states "we threw some Turkers at
it". In my experience they are overall reliable, but there's no data on what
qualifications (if any) the jobs were assigned, how much was offered, how long
was given to complete the test, etc. It could be that some of these humans
were actually by bots themselves, if insufficient controls were put in place.

[http://turkernation.com/showthread.php?21352-The-Myth-of-
Low...](http://turkernation.com/showthread.php?21352-The-Myth-of-Low-Cost-
High-Quality-on-Amazon-s-Mechanical-Turk)

~~~
51109
An interesting paper at NIPS aims to combat the lower quality of Turkers with
gamification. Instead of yes/no answers they offer a choice of: absolutely
sure, yes, don't know, no, absolutely not. Monetary reward is doubled if they
answer a question correctly with a high certainty, and it is zero'd when they
are wrong.

[http://arxiv.org/abs/1408.1387](http://arxiv.org/abs/1408.1387) "Double or
Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing"

------
andy_ppp
I wonder if IQ tests (already discredited as far as I'm aware?) become
considered less interesting/relevant in the same way chess has lost some of
its lustre once computers exceeded human abilities? Eventually the only things
humans will have to distinguish themselves is _religion_ and incompetence. I'd
love it to happen in our lifetimes to be able to see what real AI does. Maybe
it's just as likely as judgement day is that we simply ignore each other...

------
lucb1e
Headline: AI outperforms humans on IQ test

Reality: AI scores >100 on IQ test

It's technically correct because it does beat humans on average, but it reads
very differently.

~~~
andrewprock
Actually, it's even less dramatic:

Headline: AI outperforms humans on IQ test

Reality: Trained machine learning algorithm scores better than the average
untrained human.

If the Mechanical Turks were allowed to take versions of the test as many
times as the AI was before the final test, the human average would be
significantly higher.

------
fallingfrog
Word analogies are the simplest problems on an aptitude or iq test, and are
really just vocabulary questions in disguise. I think this result shows the
limited usefulness of such questions in measuring intelligence rather than
showing that the researchers' approach is a way to create AGI.

------
boznz
Until AI can move the boundaries of science/art/philosophy etc without massive
direction from humans some of our jobs are still safe.

I would not be so smug however as I expect programming to be one of those jobs
AI will become better than the human at within the next decade.

------
aidenn0
I think that if you had a person with average IQ spend time studying
vocabulary, they would get a significant boost on an IQ test as well.

------
anocendi
Watched Ex-Machina recently and I've got to drop it here:

I would not be wooed until the AI successfully seduce the human subject(s)
involved.

~~~
kleer001
And I would add, "human subject that knows they're talking to a machine."

Also hopefully a machine that's being treated humanely so as not to start the
robot uprising, or some skynet biz.

------
li-ch
If people are trained with the correct answers of IQ tests, they will get
better scores.

------
tshadwell
suggested title: 'statistical model of IQ test beats humans at the same IQ
test'

~~~
dang
Ok, we'll go with that for now. Thanks!

------
StavrosK
"IQ-test-taking computer outperforms a sample of humans at IQ test."

