
AI Can Pass Standardized Tests–But It Would Fail Preschool (2019) - memexy
https://www.wired.com/story/ai-can-pass-standardized-testsbut-it-would-fail-preschool/
======
memexy
Interesting point from the article is about language models and how they're
related to searching documents. She types in a set of sentences into Google
and the top ranked documents that Google returns for each sentence align with
what the language model predicts. This means that neural networks can be used
to rank documents by associating a query with a set of documents and then fine
tuning it over time.

Here's the relevant section

> I’ll make a competing hypothesis: Given Aristo’s language model, no such
> knowledge or reasoning is needed to answer this specific question; instead,
> the language model will have captured statistical associations between words
> that allow it to answer the question without any real understanding
> whatsoever. To illustrate, consider the following four sentences.

> 1\. Magnet will best separate a mixture of iron filings and black pepper.

> 2\. Filter paper will best separate a mixture of iron filings and black
> pepper.

> 3\. Triple-beam balance will best separate a mixture of iron filings and
> black pepper.

> 4\. Voltmeter will best separate a mixture of iron filings and black pepper.

> A language model can input each of these sentences and output the sentence’s
> “probability”—how well the sentence fits the word associations the model has
> learned—and choose the option with the highest probability. As a very rough
> simulation, I typed a version of each of these sentences into Google (making
> sure it found no exact matches) and looked at how many “hits” each received.
> Indeed, the sentence beginning with “magnet” got the most hits. My crude
> language model answered the question correctly without any intelligence
> other than word associations on the web.

> I tried this same experiment with other randomly chosen questions from the
> Regents exam and found that the correct answer received the most hits in six
> out of 10 cases. My Googling experiment is just an illustration, not meant
> to be scientific, but it does agree pretty well with the score the Aristo
> team itself reported for “baseline retrieval methods.” It’s far less than 90
> percent, but it highlights that there are “giveaways” that can boost a
> learning system’s performance without requiring any knowledge or reasoning
> at all. Moreover, this may be only the tip of the iceberg of the subtle
> giveaways that a machine-learning system could use to choose an answer.

Does anyone know of research along these lines? Or if you know of the right
keywords to google then that will also be helpful.

