This article is fantastic in both shape and content, and I got lost with all the...

gwern · on July 3, 2020

The logic is weird. Another example is factual question answering. Janelle Shane tried asking basic questions like how many eyes a horse has and GPT-3 insisting on 4; I retry with somewhat different prompting and sampling settings more finetuned to Q&A (...sampling can reveal the presence of knowledge but not its absence...), and I get perfectly straightforward correct answers: https://twitter.com/gwern/status/1278798196555296771/photo/1 So GPT-3 does know how many eyes a horse has; but why was it also happy to answer '4'?

cmehdy · on July 4, 2020

Something about the logic being so off is what I intuitively find logical: we're making these AIs "in our image" in a sense (we think of "neural networks", train them with mostly human-generated datasets), and there's a lot of evidence that pure logic evades us without the use of some heavy artillery to address it (cognitive biases, illusions, optimizations for goals that do not necessarily align with "objectively observable" reality). So in a way, I wonder if we'll have to "teach AI logic" at some point too. In this quest of running logic software on logic hardware with.. steps.. in between, I can't help but think about us humans on our parallel quest when it comes to our brains.

gwern · on July 4, 2020

I tend to write it off as less any kind of deep truth about humans (well, maybe the "bachelors can be married" one given that 9/10 students agreed with GPT-3 that bachelors can be married) than just the current weaknesses of how we train NNs like GPT-3 (small, unidirectional, unimodal, not to convergence, missing most of science in PDFs, etc).

In particular, I bet the "how many eyes does a horse have" example would be much less likely with a multimodal model which has actually seen photographs or videos of what the word "horse" describes and can see that, like most mammals, they only have 2 eyes. Think of it as like layers of Swiss cheese: every modality's datasets has its own weird idiosyncrasies and holes where the data is silent & the model learns little, but another modality will have different ones, and the final model trained on them all simultaneously will avoid the flaws of each one in favor of a more correct universal understanding.

I'm very keen to see how much multimodal models can improve over current unimodal models over the next few years.

waltpad · on July 4, 2020

> fails at logic more than at poetry

This is purely subjective.

Your expectations in Poetry might be different from those of other people or even specialists. I am not particularly good in that domain, but I don't really like the results shown sometimes.

cmehdy · on July 4, 2020

I agree with you that it's subjective. Testing logic vs. art is going to bring in this kind of problem to the surface (how do you test art in a comparable way to how you test logic?). This is why I wrote that noticing the thoughts made me take a step back from my own projections (my subjectivity). That's the whole point.