Any measure? ChatGPT still can’t solve a sudoku, let alone do any form of reason...

ben_w · on June 20, 2024

> ChatGPT still can’t solve a sudoku,

Now you've got me wondering, if there's even an appropriate test set to find out.

When I do a sudoku, it's either printed and then I don't know if I made a mistake until the very end; or it's on a computer which tells me about mistakes as soon as I make them.

I do know I make mistakes when doing them, but I don't know how often — more or less frequently than ChatGPT.

(Also, at least one LLM has been demonstrated to do Turing-machine-like operations[0], and there were Sudoku-solving AI well before LLMs).

> let alone do any form of reasoning. It’s more of a search engine than “intelligence”, while humans obviously can reason and solve novel tasks.

I am aware of Clever-Hand and that I may be making the same mistake, but it sure looks like it can do those things as well as humans.

Humans aren't that great either when it comes to genuinely novel tasks, which is why we have to be trained to do science and use the method of falsification.

[0] I'm not sure if I need the "-like" caveat? https://manifold.markets/DanMan314/will-anyone-win-victortae...

kaba0 · on June 20, 2024

I have never seen an example where it didn’t basically just rehash some answer already found in its training data. Where it excels is language tasks, and retrieving knowledge can be looked at as a language task - it has some internal representation of some textual form of the repeating parts of its training data set, and it can translate it into an answer appropriate for a question.

It fundamentally can’t come up with a novel reply, unlike a human. It breaks spectacularly the instant you go off its training set, which is very easy to notice if you ask it about a topic you are knowledgeable about.

ben_w · on June 20, 2024

It has to generalise just to be able to fill in the gaps within-domain.

If it was not so, I could copy-paste lines from this into google and find out where it was quoting from: https://chatgpt.com/c/3105db2c-b88d-42c6-83bb-ea2b9d6d65a1

> unlike a human

You know the phrase "thinking outside the box"? It's a thing people look for specifically because it's rare.

I can't tell how good or bad LLMs are at this compared to a human due to LLMs having a ridiculous advantage when it comes to breadth of knowledge.

My guess is they are indeed worse than us at generalising.

But they do generalise. They have to generalise even to be at this level.