Any measure? ChatGPT still can’t solve a sudoku, let alone do any form of reasoning. It’s more of a search engine than “intelligence”, while humans obviously can reason and solve novel tasks.
Now you've got me wondering, if there's even an appropriate test set to find out.
When I do a sudoku, it's either printed and then I don't know if I made a mistake until the very end; or it's on a computer which tells me about mistakes as soon as I make them.
I do know I make mistakes when doing them, but I don't know how often — more or less frequently than ChatGPT.
(Also, at least one LLM has been demonstrated to do Turing-machine-like operations[0], and there were Sudoku-solving AI well before LLMs).
> let alone do any form of reasoning. It’s more of a search engine than “intelligence”, while humans obviously can reason and solve novel tasks.
I am aware of Clever-Hand and that I may be making the same mistake, but it sure looks like it can do those things as well as humans.
Humans aren't that great either when it comes to genuinely novel tasks, which is why we have to be trained to do science and use the method of falsification.
I have never seen an example where it didn’t basically just rehash some answer already found in its training data. Where it excels is language tasks, and retrieving knowledge can be looked at as a language task - it has some internal representation of some textual form of the repeating parts of its training data set, and it can translate it into an answer appropriate for a question.
It fundamentally can’t come up with a novel reply, unlike a human. It breaks spectacularly the instant you go off its training set, which is very easy to notice if you ask it about a topic you are knowledgeable about.