Something weird (or at least uncommon) that has caught my attention and I havent seen mentioned in the comments is that they cite the swe-bench paper author by first name in the abstract, Carlos et al, and then by last name (as it is usually done) in the paper, Jimenez et al.
It is not the most likely setting to give rise to the data observed (that is the posterior), is the setting in which the data observed is the most likely.
The Chinese Room experiment shows that pattern matching would return correct results for staged inputs, one would not "learn" enough to evaluate an expression not contained in the data.
The Chinese Room thought experiment is not convincing to software engineers generally. It relies heavily on an intuition that looking things up in a book is clearly not "thinking". Software engineers know better: that "looking things up", if you can do it billions and trillions of times a second, can simulate a process which has a close correspondence to reasoning.
Addition and multiplication are trivially implemented using lookup, if you had a machine without arithmetic and only control flow and memory operations. You don't need much more than that for matrix operations, and now you have ChatGPT, a decent simulation of apparent thinking - which is all that is necessary to kill the intuition dead.
What is thinking if not a series of matrix operations? Your brain is just a huge network of neurons and their connections, is this not a (very complex) matrix?
Certainly not in any mathematical sense. The discrete N-dimensional coefficients of a matrix are not modeled by neurons, their connections, and the quantum mechanical electrical accidents that (a) constitute one's wetware, and (b) aren't completely captured by gates and code.
> (C1) Programs are neither constitutive of nor sufficient for minds.
> This should follow without controversy from the first three: Programs don't have semantics. Programs have only syntax, and syntax is insufficient for semantics. Every mind has semantics. Therefore no programs are minds.
---
I personally don't agree with it and believe that there is a flaw in:
> (A2) "Minds have mental contents (semantics)."
> Unlike the symbols used by a program, our thoughts have meaning: they represent things and we know what it is they represent.
While a person may know what they are thinking, examining the mind from the outside it isn't possible to know what the mind is thinking. I would contend that from the outside of a mind looking at the firings of neurons in a brain it is equally indecipherable to the connections of a neural net.
The only claim that "we know what it is they represent" is done from the privileged position of inside the mind.
I would argue that intelligence is more related to the Kolmogorov complexity exhibited by something.
( David Dowe: Minimum Message Length, Solomonoff-Kolmogorov complexity, intelligence, deep learning... https://youtu.be/jY_FuQbEtVM?t=886 )
That the model of GPT is much smaller than its input.
The Chinese room lookup table is enormously large.
If we attempt to relegate GPT as no better than a Chinese room, we can show that the Chinese room look up table is impossible with the amount of data that GPT has access to as part of its model.
If we say that its not a lookup table but instead an enormously complex interplay of inputs and variables, then the distinction between the room that GPT exists in and our own mind breaks down trying to distinguish which is which.
If we want to switch to consciousness, then possibly the argument can progress from there because GPT doesn't have any state once it is run (ChatGPT maintains state by feeding its output back into itself and then summarizing it when it runs out of space). However, in doing this we've separated consciousness and intelligence which means that the Chinese room shouldn't be applied as an intelligence test but rather a consciousness test.
Are GPT 3 and 4 conscious? I'll certainly agree that's a "no". Will some future GPT be conscious and if so, how do we test for it? For that matter, how do we test for consciousness for another entity that we're conversing with (and its not just Homer with a drinking bird tapping 'suggested replies' in Teams ( https://support.microsoft.com/en-gb/office/use-suggested-rep... ))?
I think this goes in line with the results in the GRE. In the verbal section it has an amazing 99%, but in the quant one it "only" has an 80%. The quant section requires some reasoning, but the problems are much easier than the river puzzle, and it still misses some of them. I think part of the difficulty for a human is the time constraint, and given more time to solve it most people would get all questions right.
If you pick a number between 0 and 1 its going to either be a rational number, or is going to use a known irrational number (pi or sqrt(5)). In both cases it can be guessed in a finite number of guesses.
That's not true. The set of numbers between 0 and 1 is uncountable, which means that has a cardinality bigger than the natural numbers, ie: you can't build a bijection between [0, 1] and the natural numbers. Therefore, to fully specify any number in [0, 1] you'll need an infinite number of guesses.
This is what intuition tells me as well, but it is not true. If a gambler has positive expectation of wining, even if you have gamblers playing until either broke or dead, the expectation is for the casino to lose money.