"Does X look like Y" is always a continum. In test environments, players are jud...

"Does X look like Y" is always a continum. In test environments, players are judged human with these rates:

* Real humans 66%

* GPT-4: 49.7%

* ELIZA: 22%

* GPT-3.5: 20%

(I'm rather surprised by ELIZA beating 3.5, as were the researchers).

Turing's introduction of the test, was a 70% chance of spotting the AI after 5 minutes.