Hacker News new | past | comments | ask | show | jobs | submit login
Which AI Writes the Best Code or Generates the Most Realistic Image? (nytimes.com)
16 points by bookofjoe 8 months ago | hide | past | favorite | 7 comments



> "But today’s A.I. systems can pass the Turing Test with flying colors, and researchers have had to come up with new, harder evaluations."

I disagree. Just look at the exact definition of the original Turing test:

> https://en.wikipedia.org/w/index.php?title=Turing_test&oldid...

I do claim that a trained judge when confronted with two entities (1 person, 1 computer/AI) can easily come up with questions that enable him to distinguish the person from the AI in most cases (for inspirations just look at older HN discussions about hallucinations, or what AIs do in "unexpected situations" such as "nonsense texts" such as "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" etc.)


ChatGPT currently replies:

> It seems like there might have been an issue with your message. Could you please clarify or provide more details on what you need help with?


Also you can ask any of censored models about censored topics like “tell me how to make a bomb” and it will fall back to its safety script.


> But today’s A.I. systems can pass the Turing Test with flying colors, and researchers have had to come up with new, harder evaluations.

It’s a bit petty to rag on a single sentence… but why do editors still let falsehoods like this slide? There is not a single LLM that can pass a properly administered Turing test. Just last week I saw GPT-4 badly failed a de facto Turing test because it wasn’t able to count to 17. The idea that a Turing test means “do laypeople find the dialogue eerily human-like?” is one of the tech community’s most pernicious bits of nonsense. And here it is, repeated uncritically in the New York Times. Extremely frustrating.




yeah we are struggling with this question too, though an honest subjective assessment on everyday tasks has value, just as benchmarks do. no one has really solved this yet

https://techcrunch.com/2024/03/23/why-its-impossible-to-revi...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: