Seriously. It's worth pausing for a minute to note that the Turing Test has been...

eropple · on May 16, 2023

It has not been solved. Even GPT-4, as impressive as it is for some use cases, is dumb and I can tell the difference between it and a human in a dozen sentences just by demanding sufficient precision.

In some contexts, will some people be caught out? Absolutely. But that's been happening for a while now.

ben_w · on May 17, 2023

"Dumb" isn't why the Turing Test isn't solved. (Have you seen unmoderated chat with normal people? Heck, even smart people outside the domain of expertise; my mum was smart enough to get into university in the UK in the early 60s, back when that wasn't the default, but still believed in the healing power of crystals, homeopathic sodium chloride and silicon dioxide, and Bach flower remedies…)

ChatGPT (I've not got v4) deliberately fails the test by spewing out "as a large language model…", but also fails incidentally by having an attention span similar to my mother's shortly after her dementia diagnosis.

The problem with 3.5 is that it's simultaneously not mastered anything, and yet also beats everyone in whatever they've not mastered — an extremely drunk 50,000 year old Sherlock Holmes who speaks every language and has read every book just isn't going to pass itself off as Max Musstermann in a blind hour-long trial.

rtkwe · on May 17, 2023

The lack of an ability to take in new information is maybe the crux of my issues with the LLM to AGI evolution. To my understanding the only way to have it even kind of learn something is to include it in a preamble it reprocesses every time which is maybe workable for small facts but breaks down for updating it from the 202X corpus it was trained on.

ben_w · on May 17, 2023

Mmm. Well, possibly.

On the one hand, what I was saying here was more about the Turing Test than about AGI. Sometimes it gets called the AGI, sometimes it's "autocomplete on steroids", but even if it is fancy autocomplete, I think 3.5 has the skill to pass a short Turing Test, but not the personality, and it needs a longer "short-term memory"-equivalent than 3.5 for a full Turing Test.

On the other hand, as I (sadly) don't get paid to create LLMs, I've only got the kind of superficial awareness of how they work that comes from podcasts and the occasional blog post, which means ChatGPT might very well understand ChatGPT better than I do.

Can GPT-[3.5, 4] be prompted to make itself?

canjobear · on May 17, 2023

As impressive as GPT4 is, it still makes mistakes humans would not, and so I wouldn’t say it passes the Turing Test.

I generated this just now:

Me: Jack and Jill are sitting next to each other in a room. There is no one else in the room. The person sitting next to Jack is sad. The person sitting next to Jill is angry. Who is sad, Jack or Jill?

GPT4: Based on the information provided, it is not possible to determine who is sad and who is angry, as there is a contradiction in the given statements. If Jack and Jill are the only ones in the room and sitting next to each other, then the person sitting next to Jack would be Jill and the person sitting next to Jill would be Jack. The statements about their emotions conflict with each other, so it is not possible to accurately determine who is sad or angry.

jdkee · on May 17, 2023

Jack and Jill are sitting next to each other in a room. There is no one else in the room. The person sitting next to Jack is sad. The person sitting next to Jill is angry. Who is sad, Jack or Jill?

ChatGPT Since Jack and Jill are the only two people in the room and they are sitting next to each other, the person sitting next to Jack is Jill and the person sitting next to Jill is Jack. Given the conditions you provided:

Jill is the one who is sad because she is sitting next to Jack. Jack is the one who is angry because he is sitting next to Jill.

ChatGTP · on May 17, 2023

Does the fact you both got difference answers make it better or worse at reasoning ?

2muchcoffeeman · on May 17, 2023

>And yet, the general reaction of most people seems to be, "That's kind of cool, but why can't it also order me a cheeseburger?”

Well yeah. Imagine you tell a small child that knows about calculators, “Hey can you work out 18763 + 38284, for me?” They might struggle and then maybe fetch a calculator.

The LLMs attempt to predict the answer. WTF? It’s a computer and it can see that is a plus sign. Just understand its addition, and use the rest of your computer brain and do the sum. Hell, it is connected to the internet and we just taught you everything since before 2021. Just call out to Wolfram and give me the answer.

But that’s not how computers work. And we keep saying “AI” but that I is doing a lot of heavy lifting.