That seems like a pretty testable prediction. If I can't tell the difference between GPT-5 or whatever it is and an adult human native English speaker by the end of 2022, you'll win the bet.
How many questions will you need to determine if it's a bot? How many sessions, and what percentage of correct guesses will determine the outcome of the experiment?
If it wasn't a bot but a human, we'd typically have an unbounded conversation, except by the bounds of politeness, until I was satisfied one way or the other, so that seems a reasonable protocol here, perhaps with some reasonable upper bound on the time spent (an hour?) to avoid putting a potentially unbounded commitment on human participants. (Actually, "We've been chatting since four, I have somewhere else to be" is a pretty good signal of humanity... and should help you see why you're not going to win this bet.) I wouldn't expect it to take more than a few minutes, and I would expect to be wrong in (much) less than 1% of sessions.
The more time you spent chatting the better your chances are to guess correctly. I admit that in 2 years the models might not be good enough to fool you for hours. If you put enough thought into it, especially knowing what the bot was trained on, you could devise a set of tricky questions which would expose it.
However, I believe the models will be good enough to fool you during, say, a 20 question/response dialog. They will definitely be able to fool vast majority of unsuspecting humans. And they will definitely be able to keep track of conversation (remember what you said previously, and use it to construct responses to follow up questions).
How will the bot answer "when and where were you born, and how do you know?" How will it answer "what color, besides red, best communicates the flavor of a strawberry, and why?". How will it answer "What historical figure does my communication style make you think of most, and why?", or "Which of your family members comes to your mind first?" or "What do you think the context was in which the following poem was written?". I don't need to know what it was trained on to win this bet, and 20 open-ended questions is more than enough.
Between the vast majority of unsuspecting humans and me there is a considerable gap. Mind the gap!
You are kidding, right? All these questions you provided are extremely simple to answer, compared to many other things clever human interrogators might say during TT. I'm starting to doubt your NLP expertise.
The TT ready model I'm envisioning will be trained on many billions of chat sessions. It will contain dozens of preconstructed graphs and will dynamically construct dozens more (personality graph, common sense knowledge graph, domain specific knowledge graphs, causality graph, dialog state graph, emotional state graph, etc), it will have a bunch of emotion detectors, humor detectors, inconsistency detectors, lie detectors, praise detectors, etc. It will have the ability to query external sources (e.g. google search --> web page parsing --> updating relevant graph). All these modules will filter, cooperate, and vote, providing input to higher level decision making blocks. These blocks will use those inputs to condition and constrain response generation process. This is finally where a language model comes in, and this until recently has been the hardest part - generating a coherent, grammatically correct, interesting text, directly addressing a specific prompt. This part has been solved. Until GPT-2 last year we simply could not generate high quality text. Now we can, and GPT-3 is even better at that. Sure, there are plenty of non-trivial problems left to solve, but I don't view them on the same level of difficulty - some of them have already been solved in the process of IBM Watson development, so I'm optimistic. The hardest remaining challenge is probably constructing common sense graphs. [1] looks promising.
p.s. your questions are so naive I'm not sure if you're trolling me. A human might answer them like this (and a bot built 50 years ago could easily imitate that):
"when and where were you born, and how do you know?"
- [personality - redneck] I was born on a farm in Oklahoma. How do I know what?
"what color, besides red, best communicates the flavor of a strawberry, and why?"
- Red is the right color for strawberries.
"What historical figure does my communication style make you think of most, and why?"
- You talk like one of them big city hipsters.
"Which of your family members comes to your mind first?"
- My little bro Jimmy, we just went fishing together on Tuesday.
"What do you think the context was in which the following poem was written?"
- [depends on the poem] I don't get this poem. What is it about?
You're betting that in the next roughly two-and-a-half years, the common sense problem will be solved well enough to fool me, despite not having been solved in the entire history of AI up until now. I'll take that bet. How confident are you?
To fool you for 20 questions, yes. I'm ~70% confident, so I'll bet you $100 :)
To clarify, the common sense problem is a hard one. It is similar to level 5 autonomy driving. That will take a while to solve. But what we are talking about here is kinda like Waymo cars which can drive themselves in ideal weather at slow speeds in Arizona. So in 2.5 years I think the best chatbots will be as far from having common sense as the current Waymo self driving cars are from level 5 autonomy. Which is to say they will be pretty good.
You're on for $100. I'm > 99% confident that I won't be fooled by any AI before 2023, and would have bet any amount that I could afford to set aside.
Ideal weather at slow speeds, with a professional human driver throwing road-condition curveballs at you and challenging your responses? I like my chances.
Sounds good! My email is in my profile, if I'm wrong I'll pay up! :)
Keep in mind that you would have to differentiate bot's responses from those of an old truck driver, a snobby philosophy student, a conspiracy theorist, a stoner, a bimbo following latest Kardashians news, etc. Lots of different personalities of real humans who can throw curveballs at you during a chat session. I hope you don't expect to only chat with bay area devs? :)
I'd be very disappointed if I only get bay area devs (since that's a miniscule fraction of the people I've had the pleasure of having interesting discussions with during my life). And indeed, "give me a few sentences about your background, schooling, and interests" is an excellent opening question ;)