I guess we'll just have to wait and see what happens next year :) p.s. I built m...

inimino · on June 19, 2020

Why just wait when you can bet?

p1esk · on June 19, 2020

In order to bet, we would have to agree on evaluation criteria. A task like "respond to criticisms against its own argument in a coherent way" is difficult to evaluate. Turing test is also pretty vague, and some people already declared it passed many years ago: https://www.bbc.com/news/technology-27762088

inimino · on June 20, 2020

Yes, it would be difficult but not impossible to agree on clear criteria. We would also need an impartial third party to make a determination.

The Turing test hasn't been passed if I'm the judge. Supposing I were unable to tell the difference between any AI system and a human interlocutor defending the same argument, at any time before 2023, I'd admit my prediction was wholly incorrect.

I doubt very much I'll find a taker for this bet, however. The AI field has always been bigger on optimism than results, as we both know.

p1esk · on June 20, 2020

Yes, I agree TT has not been passed. But probably starting later this year we will be seeing more and more claims it has. At first it will be clear it's not. Then not so clear, and then the goalposts will be moved again, so that when GPT-5 is announced and it is clearly capable of keeping a conversation, the reaction on HN will be the same as the current reaction to GPT-3: "meh".

inimino · on June 21, 2020

That seems like a pretty testable prediction. If I can't tell the difference between GPT-5 or whatever it is and an adult human native English speaker by the end of 2022, you'll win the bet.

p1esk · on June 21, 2020

How many questions will you need to determine if it's a bot? How many sessions, and what percentage of correct guesses will determine the outcome of the experiment?

inimino · on June 21, 2020

If it wasn't a bot but a human, we'd typically have an unbounded conversation, except by the bounds of politeness, until I was satisfied one way or the other, so that seems a reasonable protocol here, perhaps with some reasonable upper bound on the time spent (an hour?) to avoid putting a potentially unbounded commitment on human participants. (Actually, "We've been chatting since four, I have somewhere else to be" is a pretty good signal of humanity... and should help you see why you're not going to win this bet.) I wouldn't expect it to take more than a few minutes, and I would expect to be wrong in (much) less than 1% of sessions.

p1esk · on June 21, 2020

The more time you spent chatting the better your chances are to guess correctly. I admit that in 2 years the models might not be good enough to fool you for hours. If you put enough thought into it, especially knowing what the bot was trained on, you could devise a set of tricky questions which would expose it.

However, I believe the models will be good enough to fool you during, say, a 20 question/response dialog. They will definitely be able to fool vast majority of unsuspecting humans. And they will definitely be able to keep track of conversation (remember what you said previously, and use it to construct responses to follow up questions).

inimino · on June 21, 2020

How will the bot answer "when and where were you born, and how do you know?" How will it answer "what color, besides red, best communicates the flavor of a strawberry, and why?". How will it answer "What historical figure does my communication style make you think of most, and why?", or "Which of your family members comes to your mind first?" or "What do you think the context was in which the following poem was written?". I don't need to know what it was trained on to win this bet, and 20 open-ended questions is more than enough.

Between the vast majority of unsuspecting humans and me there is a considerable gap. Mind the gap!

p1esk · on June 21, 2020

You are kidding, right? All these questions you provided are extremely simple to answer, compared to many other things clever human interrogators might say during TT. I'm starting to doubt your NLP expertise.

The TT ready model I'm envisioning will be trained on many billions of chat sessions. It will contain dozens of preconstructed graphs and will dynamically construct dozens more (personality graph, common sense knowledge graph, domain specific knowledge graphs, causality graph, dialog state graph, emotional state graph, etc), it will have a bunch of emotion detectors, humor detectors, inconsistency detectors, lie detectors, praise detectors, etc. It will have the ability to query external sources (e.g. google search --> web page parsing --> updating relevant graph). All these modules will filter, cooperate, and vote, providing input to higher level decision making blocks. These blocks will use those inputs to condition and constrain response generation process. This is finally where a language model comes in, and this until recently has been the hardest part - generating a coherent, grammatically correct, interesting text, directly addressing a specific prompt. This part has been solved. Until GPT-2 last year we simply could not generate high quality text. Now we can, and GPT-3 is even better at that. Sure, there are plenty of non-trivial problems left to solve, but I don't view them on the same level of difficulty - some of them have already been solved in the process of IBM Watson development, so I'm optimistic. The hardest remaining challenge is probably constructing common sense graphs. [1] looks promising.

p.s. your questions are so naive I'm not sure if you're trolling me. A human might answer them like this (and a bot built 50 years ago could easily imitate that):

"when and where were you born, and how do you know?"

- [personality - redneck] I was born on a farm in Oklahoma. How do I know what?

"what color, besides red, best communicates the flavor of a strawberry, and why?"

- Red is the right color for strawberries.

"What historical figure does my communication style make you think of most, and why?"

- You talk like one of them big city hipsters.

"Which of your family members comes to your mind first?"

- My little bro Jimmy, we just went fishing together on Tuesday.

"What do you think the context was in which the following poem was written?"

- [depends on the poem] I don't get this poem. What is it about?

[1] https://arxiv.org/abs/1906.05317

inimino · on June 22, 2020

You're betting that in the next roughly two-and-a-half years, the common sense problem will be solved well enough to fool me, despite not having been solved in the entire history of AI up until now. I'll take that bet. How confident are you?

p1esk · on June 22, 2020

To fool you for 20 questions, yes. I'm ~70% confident, so I'll bet you $100 :)

To clarify, the common sense problem is a hard one. It is similar to level 5 autonomy driving. That will take a while to solve. But what we are talking about here is kinda like Waymo cars which can drive themselves in ideal weather at slow speeds in Arizona. So in 2.5 years I think the best chatbots will be as far from having common sense as the current Waymo self driving cars are from level 5 autonomy. Which is to say they will be pretty good.

inimino · on June 22, 2020

You're on for $100. I'm > 99% confident that I won't be fooled by any AI before 2023, and would have bet any amount that I could afford to set aside.

Ideal weather at slow speeds, with a professional human driver throwing road-condition curveballs at you and challenging your responses? I like my chances.

p1esk · on June 22, 2020

Sounds good! My email is in my profile, if I'm wrong I'll pay up! :)

Keep in mind that you would have to differentiate bot's responses from those of an old truck driver, a snobby philosophy student, a conspiracy theorist, a stoner, a bimbo following latest Kardashians news, etc. Lots of different personalities of real humans who can throw curveballs at you during a chat session. I hope you don't expect to only chat with bay area devs? :)

inimino · on June 22, 2020

I'd be very disappointed if I only get bay area devs (since that's a miniscule fraction of the people I've had the pleasure of having interesting discussions with during my life). And indeed, "give me a few sentences about your background, schooling, and interests" is an excellent opening question ;)