
Towards a conversational agent that can chat about anything - theafh
https://ai.googleblog.com/2020/01/towards-conversational-agent-that-can.html
======
RugnirViking
From the very first sample conversation they released:

\- Human: Hi!

\- Meena: Hey there! What's up?

\- Human: Same old, Monday blues...what are you up to?

\- Meena: I'm procrastinating on a paper I'm supposed to be writing.

\- Human: are you thinking of pulling an all nighter?

\- Meena: Possibly. What's your paper on?

The bot has already mistaken facts about its own crafted persona and the human
in less than 2 meaningful interactions. It's hard to see how this represents
any progress in the field at all.

I remain skeptical until we see better results.

from: [https://github.com/google-research/google-
research/blob/mast...](https://github.com/google-research/google-
research/blob/master/meena/meena.txt)

~~~
darepublic
these nlp models are getting better and better but what we need ofc is for
some model of the world to be constructed during the speech. If I tell you
that yesterday I accidentally knocked a glass of water off the table and it
fell on to soft carpet you could guess that it survived the fall without
shattering. What we need is a chatbot that as you talk to it can update a 3d
game/physics engine model from your words so that common sense implications of
your statements can be gleaned. As you speak it will simulate what you are
describing and then use info gathered from the simulation to draw conclusions.

~~~
RugnirViking
This might sound unrealistically complicated, and the approach suggested here
certainly is, but the idea of simulating a world is a very important concept
in our current models of human intelligence, with very small children spending
a lot of their time learning basic concepts of cause and effect, properties of
objects, materials and physics and so on.

It reminds me of something I do in my current occupation, with a robot. It
makes predictions about how a certain motor movement will result in a change
in position, then measures progress in the real world. Should the prediction
and reality diverge significantly (more than a particular amount), it stops
and calls a human to look into what happened. I can imagine AI techniques
using this kind of thing to guess what will happen next in a series of events,
with the result's divergance from the guess weighting the learning rather than
a steadily decreasing weight. Perhaps this is already a thing though, I'm no
AI researcher

~~~
colorincorrect
>very small children spending a lot of their time learning basic concepts of
cause and effect, properties of objects, materials and physics and so on.

i do not believe that young children explicitly/consciously simulate in their
minds to learn cause and effect and other things, though this might be up for
debate

~~~
icebraining
Who said anything about "consciously"? It doesn't need to be conscious to
affect the conversation.

~~~
colorincorrect
as to whether it is conscious or not influences whether you would want to
model it a-la 3d physics engine, or if not then perhaps there is a more
elegant solution we do not know of yet

------
v7x
>Also, tackling safety and bias in the models is a key focus area for us, and
given the challenges related to this, we are not currently releasing an
external research demo.

Read: We don't want the internet to turn our program into Tay 2.0 because we
are super duper serious.

I get why these companies do this. But I can't help but feel like attempting
to eliminate "bias" and tackle "safety" are counter productive to the goal of
developing softwate desinged to mimic a species that is by and large biased
and unsafe. Purposefully hindering a program because you're afraid it might
produce results is not a healthy method of development, especially in an area
like AI where we are still working to understand the basics of the field. Let
the programs "learn" what's objective before trying to constrain it with the
subjective.

~~~
mc32
I guess I get the bias issue (though difficult to remain neutral on some
subjects.) But safety? It’s a bot. No matter what it says to me there is no
way I’m going to feel unsafe. I mean ok, don’t doxx people (I hope that’s
kinda default behavior though).

~~~
IanCal
Maybe not you, but explicitly encouraging others to take harmful steps is a
potential outcome.

For example, there's a clear safety issue with a conversational bot that tells
you to kill yourself if you talk to it about feeling suicidal or depressed.

What if you talk to it about immigration and by picking commonly upvoted
statements on certain online communities it starts to encourage violence?

There are clear ways to me that a chatbot can have safety issues.

~~~
mc32
That’s a decent argument but then should we go to Hollywood and sanitize their
violence in their movies (even those wanting to teach us about the evil of
violence) for fear they might incite the wrong people who see such as calls to
violence?

What I mean is that that possibility exists in other media yet we don’t feel
ambiguous about it.

~~~
jonas21
First of all, Hollywood does occasionally sanitize films and TV shows. For
example, the suicide scene was removed from "13 Reasons Why" after concerns
from mental health experts that it might inspire copycats [1].

But even ignoring that, I think most people have a good understanding of what
a movie is and that fictional characters are not real. Movies have been around
for a long time, most people have been watching them their entire lives. I
don't think at this point that most people have nearly as good an
understanding of chatbots. If you can converse with one and it mostly delivers
responses like a person would, some people will start treating it like a
person and ascribe values to its statements, and that's where it potentially
gets dangerous.

[1]
[https://www.nytimes.com/2019/07/16/arts/television/netflix-d...](https://www.nytimes.com/2019/07/16/arts/television/netflix-
deleted-13-reasons-why-suicide-scene.html)

~~~
mc32
So you’re saying people can tell fiction from reality in film (even if film
re-enacts actual events), but can’t use that same construct to evaluate a
“bot”?

Maybe... but it seems like a bit of a stretch.

~~~
IanCal
People feel sad when their bomb disposal robots die.
[https://www.theatlantic.com/technology/archive/2013/09/funer...](https://www.theatlantic.com/technology/archive/2013/09/funerals-
for-fallen-robots/279861/)

We're just not great at keeping what we _know_ and how we _feel_ in line.

------
TACIXAT
I would really like a bot that could summarize topics for me then drill down
into specifics based on questions I had. Even if it was all just from the
Wikipedia page. This would be so nice to triage information while driving.

~~~
dekhn
i've thought of this many times. "Wikibot, tell me more about bridges of the
19th century." <blah blah summary> "Stop. Tell me more the Ponte delle Catene,
Bagni di Lucca bridge" <blah blah details>

~~~
FabHK
Even simpler, I wanted my iPhone to read a wikipage out loud to me while I was
walking home in the cold, and I was surprised that I couldn't figure it out -
you can switch on the screen reader in accessibility, but it stopped reading
after half a paragraph. Very annoying.

------
jawns
One key point is that they are using a metric called perplexity, "the
uncertainty of predicting the next token."

From the post:

> Surprisingly, in our work, we discover that perplexity, an automatic metric
> that is readily available to any neural seq2seq model, exhibits a strong
> correlation with human evaluation, such as the SSA value. Perplexity
> measures the uncertainty of a language model. The lower the perplexity, the
> more confident the model is in generating the next token (character,
> subword, or word).

~~~
bkanber
Actually, the key point is that they _aren 't_ using perplexity. They built a
chatbot that aims to maximize "SSA" (Sensibleness and Specificity Average),
and then found that SSA is correlated with perplexity.

~~~
phreeza
"The training objective is to minimize perplexity, the uncertainty of
predicting the next toke""

SSA is used as an evaluation metric. It is generated by human raters, and as
such is not differentiable, making I hard to directly optimize for with
current methods.

------
lallysingh
> The Meena model has 2.6 billion parameters and is trained on 341 GB of text,
> filtered from public domain social media conversation

Who's conversations?

~~~
cyorir
Good question. I think they probably meant "publicly available," not "public
domain." Maybe one could make an educated guess about which platforms they
took data from, based on the sample outputs of their model? I can't tell,
myself.

Samples here: [https://github.com/google-research/google-
research/tree/mast...](https://github.com/google-research/google-
research/tree/master/meena)

~~~
mtmail
"Human 1: There is sports tournament (badminton + tennis + basketball)
organized by google next week. Would you like to volunteer for these events?
Human 2: what does a volunteer do?"

later

"Human 1: Perfect! We have a meeting today at 4.30 pm. Is it fine if i add you
to it? Human 2: yes happy to help"

I'd say in this conversation Human 1 and 2 are Google employees.

~~~
capableweb
Definitely sounds like Google employees chatting anonymously with each other
via some service for training the AI. Been reading a few and many
conversations have pointers that make it seem they are IT professionals but
from different areas in some global environment. Sounds like a internal Google
experiment. (Edit: last example makes it pretty clear that it's was run
internally to Google)

Seems not all life is happy, wherever the samples come from.

> I used to be a Java advocate. But you know, it doesn't do a good job in the
> AI days. It really makes me sad

===

Human 1: Nice to meet you! Is this your first time doing something like this?

Human 2: Yes, interesting task! When did you start with the team?

Human 1: I have been with the company for over 3 years. Stick with the same
team What about you?

Human 2: Great to know! I joined the project earlier in the year. I think we
should sync later for lunch.

===

Human 1: Hi!

Human 2: hey, what's up?

Human 1: What do you think about human like chat bots?

Human 2: I can't wait for them to be great conversationalists!

Human 1: Yep, we seemed to have made some great progress over last few years.
Do you think the positives outweigh the negatives

Human 2: are there even any negatives? what are they?

Human 1: Like impersorsination? Though it sounds far fetched :)

===

Human 1: Hi!

Human 2: Hello!

Human 1: There is sports tournament (badminton + tennis + basketball)
organized by google next week. Would you like to volunteer for these events?

Human 2: what does a volunteer do?

Human 1: Volunteers have to book the place before the event, send out details
of the event to participants, handle some logistics and ensure everything goes
smoothly. It will be fun!

Human 2: That sounds fun, I hope I get to participate as well

Human 1: Great! Do you have any preference for any of these events?

------
aedron
This is more of a deepfake for text chat than any kind of useful tool.

That said, it's still pretty cool how far you can get with just trying to
context match a reply based on previously recorded conversations. It's
obviously not far from being able to fool a human at a quick glance, whatever
the use cases are. Perhaps an updated Lenny[1]?

[1] [https://www.reddit.com/r/itslenny/](https://www.reddit.com/r/itslenny/)

------
dnautics
Easy way to test:

Human: "take the last letter of orange and name any animal that begins with
that letter"

Or:

Human: "pick any number from 1 to five, multiply it by two, and say any word
with that number of letters"

~~~
artfulhippo
This is a good test of capabilities, but not of humanity. A normal human
response to such an order would be "f __* off ".

~~~
dnautics
would it be better if it were prepended with "let's play a word game?"

~~~
colorincorrect
maybe, but any regular human being might not want to play your stupid word
game which isn't even fun at all asides for the AI researcher.

~~~
dnautics
what's with the hostility? I'm just proving that even a 4 year old would be
smarter than an AI agent.

~~~
mkl
Not many four year olds could answer those questions. I think the hostility is
people reacting as if they were asked to jump through those verbal hoops in a
conversation.

------
kdtop
I spoke with a blind woman this week who loves talking with her smart speaker,
Alexa. "She's quite a character." I imagine that some lonely people might get
comfort talking to their house plants, so a smart speaker that can actually
talk back, however limited, would be a great improvement. So a chatbot can be
useful even if zany at times. The big issue is how user expectations are set
up. I'll put up with a digital assistant trying its best to help process my
query with much more tolerance than a chatbot trying to pretend it is a human.
Researchers are trying to narrow the gap between these two. But for now I will
enjoy the former and be annoyed by the later.

------
molteanu
Chat about everything?! I cannot even do a decent search nowadays! The Google-
sphere looks like a giant marketplace. If you're not buying or selling but
just out for a stroll admiring the countryside, you're out of luck.

~~~
ProAm
Meena: Did you say you are buying a stroller to use in the countryside? I was
just looking at these: {url}(hip new stroller){AD}

------
minikites
How is Google Duplex working out? Google is great at press releases and
starting projects, not so great about finishing and supporting products.

~~~
refulgentis
it's working just fine, you can Google(tm) it

------
anon012012
Since every comment in this thread seems deriding or downplaying for some
reason, let me say that I am impressed by the progress, and that any progress
at all is still a very important step, and pushing the envelope. We're getting
closer and closer, and I love what I've seen from transformer, and from this!

------
streetcat1
Since the bot does not have any semantic knowledge, I am sure that it does not
_understand_ what you are saying.

Yes, its answer makes more sense than prev efforts, but you need real logic to
assign semantic to the answer or to the question.

What does this bot prove? That you have enough resources to train on X GPUs
instead of Y GPUs?.

------
gambler
In 2015 Google claimed to have created a chat bot that can do common-sense
reasoning and do tech support just by reading a sufficiently large dataset:

[https://arxiv.org/pdf/1506.05869v1.pdf](https://arxiv.org/pdf/1506.05869v1.pdf)

~~~
anentropic
Meena: have you tried turning it off and on again?

------
avmich
I wonder what they'd say now about Terry Winograd works on SHRDLU and the
accepted conclusion...

~~~
nl
What was the accepted conclusion?

------
papito
There is a company called Replika that makes a convincing chat partner, but
37% of the lines are still scripted.

In other news, ELIZA was created in the 60s.

[https://en.wikipedia.org/wiki/ELIZA](https://en.wikipedia.org/wiki/ELIZA)

------
Invictus0
Improving chatbots today is like putting makeup on a face that doesn't exist.
It's all just a game to see how long the user can go before realizing that
they are being duped.

------
pmarreck
This is far less interesting than either a demo or some code we could play
with! (But still very interesting)

------
buboard
It’s amazing what transformers do to the structure of language. Attention
really is key to understand how our brain processes language. I hope these
models can inspire future work in the neuroscience of language

------
firefoxd
> My favorite show is Star Trek.

What does it mean when a chatbot says that? Does it mean that it was fed a
bunch of shows and this one resonated (computed)? Or is it a filler until you
ask it a question it can help with?

~~~
eklavya
Possibly this is the most occurring theme in its training dataset text.

------
0xD15E45E
So where did Google get 8.5x the conversational data than the OpenAI set?
Reading text messages? Reading instant messages sent over their platform?
Gmail? Google Voice? Google Plus?

------
aaron695
> Modern conversational agents (chatbots) tend to be highly specialized

Name a one chatbot?

I've never seen one other than a lame chose-your own adventure style path.

I feel like the emperor has no clothes.

------
billconan
what’s the practical use of this model? For example, can I assign some domain
knowledge or some area of specialty to the bot, so that it can be a question
answer machine. Or I basically can’t control its reaction?

------
ladon86
Let us talk to it!

------
bluedays
Cool, how do I talk to it?

I see these big announcements with nothing behind them but a white paper and
no way to reproduce. It's not a fact if there is no falsifiability.

~~~
cyorir
To my knowledge, you can't talk to it. It looks like so far they've released
sample conversations with Meena[0], and not much else. They've given a rough
description of the model architecture and how they trained it, but good luck
trying to replicate their 2.6B parameter model unless you can afford a lot of
computes.

[0]: [https://github.com/google-research/google-
research/tree/mast...](https://github.com/google-research/google-
research/tree/master/meena)

~~~
bluedays
The burden of proof lies with Google, not with the scientific community. I
cannot reproduce a scientific model if the data used for that model is
squirreled away. Google has claimed so many times that they have cracked the
nut on AI but has never allowed anyone to see real evidence. It makes me
wonder if this is just a way to pump their stock.

------
allovernow
In my experience, getting a bot to chat about bullshit is not a particularly
useful pursuit in itself. There are far more useful ML paths to explore, like
guided, interactive decisionmaking in a much more focused information field.
And I think these naturally will lead to authentic conversation.

But to truly converse beyond an essentially overfit learned response domain
requires real world knowledge(1) and learned relationships/heuristic
simulation(2) in combination with decision making, and our nets aren't quite
complex enough to get 1 and 2 represented sufficiently well (need lots more
memory), while the decision making algorithm/architecture hasn't been
developed yet...

But the pieces are all availably for assembly and industry is getting pretty
close. If hardware continues to scale at a similar pace we can probably expect
true AI in some form in the next hundred or so years. It won't initially be
very human but if it is purposed (as it undoubtedly will be) to produce
improved designs, by that point progress will effectively be exponential and
probably impossible to predict.

All of this thanks I reluctantly admit to pioneering teams at places like
Google and fb and Microsoft, and the open access nature of arxiv. I think, in
proportion to its emerging complexity and value, machine learning may be one
of the most quickly growing fields ever to exist. Humanity is quickly
approaching a new era.

~~~
james_s_tayler
You also need opinions too at some point.

