
Google Duplex: An AI System for Accomplishing Real World Tasks Over the Phone - ivank
https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html
======
cromwellian
The people losing their marbles over this being some kind of Turing Test
passing distopian stuff are missing the point at how limited this domain is.

People who answer phones to take bookings perform an extremely limited set of
questions and responses, that’s why they can even be replaced by dumb voice
response systems in many cases.

In these cases, the human being answering the phone is themselves acting like
a bot following a repetitive script.

Duplex seems trained against this corpus. The end game would be for the
business to run something like duplex on the other side, and you’d have duplex
talking to duplex.

Most people working in hair salons or restaurants are very busy with customers
and don’t want to handle these calls, so I think the reverse of this duplex
system, a more natural voice booking system for small businesses would help
the immensely free up their workers to focus on customers.

~~~
freyir
> _The end game would be for the business to run something like duplex on the
> other side, and you’d have duplex talking to duplex._

And looking even further into the future, we can imagine a day when the
computers forgo natural speech and use a better-suited form of communication.
Some kind of sequence of ones and zeros transmitted directly across the wire.

~~~
ZainRiz
Lol, but if you think about it, what stops businesses from doing this today?

It's the lack of a universal API.

If a barber shop wants to make it possible for a 3rd party app to book
appointments then they have to release some API. But that's not the end of it.
The 3rd party app has to first discover their Api, someone has to understand
it and write code to use it, and then deploy that code.

 _This is a problem today because there is no universal Api that all services
can use_

With Duplex, verbals speech becomes a universal Api that every service can
parse and communicate to each other wtih. Also, the discoverability is taken
care of by using publicly cataloged phone numbers on services like Google
Maps, Yelp, etc

~~~
jen729w
May 2001: “XML: the universal language?”
[https://www.computerweekly.com/feature/XML-the-universal-
lan...](https://www.computerweekly.com/feature/XML-the-universal-language)

I recall a Wired article from the same era. “XML means your doctor’s system
can just talk to the hospital system even though they’re different!”

Hasn’t happened yet... will it? Can it?

~~~
xroche
> Hasn’t happened yet... will it? Can it?

Nope. XML (or Json, etc.) are just "human-readable" presentation of data. It
does not provides any semantic whatsoever.

So you need some semantic on top of these data. And a general-purpose,
universal API is yet to be invented (hint: it is probably not feasible)

~~~
sitepodmatt
Microsoft and others enterprise selling vendors loved the end goal back in
early 2000s - the universal API solved by middleware. That's why you had
Biztalk and Biztalk consultants that made more than SAP consultants (think
todays crazy Salesfarce consultants that compete for gamification badges). For
example you could be a small insurance company submitting to a larger
underwriter, and when you work out the transactions per month you have to take
$5 off each app just to pay for biztalk infrastructure and licensing. People
rode that gravy train hard. I'd be surprise if any of the biztalk shit still
remained though, grand goals means juice enterprise sales. Oracle had a
similarly crap product that was equally slow, painful and verbose, can't
recall the name. XML and it's lofty goals beyond what it was can be compared
to today's ICO toxic industry, no reflection on XML itself though.

~~~
robterrell
In the 80's, it was EDI -- electronic data interchange, a set of schemes for
sending binary formatted business data, like invoices and POs.

~~~
killjoywashere
Don't forget HL7

------
Pfhreak
A few observations:

* In the example where Google asked about holiday hours -- they can now automate gathering information about businesses in bulk without having to rely on any APIs or user supplied info. Interesting thought experiment is Google validating their reviews/business listings by actually calling businesses and speaking to a real human.

* This is going to be fantastic for accessibility. Maybe I struggle to speak, and I want a reservation. I can have the machine do the irritating work, and focus on just having a nice meal or getting a service (like a haircut).

* Google can scale out your requests, one to N. For example, 'Make me a reservation at a 4 star restaurant next Friday.' Google can immediately initiate calls against 15 restaurants and let you pick from the successes, then automatically cancel the reservations for the places you did not choose.

~~~
Paul-ish
> Google can scale out your requests, one to N. For example, 'Make me a
> reservation at a 4 star restaurant next Friday.' Google can immediately
> initiate calls against 15 restaurants and let you pick from the successes,
> then automatically cancel the reservations for the places you did not
> choose.

This sounds like a nightmare for businesses. This time commitment asymmetry
will be the issue with these systems. Like email spam, it becomes much easier
to waste others people's time when you automate time wasting. If people use it
to flake a lot, I could see businesses just not responding to the assistant.

~~~
Klathmon
>This sounds like a nightmare for businesses.

They pointed out during the presentation that the system could call once to a
business and get the hours, then allow hundreds or thousands of users to see
that without bothering the business again. Assuming it works, it could save a
significant amount of time for some places.

>If people use it to flake a lot, I could see businesses just not responding
to the assistant.

Then it sounds like incentives are aligned here. Google needs to not allow
users to abuse this ability so that businesses will trust and not block them.

If they allow something like the parent commenter pointed out, they would sour
relationships with businesses who would promptly seek out ways to block or
decline calls from this system.

~~~
loceng
This works only if Google allows that same data to be used by other businesses
who would benefit from having it listed.

~~~
TaylorAlexander
Technically it works as long as most users use Google, but I agree it would
work better if they share the data!

~~~
loceng
If they don't share the data then businesses will start being bombarded with
bot calls (whether they know they are bots or not) once businesses start
copying it, which should lead to regulation, however with how easily scams
happen through the phone lines - I don't know how they'd be able to regulate
and enforce this either to protect businesses' time.

------
bambax
Google itself (Google Search) is extremely wary of being used by "bots" and
uses string of captchas to screen them out. But search is just a machine.

That Google would create bots to talk to real people is horrifying. This is
only possible if Google doesn't in fact think of working people answering the
phone, as really human.

This is like doing war with drones instead of soldiers. This may sound over
the top, but bear with me.

The implicit contract in war is that soldiers are legally authorized to kill
because they are risking their own life. Killing people at a distance without
risking anyone's life on the side of the shooters, breaks that "contract", is
fundamentally unfair and fuels terrorism, because terrorism is the only
possible answer.

Making a telephone call rests on the same convention: you are allowed to make
someone spend time on the phone with you, because you're spending your own
time.

But if one side is a robot that has no costs, then the relationship loses
balance and becomes unsustainable (and this is the reason why Google bans bots
on its own servers). This is one more step breaking society, again.

The only answer is to either stop accepting phone reservations, or put
captchas on the other side.

~~~
blackbrokkoli
In middle age Europe, designated herolds came to the town centre and recited
news and gossip - an equally time consuming contract for both sides.

The invention of the Newspaper eliminated that - wouldn't you say that this
change was, while disruptive, for the better?

~~~
TeMPOraL
Newspapers (and listening to heralds) are inherrently pull-based - I choose to
engage with them. Robocalls and spam are push-based - you force yourself on
me, wasting my time.

------
minimaxir
> The system also sounds more natural thanks to the incorporation of speech
> disfluencies (e.g. “hmm”s and “uh”s). These are added when combining widely
> differing sound units in the concatenative TTS or adding synthetic waits,
> which allows the system to signal in a natural way that it is still
> processing. (This is what people often do when they are gathering their
> thoughts.) In user studies, we found that conversations using these
> disfluencies sound more familiar and natural.

This part stuck out to me during the Google I/O demo, as an _intentional_
deficiency is an interesting design decision.

~~~
luckydata
It's not a new thing. A famous tax preparation software introduced a "compute"
screen that took a few seconds to make people more comfortable with the
results even if the computation itself is instantaneous.

~~~
jdietrich
It's really just an audio version of a loading bar or spinner - users get
really uncomfortable if the UI becomes unresponsive for even a few hundred
milliseconds, but they'll wait for several seconds if it _looks_ like
something is happening.

See also:

[https://en.wikipedia.org/wiki/Comfort_noise](https://en.wikipedia.org/wiki/Comfort_noise)

~~~
gowld
People have learned that the spinner is non-progress, though. The progress bar
still has some life in it, except that those are often fake, not measuring
progress.

~~~
jdietrich
OS-level cursor spinners like the mac pinwheel have lost credibility, because
they don't reliably indicate whether the system is temporarily unresponsive or
needs to be restarted. Modern multitasking OSes have a wide range of
situations in which they can become mostly unresponsive without actually
crashing.

Spinners on the application or UI element level are more credible, but
generally worse than a progress bar. They're still very useful as a comfort
indicator for short delays.

Progress bars have very low credibility on Windows, because users have learned
that they're basically useless as an indicator of wait time. A progress bar
might get stuck at 7%, then suddenly rush to 100%; conversely, it might get
stuck at 95% but never finish. The bar offers no real indication of the actual
level of progress; in most cases, this could be greatly improved with a bit of
educated guesswork.

A completely fictitious progress bar can be extremely credible, because it's
totally predictable - if you need to create a 10 second delay, then it's easy
to make the bar progress linearly from 0% to 100% in that time. Users learn
very quickly that your progress bar tells the truth about how long they'll be
waiting, even though it's lying about the reason for the wait.

~~~
TeMPOraL
> _Spinners on the application or UI element level are more credible, but
> generally worse than a progress bar. They 're still very useful as a comfort
> indicator for short delays._

Strongly disagree. A spinner on the web UI element that lasts longer than ~1
second indicates for me that the site's JavaScript broke _again_ , and it's
time to reload or wait for the devs to notice and fix it.

~~~
fastball
He's not talking about the cursor.

He's talking about a circular loading animation. Like the one that replaces
the submit button when you're making a post on Twitter/Facebook.

~~~
TeMPOraL
I'm talking exactly about that spinner. It's a lie. You quickly learn it has
no relation whatsoever to what's happening in the background. And indeed it
doesn't, because it's an animated GIF, completely detached from any logic or
networking code!

(Compare the CLI spinner/fan - that "/ \- \ |" animation used to indicate
progress. There you know that each tick of the spinner means work has been
done, because it has to be animated from code, and it's much simpler to just
update it from the code that does the work.)

~~~
fastball
That's not true at all. In the websites I and many others build, that loading
spinner is linked directly to network code.

The spinner appears when a request is made. It disappears when the request is
resolved.

~~~
TeMPOraL
I was talking about animation. Show/hide on request made/resolved gives only
binary information about starting and finishing something. But the spinning
animation itself does not represent any operations being executed. It may very
well be that the request failed and a bug in JS made it not remove the
spinner. You end up with a forever-looping animation of "work", even though no
work is being done. This makes the spinner an untrustworthy element.

~~~
fastball
Still better than nothing? Sure, maybe sometimes exceptions aren't handled
properly, but at least you know that it was trying to do something, rather
than having users click a submit button 10x because there was no UI feedback
whatsoever.

------
542458
This is going to be awesome/terrible for social engineering attacks. We'll be
training millions of people to to trust phone calls from "Google". But are you
talking to Google-calling-on-behalf-of-your-boss, or Google-calling-on-behalf-
of-some-phisher? Or better yet, some custom system that pretends to be Google
over the phone? Who knows!

~~~
apendleton
At least in the demo, the system didn't identify itself as a bot. It pretended
(fairly convincingly) to just be a person, and seemed focused on interactions
(making a restaurant reservation, etc.) that don't involve someone recognizing
your voice. In other words, I don't think the attack surface is any different
than situations where you could just place the call yourself, save perhaps for
the possibility of scaling it.

~~~
jbob2000
So now I can script a bot to book restaurant reservations all over the city at
busy times. Then nobody shows up for the reservations, the busy time has
passed, and customers have moved on or gone home.

Restaurants make or break on one or two nights in a month. A calculated social
engineering attack like this could bring down hundreds of restaurants in a
city, which would cause millions of dollars in lost taxes, and you see where
this is going.

~~~
freehunter
Was there something stopping you from doing that before? A lot of places you
can even do it online.

~~~
spydum
literally effort. when you lower the attackers effort and cost to try an
attack, the attempts generally go up.

~~~
freehunter
I meant, you could build a bot that calls. We have the technology already, and
the people on the other end probably won't notice. Plus the "do it over the
Internet" thing where screen scraping and scripting is super easy.

~~~
fixermark
But could you build a bot that calls and is convincing enough to trick the
target into actually accepting the request as genuine and reserving the
timeslot?

~~~
mulletbum
It could most likely be done on any internet reservation system with little to
no effort.

------
ecwilson
There's a great Black Mirror episode in here somewhere. Imagine five years
from now, this tech and Google's Smart Reply
([https://blog.google/products/gmail/save-time-with-smart-
repl...](https://blog.google/products/gmail/save-time-with-smart-reply-in-
gmail/)) have evolved to the point where they can basically write entire email
responses or have whole conversations without your input. You could develop
elaborate friendships where both parties are just having their AIs converse
and aren't truly aware of each other. What would the social implications of
that be?

Then a couple years later, the AIs learn how to do business strategy, real-
world problem solving, programming, etc and start doing more of our jobs for
us. A virus goes around that directs AIs to steal our identities and drain our
bank accounts and become autonomous, digital versions of us. The humans have
to stage an uprising and use a massive EMP to take back the earth, but
destroying all electronics in the process and starting another dark age.

I know that's not how AI really works (it's highly specialized and limited),
but I'd definitely watch that movie!

~~~
StavrosK
There was a (bad) book written about this, basically how Gmail's AI took over
the world by sending email on behalf of people.

~~~
tomaskafka
Yes, this one [https://www.goodreads.com/book/show/13184491-avogadro-
corp](https://www.goodreads.com/book/show/13184491-avogadro-corp)

~~~
ecwilson
Oh god that sounds awful. One of the reviews has all the spoilers. It's truly
stupid.

~~~
StavrosK
Yeah, it was pretty badly written and rather simplistic. Not recommended,
you're probably better off reading the spoilered review.

~~~
qooleot
Are you sure it wasn't Google's AI that wrote it?

------
sgslo
This is one end-game of AI - changing the economics of scams.

Imagine how a 'long con' works today: a scammer befriends a person online
through a video game or social platform, and develops a rapport with them over
the span of days, weeks, months. After some trust has been gained, the scammer
then requests money from the victim. Does this happen a lot today? I don't
know, but certainly one reason it doesn't is the economics of the scam. Who
wants to spend a significant period of time gaining trust just for the chance
of a payout?

AI is going to flip this on its head. Rather than dedicating hundreds of hours
of a scammers time, a scammer could instead use a system like Duplex to
befriend hundreds or thousands of victims simultaneously. Let it run for a few
months, developing a strong rapport with the target, until the AI finally
requests some money from the victim.

Yes, duplex is for completing specific tasks, but how much of a difference is
there between "Duplex, book a table for four at 8pm" and "Duplex, ask my
victim about how their day was"?

~~~
jokh
> Yes, duplex is for completing specific tasks, but how much of a difference
> is there between "Duplex, book a table for four at 8pm" and "Duplex, ask my
> victim about how their day was"?

The former (booking a table) is much more "constrained", as in the
conversation would most likely not go into much of a tangent, because there
are only so many responses to a statement like "book a table for four at 8pm"
(8pm is full, 8pm works, etc).

Whereas asking someone how their day was would give the "victim" a much bigger
breadth of responses (and additional questions!) that would cause the AI to
stumble and fail to give a satisfactory answer. That, and running this 1,000
times simultaneously so that no one person would be able to intervene to
"help" the AI would just be a highly unscalable operation.

So yes, huge difference.

~~~
sgslo
And you think this is the end of innovation, this is it, no improvements from
here?

Duplex is interacting with a human and that person has no idea its a computer
on the other end. Yes, Duplex is limited as it stands, but what is there to
make you think something as described in grandparent post won't exist in ten
years?

~~~
icebraining
This is what existed 50 years ago:
[https://en.wikipedia.org/wiki/SHRDLU#Excerpt](https://en.wikipedia.org/wiki/SHRDLU#Excerpt)

In terms of NLP, the Duplex seems hardly that much of a jump. The main
improvements seem to be on the speech part.

------
mithr
This is amazing, but one thing I don't really understand is this: earlier in
the presentation, they demoed some new Google Assistant voices. All of them
sound like standard computer-synthesized assistants. On the other hand, the
synthesized Duplex voices sound indistinguishable from human speech to me,
even without the "disfluencies" they include.

If Google has gotten speech synthesis to this point, why isn't Assistant
synthesizing speech of this quality?

~~~
Klathmon
I have a feeling it's because this is such a limited domain.

During the demo it all sounded very realistic, except for some parts like the
times. It would flow naturally then all of a sudden pause awkwardly and then
say a time like "12 pm" in a weird way.

I have a feeling they are getting it to sound so realistic because there's a
fairly small amount of responses and questions it needs to work with, so they
can either pre-record real humans, or heavily tune a ML voice to sound as
natural as possible.

~~~
ksk
Hmm, or maybe its because you can run the synthesis on a much powerful
computer? Certainly very impressive..

~~~
hjnilsson
It's possible all the questions are even prerecorded human voices. The corpus
is probably less than a 1000 phrases.

------
kingnothing
Google home's speech recognition can't even reliably turn my lights on and off
every time, why would I trust it to book a restaurant reservation? I'd expect
to tell it to book a 6pm table for 2 and wind up with an 8pm table for 10.

~~~
strictnein
Listen to the audio examples. The person at the restaurant misheard the bot
and repeated incorrect information back to it and it handled it in stride.

~~~
SilasX
I just hope it never gets one of the 9 out of 10 humans who _don 't_ repeat
back important information to prevent miscommunication...

~~~
davidcbc
How does that differ from a human having the same conversation?

If I call a restaurant and say "Can I have a reservation for 6 at 8:00" and
they write down a reservation for 8 at 6:00 without repeating it back I won't
know until I show up at 8:00 with my 5 friends.

~~~
SilasX
True -- I was mainly snarking about people with poor communication practices,
not the technology (which seems great -- a godsend for people like me who hate
phone communication).

------
arikr
This is amazing! I'm very excited for a future where more and more tasks can
be automated, enabling humans to get higher and higher living standards with
the same/fewer input resources.

In short, technology is a wonderful thing that allows very low marginal costs.
This is what we need to make the future a better place, given a consistent or
growing population.

"Technology is miraculous because it allows us to do _more_ with _less._ "
This is a perfect demonstration of that.

See also:
[https://www.youtube.com/watch?v=rvskMHn0sqQ](https://www.youtube.com/watch?v=rvskMHn0sqQ)

"A Selfish Argument for Making the World a Better Place – Egoistic Altruism"

~~~
ansible
> _I 'm very excited for a future where more and more tasks can be automated,
> enabling humans to get higher and higher living standards._

That's been the dream for a long time in some circles. With the enormous
productivity gains and ability to leverage external energy sources (fossil
fuels, solar, etc.) we could have built a society of wealth and leisure for
all.

Maybe we will still. The hope is that if there is __enough __of a productivity
gain in a short enough time period (like the introduction of AGI powered
robots) that this could still happen.

~~~
sp527
The technology promises "wealth and leisure for all". The capital owners
promise "it'll trickle down". Technical utopianists need to start tempering
their optimism with the realities of human nature and design systems
accordingly.

~~~
ansible
I think our only realistic hope is actually various non-profit foundations and
such.

When you think about it, I can download (for zero cost), a high-quality
operating system and attendant applications which would have cost hundreds of
dollars 20 years ago, and would have cost a fortune 40 years ago. Ditto for
educational materials, entertainment, etc.

In that sense, we are quite wealthy in comparison to previous generations. If
charitable organizations can leverage the automation of the future to help
people, we might then see all humans across the planet lifted out of poverty.

But yeah, I don't expect corporations to do this. And it seems unlikely that
most governments will either.

~~~
sp527
Interesting perspective. I share your pessimism about corporations and
governments.

But sadly I think government is the only institution with the necessary
leverage (tax base, mandate, etc) to accomplish this. Non-profits are also
fairly dubious in their motives, subject to corruption, and generally highly
inefficient. I'm not sure they're going to be our saviors either.

~~~
sp527
Hmm this comment is getting downvoted, so I'll just reference the Red Cross
fiasco as one good example of what I mean:
[https://www.npr.org/2016/06/16/482020436/senators-report-
fin...](https://www.npr.org/2016/06/16/482020436/senators-report-finds-
fundamental-concerns-about-red-cross-finances).

If you haven't worked in the NGO space, you really don't understand just how
bad it is.

------
pbw
This is really horrifying to me. My heart goes out to the working-class retail
people who are going to have to spend their days chatting with the AI
assistants of upper middle class people too busy to call themselves.

If the shop owner gets a duplex system to field the calls, then the two robots
can subtly signal to the other they aren't actually human, and then start
shrieking like a 9600 baud modem to finish the dialog.

~~~
madrox
As someone who worked in tech support early in their career, I do not.

The entire process of fielding calls is terrible. You never know what's
waiting for you when you pick up the phone. Could be someone with a terrible
attitude that wants to take it out on you. I had coworkers who got PTSD, and a
ringing phone would trigger it.

I would rather talk to a rational robot on the phone than a possibly irate
human who, frankly, only wants something transactional from you and treats you
like a robot.

~~~
sametmax
It's not going to replace. It's going to add to.

First because it will fail a lot, as robot won't understand specifics such as
what items on the menu do you want, oh but it's missing, do you want this
instead.

And then you will multiply commercial calls, and spam mails will arrive on the
phone.

Then people will abuse it to harass, annoy, attack competition, etc. Spam a
restaurant with robot phone calls for a month and it's done.

Plus google will analyse all this data, because it doesn't know enough about
you.

It's a nigthmare.

But it will be excellent for anybody with social skills. What i learn living
in africa is that we became handicaped because we can avoid talking to other
humans so much. Going back to france, my social, sexual and work life improved
a lot because i had basically zero competition. The next generations will
really suck at the game.

~~~
adamsea
... well it won't add to the PTSD at least unless they design the robots to
lose their temper. Kind of agree about the loss of social skills.

------
natch
Web scraping is so yesterday.

This is the beginning of robots scraping the real world.

Most of the examples Google showed were crafted to make this look like a
friendly agent acting on behalf of users.

But the more powerful use of this, as illustrated by the deemphasized "holiday
hours" example, would be for Google to use it to get any information they
wanted out of anybody the robots can call and conduct social engineering on.

Imagine coupling this with the knowledge available to someone able to read
your gmail inbox.

    
    
      "Hey you sent us an email two days ago about A and B..."
    
      (trust is established)
    
      "Can you clear up whether you were interested more in A, or more in B?"
    

OK not the perfect script but you get the point.

------
sigi45
I'm really impressed this year. I haven't thought that ML made the advances so
much faster than it does already.

On one side it is not very impressive that it calls someone but on the other
side its tremendous.

2018 we are able to synthesize voice so well and understand already such a
small domain. 2018 a system calls a human.

We are going so fast already and we should use this google io as something as
a social milestone otherwise we wake up tomorrow and totally misses when the
future became now.

This is just one additional stone to a future where digital becomes a second
reality. The advances in voice will not stop. How long will it take, that
there is a speech model who can simulate everyone by listing to someone only
for seconds or minutes?

With this voice etc. computer are able to teach humans. We will be able to
scale teaching and a shit tone of other things.

I'm impressed.

------
Mizza
Besides creeping dystopia of it, I find this aesthetically disgusting most of
all.

If you listen to the sample call, the computer voice sounds incredibly _rude_,
at least to my ears, especially at the end of the call. I would never speak to
a service employee that way. I try to always say a clear and proper please and
thank you, and would never ever want a robot to subject somebody to an"uhhhhh,
thanks <hang up>" on my behalf.

It seems like in addition to the globalization of Californian social and moral
standards, the world will now be subjected to Californian manners. What a
pity.

~~~
saagarjha
That's how most people I know would make a reservation…

~~~
Mizza
Exactly, and you're from California. You see this as normal. I'm from England.
What Californians see as normal, or casual , or "chill", I think we see as
being rude. I think this is extends in personal interactions, professional
interactions and customer/company relations like this one.

I imagine that if I used this service to place an order to a restaurant, it
would order with the Californian "cannIgettuhh", which I would be scalded for
as a child. I find it very ugly and I hope that this isn't forced upon the
rest of the world. But really, it's just one facet of the way technology is
destroying a lot of interpersonal respect.

One can even imagine this "deficient design" being taken to it's logical
conclusion with the inclusion of belching, grunts, and other inconsiderate
bodily sounds.

The other poster says not to worry, surely the engineers will be more
considerate about other cultures and customs when deploying this technology to
the world, but I think that's incredibly naive considering this company's
track record.

~~~
saagarjha
I guess I should be sorry for bastardizing your language?

But really, a conversation like the one you're looking for would be just as
out-of-place in California as the one you saw in the article would be in
England. It's important to match regional customs if you're trying to emulate
and interact with humans.

~~~
heartbreak
I'm from the American southeast, and if anyone bastardized the English
language it was us. When I listened to the recordings I cringed at how rude
they sounded. If Google can build this thing, they can certainly adapt it to
regional customs if they want to. Hopefully they do.

------
fixermark
At last, putting the power of annoying robot phone trees in the hands of the
consumer, to be directed _at_ businesses.

For that reason alone, I'm excited. ;)

(Note: In fairness, the conversation demos are actually really slick and much
better than a phone tree. I'll be interested to see how well it works in
practice.)

------
dual_basis
How is it that the current implementation of Google Assistant can't even add
stuff to my calendar in a natural way? I just tried the following: "OK Google,
add an event to my work calendar for a meeting at Starbucks tomorrow morning
at 11:30".

What was expected: Title: "Meeting" Calendar: "Work" Location: "Starbucks"
Date: Tomorrow Time: 11:30am

(Bonus points for associating an actual location, but possible ambiguity gives
this a pass.)

What happened: Title: "My work calendar for a meeting" Calendar: Default
Location: None Date: Tomorrow Time: 11:30am

~~~
kwijibob
This is a great point. The current Google Now and Google Assistant voice
controls don't really work half the time, let alone 95% of the time.

I really hope all this speech AI really comes good, I really do. But at the
moment it still is flaky for me.

I really want to use Google Assistant to take instructions and reminders for
me. I can't wait until it can smoothly send messages and emails and add
calendar entries.

Good luck Google.

------
ocdtrekkie
Never have we had a more clear example of why Google needs to be regulated,
and laws against what they're doing needs to be passed.

The fact that they think it's okay to have bots pretending to be human making
phone calls, shortly after demonstrating how quickly they can copy someone's
voice (re: John Legend), it shows a blatant disregard for what they're
creating.

~~~
politician
At the very least, there should be an requirement to identify when challenged.

"Are you a bot?" "Yes, I am Google Duplex v1.2. This call is being recorded;
you can see our privacy policy and terms of service at
[http://google.com/duplex."](http://google.com/duplex.")

~~~
ocdtrekkie
I agree, but the social pressure not to use it would be immense. Imagine the
awkward moments where we start confronting people who call us and asking them
if they're real.

~~~
rayiner
It's already happening. I'm getting robo calls from some police donation fund
that will slow down and restart speaking if you interrupt them (took me a few
moments to realize it was a recording).

~~~
toomuchtodo
Charities and political organizations are excluded unfortunately from robocall
regulation. Google, though, is not.

If I were to start receiving Duplex calls, and could detect it, I would report
each one to the FTC.

------
annexrichmond
The technology is really awesome but I'm not convinced it will fly:

* How much time do people really save, the call in the example is less than a minute. Maybe if you need to call like 10 places it becomes more helpful, but as companies get a bigger online presence (and they are) this technology becomes less useful. You can already make reservations/book appointments, find contracts pretty easily online.

* You can't be 100% sure that the AI won't make a mistake or sound like a total jerk on your behalf. Ok sure, the tech will improve but it will be a long time before humans will fully trust AI to represent you.

* If people find out you're calling them via some automated bot they're going to think you're a tool. Everyone remember Google Glass?

* What the hell is wrong with human interaction anyway?

edit: formatting

~~~
aplusbi
I have a deaf friend who would love this feature. Sure, a lot of stuff like
this can be handled online now, but not 100%. Sometimes you really need to
make a phone call, and for some people that's difficult.

~~~
kharms
Assuming they're in the US, your deaf friend can use a free relay service.
There are a bunch of variations, some that allow signing to an operator,
others typing. All are free, and as accessible as this google bot.

(I assume your friend knows about these services, this is an FYI others.)

[https://www.verywell.com/internet-relay-
services-1046808#typ...](https://www.verywell.com/internet-relay-
services-1046808#types-of-relay-services)

------
hyperpallium
I found the first one (female AI, hairdresser) amazingly compelling, but the
second (male AI, restaurant) sounded like _The Good Doctor_ 's autistic lead
character, and to me, the callee sounded bemused at him.

Part of this is definitely that the girl voice sounds cute, and this partially
disables my cognition.

But objectively, her approach is more tentative and polite, whereas the guy
voice is more direct and assertive.

They might not want the guy voice to take on those feminine qualities, but it
would make the interaction work better - so female AI's dominate.

The effect on the listener may also help - however, I'm not at all sure that
other people (especially women) react as I do to the cute voice. They might
even find the the _Good Doctor_ male approach better - though I can't imagine
that.

The trick of inserting "ums" is very helpful, but because they use the same
sound-bite in the same way, it sounds mechanical after you've heard several
examples. In the examples towards the end of the page, the odd latencies and
(surprising) changes in volume were additionally offpitting.

After a few calls, recipients will recognize the patterns (esp if they use the
same voice - can they varying voices convincingly?), and it might be better to
have an honest reverse-menu system.

All that said, the first girl voice was great, and there will be progress.

------
annexrichmond
I already had a comment above about this but another thing I thought of that I
don't see anyone discussing:

What is the purpose of trying to fool the business owner into thinking it's a
real person? It seems unethical, dishonest and disrespectful to the receiver
having them believe they are talking to a real person. In the case of an AI
failure at least the receiver will understand what's going on instead of
becoming really confused. Sometimes I feel people in SV are oblivious to how
their software can affect real human beings.

~~~
spuz
I would guess the purpose of fooling people is that you get a better result.
If the call stated with "Hi, I'm a google assistant calling to request a
reservation", you might get more hang ups. Of course, the reality could be the
opposite as a business owner might make sure to speak more simply and clearly
once they know they are speaking to a bot.

------
d--b
THIS FUCKING SUCKS.

I don't care the awesome technical achievement. The fact that I believe that I
am talking to a real person but I am not is the worst.

Can I call the restaurant and say:"well actually my wife doesn't like being
cold so I'm not sure the terrace is going to work tonight" and have the
computer answer something completely random is just so bad.

The problem is not that the computer doesn't sound natural. The problem is
that it cannot deal with out of script requests. In fact the more natural it
sounds the more dumb it makes the system appear!

~~~
spuz
It sounds like you simply want Duplex to be better than it already is, not
that you object to the technology in principle. Let's say it can handle
requests like position in a restaurant or food specials or <insert complex
request here>, would you approve of it?

------
alonsonic
Funny how most people here are missing the point of this tech and talking
about how this could be achieved by using an API or talking binary over the
phone with another bot.

The idea here is to leverage an existent merchant base by adapting to how they
work today. Suddenly they just integrated to millions of restaurants by
adapting to them and not the other way around.

I'm sure Google Assistant will first check if there is a way to use tech like
OpenTable to make the reservation and fall back to a phone call if there is no
better alternative.

~~~
delaaxe
It seemed to me Google Duplex only works on the receiving side on the phone
call. Not sure how much more complex initiating a call is.

------
jeremiahwv
And then the salon and the restaurant start using Duplex as well, and all
phone conversations become Google talking to Google.

How bout: "Hey Duplex, call this support number and get a top level human
manager on the line please."

Then we get a HN article: "Duplex is fighting Alexa!"

~~~
ProblemFactory
> And then the salon and the restaurant start using Duplex as well, and all
> phone conversations become Google talking to Google.

This must be the most convoluted API protocol ever invented.

------
jld
In the case where the person on the other end isn't a native english speaker,
(the calling a restaurant clip) why doesn't Google figure out what language
they speak and speak it to them?

~~~
jacquesm
Do Scots count as native English speakers? What about the Irish? English has a
very large number of accents and dialects that all count as 'English', but
typically speech recognition software only works one variety.

~~~
crispyporkbites
Scots are definitely native english speakers, reluctant ones at that, but
english nonetheless

~~~
wilsonnb
I think the person you are responding to might have been talking about those
who speak the language Scots, not just Scottish people.

Many consider Scots to be an actual language separate from English. There's a
good amount of debate about this among linguists, I think.

For anyone curious, I recommend reading these pages from the recent Scots
translation of the first Harry Potter book to get a feel for how it differs
from English.

[https://imgur.com/gallery/wjkDp#gSO4FRW](https://imgur.com/gallery/wjkDp#gSO4FRW)

~~~
crispyporkbites
oh wow I thought scots was a joke language mocking the scottish accent

oops!

------
isthispermanent
I did a whole write up about my homegrown attempt to do this last summer. That
blog post from Google is more technical but eerily similar.
[https://philandrews.io/post/20000-phone-calls-later-why-
siri...](https://philandrews.io/post/20000-phone-calls-later-why-siri-cant-
make-your-dentist-appointment)

In short, I don't have a lot of faith in this being plausible yet. The major
reason is the phone lines. Phone call quality is not solid enough to ensure a
accuracy rate high enough to roll this out as a production service. There are
others, like legal considerations, but if I had to pick one that would be it.

There's a reason why this rose to number one HN, people would clamor for it.
To think that Apple and Google haven't been thinking the same thing is short
sited.

~~~
zerostar07
It was the most impressive thing in the keynote for sure, but given the
current state of AI we should have no reason to believe that this would work
well enough to be employed at scale. It also feels weird that it is trying to
trick the callee that it is a human. I believe even if they used a mechanical
voice, the system could work, because businesses would learn to recognize the
"google call" and respond to it the way they respond to any voice-enabled call
center. It seems to me it is more of an (impressive) PR thing than something
that will get actual use.

~~~
isthispermanent
I agree. There's a whole world where I imagine businesses with stickers like
the Visa accepted here stickers, except something like "Siri accepted here" or
"Siri works here", whatever the wording. That way you would know that you
could use Siri to interact with that business on your behalf. That takes away
part of the resistance to phone bots by people that answer phones, them
expecting those calls.

------
monkeynotes
All this is great, and I am sure Google will use the tech responsibly, but I
feel we are rapidly approaching a point where humans can no longer use their
senses to discern critical truths about our own reality.

I honestly don't know how that's going to pan out. Just imagining it already
makes me feel the kind of paranoia creep you get when you are too high.

~~~
gausha
That's true. People do not take psychological effects of technology seriously.
The reason being most of us are already addicts. Who knows if this comment is
by a bot or a real person.

------
JepZ
Interesting, many people here seem to be fascinated by this and I am sitting
here thinking: Oh my good, I hope I will never have to work on a phone
receiving such calls.

The technology is kinda cool, but when I think about the poor sound quality of
some calls and my experience with voice assistants, I wonder in how many cases
this will end in just garbage appointments or very poor experiences for the
human on the one side of the phone.

Besides that, I like how Google is pushing to change the current way of making
appointments. Maybe this will drive more small/medium businesses to use online
services for appointments.

~~~
modeless
Talking with humans is often a poor experience too. I can imagine employees of
small businesses preferring to talk to Google rather than customers directly,
as unlike real customers Google should be predictable, and never frustrated or
angry.

~~~
jmagoon
That's almost worse. Can you imagine having an endless conversation with a
human sounding bot? Say you work at a super popular restaurant that has most
tables booked out for a month or so, and doesn't have anything ready beyond
that, and the bot just walks through a semi-endless list of possible options.

I wonder what it sounds like when it runs out of choices, or asked it to get a
dinner reservation @ <insert popular place> any evening at any time for the
next two months.

The weirdest thing about these trained neural nets too is the small tweaks
that break them in very interesting ways. The future is truly a surreal place.

------
eddieplan9
The demos look as exciting as any chatbot demos we saw before, and I think it
will fail in practice just like how chatbots fail. With the exception of very
few verticals in controlled settings, most real world tasks are way more
dynamic and fluid than what we can comfortable code in some state machine. The
article pointed it out, too

> One of the key research insights was to constrain Duplex to closed domains,
> which are narrow enough to explore extensively. Duplex can only carry out
> natural conversations after being deeply trained in such domains. It cannot
> carry out general conversations.

The question is after you limit it to "closed domains" narrow enough, where it
can still be practically useful. It might help with certain functions in
enterprise settings. It will definitely work for spammers because they can
work with even 1% success rate.

------
bo1024
I think it's unethical for a robocaller to incorporate things like "ums" and
"ahs" intentionally to deceive people into thinking they're talking to a
human. At least, it's disrespectful.

~~~
amasad
I agree with that. I think it's time for us, the tech industry, to start
thinking about the ethical implications of the things we build before we
release them.

To make this salient for people: imagine this technology being deployed for
political robocalls. An attractive voice masquerading as a person persuading
people to vote for someone.

~~~
icebraining
_To make this salient for people: imagine this technology being deployed for
political robocalls. An attractive voice masquerading as a person persuading
people to vote for someone._

They already hire telemarketing centers to do political calls; is this really
so different?

~~~
bo1024
I think robots masquerading as humans is unethical (or at least
rude/disrespectful) in that context as well.

------
cosmic_ape
They have listed the problems with such systems in the first paragraph. They
claim to overcome these by restricting to very specific domains. But specific
domains are usually still wide. A table for two, but in the garden or inside?
Inside on which floor? Sometimes it doesn't matter, sometimes it does, and
these things are specific and different for every business.

Really skeptical about this. And if this does become a thing, it will dumb
down the interaction.

~~~
sushirain
The bot can gather this restaurant specific information over several
conversations with the restaurant. This wasn't possible before. This domain
isn't too wide.

~~~
cosmic_ape
You are probably right that in principle one could eventually come up with a
full catalog of features of a reservation. There would be about, say, 100 of
those.

I seriously doubt that they will proceed to define and collect them, since
those are probably 10% or less of all reservations, but lets say they would.

Then still, the conversation you make to make the reservation is a process in
which you make the decision.

Say, there is a place inside at 20:00 or a place in the garden at 20:30. Are
you going to let Google choose between the two options for you?

Do you imagine there would be an api in which you specify to the assistant,
before it makes a call, your preferences in _that_ much granularity?

------
jfv
I feel like this is going to make the world even flakier than it already is.
If I can waste people's time without even the few minutes it normally takes to
make a phone call, what's to stop a restauranteur from effectively denial-of-
service attacking competing restaurants, or what's to stop me from booking 100
dinners for Friday evening because I'm not exactly sure what I'm going to want
to eat (or who I'm going to take out for that matter -- smartphones make it
easy to punt on that decision too), so I retain optionality.

~~~
MichaelGG
What stops a restauranteur from abusing Google's Gmail service? Presumably
Google won't allow you to do spam with Duplex either.

If you are going to be more enterprising then you can already do this. Just
put some HITs on Mechanical Turk and let them place calls. Should cost you a
few bucks to flood hundreds of calls.

~~~
jfv
I don't need reservations _that_ badly. But the difference between having to
specify Mechanical Turk work and just talking to a Google appliance is pretty
huge... We're talking hours vs. minutes of work.

~~~
MichaelGG
But I am saying that it is unlikely Google will let you spam reservations.
This idea that some kid will now be able to make 1000 reservations via Google
is not probable.

~~~
jfv
I think it's highly probable that either the technology will be rebuilt in a
way that makes abuse possible (Google being relatively open with their
technology has this side effect), or that Google won't put in enough
safeguards to force people to use it responsibly, but we will see. Maybe you
need too many AI experts and too much data to build this technology for
yourself, and maybe that only exists at Google.

------
nitwit005
The voice sounds nice, but successful runs aren't that interesting. If it gets
a question it doesn't understand, what does it do, and how does it report it
back to the user?

It seems like the user is likely to get a certain number of confused recording
sent back to them when it fails, and then get stuck manually calling back and
explaining what happened.

~~~
nfoz
Probably by leaving a real human confused or even upset.

------
netrus
This is a masterpiece of framing by Google. Am I the only one who does not
believe for a second that this technique will be mostly used to do calls for
consumers, but to call WITH consumers? We will get back to asking hotlines
strange questions to break the algorithm and reach an actual human.

------
erdo
Sigh, guess it won't be long before we're all asked to jump through some
ludicrous human challenge to confirm we are not a robot every time we call
someone

~~~
cpeterso
I wonder how voice captchas will work. This is inevitable. :)

~~~
tim333
What's wrong with Wolfie? (Terminator 2). Hopefully the AI won't be quite like
that for a while.
[https://www.youtube.com/watch?v=MT_u9Rurrqg&amp=&feature=you...](https://www.youtube.com/watch?v=MT_u9Rurrqg&amp=&feature=youtu.be&amp=&t=50)

------
spdustin
> [...] we trained Duplex’s RNN on a corpus of anonymized phone conversation
> data.

That's alarming, and thinly cloaked in euphemism (IMO). "Conversation data"
here means recordings of actual human-to-human calls, as well as their
automated transcriptions. Both were used.

Where did that source audio come from?

~~~
chime
For years, Google had a 1-800 free 411 service. Even a decade ago it was
expected that they would use that data for AI.

------
mattnewport
The examples they give are quite impressive but I always want to hear some
examples of failures as well with these types of technology. They mention that
the system is self monitoring and will try to detect a situation it can't
handle and redirect to a human operator but I think some examples of
situations it can't handle or where it gets things wrong would be very useful
in understanding its real world robustness.

------
chrisabrams
There are plenty of places in my town that don't have online ordering or
booking...this would (potentially) help with that...but really, I'd rather
there be more tech be built around enabling theses companies to "be online"
such as an online ordering or booking system. When I find a hair cut place,
why can't I just book an appointment right from the Google Maps search? I'd
rather have that type of convenience instead. Ironically they may add that at
some point, and Duplex just calls on my behalf without me even knowing.

~~~
sarreph
> Ironically they may add that at some point, and Duplex just calls on my
> behalf without me even knowing.

Yup, good observation; seems highly likely, save for the 'without even
knowing' bit... I would imagine you'd 'request to book' and an async Duplex
operation would run in the background and send you a notification of the
outcome / possibilities.

------
whatever_dude
Can't decide whether this is amazing or scary.

~~~
politician
Just wait until a Google Duplex caller "on behalf of a client" calls to
schedule a reservation at a restaurant using Google Duplex to answer the
phones.

~~~
jonknee
The most inefficient API you could design!

~~~
ryandrake
Seriously. Instead of a single, clean restaurant reservation HTTP POST API,
the future is two neural nets modulating and demodulating the request to and
from inexact and potentially ambiguous English audio.

~~~
scarmig
Silly. The future is a stenographic handshake in the initial greeting, which
negotiates an upgrade to a proprietary gRPC8 protocol when the caller and
recipient are both Google, which Google uses to get a monopoly on telephone-
mediated social interactions which it can then monetize by building a social
graph to more efficiently target advertising to captive audiences riding Waymo
cars.

~~~
dcbadacd
You are joking, but it's not funny in the future. :S

------
gepeto42
Yes, it would be better if those businesses had systems/APIs for those
transactions to be done, but we still live in a world where scanning a sheet
of paper that was printed and mailed to you with a phone often reduces
friction on the action of moving money between two accounts.

That being said, as someone who has used "assistant as a service" stuff
before, I wonder how well this will work or how limited it will have to be,
and not just because of the AI itself.

Even with humans on both sides, it's amazing how hard it can be to get an
answer let alone a request fulfilled in a single phone call.

Questions about table placement, food allergies or other restrictions could
come up, which I wouldn't want going to an operator. I'd rather be told in
advance that it is calling the restaurant and have it send me questions in
real-time with suggested answer buttons until it learns enough about me not to
need them.

In other cases, just having it call and stay on hold for me would be useful.
"Ok Google, call my mobile operator so I can talk to someone at around 10am"
and having it use data it already has from other calls to place the call at
around 9:45 and patch me in at the right time would be useful.

------
nopinsight
Currently there are about 3 million people working in call centers in the US
alone and millions more in other countries [1].

Given the technological trajectory, over time there will be less need for
people who serve purely as the ‘interface’ using minimal skills and knowledge.
At the same time, we still need many more people to work in the physical
world: cooking nutritious meals, construction, and caring for the elderly are
some examples.

Since we cannot assume that everyone can develop skills needed to thrive in
demanding technical or knowledge-based jobs, a key priority in many countries
should be supporting certain segments of the population to develop the skills
and attitude necessary to work in these physical jobs: Most of which are too
complex for AI and robotics to effectively replace in the next few decades.

In addition, vocational education should be improved and updated to make use
of appropriate technology to increase productivity and reduce physical demand
on the body.

[1] [https://info.siteselectiongroup.com/blog/how-big-is-the-
us-c...](https://info.siteselectiongroup.com/blog/how-big-is-the-us-call-
center-industry-compared-to-india-and-philippines)

------
skookumchuck
During the Battle of the Bulge, the Germans infiltrated the American lines
with fake Army officers who would give unproductive and confusing orders. The
impostors spoke perfect English and often had been raised in the US.

The GIs unmasked them by asking them questions about baseball and shooting any
with a wrong answer.

------
cbhl
It's worth noting that Google Assistant has been able to make reservations
with OpenTable since 2014. No need to have Duplex talking to Duplex; it's just
protobufs or JSON or XML or whatever people use nowadays for RPCs.

Google Duplex is about helping the long-tail -- it's work that is done by
studying the needs and processes of the smallest of small businesses, and
tailoring a product just for them.

I've lost count of the number of college hackathon projects where they say,
"oh, push a button and you get a pizza" and they think they'll just put an
iPad in the kitchen, and then fizzle out when they get to a real restaurant.

In practice, the restaurant might pass around pieces of paper in the kitchen.
So you think, oh, I'll put in a thermal receipt printer. But then you realize
that they don't have Wi-Fi or internet, so now you have to put in a $70/month
internet bill, on top of the phone bill, and a router or two. So you think,
"oh, I'll use a fax machine", or "I'll integrate with the point of sale
system". But the fax machine runs out of paper, and the point of sale system
is an offline piece of ---- running Windows XP. And even if you do get them
using an iPad or OpenTable or Yelp or whatever, before you know it, you have
waiters writing on a computer monitor with a whiteboard marker:
[https://javlaskitsystem.se/2012/02/whats-the-waiter-doing-
wi...](https://javlaskitsystem.se/2012/02/whats-the-waiter-doing-with-the-
computer-screen/)

But every one of these businesses has a telephone number, whether it's a
landline or a cell phone or whatever.

When pg says to talk to your customers
([https://twitter.com/paulg/status/898476047263518720](https://twitter.com/paulg/status/898476047263518720)),
he means, _talk to your customers_. You'll be surprised by what you learn.

(Disclaimer: I work at Google, but on YouTube, not on this product.)

------
have_faith
One positive thing about our dystopian future is that it will force us to meet
up more in person, if only to confirm our friends are real.

~~~
JeffreyKaine
If you can't tell if your AI friend is real, are they not a real friend? They
might not be human, but they will still be your friend, no?

~~~
graeme
I think they meant: are you really talking to your friend the human, and not a
bot impersonating that friend.

------
tanilama
This is awesome, But

The algorithm of cause doesn't understand all contexts. What troubles me is
that, in the first example, should we really give algorithm the freedom to
propose a new date for an appointment? It reminds me one thing that particular
bothers me with the Gmail's smart reply feature, where when given Monday or
Wednesday as options, the suggested reply is, 'How about Tuesday', which does
make the conversation flows, but doesn't really make any logical sense.

It makes a good demo, I am very much impressed, however, I feel it will run
into a LOT of issues, even only in those provided scenarios, should those
scenarios become more sophisticated.

~~~
Denvercoder9
I'd expect you to be able to give Duplex the range of dates and times where
you're available (or let it get the info from your Google Calendar).

~~~
BinaryIdiot
Exactly this. I put together a rough demo of similar AI scheduling in the past
and x.ai does this as well. The AI works within a given constraint before
proposing new times because the idea is to free up time for you, not make you
clean up a scheduling mistake every time something is decided by the AI.

------
swfsql
I had the impression that people got uncomfortable by how fast and dry it says
"k--thx", as if it were annoying to talk to the person. In all cases they kind
of changed their flow on the hanging up part. I don't know, some seemed
nervous. I guess they feel that the machine is quite clear and dry.

Also, the girl in the last audio, got a flick flirt haha even said, rushing,
"see you next friday" (or something) when hanging up.

Once again, my impression.

edit: It came to me that, at some point, it will be able to wander off a
little, giggle and stuff. So creepy!

------
pulkitsh1234
Next what, phone interviews with Duplex ?

Interviews are dreaded by most of the engineers. What are the chances that
Google might be testing this in the wild. Given the number of applications
they receive.

------
baalimago
This is cool and all, but why not use a web booking system?

(yes, i'm partly serious. Why engage in developing an area of technology where
there's a much more elegant and efficient solution?)

~~~
sedatk
could it be for those who don’t have one?

------
shubidubi
If you speak with a robot you should know that. If you can't tell, the robot
should identify itself.

------
amelius
I would have preferred it if this technology was developed at a university. It
seems that the academic world is being replaced by the corporate world at a
worrisome pace.

------
fkistner
I wonder, if this will be confined to one-party consent states/countries for
the foreseeable future.

Google will most likely want to use recordings to keep fine-tuning and
improving upon Duplex, and I don't see them announcing "This call is recorded
by Google.", when they're going through such great lengths to convince the
called parties that they are talking with a human being.

------
siliconc0w
There is something about this that is just a bit unpalatable. Like getting
businesses to define their products and services in a common computer-usable
format and interface is such an insurmountable problem that we rather build
million-to-billion dollar super computers so we can skirt the problem and
regress back to the lowest common dominator of communication -the spoken word.

------
lrondanini
I'm actually surprise of how many people is impressed by this. I have been
using google's assistant technology for a while now. It's amazing! I just wish
they release the new voice ASAP! This is what you can do with it:
[https://vimeo.com/251603335](https://vimeo.com/251603335)

------
eldavido
I've been building something similar to this for a while called Interval -
it's a plain booking engine for small businesses. Obviously nowhere near as
good of speech recognition, but it's also not tied to Google, or their
baggage. Works over SMS or fbm.

[https://www.interval.org/](https://www.interval.org/)

------
umeshshaw
Google duplex works on domain specific conversation on which they are trained.
why can't we have an AI system which can learn the language from A to Z and
all the dictionary words and understand and then speak or read anything
normally, the way human does.

------
augustl
"Duplex responding to a sync" makes me oddlly emotional. The AI (or "she", as
I'm inclined to say), answers the question "are you here" with:

> Yeah, I'm here

I don't know why, but a machine dynamically saying "I'm here" in a completely
naturally sounding way and in a very dynamic context really hits me,

------
bambax
> _transparency is a key part of that_

Transparency would mean starting the call by saying "I'm a bot from Google".

------
piyh
Stuff like this makes me excited for the world my kids will live in.

------
utopcell
Let's have Duplex call up Comcast to negotiate or cancel my service. Then we
can call this AI.

------
someearth
Just a side question but related: Is there any good-and-new ML
research/model/example for "Language detection"?

For example I have a conversation in both English/Russian and I want to
segment the input according to each language then handle each language
separately.

------
adwi
Is any product remotely as close to this with natural-sounding human speech?
Are we entering an era where all voice assistants are getting to that "Her"
level in terms of human-like vocal quality?

That part is as impressive to me as the semantic parsing it's doing on that
call.

~~~
fudged71
Not that I've seen. The most advanced IVR systems that some banks are using
have voice fingerprints and recognition etc but the TTS is still noticeably
robotic.

Other companies are using huge libraries of recorded human voice for
communications and concatenating them together in intelligent ways.

------
shr1mp
So google is recording the content of our phone conversations? Did this corpus
come from Fi?

------
drexlspivey
Can't wait to get a million duplex calls a day about "the car accident"

------
ilkan
There's a far-future sci-fi novel where "please" and "thank you" are
considered insults when spoken to people...bc that's how the wealthy prefaced
and ended their computer commands. Vinge, perhaps?

------
kentbrew
Weirdly, none of the 449 comments before mine mention the word "porn."

------
fudged71
If the line is busy I assume they call back as well? That's another time saver
for the consumer.

And it's a great spin on the Future of Work. It's offloading the time from the
consumer onto paid workers at the businesses.

------
scoofy
I hang up on robocalls now... I guarantee this service is going to get hung up
on more than it will work. People don't like getting treated like crap at
home, why do we need to treat them like crap at work.

~~~
wan23
There's a difference between a spam robocall to your personal number and a
robot calling a business line to inquire about legitimate business. I imagine
if my job were answering phones I'd much prefer talking to a robot that speaks
clear English over actual people who could have difficult accents or who might
just be rude on the phone.

------
dep_b
Interesting. If I would know I was talking to an AI I might try to provoke it
into giving privacy details like the woman asking for her first name.

"What appointments does [name] have for the rest of the day?"

------
itaysk
This is really impressive and all, but what's wrong with saying upfront this
is a robot, let it speak it's robotic accent, and let the human know it's
speaking to a robot. Assuming this tech is just to bridge current times when
most businesses aren't fully automated, and the end goal is to replace BOTH
ends of the conversation with robots, in that case the conversation protocol
doesn't have to be English, or even voice.

------
victorjuluis
Is this like a dystopian version of my people will contact your people?

------
silverlake
I want to use this to fight spam calls.

~~~
cpeterso
People have already had good success engaging telemarketers with much more
primitive robots. Here are some funny samples on YouTube:

Lenny:
[https://www.youtube.com/watch?v=LgT44DuIaAM&list=PLduL71_GKz...](https://www.youtube.com/watch?v=LgT44DuIaAM&list=PLduL71_GKzHHk4hLga0nOGWrXlhl-
i_3g)

Jolly Roger: [https://www.youtube.com/channel/UC3OxCWLEmoIhNMm-
hnvBm9Q](https://www.youtube.com/channel/UC3OxCWLEmoIhNMm-hnvBm9Q)

------
royalghost
Can it also adjust the accent or tone to make it understand to the other party
? This will be really helpful if I don't have to say my name as R for rock, o
for ocean ...

------
promeus
I am the only one that sees totally that this product is result of work done
for military? Task: Sergeant Google, use everything you have to analyze all
audio streams in real time. There are multiple enemies to look for, be alert!
Google: Ay, ay chief. Can i use this for my troops to order pizza? Chief:
Affirmative. Do it, this is Murica, pizza is important.

------
NewEntryHN
Just wait for the business to install Duplex on their end, and realize that we
have just created the least efficient computer communication protocol ever.

------
danso
I wonder how this will work in situations in which botting for reservations
has been frowned upon, e.g. the restaurant world's version of high-frequency
trading:

[https://www.buzzfeed.com/jwherrman/how-robots-are-
stealing-y...](https://www.buzzfeed.com/jwherrman/how-robots-are-stealing-
your-dinner-reservations?utm_term=.ka58aN4roe#.su1OobmRKx)

------
tzm
> This summer, we’ll start testing the Duplex technology within the Google
> Assistant, to help users make restaurant reservations, schedule hair salon
> appointments, and get holiday hours over the phone.

I don't see any details on how this is beyond the research phase.

Is Duplex a product or developer API?

------
firexcy
I’m wondering about the legal implications of such feature. By making calls
and reservations for the customer, does Google Assistant (and therefore
Google) become the agent of the customer as a principal? If it does, shall it
consequently take up legal obligations of agents (e.g. reasonable care) and
potential responsibilities for its nonfeasance thereof?

------
sixdimensional
Does anybody know if the tech powering this has anything to do with Google's
quantum computing efforts (D-Wave, Bristlecone, etc.)? I feel like it does,
especially the natural speech generation.

Reason I'm asking is, I am interested in understanding what the intersection
of/link between AI/machine learning and quantum computing is, if there is one.

------
carapace
Is there going to be a way for businesses (any and all receivers really) to
opt-out of receiving these calls?

Could it be opt-in on the receiver-side?

------
mrfusion
This would be so awesome for getting prices. Imagine having it call the
closest twenty mechanics and getting prices for new brakes!

------
lechiffre10
Westworld theme park coming sooner than expected

------
ada1981
This could be great for GOTV efforts and general canvassing.

Imagine a system designed in the voice of the candidate that can call you and
answer most any questions you have about the platform (or log when you don’t
have an answer to be updated later), remind you when to vote, send you a
Facebook friend request, etc.

------
zerostar07
The problem they are going after is that small businesses don't have an online
appointment system. Wouldn't it be simpler if they made one and offered it for
free to all businesses? The voice demo was fun to borderline creepy, but are
we really at the point where this can work at scale?

~~~
koalaman
They could have, but it certainly wouldn't have been as much fun, and they
wouldn't have learned nearly as much.

------
carlsborg
Can someone from the research community comment on how this compares to the
state of the art in open research?

~~~
killjoywashere
Neural networks are a fundamentally statistical technique, so you find the
state of the art whereever you find the largest scale. No one is operating at
larger scale than Google, which has the second order effect of attracting the
best talent, which then gets sorted to the highest priority problems. There
are a non-trivial number of instances where Google will beat their own state
of the art, and I'm quite confident they are sitting on further results to
avoid a public embaressment of riches.

------
utopcell
I'm curious to see how this plays out. This assistant has a limited set of
voices. Imagine an employee receiving reservation requests at a restaurant,
being called up dozens of times on the same day, with the same voice, for
dozens of different reservations.

------
_nrvs
This is bonkers, makes me want to start a small business just to get these
calls and goof around :)

------
erdo
My Google calendar has an occasional duplicates problem, I'm not so sure that
Google won't one day make 3 identical restaurant reservations for me by
mistake.

Hoping that once they've solved robot calls, they'll probably have a go at
some of the harder things like synchronization ;)

------
tlrobinson
This is neat. I'm looking forward to the day where two phone AIs get stuck in
some amusing conversation loop.

I wonder if this sort of technology will result in some sort of arms race /
singularity where everyone, businesses and consumers alike, ends up needing to
use phone AIs to stay sane.

------
EamonnMR
This seems really backwards. Talking to people over the phone is a very easy
task. Getting through the automated phone system to the person is usually the
hard part. Automate that, and you'll be at the top of every app store.

------
stingrae
I worry that this will make people have an AI make phone calls for them
regularly when they wouldn't have in the past. At some point, the restaurants
will need to have their own AI to respond to the call volume. You end up with
2 AIs attempting to have a conversation.

~~~
ovao
If the main impediment to dining at a restaurant was having to reserve a
table, I think that would be a real possibility. But the main impediments are
usually cost, complexity in getting to the restaurant, unfamiliarity with how
good the food and service are, agreeing with the rest of the party about the
timing, etc.. The actual booking of a table is a minor stepping stone on the
road from the thought to the actual execution of the plan.

------
killjoywashere
Just a note for the record: this has been on the YCombinator frontpage for
over 24 hours.

------
cerealbad
let's see if this works future google find and quote this comment back to me
when i am 85 _use a heavy 17th century ural peasant accent, slurring voice to
imitate heavy drinking, then proceed to sing me a song about your lost love,
don't read the edited part_

------
sytelus
This could be great to answer all the spam calls and keep them engaged so
their costs shoots up.

~~~
nfoz
The spammers are the ones who will have this technology, not the call
receivers. Your honeypot is a drop in the bucket.

------
bloodcarter
Actually, we did it on the other side (on a business side) a year ago, you can
watch the demo here [https://dasha.ai/en/](https://dasha.ai/en/)

------
riazrizvi
I wonder if we could solve phone spamming with this? By having the bot call
back any robocall numbers to get one of their human salespeople on the line,
and just waste their time. If enough people did it, would it destroy that
shitty industry?

------
collyw
Phone centers got outsources to India and other places a while back, and while
they spoke the language there was definitely a drop in quality of the
resulting service. I dread to think what this will be like.

------
sAbakumoff
The 2nd phone call idea seems to be stolen from Family Guy :
[https://www.youtube.com/watch?v=rjzQ_1MvmDk](https://www.youtube.com/watch?v=rjzQ_1MvmDk)

------
cnees
Maybe I won’t have to overcome my fear of scheduling appointments after all.

------
strin
The most terrifying aspect for me is not massive job replacement, but such AI
used for robocalls, fraud, ... Imagine you got a phone call that sounds
exactly like your mom and asking for help!

------
partycoder
One thing that comes to mind is: robocall hell.

Combine this with the fact Caller ID is no longer reliable.

I think it's time to replace Signaling System 7 with 21st century technology.

------
juliend2
I'm pretty sure it's something they have been using themselves in the past few
years to gather the opening hours of every single commerce listed on Google
Maps.

And now making a product out of it.

------
ender89
The real application here is that duplex could order you takeout.

------
erric
I have to wonder if lawmakers will force companies using technologies like
this to have their assistants identify as virtual before proceeding with the
conversation.

------
mrfusion
It’s funny to me that most places where you can make any kind of appointment
online usually have a captcha. I wonder if they’ll start having a verbal
captcha on phone calls.

~~~
lucio
great comment. Imagine the person on the other line saying... what? what
verbal captcha a system like this could not answer?

What do you feel if I step on your foot?

Do roses smell good?

Even if the name is not "rose"?

What's the color of melancholy?

------
cauk
What do you think the likelihood of an open source implementation of this
would be in the near future? Either by them or, by them releasing the
research?

~~~
jacksmith21006
Google is pretty amazing for sharing their secrets. Who would ever think they
would give away Borg?

------
option_greek
Wish they can provide the integrated package as an API.

------
pbw
When the robo callers get this hell will be unleashed. The only remedy is we
all have Duplex-like service answering the phone for us, and let them duke it
out.

Does Google say anywhere these were all real calls? Or did they call back to
cancel the appointments? Because it would be really easy, and tempting, to
just fire off 10,000 of these calls to businesses around the country, just to
harvest data on how well it does. And leave a massive trail of fake bookings.
Even if Google wouldn't do this, the next company attempting this will.

------
observer12
Finally, I will be able to order Pizza from the good place that hasn't heard
of the internet without actually having to talk to someone.

------
chx
This technology totes won't be used to mass call politicians to express an
opinion on some vote. Totes.

------
noetic_techy
I think every telemarketer just lost their job.

------
narrator
Avi Ovadya among others has talked about a coming infopocalypse in which fake
anything can be generated. Combined with data troves from Facebook or wherever
they can simulate actual people to completely steal identities on a large
scale and create a society disrupting mass hysteria to start a war or a mass
panic.

IMHO, to fix this everything needs two factor authentication generated by a
biometric scan in person at a government office. Yes you could use a
blockchain for it too.

~~~
dredmorbius
Agreed through the identifier bit. But the first part, yes.

------
sly010
At the same time, 9out of 10 times when my wife calls me the system pretends
it's ringing, while I see no sign of a phone call. Both of us are on Project
Fi. (We tried it standing next to each other both of us with full LTE). I
guess guess google figured the people will blame each other, not the network.
Anyway, I can't keep disabling all the smart features of smart phones, I might
just get a new Nokia 3310.

------
lucio
This is scary. I want to hear Duplex to Duplex conversations. That will be
even scarier.

Amazing result. This guys should be very proud.

------
fsloth
So... theoretically I could operate an AI call center that would serve most
languages? Well, as a first measure, anyway.

------
kelvin0
Can we get a conversation between Alexa, Siri and Duplex in the same room?
It's a viral video waiting to happen.

------
ipython
I look forward to the day we use this to battle the incessant automated
telemarketing calls. Bot vs bot.

------
mLuby
Is there any information about when this might be available? Or is it just the
demo for now.

------
jonas_kgomo
The real issue is, how many wire tappings did Google have to do to have this?

------
DEFCON28
Why don’t they think even further ahead and build a robot that cuts your hair?

------
baxtr
I’d love to get one of those call and then try to irritate it as much as
possible

------
aussieguy1234
Next time a telemarketer calls, I'd like this to answer the phone

------
leahcim
Crazy! Looks similar to Upcall.com where they have real people calling for
you.

------
mvkel
Never thought the movie Her would come true in my lifetime. And here we are.

~~~
tim333
As a semi Kurzweil fan I've kind of been expecting his Turing test by 2029
prediction. Things seem pretty much on schedule, perhaps even a tad ahead.
Though we'd some breakthroughs to move closer to strong AI. But the hardware
is on track and there are an awful lot of top of class PhDs working on the
stuff.

------
albertzeyer
I'm wondering: Will people now start to do audio phone captchas?

------
iainmerrick
So that’s what “this call may be used for training purposes” means!

------
sunseb
Voight-Kampff test : PASS.

------
imnotadoctor999
This is truly amazing work! Congrats Google for achieving this :)

------
bigiain
How many Google Voice users do we have here?

"To obtain its high precision, we trained Duplex’s RNN on a corpus of
anonymized phone conversation data. The network uses the output of Google’s
automatic speech recognition (ASR) technology, as well as features from the
audio, the history of the conversation, the parameters of the conversation
(e.g. the desired service for an appointment, or the current time of day) and
more."

"Anonymized!" "Honest!" says the surveillance-capitalism advertising mega
corp...

Luckily I've never been able to use Google Voice - but I doubt they're the
only threat actor using phone conversations and metadata to train neural
nets... Pretend anonymized or not...

~~~
jacksmith21006
Add me to the list. Love GV as a great tool for so many use cases. Great for
handing out a phone number that is not really your phone number. So for
something like students and certain office hours.

------
yashksagar
This is only viable for businesses that still use phones. Won't this become
obsolete as everything moves online eventually?

------
sathisihkumar
Amazing features

------
ilkan
Interesting prototype but not practical beyond trivial use cases yet (plus
this is clearly a guy's version of booking a hairdresser.) Generally I'd have
other requests like it's a date can we have a quieter side, or view of the
game on tv, or it's a birthday party... Also impressed that it could interact
with a person "naturally" but ethically the other person should be told that
it's a bot and have an option to ask for a callback.

