Hacker News new | comments | show | ask | jobs | submit login
Google Duplex: An AI System for Accomplishing Real World Tasks Over the Phone (googleblog.com)
1875 points by ivank 7 months ago | hide | past | web | favorite | 750 comments



The people losing their marbles over this being some kind of Turing Test passing distopian stuff are missing the point at how limited this domain is.

People who answer phones to take bookings perform an extremely limited set of questions and responses, that’s why they can even be replaced by dumb voice response systems in many cases.

In these cases, the human being answering the phone is themselves acting like a bot following a repetitive script.

Duplex seems trained against this corpus. The end game would be for the business to run something like duplex on the other side, and you’d have duplex talking to duplex.

Most people working in hair salons or restaurants are very busy with customers and don’t want to handle these calls, so I think the reverse of this duplex system, a more natural voice booking system for small businesses would help the immensely free up their workers to focus on customers.


> The end game would be for the business to run something like duplex on the other side, and you’d have duplex talking to duplex.

And looking even further into the future, we can imagine a day when the computers forgo natural speech and use a better-suited form of communication. Some kind of sequence of ones and zeros transmitted directly across the wire.


Lol, but if you think about it, what stops businesses from doing this today?

It's the lack of a universal API.

If a barber shop wants to make it possible for a 3rd party app to book appointments then they have to release some API. But that's not the end of it. The 3rd party app has to first discover their Api, someone has to understand it and write code to use it, and then deploy that code.

This is a problem today because there is no universal Api that all services can use

With Duplex, verbals speech becomes a universal Api that every service can parse and communicate to each other wtih. Also, the discoverability is taken care of by using publicly cataloged phone numbers on services like Google Maps, Yelp, etc


May 2001: “XML: the universal language?” https://www.computerweekly.com/feature/XML-the-universal-lan...

I recall a Wired article from the same era. “XML means your doctor’s system can just talk to the hospital system even though they’re different!”

Hasn’t happened yet... will it? Can it?


> Hasn’t happened yet... will it? Can it?

Nope. XML (or Json, etc.) are just "human-readable" presentation of data. It does not provides any semantic whatsoever.

So you need some semantic on top of these data. And a general-purpose, universal API is yet to be invented (hint: it is probably not feasible)


Microsoft and others enterprise selling vendors loved the end goal back in early 2000s - the universal API solved by middleware. That's why you had Biztalk and Biztalk consultants that made more than SAP consultants (think todays crazy Salesfarce consultants that compete for gamification badges). For example you could be a small insurance company submitting to a larger underwriter, and when you work out the transactions per month you have to take $5 off each app just to pay for biztalk infrastructure and licensing. People rode that gravy train hard. I'd be surprise if any of the biztalk shit still remained though, grand goals means juice enterprise sales. Oracle had a similarly crap product that was equally slow, painful and verbose, can't recall the name. XML and it's lofty goals beyond what it was can be compared to today's ICO toxic industry, no reflection on XML itself though.


In the 80's, it was EDI -- electronic data interchange, a set of schemes for sending binary formatted business data, like invoices and POs.


Don't forget HL7


The "badges" are not won in competions, but instead watered down tutorial gold stars. Instead of targeting real programmers Salesforce built "trailhead" for John in finance who wrote some Excel macro and decided he should become a Salesforce "developer". Thats why the small percentage of us consultants who actually come from a cs background can charge so much.


I think I agree with your sentiment, but I'd think you'd be surprised how much a click next salesfarce developer charges...


Of course it's feasible. It's called English!


Quoi? ("...fetchez la vache!!!")


I read a great article about XML that spelled out that XML isn't a "language" or protocol, it's an "alphabet". It gives you the building blocks but what you build with it but it doesn't translate from one language to another. (I guess that's what XSLT is meant to do but it's still not magic.)


JSON? And no doubt in another ten years, something else!


Luckily, the English language never changes. Wait.


But compared to the rate of change in technology, it's practically written in stone!


Much of the "semantic web" work was directed at similar things. The reason we don't have it already after 20 years of web commerce is that it's an adversarial process. The business wants to mislead, upsell, or discourage a customer from asking for support; and likewise some small fraction of customers are looking to exploit the business.

Bots that automate UI tend to get banned.


English, an API with over 1.5 billion clients in the wild


If you think about it, conversation is just a loosely defined API with an extremely high degree of tolerance for poorly formed input.


I mean we are all just emotional processing silos that trade information and process it right :) ?


That’s a very Wittgensteinian view.


This thread now contains a loop. Granted, it's a really long loop, all the way back to 1956, but it's a loop all right: you've essentially summarized the Dartmouth Conference and its assumption "gee, that ought be just a minor research subject - easy, right?"

https://en.wikipedia.org/wiki/History_of_artificial_intellig...


If it was that straightforward, we’d already have bots that pass the Turing Test.

Whatever natural language is, it’s not an API. Might have some overlap, but it’s different.


Natural language is not an API. But a business can assume (or weed out quickly) that caller is calling to hit a specific API endpoint. And as such their language will eventually lead to one of those endpoints.

And similarly, a caller can assume the business they are calling is trying to lead them to an endpoint.

In both cases, a set of assumptions lets natural language act as an API, even if neither end could pass a Turing test with someone who wasn't interested in any of the endpoints.

edit: I read your point about language changing, and that is true. But if we only have machines using the language with each other (no more training with humans), we can also assume the language won't change.


Nobody said it's easy to implement :)

I'm curious - what differences do you have in mind?


APIs map to a single canonical concept. A "class" in Java actually has a correct definition. There are a nontrivial number of philosophers who believe that this isn't true for language.

To simplify that and put it in more technical terms, an API is perscriptive and a language is descriptive.

If a bunch of coders decide to start capitalizing "Class" their code won't compile. If enough people start using the word "aint" it becomes a word, regardless of what the dictionary says (see "irregardless"). There is no single authority that can decide what is and isn't a canonical definition.

This is why spoken languages evolve so much. Even languages where we've explicitly tried to go the opposite direction (like Esperanto) have evolved into multiple dialects, where subsets of the community simply ignore the standards and still communicate with each other just fine.

Note that this is the opposite of what you want with a federated, universal API. The whole point of an API is to standardize between unfamiliar devices. Language is actually pretty bad at standardizing communication between unfamiliar people. Even in the US, different regions and communities use different euphemisms, terms, and definitions.


You're right that the language is not the API, it is the medium. The "appointment booking API" exists only in our heads, it is a set of mutually understood conventions for interpreting a subset of language in order to create a record of the appointment somewhere.

If you call up a hair salon and start reciting poetry you'll get an error response in the same way as you would if you had sent malformed JSON to an endpoint. If you stick to the expected script you'll achieve success almost all of the time.

I wouldn't be surprised if a majority of human communication works this way, especially when it involves individuals who do not know each other. We have agreed upon limits, key phrases and words, and expected responses that allow most of the unpredictable stuff to be ruled out. All of that favours automation.


> We have agreed upon limits, key phrases and words, and expected responses that allow most of the unpredictable stuff to be ruled out. All of that favours automation.

I'm not sure I'd disagree, but it seems you're just describing a domain specific language in a more roundabout way. We have a ton of protocols that introduce a set of limits, key phrases, words, and responses. Java, Network protocols, XML, JSON, etc...

Assuming that you're correct, does it make sense to then assume that it'll be an improvement to standardize English rather than a set of IP headers? The agreed upon standards in English are (for the most part) informal, evolve constantly, and are hard to teach to computers. Automation favors predictability, and even the most generous interpretation of a natural language leaves me feeling like it's a step backwards.

We're going to standardize on an appointment booking API that exists only in our heads, that can only be taught using ML, and that is guaranteed to change over time in unpredictable ways? That seems wrong to me.


Just to add a fun example to demonstrate your point, the original name for Esperanto wasn’t “Esperanto”, it was (translated) “the international language”. It was given the nickname “Esperanto” after the chosen name of the creator, and the word itself it supposed to mean “one who hopes”.


Pedantry ahead: the various implementations of (J(ava)?|ECMA)Script tend to disagree with the point on pREscription and non-divergence (yay 20 MB of stubs and libraries for sort-of-coherent behavior). Moreover, the example of "Class" is just a pre-compile check convention, the compiler doesn't care.

(Postel w/r/t API - Does that make any sense? Or is that a fancy way to say DWIM?)


:) I don't think that's pedantic, it's a really good point to bring up.

On the web side of things, the W3C often describes their role as being partially descriptive.

From their doc on the Web of Things[0]: "The Web of Things is descriptive, not prescriptive, and so is generally designed to support the security models and mechanisms of the systems it describes, not introduce new ones... while we provide examples and recommendations based on the best available practices in the industry, this document contains informative statements only."

This is exactly for the reason you mention - if browsers collectively decide to go in a different direction, what the W3C says doesn't matter. The web standard is what the browsers do.

However, two things to keep in mind:

Even where browsers are concerned, there is still an API and a canonical version of "correct" for each browser. What we're trying to do is get those APIs to be compatible and consistent with each other.

Many people believe that language even on an individual level doesn't directly map to an actual reality; in web standards that would be like the browsers themselves not having their own consistent API.

But assume those people are wrong for a sec. Let's assume that language is just a standardization problem between different communities and individuals. Well, the W3C should teach us that even in the realm of computing, standardization is really stinking hard.

So even in that scenario, we have to ask whether standardization becomes easier or harder when every single individual in a community has the ability to change norms or introduce more language. We can't even get 3-4 browser manufacturers to agree on a single API, now imagine if every single hair salon owner could increase divergence whenever they wanted just by answering phone calls differently.

[0]: https://www.w3.org/TR/wot-security/


True, many people know english, but that doesn't mean they know statistics or some other complex domain. "Show me a k-means cluster of this dataset", is likely to be parsed but not understood by many english speakers.


Fortunately, Duplex's purpose is booking salon appointments, not to provide a conversation partner about statistical theory.


I was attempting to communicate to the parent and grandparent that transcribing/parsing english and communication are different. We both need to share the model being spoken of to transfer or communicate the knowledge.


yep, english is the protocol (like http) and the domain in question (for example booking an appointment) defines the (very loose) api


My god no one in this thread has a clue about the complexities of natural language. English is not a protocol like HTTP, it's a natural language.


I think you may be confused. I suspect everyone in the thread knows that English is a natural language.


Optional functionality :P


word


In a sense you are correct, but remember that Duplex only works because it is limited to a very strict well specified domain: scheduling for restaraunts and hair salons. That is, Duplex only works because there already is a de facto specification for these transactions. It requires a lot less effort overall to just formally specify this de facto standard and deploy it once and for all, but of course a top down approach rarely works. This actually reminds me of the failure of the industry to adopt semantic web technologies. Many organizations are providing the same services and could easily adopt a common “API.”


> Many organizations are providing the same services and could easily adopt a common “API.”

It's not in the interest of those organizations to adopt a common API. Everyone wants to suck in data and be the platform; nobody wants to give data away.


But that's the reason why Duplex is interesting, and why there's a grain of truth in what ZainRiz is saying:

Humans have settled on a de facto API for scheduling appointments. It uses the telephone as its interface, speech as its medium, and Duplex is exploiting it.


...but Google is one of the giants that could pull that off and become The Platform.


You first need "ye old barber shop" and "big corp barber shop" to all agree on what this common API should be. That's the old standards proliferation problem https://xkcd.com/927/

Which is what makes it very hard to define a new common API

However, they all already agree on the standard for natural language communication (in the context of a strict, well defined domain). That's the pre-existing common API which Duplex is using


It's basically what 'pjc50 said - it's not in the interest of businesses to expose their data via a common API.

WRT. using English as universal API, I think this is just dumb. You solve exactly zero problems by going that route, because the actual problems to solve (beyond no incentive for businesses to care) are exactly the same as you have with XML APIs, or any other APIs. The problems of discoverability and machine understanding is something the Semantic Web space has been dealing with for quite a while, and other people before that. Adding natural language to the mix only makes the job significantly more difficult, because you now have to deal with natural language parsing/understanding.


> It's the lack of a universal API.

This sort of thing is exactly why the healthcare industry still uses faxes, even going electronic charts -> pdf -> fax -> pdf -> electronic charts in some cases.


This is even more fun because in the modern age, what very often ends up happening is:

electronic charts -> pdf -> fax -> fax machine as a service -> unsecured email -> pdf -> electronic charts

Compliance can sometimes help, but ultimately the data needs to flow, and people will do whatever it takes to make that happen. Until security is so easy that it's the default, these little loopholes will continue to be abused.


Phaxio co-founder here. We do a _ton_ of heathcare faxing and we're starting to see a shift away from the "unsecure email" in applications. Granted, we can't see what our users are doing at all times but being HIPAA compliant ourselves, we often work with our users to understand their systems and guide them towards compliance.

>> Until security is so easy that it's the default, these little loopholes will continue to be abused.

The simple way to think about this is that the government is more worried about unsecure email/email spoofing than it is about wiretapping.


To be fair, you’ll notice if 150 million faxes start going off rather than someone breaking abusing your API.


Healthcare uses faxes mostly because HIPAA rules particular to format and security of electronic communications don't apply to faxes; it's a compliance hack.


That sounds a bit too juicy to be true. Any citations?


I've literally been in the room when legal and compliance offices gave the advice on both the construction of the relevant regulations and industry practices on which a payer relied on in deciding to use a process that created paper documents then faxed them for certain purposes, but, no, there's nothing published I can link to as to that being the reason industry players make that decision.

I can, however, point you to the relevant section of HIPAA regulations on which it rests, the definition of “electronic media” at 45 CFR § 160.103, specifically this bit: “Certain transmissions, including of paper, via facsimile, and of voice, via telephone, are not considered to be transmissions via electronic media if the information being exchanged did not exist in electronic form immediately before the transmission.”


And in a way it is justified, you need a warrant to wiretap a phone line but no such constraint on eavesdropping on TCP/IP communication.


> if the information being exchanged did not exist in electronic form immediately before the transmission

So you need to print them out before faxing? PDF->Fax wouldn't work with that definition.


The one or both speakers could use a handshake noise at the start of the call to tell the receiver that it's capable of "speaking" a modem protocol. It might change a little every time, or be of an especially low or high frequency so that a person doesn't realize they're talking with a computer. After handshaking, the receiver could send a URL that would allow the channel to be upgraded to the Internet... or not. English is a good fallback if both people speak it and you can't find a more efficient channel.


Revenge of Cap'n'Crunch :D


Now that you mention it, this super advanced AI project sounds like a failure of software standardization rather than a triumph of technology, lol.


> The 3rd party app has to first discover their API, someone has to understand it and write code to use it, and then deploy that code.

I think this would be a better integration point for AI. It could look at the fields and learn to fill them out automatically (name, age) and prompt the user for anything missing. Then instead of the barber shop needing a universal AI users just need their personal AI (or a script) to interact with the API.


Re businesses doing this today: I believe WeChat handles this very well in China.


It reminds me of the maybe apocryphal story how NASA invested time, money, and effort to develop a pen so astronauts could write in zero gravity, and the Soviets used pencils.


A great story, but you are correct about it being apocryphal: https://www.snopes.com/fact-check/the-write-stuff/


And one internet binge later, I am now the proud owner of a matte black Fisher 400B Space Bullet Space Pen. Cool story. (The real one, I mean.)


Sometimes the simple, old, and reliable tool isn't the best one for your environment.


Both used pencils initially, and both switched to this same pen.


Pencils have the disadvantage of putting out conductive graphite dust, which, in 0g, will float around for a very long time.


"Lol, but if you think about it, what stops businesses from doing this today?"

Systems like that are much more expensive than paying a receptionist?


That doesn't sound right. I think you'd pay something like $99 per month for a SaaS product that manages bookings and provides an API. That's how much the average receptionist earns in one day ($12.25 per hour.)


Things that you appear to be missing from your post (but probably totally already know and are just not bothering to mention):

* Most businesses already have a receptionist.

* Most receptionists do not spend measurable fractions of their day answering phonecalls asking when the business is open.

* Taking bookings is really also not the majority of their day.

* Receptionists are capable of a bunch of things that your SaaS booking program is not. Like ordering catering and picking up office supplies.

* A SaaS booking program that is looked over by a human doesn't have to have AI-systems, because they just have I-systems. A human receptionist.

* The inevitable job-post catchall "Other duties as required."

* I had a receptionist bring me a beer once while I was waiting and I'm pretty sure none of your SaaS solutions will do that.


And $20,000 to get someone who knows how to install such a system to do the job.


What is installed when using a SaaS solution?


Well there is initial setup at the very least. Hooking up whatever landline the client has with the SaaS solution.

Then whenever a significant change is required, you need to call back your "expert". New location? xk$, etc.

But I think the biggest concern is that suddently the owner does not understand how his reversation system works. He used to be able to call Joe and know what's going on...


Also, whatever the SaaS system is, I bet you it's significantly less efficient than a receptionist using paper and desktop software.

Web systems are almost never built for productivity.


It is customised against each business model. Some are same, some are different.

The issue isn't universal API, but universal data models which is probably impossible.


Someone needs to maintain that API; update it with the schedule of the stylists or update it for holiday hours, or remove availability for bookings done the old fashioned way.


"This is a problem today because there is no universal Api that all services can use" "It's the lack of a universal API."

I disagree with that. We already have universal APIS. Adopting a newly established Universal Api is far more painful and has slower adoption rate than using the existing-globally-reached one like a telephone. Google duplex like systems addresses a broader scope of computer verbal communication and it feels like a step in the right direction.


What's a universal API that everyone agrees on?

It's the old Standards Proliferation problem: https://xkcd.com/927/


If both sides were to use Duplex, it would already know to just send 1010 instead of verbal communications.

Also if it was unknown whether the opposing person was a bot, a bot could firstly send a common test according to some protocol to ask if the other one was bot by some kind of sound representing that. In which case both would start sending machine readable information to each-other.


Sounds a lot like a dialup handshake.


Or just hang up automatically.


If I remember correctly, CORBA was all about standardizing APIs. You'd have your distributed "CORBA Objects" whose methods anyone who knew the unique object ID could call. The idea was that there would be a "standard library" for each industry. So all the barbers would implement a standard library "Appointment Schedule" object, all the exchanges would implement a stdlib "Orderbook" object, etc.

CORBA would generate RPC stub objects for you in various OOP languages, and potentially automate discovery, so you could say, give me all an array of all the orderbooks of all the bitcoin exchanges, and ask each for the last price.


Lol imagine my phone talking with an automated customer service line. Two machines, talking to one another, not using any of the existing protocols. My phone would have a database of questions to ask, form it into an English sentence, run it through a text-to-speech, transmit this to the other phone. Their "phone" would run a speech-to-text, run NLP, match it with its own database, and do the whole thing again in the opposite direction.

This gives a whole new meaning to "all of UI/UX is basically prettifying database queries".


"all of UI/UX is basically prettifying database queries" - are you saying that generally speaking or is that a reference an article or articles?

Would love to read more about it if so-


Pretty sure this is just the idea behind MVC.


me too.


Potentially there would be a way for both sides to recognize that the other side is a machine and they could switch to binary communication.


Binary communication isn't useful by itself. You need a compatible protocol and API.

I can give you 010101010101101011111 to a machine all I want, if they don't know how it's formatted, it's useless.

Conversational English is a format.


> Conversational English is a format.

And as you said, format isn't enough. You need semantics.

If two applications know enough about the other side to know how to formulate their voice queries, they know at least enough to exchange those same queries as text, and skip the stupidly wasteful text->speech->text process.

(And if world wouldn't be so full of adversarial practices driving engineering stupidity, the developers would agree on an efficient binary format beforehand.)


The simplest "API" would be skipping the text-to-speech-to-text process, and send the text string in binary.


Well they could at least continue doing the same thing, at 100x the speed.


Over the phone? Voice recognition would fail even more. Google also mentioned some of the latency in answering is required by their processing.


Does this sound look familiar? ;)

https://www.youtube.com/watch?v=vvr9AMWEU-c

In other words, if M2M handshake works, switch away from voice.


A large part of introducing new tech is enabling the transition from existing tech. Maybe if we scrapped all old cars and allowed only autonomously vehicles we’d be be sleeping at 120 km/h next week. But some businesses still run on COBOL.


There are already apps/technologies that transmit information through audio at frequencies not audible to humans. It should be trivial to adapt this so that if two AI systems are interacting they can perform an "AI-handshake" in the audio at the start and then switch to a more efficient form of communication.


but audio telephony equipment generally assumes the signal is in the human vocal range, right?


Correct. There are several levels at which this applies:

Phone hardware (microphones, speakers) are only calibrated to detect 'useful' frequencies for human speech.

The sampling rate used by audio codecs tend to cut off _before_ the human ear's limits e.g. at 8kHz or 16kHz. They aren't even trying to reproduce everything the ear can detect; just human speech to decent quality.

Codecs are optimized to make human speech inteligible. The person listening to you on the phone isn't receiving a complete waveform for the recorded frequency range. The signal has been compressed to reduce the bandwidth required, where the goal isn't e.g. lossless compression; it's decent quality speech after decompression.

It's completely possible to play tones alongside speech that we won't notice, but in the general case, not tones that the human ear can't detect.


Google Duplex is just a backwards compatibility shim...


Backwards compatibility is not bad, quite the opposite. There are far too many dead data silos, abandoned for a new, shiny and incompatible protocol.


The person who invents a way to change the little green phone icon with a little human icon for when they want to talk to a human will be a zillionaire.

But that is the far future. Realistically, I just don't see this as feasible any time soon.


This doesn't even need to happen on the voice connection. A register of "does this number map to a known system" would be enough. Then it's just up to common APIs.


The issue here is interfacing with ancient telephone systems.

If the two bots were to slip in some subliminal beeps and boops to recognize each other; then they could change their speech to very quick binary communication.


You mean AT codes.. sounds familiar


You can't just drop compatibility. We will have A.I. trained voice systems that mimic natural speech just enough to be understood by duplex while compressing the exchange to a minimum. Data transfer will be measured in microwords per second. Future versions of duplex will of course detect this kind of compressed speech and reply in kind, falling back to normal speech only if the immediate response is similar to north american confusion.


...somewhere, Kevin Mitnick just smiled.

I like to think it was a smile of renewed relevance due to unbelievably poor technical decisions.


This really made me laugh. You are also totally right, we have better solutions to this problem already.


I can already hear the modem handshake playing in my head...


That form of communication exists. It's called REST.


you mean the entire tcp stack and network to network communication?


To save CPU/GPU/TPU there should be a high-frequency sound, as in people can’t hear, so the computers talking to each other and switch to a faster way to communicate. If this is included you also have way to detect if you are talking to a bot/duplex.


Doesn't most carriers heavily "compress" the sound, removing all sounds/frequencies that a human can't hear, etc? https://www.youtube.com/watch?v=w2A8q3XIhu0


Yes, but it could be very subtle and low bandwidth at first, and once both sides were convinced the other was a machine switch to a full speed screeching 56k modem [1].

Or just communicate "hey actually connect to this HTTP/XMPP/whatever address on the internet and we'll continue this from there"

1. Probably a bit slower, I've heard modern VoIP lines don't work well with traditional modems?


Damn that sounds even more dystopian, can you imaging it

"Hello how can I help you? - Hi, beep, I like to reserve a table? - Ok, beep, beep, on second - Mhm-mm beep, beep, sceech, 011000101010...."



Also a great story along very similar lines:

http://archive.today/txrAd

(Archive link because that blog now requires authorization to view for some reason.)


This made my day


This is basically what 56k modems do, isn't it?



The whole context of Google introducing this functionality was for the 60% of businesses that don't yet have an online presence at all.


Sure, but they should consider the future. That number will only get smaller, especially if Duplex or other services say "we'll handle all your phone and online bookings for you for $SMALL_FEE, and still forward other inquiries to your phone as before".


[smallprint]...and we'll abruptly shut it down in 18 months.[/smallprint]


So just put it in the hearable spectrum. Phones already make all kinds of sounds that no one under the age of 35 has any clue what they mean or why they are needed, and frankly they aren't.


Perfect use case for the endangered fax answering sound.


beep boop.


Yes, and lots of sounds that the human ear can hear but are not used to decode speech. Also the audio is frequently recoded as calls pass from infrastructure to infrastructure.

Good times !


You’re totally right.

However while this is useful to bootstrap a new technology rollout, 10 years on its just technical debt.

The amount of tech debt in the system behind credit cards is crazy, because originally charges where phoned in to the card issuer manually, and everything from then on - magstripe, chip & PIN, online only transactions, etc, has all been built on top, and the leaky abstractions show through in daily difficulties with the card system for end users, like lack of real-time balance (in some cases), lack of transaction metadata, etc.


On the other hand, the credit card system's backcompat does mean that you can still accept credit cards when the power's out. You just write down the number (or use an imprint machine) and let the customer go. And the semantics of credit mean that you can still make that charge even if an online transaction would have resulted in a decline—offline transactions are never declined, they just cause overdrafts.


I wonder if that resilience is worth the immense amount of infrastructure and engineering that is spent on maintenance of the technical debt. Does that maintenance drive up processing fees? I suspect it does, but not in any amount sufficient to explain the size of those fees.


Yeah, but around here, most places just say the system is down and require cash.


True! Although I think that's more of a byproduct, rather than something designed into the system at the moment, and I suspect we could do better with designing it. For example, I doubt many shops have those imprint machines any more.

I also think the tech debt is holding us back a long way. For example, why can't I see itemised receipts in my card statement? Paper receipts are on their way out, email receipts aren't linked to anything or structured data, but being able to see that I've spent $120 on shipping with Amazon in the last 12 months, so a Prime subscription would make sense, would be a great sort of financial tool to have. That isn't possible in the card network at the moment.


They imprint "machines" are still issued - but all* of them are just tossed away into storage, and never ever used (Training users on those? Pointless).


Instead of a high frequency tone, just watermark the background noise or the speech pattern. You could watermark the background static, the voice samples, or even the speech patterns. All you really need is something like 30 bits of data to identify a call as a Duplex call with very high probability, and I’m certain you can find a way to imprint that many bits into the frequency spectrum of your background noise.


I like this. So basically the old school modem sound, but in frequency that can't be heard. It would only take a fraction of a second to send out the feeler, and would not be noticed if a live human picked up. Could even detect a human and send the call over to a live representative without anyone noticing.


It doesn't have to be out of frequency (since that's probably filtered anyway), could be just a really quick burst handshake identifier which could encode an IP address to communicate over instead of a crummy phone line.

Duplex: <beep beep> (I'm available to chat)

Other bot: <boop boop> (Oh hai! Wanna get intimate?)

Duplex: <blaaaaaaaart> (Come find me on duplex://64.233.160.0)

<insert hack attacks and other nonsense here>


<hacker voice="1">I'm in.</hacker>


As an end user, picking up the phone to hear a beep is not pleasant. I'm likely as not to immediately hang up, as I've come to associate beeps at the start of calls with scammers.


What about if the caller makes no such sounds and the recipient makes the offer to handshake?

Anyways, this aspect is more amusing to just think about than anything else. That said, I really hope companies who produce these next-gen AI robo-callers actually have the courtesy of identifying themselves as such. I want to know if I am talking to a human or Duplex. Yes, I may hang up, but I feel uncomfortable being fooled into thinking I am talking to a human when I am not.


There's no reason why it can't be encoded as elevator music - you already hear it all the time, they might even throw in a looping "Thank you for calling. Your call is important to us" to keep you from freaking out.


Hence the idea of doing it in a frequency humans can't hear.


Hence the:

> (since that's probably filtered anyway)

Phone lines are optimized for frequencies humans can hear, though I'm guessing you could get enough bandwidth out of the edges to convince the other side you're a machine without bothering a human too much.


No, let's make it fun. "How can I help you?" "Get off the phone you damn bot!" "Why I oughta...<modem sounds>"


Did we just reinvent dialup?


Human-readable modem handshake.


I want to hear a Duplex voice giving verbal AT commands.


+++ATH0


Yes, it would be nice if in parallel Google came up with open machine-friendly protocols for each of the use-cases Duplex supported, with a clear migration path away (e.x. businesses started publishing the endpoints and protocols they supported alongside their phone number so you could skip the call completely)


Or a way to book via a website or app...


"Ba weep granna weep ninny bong"

It is the universal greeting for cybernetic organisms, after all.


maybe just cut the chase and do an API for reservation?


Why not just a pattern of umms and arrs that it already seems to add into output. Easier to detect for it and harder for a human to recognise.


The problem is that computers will now be interacting with people and people will become unsure of whether they are taking to a computer or a person. It will create a bewildering world full of mistrust. I would argue that there should be a law proclaiming that computers must identify themselves.


They're not unaware of this concern. From the article:

> The Google Duplex technology is built to sound natural, to make the conversation experience comfortable. It’s important to us that users and businesses have a good experience with this service, and transparency is a key part of that. We want to be clear about the intent of the call so businesses understand the context. We’ll be experimenting with the right approach over the coming months.


So they indicate that they are aware of the problem - and instead of doing a straight-forward "hey, I'm a bot", their suggested strategies are "being clear about the intent of the call" and "experiment with the right approach over the coming months"?

To me that quote sounds more like a polite way of saying they definitely won't reveal to callers that they are talking with a bot than them taking the concern seriously.

Some of the conversation examples on the blog page where they invent a sort of story for the caller ("I'm calling for a client") would fit that theory.


People are prejudiced against talking to bots because of how bad they are currently. I despise calling services that have voice recognition, often times it's easier to have an options menu.

If they can get a way of saying "I'm a bot" without people hanging up the calls, I'm all for it -- otherwise, "I'm calling for a client" or similar is the best for everyone involved (assuming everything works).

Businesses also need to have a way to report problems to Google, like if they are getting spammed by Duplex or want to opt out.


> People are prejudiced against talking to bots because of how bad they are currently

I'm prejudiced against talking to bots because they're bots. They don't have empathy, whereas from voice interaction I expect a human I can relate to and desire to help and be courteous with. It's a fundamentally different type of interaction and I will be annoyed anytime that one is confused for the other.


Well, bad news: all the (presumed) humans I've talked to on any scripted call are worse than bots: they do have empathy, but the script forbids them to use it. An out-and-out bot is free from this prison, at the very least.


How about something like "I'm an automated agent calling for a client" - correct, not misleading, but using terminology which isn't likely to be immediately disconnected _right now_.

Of course, if they screw it up, they'll burn that terminology too.


Yeah, something like "Hello, I'm Rogers' Assistant..." to fit current Google naming conventions would probably be fine.


From the samples I've heard, it's trying to sound natural. Hello Uncanny valley, long time no hear.


If we accept your premise that bots become so close to humans that they are virtually indistinguishable, then at that point does it matter who the "person" on the other end of the line is? I'd argue it doesn't because the outcome would be the same.


I think there will always be a difference. When I'm talking to someone, there's an emotional connection and responsiveness there. I'm trying to help a human out -- I'm putting effort into being polite, into considering their point of view etc.

If I found out that they were a robot (this is probably unpreventable; even if the technology gets amazing, surely there will be edge-case breakdowns/bugs/etc.), my trust is broken. That would have an emotional consequence e.g. frustration.

There will always be amazing technology wielded by awful developers, and in this case the outcome is emotionally hazardous. The impact of that is not easy to quantify e.g. by any economic indicators, but it's there.

Also, it's likely that robots will not be as polite back, so we're degrading society's trust and empathy all around. For example, Google's AI call to a restaurant was rude, and not for reasons it seems to yet understand.


What happens then if I'm a human that masquerades as a computer? Seems like a neat way of explaining away a number of social faux pas or drunk-dialling. "Me? Noooo... that must have been my PhoneBot 2000. Supposedly the next firmware update should solve that kind of problem."

That said, I don't necessarily disagree; there is going to need to be lots of these kind of issues that need sorting out before we reach a Culture-level of AI interaction.


Phoning like this you already have to wonder if the person on the end is an idiot who will screw stuff up. I'm not sure it maybe being a google bot will make much difference.


If you can't tell whether you're talking to a computer, does it matter?


Not sure why you're using future tense here.


I love how no matter how amazing something is, someone will eventually say it's trivial, even though it's taken the smartest people on earth decades to figure out how to do this.

>The end game would be for the business to run something like duplex on the other side, and you’d have duplex talking to duplex.

The end game is clearly to use an api, not this.


If you ever needed proof of the negativity in the tech community, this thread is it.

Google actually delivers a real world product incorporating the most advanced AI we've had the chance to experience, and half the HackerNews comments are "Wow, this is so dumb, can't wait for it to become technical debt in 10 years".


One time, a giant tech company staffed by geniuses released a super-useful tool that saw massive adoption. 10 years later that tool became technical debt.

Wait, it was way less than 10 years.

I'm talking about Google Realtime. Or reader? Or buzz?

No, wait, I'm talking about aggressive Twitter API deprecation/removal.

Wait, nevermind, I'm talking about Facebook.

You get the idea. What's revolutionary today sometimes becomes the substrate for future innovation. Sometimes it gets cast by the wayside, even in the face of significant "user" (developer) popularity.

That's not proof of negativity; just realism. Negativity would be "no new innovation will ever get traction". Optimism would be "all new technologies will change the world" (c.f. https://www.npmjs.com/browse/depended). This is neither.


Because idea of running machine2machine communication through Duplex, going binary -> text -> voice -> text -> binary, is just fucking dumb.

It's not proof of negativity in the tech community. If anything, it's a proof that tech community often can't pause and look if a particular idea makes engineering sense.

That's how we ended up with Electron.


> The end game is clearly to use an api, not this.

My understanding is API integration is what wechat is in china -- every hair salon and equivalent-of-corner-pizza shop has some wechat integration, payment and all.

Voice bots like this will have the advantage of ubiquity. At least a couple years ago before every resteraunt had 5 tablets for all their seamless/grubhub/chowhound/whatever apps, pretty much the only reason the fax machine was still around was for restaurant ordering. Although there were clearly better ways of doing it (see how Dominos reinvented itself as a tech company), the sheer ubiquity of fax as the lowest common denominator kept the tech around.

In that light, it's kinda like the cell-phones-leapfrogging-landlines-in-developing-countries argument... part of the wechat story involves a massive population entering the consumer class at a time when everything was digital. Call me out if this is a gross over-generalization, but in a way, the wechat population never had to deal with the backwards-compatibility of people growing up ordering a pizza over the phone.

It'll be interesting to see how the API-centric approach (wechat) plays out versus the lowest-common-denominator ubiquity approach (voicebots). I'd stop short of calling API's the end game though.


Also, in the WeChat model, everyone is tied to WeChat and can't go around it.

This voice based model can be integrated into any existing system. It already has the network effect going for it and it's not tied to the fate of any one company


No, you just become tied to Google. You can't reimplement Duplex yourself without reimplementing both their API and their voice recognition verbatim, and the best way to do that is to simply use their product.


Not necessarily. Given the rate of progress in AI and the number of companies working on it, it's only a matter of time until Duplex-like tech is reimplemented by other large corps like Amazon and Microsoft, and eventually it'll could even be implemented by startups if there is a decent business case for it


My understanding is API integration is what wechat is in china -- every hair salon and equivalent-of-corner-pizza shop has some wechat integration, payment and all.

Meanwhile in NYC, good luck getting the bodega on the corner to even take your debit card.


Eh? The vast majority of bodegas in NYC take credit cards. Many have minimums, or charge a fee if you don't hit the minimum. But I've not been to a bodega in the last ~5 years that didn't accept credit cards in one form or another. > 5 years ago, sure, but now pretty much everyone's got them (even in the rougher neighborhoods).

If they don't accept cards, they almost always have an ATM.

My complaint in NYC is the uptick of "cashless" places, that don't accept legal US tender. I like using cash, I don't want it to go away.


Agree api is the end goal and the wechat model is amazing.


pretty much the only reason the fax machine was still around was for restaurant ordering

The fax machine is still around now, and heavily used in the medical context: https://www.vox.com/health-care/2017/10/30/16228054/american...


Please go and convince the mom and pop hardware store or bakery in your town to use an API instead of answering the phone.


They’ll do that before getting a robot to answer the phone for them.


Why do you say that?


Because they do it all the time. Open table already solved this problem for 90 percent of restaurants.


Ya, I agree that an API is the right solution but the benefit of this is that both sides aren't forced to adopt it at the same time, it's more resilient to changes on either side...


The benefit is that a human could jump in and take over either side of the conversation.

Anyone can have a conversation, but not everyone can author an API.


Where did I say it’s trivial? What I’m saying is, people ar anthropomorphizing and extrapolating what this system is doing far above what it is actually doing, and then using it to justify fears that Skynet’s around the corner.

This system can’t pass the Turing test, it would be fooled probably by a simple question about itself or a subject outside the domain, like the kind of food you like.

You’ve got people in this thread hyperventilating about AI duping your voice and the. Becoming a doppelgänger and therefore we need laws immediately to stop this dystopia? Let’s calm your cortisol levels for a second and stop acting like Thanos just got the last gem.


I don't think any technically minded people on HN are extrapolating what this system is capable of doing, but are (rightly IMO) extrapolating what kind of systems will be announced in 2, 5, 10, etc. I think even HN is greatly underestimating what world class researchers paired with an army of world class engineering talent are capable of.


I think it passes a limited Turing test at the domain it’s trained in. I doubt any of the people on the other end of the call would even suspect it’s a computer. That’s an amazing achievement.


"Duplex seems trained against this corpus. The end game would be for the business to run something like duplex on the other side, and you’d have duplex talking to duplex."

So, you would just have an automated booking system API which is better handled by not placing calls as its form of communication. Right?


The point being is that it's a human speech API which is easier for humans to utilize "manually" if they don't have a robot servant.


I prefer not to call them servants, it would be better to start calling them partners actually, just in case we call them master some day.


Yes!

This is an API that requires no computer on the user's end and is portable across different implementations from different companies.

It's not ideal. Actual standardized APIs are better. But, uh, have you ever worked with industry standard APIs? I have, and standardized is not how I would describe them.


I still think there's a need for standardized APIs in this situation. At some point, the context constraints mentioned in the blog post have to get translated into some action with parameters. I'm guessing that action will be API calls to other Google products behind the Duplex Google Assistant UX.

"Ok Google, can you reschedule my Dr. Appointment this Friday for next week? I have a conflict." -> calls the Dr and reschedules -> adapts result to rebooking action with partners (ie, an api call to your Google calendar) -> applies action and responds to you.

There is still quite a bit missing from this to be a useful AI product. It's getting really close though. I can't wait until this makes it into Google Assistant and it can call a restaurant to ask about gluten free options while I'm driving.


You're absolutely right! There's a huge need for standardized APIs for interacting with outside systems for this use-case. In your example, your doctor's office.

In practical terms, there may be some minor issues such as incompatible multiple implementations and adoption costs. But that's made much easier to handle by a very small number of expected consumer systems.

As for interactions with end-result partners, well. I've worked with standards designed to represent such highly general cases (xcbl and cxml). They're invariably rife with interoperability problems and other issues arising from overly broad standards. These tend to not get better over time as much as one might hope, as it's not easy to continuously update standards at a reasonable speed across N target types of partners. Keeping up with how usage evolves is never easy.

The best approaches to this that I've seen in use are those that focus on providing a vehicle for arbitrary data for delivery to the app - like HTTP or TCP. Getting more specific is the route to madness. Which, unfortunately, is probably precisely the bit you'd most like standards around.

You're completely right. There's a very real and very important need for standards here. There just might be some issues worth mentioning that might arise from the attempt to create and rely on them.


It's not just about the user. There's no open, universally agreed upon booking system API...except for human voice.

This is creating a natural-language based booking API that any system or business can tap into


This is my initial reaction too, but APIs need work for coding, integration, testing, make sure data is sent in the right format, etc, while voice-robot-to-voice-robot will just work out of the box.


You're completely abstracting away all the coding, integration, testing, and "data formatting" (read: grammar) involved in Duplex, which seems to be much more complex than an REST API.


But each side only need to do it once right? At least once per language. No need to deal with interoperability with multiple APIs and integrations.


Opentable


And NASDAQ:BKNG in general (includes OpenTable).


That end game seems very wasteful and Rube Goldbergian. Why use an 1800s technology as the transport layer when salons, yoga studios, and more already use things like MindBody, which already has an appointments API? I’d honestly be way more interested if this integrated with MindBody, OpenTable, DMV websites, car dealer appointment systems, medical office scheduling systems — all of which already have APIs or at least web pages. But then, saying that you wrote and will maintain some WWW mechanize stuff that posts forms is way less marketable to the general population who see this as magic.

Also, they’ll discontinue it after a year once it gets enough negative press about how it doesn’t work well and loses business for businesses.


Because there are lots of businesses that don't have a booking API and don't see the need for one, or can't afford one. This kind of technology allows interaction with them, because it's easier to interact on a common transport protocol than to expect everyone to change to your preferred one. That being said I feel like it won't be long until this tech is used for scamming, phishing and pranks.


In the second example they ask what's the line likely to be like on Wednesday. Show me one API that does that. English is richer than APIs.


Google does provide that information when you look up a restaurant, under "Popular times": https://imgur.com/a/ssdHxn3


Because businesses can offer Duplex for customers to call and speak to after hours if they still want to keep a human phone assistant. It's modular, not Rube Goldbergian.


You also seem to completely miss that outside of silicon valley, large portions of the population actually want to interact with things by phone.

That's changing for sure, but the demand for phone based services is still very high.


Likely it'll be a transition technology for businesses still using phone appointments as their primary API, and eventually it'll ease the path for more direct integration.


It's specifically for the salons, yoga studios, restaurants, etc that don't use any kind of online booking system.


Humans can't speak api


It's unfortunate that in 2018 we still have to resort to klugey work arounds like this with so many restaurants and hair salons instead of being able to make reservations online. There are services like OpenTable but even here in Silicon Valley only a small minority of restaurants use them. It seems like there's a huge opportunity if someone can crack the market.


> There are services like OpenTable but even here in Silicon Valley only a small minority of restaurants use them.

OpenTable takes a cut, no? That always going to limit availability.


I think it's more unfortunate that so many people are just so opposed to picking up the phone and talking to someone.


That's the universal criticism of every technology.

"I think it's more unfortunate that so many people are just so opposed to looking up directions to wherever they're driving before they get in the car."

"I think it's more unfortunate that so many people are just so opposed to paying their bills every month."

"I think it's more unfortunate that so many people are just so opposed to carrying cash around and counting change."

"I think it's more unfortunate that so many people are just so opposed to coming over and talking in person."

"I think it's more unfortunate that so many people are just so opposed to washing their dishes by hand."

"I think it's more unfortunate that so many people are just so opposed to doing long division."


I don't mind talking to "someone". But I sure mind talking to customer service folks who have absolutely no interest in talking to me and make it as difficult as realistically possible.

Worse, they're probably gonna spend 3/4th of the time trying to sell me shit I don't want and make me fight against it.

Online, I can ignore any prompt and just click next next next finish, and the form won't be in a bad mood. I have no interest in talking to an annoyed clerk, and they obviously don't want to talk to me, so we can just avoid each other.


Story time!

When I was signing up for Internet at my new apartment, there were 3 ways I could do so: by contacting my apartment's official representative, online, and through the regular phone system.

I used all three. First, I contacted the representative, who gave me a price. Then I looked online and found the actual price (considerably lower). When I tried to sign up online, I was told I'd need to provide an extra security deposit because I have my credit reports frozen.

So I called the generic phone system. The agent gave me another price (lower than my official representative, but still higher than the website). I pointed out the website price, and the agent switched me to that price. I asked if I'd need to provide a security deposit and they said no. They finished signing me up, and everything was fine.

The whole process was annoying, I would have loved to have someone else do it for me. This was the perfect time for a phone assistant to step in. But that would have been a really bad idea with Duplex.

The point is - an automated call system probably doesn't protect you from an abusive representative. If I had Google Duplex handle either of my calls, I'd be paying more for my Internet right now, because I guarantee Duplex isn't smart enough to determine if a representative is lying about an advertised price.

95% of the time this probably doesn't matter, because most people I talk to on the phone aren't abusive. But if someone does want to upsell you or bury you in service fees or waste your time, Google Duplex is probably making their job easier, not harder.


Good luck doing that at 3AM in the morning, because that's when you have the time to plan your weekend activities.


It’s the lack of indication that you’re talking to an automated assistant and that fact that it uses human affectations in its speech that creeps me out hardcore.


As crazy as it seems at first glance, a double Duplex system would be a really beneficial result.

What stops businesses from setting up Apis from scheduling services today? It's the lack of a universal API.

If a barber shop wants to make it possible for a 3rd party app to book appointments then they have to release some API. But that's not the end of it. The 3rd party app has to first discover their Api, someone has to understand it and write code to use it, and then deploy that code.

I'll repeat for emphasis: This is mainly a problem today because there is no universal Api that all services can use

With Duplex, verbals speech becomes a universal Api that every service can parse and communicate to each other with. Also, the discoverability is taken care of by using publicly cataloged phone numbers on services like Google Maps, Yelp, etc


No, it doesn't.

The problem of universal API is entirely orthogonal to voice communications. Duplex is not a Turnig-complete system, it's just an API behind a voice recognition layer. All the important problems for universal APIs happen after that layer.

Ultimately, what you describe can work perfectly only when everyone is using Duplex, which is equivalent to everyone using Google-defined API. That's not universal, because you have one entity behind it.

The only way this brings us somewhat closer to universal API is that if you expect it to handle humans as well, it introduces some constraints to the space of possible APIs, which could make it easier for everyone to agree on a common format. Constraints of natural language processing without a human-level AI require your API to be very fuzzy and very lenient. There's nothing stopping one from implementing those same constraints over a text or binary protocol. Nothing except no reason for businesses to do it.


Everyone doesn't have to use Duplex, everyone would have to use a system that allows it to seem like a human is talking to the other person, provided you limit the context to a particular domain (like taking reservations).

This system could be developed by any company with sufficiently advanced ML chops


You gotta be kidding. If it was that easy why didn't you build it yourself? Easy billion right there.


cromwellan didn't say it was easy. It's still hard to do something limited.


Well, cromwellian works at Google, so maybe he did build it himself. :)


>The end game would be for the business to run something like duplex on the other side, and you’d have duplex talking to duplex.

Is this satire? If this is indeed the future, I wonder if there is an irresistible urge to make systems as inefficient as possible. Kind of "like gases expand to fill the container, applications become as inefficient as the power of the hardware allows".


Dude, this system has better conversation skills than me. I mean literally. Well, I'am autistic + esl (+ kinda too why). But still it's kinda incredible that a system is actually better.


Yes, that first restaurant reservation would have taken me several times as long to complete.


"The funny thing about AI is that it’s a moving target. In the seventies, someone might ask “what are the goals of AI?” And you might say, “Oh, we want a computer who can beat a chess master, or who can understand actual language speech, or who can search a whole database very quickly.” We do all that now, like face recognition. All these things that we thought were AI, we can do them. But once you do them, you don’t think of them as AI. It has this connotation of some mysterious magical component to it, but when you actually solve one of these problems, you don’t solve it using magic, you solve it using clever mathematics. It’s no longer magical. It becomes science, and then you don’t think of it as AI anymore. It’s amazing how you can speak into your phone and ask for the nearest Thai restaurant, and it will find it. This would have been called AI, but we don’t think about it like that anymore. So I think, almost by definition, we will never have AI because we’ll never achieve the goals of AI or cease to be caught up with it."


Who in the 70's thought AI would be defined by being good at chess? I mean it can just about brute force the game; how is that intelligent?

The Turing test is way older and seems to have been the standard measure since it's inception.


First of all you can't brute force the chess. Algorithms that beat high level chess players are non-trivial, even with todays computers. In fact top engines today all use carefully designed heuristics hand-crafted by experts -- this notion that Big Blue, Deep fritz, etc were dumb "brute force search engines" is a misleading tale.

Second, in the 70s there was no computer power even for quite clever algorithms (that probably didn't exist yet) to beat top chess players. Chess was seen as a grand goal requiring utmost intelligence -- while it is obvious in hindsight, at the time the intuition was probably that extremely "intelligent" humans were required to play chess, and in fact the best chess players were among the most "intelligent" persons -- it was a clear exclusively intellectual task that few people were competent at. So many believed that chess would be one of the greatest challenges to AI (the clarity of the rules added convenience of research and implementation). Things like walking didn't seem intellectually demanding, so the common sense was that it is probably "easy". In fact today we know that navigating a bipedal robot in a simple environment through visual recognition is vastly more difficult computationally than playing chess well, it is only easier for us because we have highly specialized circuitry in our brain hat is well matched to those tasks. Our brain wetware is not very well matched to playing chess.

Also chatbots have been doing pretty well on Turing's original definition of a Turing test, ever since about 10 years ago. But now it is being argued that Turing didn't really see the "loopholes" they believe the bots are exploiting, and are coming up with more strict requirements for a Turing test.

That's totally in line with Tao's argument that every time we approach a major AI goal, suddenly it is not AI anymore, because there's nothing magical about it, just boring old technology. And human brains are magical, right?

Until every obscure niche capability of humans has been dominated in every possible way by AIs many won't want to concede that it really is AI. And even when it does become better than us in every possible way, I suspect a few will still find arbitrary reasons why it really isn't AI/AGI, e.g. because it is not organic, because the computer lacks a body, because it lacks a "soul", etc.


You might be right about chess, but I can't understand how you think chat bots are "doing pretty well". I've never seen a conversation with one that held up to even the most tolerant hand-holding for more than a few sentences.


I mean they're doing pretty well by Turing's original definition. I agree chatbots using traditional techniques (not sure about newer LSTM chatbots) are not too impressive, just illustrating that we've had to adjust the definition. That's by definition moving the target.

In fact I'm quite sure Turing would be quite impressed by good recent chatbots.

Try this one: https://www.pandorabots.com/mitsuku/

From the point of view of the 1940s, this would seem really close to a veritable "Thinking machine"! Although I'm sure he'd recognize a few things are still missing to fully replicating human behavior (or going beyond).


"who can understand actual language speech" - "we do that now" We have mood classification, but understanding? [citation-needed]


By understanding he means translating speech to text, I guess. We have speech-to-text systems that are better than the median human in the native language now. Quite amazing, given how central auditory language processing is in our cognition. And most people don't think it's "AI" (and certainly not anywhere near AGI). That's a good example of how AI is a moving target IMO.


Small business do have email. I use email to do this kind of thing all the time and it works extremely well. I feel like this is a case of silicon valley solving marginal problems while the world burns.


You have to see it in use too. The Milo demo from Microsoft was much more impressive, but completely fake.


To the extend that the primary application for this is call support, I don't agree with your proposal. This is supposed to close the gap between a tech-savvy group that would be using duplex and tech-handicapped small businesses. It is much easier / effective for a restaurant for example to hook up with open table than deploy something like business-duplex.


Exactly. If a small business doesn't use open table or duplex, no problem - I can just use duplex to schedule the reservation for me. Open table requires buy in from the restaurant, duplex doesn't.


The Turing Test is irrelevant, the "dystopian" stuff is mostly irrelevant, the ethics are highly relevant. It is simply unethical for a computer to converse with a human while misleading them into believing they are talking to a human. There are a zillion reasons why this is so, if none of them seem obvious then I'd suggest investing some time to take an ethics course.


> Most people working in hair salons or restaurants are very busy with customers and don’t want to handle these calls

And those will have online booking systems already - I don't see how this technology is still relevant nowadays. Maybe it was back in the 90's when the internet (and online booking) was a new thing, but now? I can't see there's a big market for this application.


> Most people working in hair salons or restaurants are very busy with customers and don’t want to handle these calls, so I think the reverse of this duplex system, a more natural voice booking system for small businesses would help the immensely free up their workers to focus on customers.

I think this perspective is very short sighted. You will lose customers to automation, but businesses wont turn away customers because of automation.

Customers and prospects don't want to interact with machines, but businesses should be willing to give customers what they want.

The idea that a tool can be rolled out to millions of consumers, even with limited use cases and not have to get adoption from businesses to be useful is IMHO a much bigger opportunity and much better use case than rolling out a tool to businesses that make the interaction less personal.

Customers need to trust businesses, business only need to collect money from customers.

I think everyone who focuses on chatbots from the business use case perspective is missing the bigger opportunity.

A technology that can give a consumer access to ALL businesses, not just the ones who adopt a new technology offers much more utility than serving businesses or the shortsighted use cases like saving time and money for the business.


Uh...businesses have been automating phone calls for decades now


Thats partially my point. Everyone is thinking about the business case for cost saving around automation for businesses, but that lacks imagination. Automation tools for consumers to interact with humans at businesses is where the real opportunity lies.

Would you ever voluntarily use an IVR? I wouldn't. If I am going to interact with automation for a business, I want to do it with a different interface than voice... all the hype around NLP and chatbots was uninspired and focused on the wrong side of the interaction... Building conversational interfaces for consumers to use to interact with businesses is a much better use case.


I feel like a lot of people who are pointing out the limitations show a failure in imagination of where this will go in the future. I don't care about the obscure technical limitations now, those are just engineering problems that will be solved in very short time.

I'm not afraid of the machines going all singularity or skynet or whatever, becoming sentient and taking over the world as some kind of robo-Hitler. That's moronic. But what does worry me is the normalization of having a machine do everything for you, plan your whole life, access every little detail of every bit of your personal data and lifestyle.

Of course we've already had that for a while with the way phones work. But this is another step towards getting public consensus for using it in new ways. Once people are used to this, we'll have more and more systems with conversational software that manages your life for you. Speaks on your behalf. Interfaces with the world for you because doing it yourself is far too stressful and inconvenient.

And of course it'll be a free, advertising-supported model so all that data will have to be shared with, among many things, shady political organizations to try to gain every little advantage possible to manipulate public opinion and steer themselves into enormous power.

Think of where the cell phone started off: just a phone in your pocket. It's so much more now. Remember that when thinking about these AI assistants and what they will develop into. I'm not afraid of the classical AI apocalypse. I'm afraid that these systems will do exactly what they're designed to do. That people are underestimating just how much power lies in these little inconveniences in life, once they're all added up and analyzed and tallied.


> The people losing their marbles over this being some kind of Turing Test passing distopian stuff are missing the point at how limited this domain is.

A colleague of mine is just going on about this.


You’re only looking at what it is today. You just need to extrapolate enough into the future to see how deeply disturbing this really is.


As long as there's a way out of the AVR system... too often, you get trapped when you want something that isn't an option.


When you mash 0 with Comcast, it kind of scoffs at you and tells you that it needs some information before it can help you. If you keep hitting 0 it just hangs up. Pretty terrible experience, but they don’t have to care because they have monopolized so many neighborhoods.


I imagined both the booker and bookee using a call automater / reception automater to make the appointment in the future.


> People who answer phones to take bookings perform an extremely limited set of questions and responses,

But that in itself is not even true across the industry, some(most) phone bookings are very complex, otherwise they would just use a web interface.


> But that in itself is not even true across the industry, some(most) phone bookings are very complex

Citation needed for that "(most)". I work for a company with a call center and a large part of calls are simple ones that could be easily answered by just reading the FAQ page on our website.

> otherwise they would just use a web interface.

I think the problem is more about resources. My local hairdresser use his phone and a notebook to take bookings. It takes a bit of his time and could easily be replaced by a Web interface but he doesn’t have any resource for that (and some people still prefer using their phones).


> I think the problem is more about resources. My local hairdresser use his phone and a notebook to take bookings. It takes a bit of his time and could easily be replaced by a Web interface but he doesn’t have any resource for that

By "resources", do you mean money? Because if so I can't imagine the purchase and training of Duplex on the business side would come cheap either.


Absolutely correct. And so-called general AI may never happen. Regardless, this is shocking. It immediately needs to be factored into any speculation about what the world will look like in 20 years. Innumerable questions.


Maybe a general AI is just a very big collection of many smaller abilities.


The older I get, the more true I suspect this is.


Of course there is more to human intelligence than a list of abilities like "booking a reservation" or "translate French to English".


Umm...hmm...I got you.


> The people losing their marbles over this being some kind of Turing Test passing distopian stuff are missing the point at how limited this domain is.

Right, and this kind of comments will continue for a while.

The question is - when the "this is trivial, move on" type of comments will start to fade out? Five years? Ten?

More

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: