Judging by my everyday interactions, a 6% error rate is lower than human error r...

strombofulous · on Sept 14, 2016

The difference is that people are OK with a human asking for clarification, but systems like Siri need to have a near-zero error rare before people will consider them good (a person who has to repeat themselves once every 20 times will consider it bad, or at least not good enough)

a3_nm · on Sept 14, 2016

I'm not sure people expect super-human performance out of Siri. An important difference is that a human who doesn't understand will say so, and ask to repeat the relevant part (or to choose between two alternatives), conversationally; or it will pick an interpretation which is not the intended one but was an understandable misunderstanding.

Contrast this with speech recognition, which will often substitute words that are nonsensical in context, making it look silly from a human perspective...

click170 · on Sept 15, 2016

I think another important difference is that humans won't get stuck in a loop asking you for clarification the same way several times, after 2 or 3 times they'll typically change behaviors. Eg they'll ask you to spell the word or respond with the not-understood word with a questioning tone to signal that they don't understand what that word means.

jobigoud · on Sept 15, 2016

This could be implemented though. Based on the part of the sentence that is understood, figure out most likely words for the missing part and ask a specific question about it to fill the gap.

legolas2412 · on Sept 15, 2016

See, it's not about hard coding such behavior. I would say that it reaches human level of understanding if it automatically learns these ways of solving the problem. Asking relevant questions can be hard coded, but it doesn't equal "understanding" the problem.

I think the chinese room experiment overlooks this part of "understanding"

cptskippy · on Sept 14, 2016

Exactly, when SR has a low confidence level it needs to ask for you to repeat yourself. Not just choose the highest confidence match and hope for the best.

randyrand · on Sept 15, 2016

Siri underlines words its not sure about. Then if you click it, it gives you a menu of potential other candidates.

Seems like a good approach.

tarikjn · on Sept 15, 2016

That's a good start but a probably the wrong interface for it, "non-native" in the context, a command initialized by voice should present the options by voice.

zaroth · on Sept 15, 2016

It's a valid HCI solution to a technical failure mode. Once the software has advanced to the point where the AI is truly conversational, it is a watershed moment.

cptskippy · on Sept 15, 2016

That's fine for dictation but of little use when driving or other eyes free scenarios.

ams6110 · on Sept 15, 2016

Also. When. People. Talk. To. Siri. They. Speak. Very. Distinctly. With. Clear. Separation. Between. Words.

Or that is my observation, anyway. I don't use it myself.

dasboth · on Sept 15, 2016

I bet Siri's great at understanding what William Shatner says.

taneq · on Sept 15, 2016

The important thing here, IMO, is going to be how the system asks for clarification. Hearing the same canned "I'm sorry, I didn't quite get that, can you repeat?" phrase 20 times in a row is annoying. Having the computer say "I'm sorry, what was that last word?" or "I didn't quite catch that, did you want me to call Benny or Betty?" would be far more acceptable.

daveguy · on Sept 15, 2016

Like someone else mentioned, how it makes sense out of words is much more important than a zero error rate.

Understanding rate is less than 10%. If you don't match a keyword it gives a useless web search.

Personally I don't think understanding rate is the whole issue as much as reaction to error (which is partly understanding). You can't say "no that's not what I said" and Siri et al never keep enough context to say "huh? What did you say? Or "I didn't get that last part. can you repeat it?"

It's that errors in understanding or accuracy turn the whole thing into a complete shitshow.

One failure and you might as well pull over and type what you want.

Houshalter · on Sept 14, 2016

Remember this is with low quality sound. It could be much higher under better conditions. Amazon's echo relies on good hardware as much as software, with an array of good mics.

dogma1138 · on Sept 15, 2016

From talking to a few people that do SR it's also considerably easier to do when you know the hardware.

They can cancel out reverb and create very fine tuned waveform profiles for speech.

I think one of the reasons that Siri is slightly better at SR than google is because of the control that Apple has over the hardware.

While Cortana turns sourpuss on me every time I switch headsets.

blazespin · on Sept 15, 2016

No, the err rate is not a big deal. What is a big deal is making sense of the words it actually can hear.

amelius · on Sept 15, 2016

One big problem with Siri is that it has zero sense of humor. That is, imho, what makes people feel tired talking to it. It's like talking to a boring civil servant.

YeGoblynQueenne · on Sept 15, 2016

>> Judging by my everyday interactions, a 6% error rate is lower than human error rates in casual conversation.

It's better to avoid throwing around numbers like that but even if that was the case you have to remember that humans understand speech. The speech recognition task performed by AI systems on the other hand is more akin to transliteration: the system takes in sound as input and produces text as output. Any sort of "understanding" a) is extremly difficult to do well and b) must be performed by a different component of the system (a different algorithm, trained on different data).

blazespin · on Sept 15, 2016

The difference is that people understand what you say, don't just map your speech into words.

newscracker · on Sept 15, 2016

> People regularly ask each other, "sorry, what did you say?", "wait, what did she say?", "would you repeat that please?", "huh?", etc.

For humans, isn't this a due to a combination of factors than just comprehension alone? Humans who ask, "sorry, what did you say?" or "would you repeat that please?" or even just a "huh?" usually aren't paying attention at all. It's not a comprehension or sound quality or surrounding noise problem for many, except in situations where the person is not fluent in a particular language or dialect or accent or if the surrounding noise vs. the person's hearing ability aren't conducive to listening properly.

Most people also usually tend to think about judging what the other is saying and constructing a counter-point during the process of listening that impairs the ability to listen and understand well.

On the other hand, a computer could expected to be, and made to be, paying attention a lot better in a predictable way, which is not possible with humans.

With the other comment reply above stating people's expectations with humans vs. computers, shouldn't we also consider the computer's strengths while making comparisons with humans?

Retra · on Sept 14, 2016

That's mostly because people are thinking about other things. We understand that and anticipate it. If my computer doesn't understand me, it has no excuse as it can't distract itself. It isn't going to hear me next time by "concentrating harder" like a human can. It's going to keep failing.

oldmanjay · on Sept 14, 2016

I have a different experience - many people speak with a mumble or a mushmouth and no amount of concentration helps me disentangle it until I can get them to speak more clearly.

YeGoblynQueenne · on Sept 15, 2016

Sure, but if you repeat your utterance there's a good chance that the conditions will have changed the second time around- maybe background noise will have subsided or you'll have swallowed that bit you were chewing on and so on. It makes sense to ask you to repeat a couple of times even if it's a computer you're talking to.

noonespecial · on Sept 15, 2016

Yes but people know to ask for clarification based on context. You know that they didn't just say they were off to wok their log.

UlyssesSKrunk · on Sept 15, 2016

Humans do way better than 6%. I don't know who you're talking too, but I don't know anybody who I need to repeat anywhere near 6% of what I say to.

ams6110 · on Sept 15, 2016

I'm also not seeing anything close to 6% on any public implementations. The voice mail transcript emails I get are often so bad that it's impossible to discern even the gist of what the caller is talking about.