

40+ voice searches thrown at Google on Jelly Bean - bane
https://plus.google.com/100130762972482716067/posts/BN5qjTEN62r

======
pilif
(disclaimer: this is based on demonstrations I have seen - I don't have access
to JB and thus I can only speak of Siri):

In many of these demonstrations I noticed one thing that was bugging me: Even
though the voice recognition in Android seems really cool (I don't have JB
yet), it doesn't give definite audible confirmation of the command in many
cases and sometimes it even requires user interaction with the screen.

Now personally, I already believe that speech input is kind of a gimmick in
itself (try using english voice recognition with my address book filled with
german names...), I believe that to even have a chance to move from gimmick to
useful feature, it must work without user interaction on the screen.

"Play <whatever band name>" followed by "beep" and the a button to press on
the screen doesn't help me. A useful response to "play <insert band name
here>" is "playing <insert band name here>" followed by actually playing it.

Or "call <some name>" - if you just get back "calling" or even just a _beep_
\- how would you know whether the recognition was successful or not and the
correct name has been recognized?

Some commands on Android seem to be doing fine (the weather example), but
others fail in one way (the play example seems to require user interaction on
the screen) or another (the "turn on wifi" command doesn't produce any audible
confirmation or error message - just the same beep sound as if it worked).

Siri, while it might not have as good a recognition as the Android solution,
is much better in that regards: It always confirms your command. As such Siri
seems moderately more useful as an additional input method whereas Android, by
forcing you to look at the screen when inputting a voice command, reduces this
to a gimmick and nothing else.

~~~
srj
It appears that for queries that don't have direct "answer" type response and
will perform an action it displays a progress bar with the query it
understood. Presumably this is so that you can cancel the action if voice
recognition was wrong.

It doesn't require any input though - I just tested it. Once the progress bar
reaches the end (it seems to take ~7 seconds) it will complete the action.

~~~
pilif
That means that in case it mis-understood you that you have to wait ~7 seconds
before you notice that it was wrong.

If it immediately confirmed like Siri, you would know right then and could re-
issue the command.

~~~
kefs
Are you trolling? I understand you said in your parent that you don't have JB
and are strictly going by the video and demo.. but how could you miss this?
It's in the first minute.. multiple times.

You do not have to "wait ~7 seconds before you notice that it was wrong" so
you can "re-issue the change". The result is displayed immediately. You can
cancel the auto-action (which you first said didn't even exist), or force it
through before the ~5 seconds (not ~7) elapses.

Go re-watch the entire video in the foreground, please.

~~~
taylorfausak
I think pilif meant that you'd have to wait for the progress bar (about seven
seconds) to finish before you noticed something was wrong. For example:

    
    
      "Call the Drake Hotel in Toronto."
      *bling* "Calling..."
      (wait seven seconds)
      "Hey, this is Drake. What's up?"
    

Versus what Siri does:

    
    
      "Call the Drake Hotel in Toronto."
      *bling* "Calling Drake Smith..."
      "No, wait! Stop!"
    

Think about using the voice commands when you can't see the device. Like when
you're driving or running. It's useful to have the audible feedback in
addition to whatever's displayed on the screen.

------
vibrunazo
I was impressed that most of these searches already work on regular google
search (just tried them). I had no idea the knowledge graph could do those
already. This is not only a great demo of android voice search. It's a great
demo of google search.

~~~
vibragiel
However, at least some searches from the video seem a bit cherrypicked.

"How much is Angelina Jolie worth?" is throwing an answer:
[https://www.google.com/search?q=how+much+is+angelina+jolie+w...](https://www.google.com/search?q=how+much+is+angelina+jolie+worth%3F)
But "How much is Brad Pitt worth?" isn't:
[https://www.google.com/search?q=how%20much%20is%20brad%20pit...](https://www.google.com/search?q=how%20much%20is%20brad%20pitt%20worth)?

"Fish species in Lake Tahoe":
[https://www.google.com/search?q=fish%20species%20in%20lake%2...](https://www.google.com/search?q=fish%20species%20in%20lake%20tahoe)
But "fish species in mississippi river":
[https://www.google.com/search?q=fish%20species%20in%20missis...](https://www.google.com/search?q=fish%20species%20in%20mississippi%20river)

Edit: changed ".es" by ".com" in all the links.

~~~
abrahamsen
I had to change ".es" to ".com" in all those links to get anything.

~~~
vibragiel
Oops, weird. Changed them.

------
bsimpson
Google Now is awesome, but it's way worse at calling my friends than Voice
Search was. It's like it doesn't index my address book.

The other day, I told my phone "Call Nico Thornley." Instead, it searched for
"call me-so-lonely." Not the first time Google Now has completely botched a
friend search either.

~~~
lukeschlather
Voice search seemed to quit indexing contacts with the 2.1 upgrade. Which is
too bad, since putting addresses in my contacts made it remarkably efficient
for driving to a friend's house on a lark.

~~~
nazgulnarsil
This is what pisses me off. We clearly have the capability for really obvious
stuff like this to just work ("navigate to wendy's house"), yet it still often
doesn't.

------
gurkendoktor
I think these are great times to be a gadget consumer. Apple made the Siri
interface 'mainstream', and Google is great at throwing brainpower at their
own version of the interface. (I hope I'm not mistaken that Apple integrated
the 'vertical list' interface first.) The fact that JB's dictation works
offline whereas Siri needs a connection is the icing on the cake.

Same for the race to having better maps or the better browser.

~~~
greggman
What's the "vertical list" interface? Android had lists of results on voice
search for while.

Is this what you mean? <http://www.youtube.com/watch?v=0L_IhqGcRM8#t=8m36s>

Here's the navigate by voice from nearly 4 years ago
<http://www.youtube.com/watch?v=jLXZ5BHeDFg>

Here's the original voice actions video by Google from 2 years ago
<http://www.youtube.com/watch?v=gGbYVvU0Z5>

Siri was introed 1 year ago?

I agree though, it's a great time to be a gadget consumer. I love Siri's
conversational style. It actually doesn't seem too far off before I can
actually start having a conversation with a computer though something like
Siri which is both awesome and terrifying at the same time.

~~~
gurkendoktor
I meant that the interaction with Siri looks basically like a chatlog
(conversational).

Here's what I misunderstood - I thought Jelly Bean was the same, but he would
always just tap away the last item in the video so we don't see it. Maybe
there is no scrollable backlog in Jelly Bean.

------
zmanian
Anyone have a hypothesis on how Google's offline line voice recognition works?
it definitely seems like they have moved more voice recognition work onto the
client even in the online mode. My understanding of Google's approach to voice
recognition was that it was big data dependent. This would make it hard to
move to client devices....

------
sxp
There is also this test of 1600 queries: [http://gizmodo.com/5922332/google-
search-beats-the-crap-out-...](http://gizmodo.com/5922332/google-search-beats-
the-crap-out-of-siri-in-1600-question-test) Final results is 86% to 68% for
Google.

------
bmj1
Looks good - 1 hiccup: the calculations aren't rounded correctly. (time: 4.10)

Rounded to 2 decimal places: 11.65600 is 11.66 not 11.65 80.4672 is 80.47 not
80.46

Sorry to be pedantic...

~~~
bmelton
I don't believe they're rounding at all, rather truncating.

~~~
bmj1
Good point - however they should be rounding in this context.

------
Zenst
When I see 40+ voice search's by various accents, including Socttish - THEN I
will be amazed.

------
switch007
I assume that's over wifi? I'd like to see its performance over 3G.

~~~
mise
On a related note, does the voice get intrepreted on the device, or on Google
servers?

~~~
ConstantineXVI
As of JB, it can do both. Though I believe the offline mode is only used as a
backup if the network's acting up.

~~~
Tyrannosaurs
I wonder whether the offline performance is comparable?

If it was then surely you'd do all the voice parsing locally so I'm guessing
that it's not. Unless anyone can think of another reason you'd push it through
the servers?

~~~
ConstantineXVI
Two factors:

\- online should have more training data and can be improved much more easily.
Don't see that advantage going away.

\- power efficiency and/or speed. Sending 5s of audio across the net can be
less strain on the device (esp. older ones) versus parsing the audio locally.

------
mrich
Well done, unfortunately all devices using this will be sued out of the market
by Apple.

------
shawndumas
what happens with jellybean when you make an appointment for a time slot and
you already have one at that time slot?

------
fabiandesimone
212 is also the area code for Caracas, Venezuela ;)

~~~
itp
It'll be interesting to see if the response changes based on localization
(once this is available in more languages than US English, of course). Having
the response to that query change if the locale is es_VE would be pretty
slick!

Thinking about this, do systems like Android typically offer localization that
specific?

~~~
raldi
Indeed; do questions about the Cardinals give different answers during
football season? And if it's the time of year that people are playing both
baseball _and_ football, does it distinguish by whether you live in St. Louis
or Arizona?

Edit: or Vatican City?

------
jarin
What's up with it showing directions to Moscone Center and then when he closes
it there is something about Wooster College? I don't wanna call fake, but
that's definitely strange…﻿ Then later on he asks a question about Wooster
College

Edit: Also, he says "Where is that museum with Egyptian stuff in San Jose?"
and after he closes it, it shows "where is the tallest building in the world"

~~~
itp
From the link text on the post:

 _Disclaimer: the only edits I made were to cut time between each of my
queries, as well as re-order some of the demos from the original order I
recorded them in, so they would fit into categories. None of the queries
themselves have been edited or cut down, and the sequences are intact. The
processing time happened exactly as you see. This demo is made on the early
build of Android 4.1 (JRN84D, takju build for Galaxy Nexus I/O edition), on a
wifi connection. Consider this beta._

So what you're seeing are places where the queries were reordered.

~~~
jarin
Ah, I did not notice the cuts.

