I find voice assistants rather underwhelming. I tend to lob them really specific questions they should be able to answer like.
"call $business_thats_part_of_a_national_chain in $town $state." That's a batting practice pitch right over the plate. The "in $town $state" tells it to use Google maps search. From there it should have no problem finding the business and phone number. Instead it tells me "here's what we found for $business in $town $state." Of course the first result is the business I'm looking for and I can find and call the number somewhere on the page but it's almost easier to just Google it directly.
The only time I'm willing to put up with this is when I'm driving and want to minimize interaction with the phone.
Maybe nobody uses them in public because it's faster, more precise and less obnoxious to type.
Looking up car parts would be nice ( "does $chain_store in $town have $partnum" and "$other_chain_stire in $next_town_over have $partnum") is hilariously easy from a implementation perspective since everything but slapping the voice recognition on top and mapping key words to software commands has been solved for a long time. (e.g. http://showmetheparts.com/).
When I watched the Google Home video from the I/O event, I cringed very hard. And I am not a skeptic: I generally am a believer in new tech, in natural language tech, etc., but that video looked really goofy and improbable. Do they really think that families are going to change their morning routines to interact via voice with a device? Just the "Hello Google" command is awkward enough, people aren't going to walk down the street while talking nerdy Google commands out loud. I don't see it happening.
And from a foreign language speaker standpoint, I find it even harder to imagine. In Spanish there is not really a consensus about how to pronounce 'Google'. I can't imagine people on the streets speaking Googlish.
On the other hand, everyone googles, for sure, everyone uses search and mobile. I have the feeling that these two interactions -text/type search and mobile- are too hard to improve on. I don't think voice has a chance to become bigger than text.
Have you observed anyone interact with Alexa? I have, and it seems like people really love it. So, is there a big difference between Alexa and Google Home that you find dealbreaking, or do you challenge the notion that Alexa has been successful?
I haven't. I would like to try it out, maybe I will check the browser emulation they released a couple of weeks ago. I would like to try it both in English and Spanish and compare the experiences.
> hilariously easy from a implementation perspective since everything but slapping the voice recognition on top and mapping key words to software commands has been solved for a long time.
Yes, any single capability is 'hilariously easy' to implement - the problem with these systems is that there are thousands of capabilities to implement before people can consider them close to doing 'everything'.
I remember this discussion from school. Textbooks have defined most of the experiments and tools of the AI boom of the '60s as "AI": things like neural nets and even basic "expert systems" such as today's phone trees. Individually, these things aren't really AI, but even today most of them are fundamental component parts in the larger discussion...
What is the "atomic theory" of AI? Are all of these pieces and components evolutionary explorations into AI or something less or something different? What do we call them along the way other than just "AI"?
Right. The voice assistant doesn't currently solve a real problem I have, except in a few limited situations (in the car, when my hands are full, etc.). And even then I don't trust it enough to do the right thing.
I'm surprised they didn't mention that in areas with a lot of background noise, eg public places, voice recognition doesn't work as well. I think that is also a reason why people refrain from using it in public. From personal experience I can say it's very frustrating to repeat the command to your phone multiple times because it's sorry it didn't get that, or calls your ex girlfriend instead of playing some music.
My experience is totally opposite. I've found Google voice search very effective, even in a crowded bar. When I listen to some the recordings that were successful[1], I'm frankly amazed at what a good job it does.
That URL is the most amazing thing I have ever seen! Unfortunately, it does not convince me of the power of speech operated interfaces. My history tends to consist of a series of multi-part voice commands with the first or second phrase being repeated multiple times, in progressively more angry tones of voice, until it seems I give up. It looks like the most common thing I do, interestingly, is set timers for myself.
The most hilarious clip I found was one where Google had transcribed the request as "bbbbbbb bbbbbbb bbbbbbb" and on playing it back, I heard the sound of my alarm clock going off with a steady beep, beep, beep...!
Google voice used to work poorly in noisy environments, then one day it didn't. I was at a trivia contest in a noisy bar a few months ago. When no one could answer a question off the top of their head, the question would go to a phone search permitted round. Our table won the contest handily because I could use voice search more quickly than any one else could manually type in queries.
Indeed - this is a good point. I travel a lot on subway in New York and by bus to my home in the suburbs. Apart from not wanting to disturb other other passengers, the noise levels are generally so high that its hard to get your word recognized. I need to use a pair of noise cancelling headphone to listen to anything at all.
Interesting. I always believed that an important part of the voice was created in the mouth (tongue, soft palate, lips), and the larynx was only responsible for producing the base frequency. But apparently, attaching a microphone to the larynx works as a means to recognizibly capture the voice. I might have to adjust my understanding of the voice :)
One aspect which drives me nuts about using Siri in public, is her inability to match my own volume. Often I'm in a relatively quiet office, or somewhere else where shouting would be inappropriate. I bring the phone close to my mouth and speak my command softly.
At the loudest volume setting, Siri replies "ALRIGHT, STEPHEN, I'LL REMIND YOU TO WHATEVER WHATEVER".
The one feature that would get me to use Siri more is if she would speak more quietly!
1: You can adjust the Siri app's volume the same way you can any other. Activate Siri, and then use the volume rocker button to adjust it up/down.
2: You can turn off voice responses altogether! In Settings>General>Siri, change "Voice Feedback" to "handsfree only". Now Siri will stay completely silent if you activate it with the home button ("Hey Siri" is hands-free, so you still get voice responses there).
In public I want a reverse voice assistant. Think the movie Her but I don't have to talk back. I want hardware that is in my ear but still lets me hear normally. New text message spoken to me, same for email, calendar, etc. I don't want to even have to look at my phone to get information, and I want it to be invisible. Augmented, I didn't ask for this, well actually I did.
The technology to achieve custom molded Bluetooth earbuds like you're describing has been around for a few years, but no one is making them. There are a few startups and old dogs (hearing aid industry) trying to move into the space, but damn are they slow to the punch. I certainly blame Apple for locking down Bluetooth LE and not releasing a product.
Can you be more specific on how Apple locked down Bluetooth LE? From my understanding it was accepted into the Bluetooth 4.0 standard in 2010 and has since seen widespread adoption in Bluetooth devices.
"Accessories that use only Bluetooth Low Energy (BTLE) (note: BTLE-enabled HomeKit accessories and BTLE-enabled MFi Hearing Aids are part of the MFi Program)"
Have fun with MIDI quality. The only way BTLE is sending mp3 or higher quality is through the MFi Hearing aid protocol Apple owns and keeps under wraps.
Well there is the moto hint as well as samsungs new bluetooth headphones. Battery life is measured in minutes though. That's the achilles heel of what I want. I also don't want crap in my ears. I basically want bone conducting headphones that are invisible that are powered by my body heat. It's a pipe dream.
As soon as I got my Apple Watch, I started using voice assistants way more; both in private and public. "Hey Siri remind me to ___ at (location or date/time). "Hey Siri, set an alarm at ____". "Hey Siri, how many grams in 3 ounces?" "Hey Siri, set a timer for 25 minutes".
Also, a couple of months ago I enabled the "Listen for 'Hey Siri' when plugged in" option on my iPhone and it's been super helpful. My phones typically plugged in right infront of me when I'm working, as well as in the car. I use it frequently to start a phone call "Hey siri call ___" and it works wonderfully. It works WAY better than the hands-free setting in my car.
Car:
**Press Button**
Me: Call Home
**3 second pause**
Car: Calling Home at Home, is this correct?
Me: Yes
Car: Dialing....
**2 second pause while it initiates the dial**
I ponder if some day even our in-home voice assistant should be aware of this. Like, I was thinking if a guest is there, my voice assistant should know. My voice assistant might respond differently in that context. Like providing less detailed information in voice responses.
There's probably even ways a home assistant could also specifically help my guests. It could know where I store things they might need, like cups, plates, and silverware. Or be able to tell them where my bathroom is.
I actually really like "OK Google", but my reasons for using or not using in particular contexts are not at all what the author here states. Invariably, the about 1 second after I say "OK Google" or hit the mic on the search screen, some knucklehead (or one of my kids) starts talking in the background making my search useless. Background noise is a killjoy. I don't want to be the guy repeating the same thing louder and more clearly over and over in the food court. I have no problem using it "in front of people", I just have no faith that it will work.
I'm always "OK, Google"ing at my Android Wear watch at work. I've been doing this for a year or so, and at no point has anyone gotten used to it. They all still look at me like I'm a little crazy.
Of course, when the voice recognition completely botches what I asked of it and I get frustrated trying to make it work right, I probably do look crazy.
Having had the same experience at my work (but from the other side), they possibly don't think you are crazy but would just like you to shut the fuck up.
This is entirely different since the person speaking your example has to speak that out loud as part of their job. Talking out loud to Siri or Google is absolutely not necessary in an office setting. Both are annoying, but only one is avoidable (and thus rude).
Once speaking to a voice assistant is as accurate as talking to a human on the phone, why not. It'll look like/be as anti-social as a normal phone call. Fine on the street, probably not in a crowded train or in the office.
Where it gets annoying currently is repeating OK Google/Alexa/Siri five times, then repeating what you want the damn thing to do another four times, each time a bit louder.
Voice assistants are the new bluetooth headsets. They are not socially acceptable most of the time.
Personally, I only use Siri in my car as a last resort (I think CarPlay is amazing; I don't use my built-in car software anymore). I think talking to a device is awkward and starting a "conversation" by yelling "Hey Siri" is even worse, it feels embarrassing.
Obviously for everyday, casual use that would require some design work. Maybe integrate that to jacket or shirt, if the hardware is not too expensive. Bluetooth for communications. Battery is a bit of a problem, having to constantly charge my jacket is not something I would like to do. Maybe there would be some opportunities to build it so that it would not be active all the time.
Or maybe one should actually rethink the whole concept. Maybe it would be better to just capture the vibrations from few places around the throat and then use machine learning to make sense of them instead of trying to use the traditional speech recognition.
"A set of electrodes are attached to the skin of the throat and, without opening the mouth or uttering a sound, the words are recognized by a computer."
https://en.wikipedia.org/wiki/Subvocal_recognition
I've tried OK Google recently under my motorcycle helmet just to change music. It wasn't up to the task. Somehow The Google Music unchecked the downloaded music only, and OK google played radio racking up a few hundred MB of data before I could change it. It didn't work when all I was saying was next song. I went back to using the bluetooth button I was using before.
Bringing up music with Google Now is also notoriously frustrating because its designed to force you into Google Play. Its purposefully awkward to get it to use a different player.
Could you share a product link for whatever bluetooth button you were using, that sounds like a product I've wished existed for a long time and failed to find in the market. I remember an old wireless/30-pin connector for my iPod that had a weather-proof armband with controls from the company iMonster a long long time ago that I loved, but snapped many a connector off in my leathers!
This is the one you want. I'm on my second now from losing the first. It comes with a handlebar mount but mine were too thin. I layered up some electrical tape and the holder works even at high speeds.
I would assume from a UX perspective, the movement from handset phone back to smartphone screen is seen as an inconvenience and interrupts the flow of the process. Add in that many phones have screen conditions tuned to what context the phone is being used, you increase the chance of interrupting the process when the mechanisms within the phone misinterpret the phone's current position and orientation. (e.g., iOS screens enter a "call mode" when held up to the ear, or screen rotation will change what options are available. Even on new iOS and Android devices, sometimes the OS just seems to get "stuck" in a particular orientation and only after considerable fiddling with the phone will it rotate to the intended orientation)
Likely, designers made the call that they could just avoid the entire mess by operating the entire voice assistant in speaker mode instead of trying to compete with other UX/UI features.
Additionally, I would assume that the idea of simply talking aloud to a device replicates the Sci-Fi computer experience for computer interaction, most notably in Star Trek in which crew members simply summoned the computer with "Computer: [plain language command or query]". Even outside of Star Trek, this was a fairly common idea for interaction with futuristic computer systems, and for enough of the population, this method of interaction is part of the cultural memory.
Siri used to have "raise to talk" that would activate Siri automatically if you put the phone to your ear outside a phone call, but from some googling it was removed in an update.
If you activate Google's voice recognition from a bluetooth headset, it will respond to you through the headset -- but you often have to interact with the device anyway, to get it to do more than the most simple of things.
Actually next time just try talking to it to complete the task.
I use Google's voice recognition a ton when on my motorcycle with a Bluetooth helmet. I've gotten in the habit of just trying things and to my surprise many have worked.
Play/pause music, mute/unmute voice navigation, ask how long until my next turn, ask it to navigate to somewhere else, call/text/send a hangouts message to a contact by their nickname, play a specific band or album of music, etc...
> The high proportion of usage in the car would suggest it has more to do with the hands-free law that regulate driving and texting vs. a free choice by consumers to embrace this technology.
I doubt this conclusion. For one, given how many people I see everyday driving while holding a smartphone to their ear, it does not seem that too many people are concerned with that particular law. Couldn't it be instead that driving is a situation where not having to use your hands is perceived as an actual benefit?
I found one good use for Siri so far: as a calculator. Much easier to say "square root of seventeen times five-eights?" than find the calculator app and key that in there.
It's funny you mention that, the cheekiness is one of the things that bugs me about Siri. Seems like every time I ask about the current temperature (one of my most common uses), Siri has to add some sort of snide comment about how hot or cold it is. It's cute the first hundred times, but then it gets old. I think it's to the point now where it actually makes me use it less because I just don't want to hear it.
Your point stands, and the cheekiness gets to me a after a while as well, but let me be clear that I embellished a little, as she really just says "14.5". If Siri were as clever as I described, I might not mind it as much.
Right, I know it's not that cheeky, but it reminded me of the real comments.
I agree that it would be better if she were that clever. Part of the problem is that it's just so predictable. Whenever it's above room temperature, she'll say "Hot!" If it's below room temperature, she'll say "Brr." Some variety would really help.
I don't like using voice assistants in public as much as I don't like talking on the phone in public: it's loud around so I wouldn't hear the voice feedback, I don't like disturbing people around... However I use cortana on my phone all the time in public by typing my sentance instead of saying it out loud. I assume siri and ok Google let you do this as well, otherwise how do people with speaking/hearing disability use these assistants in general?
I use Siri constantly, to remind me to check pages the next day, timers while cooking, setting alarms, for playing basically anything via Apple Music or a quick search. But people are constantly weirded out. "Why are you reminding yourself to hot glue the thing with the stuff when you get home", well, so it comes up to remind me when I get home...
Why not a quick instant open keyboard when the user holds down the home button (not sure what the Android equivalent is)? Have it suspend voice recognition if the user is typing so the user doesnt have to manually select an option. Then feed the results into the same NLP as used by the voice stuff, and bobs your uncle.
Google Now kind of supports this. To activate the assistant thingy, you hold down the home button and press the mic icon. But in the same screen there are also suggestions based on what's on the screen (for example, if I'm looking at a forum where someone's talking about an album it'll suggest finding it on Spotify), as well as a plain Google search box.
Ever since the Apple Watch came out I've been wondering if there's a market for an always-on throat mic fashionably packaged as a new kind of jewelry or something. Works fine for the military it seems.
Subvocalization is your "internal me," the not-sound you make when you talk to yourself.
Apparently this internal voice actually results in some muscle movements, and it's thought that with the right sensors and software it would be possible to recognize it as speech. Nobody has quite gotten it to work yet, though.
"call $business_thats_part_of_a_national_chain in $town $state." That's a batting practice pitch right over the plate. The "in $town $state" tells it to use Google maps search. From there it should have no problem finding the business and phone number. Instead it tells me "here's what we found for $business in $town $state." Of course the first result is the business I'm looking for and I can find and call the number somewhere on the page but it's almost easier to just Google it directly.
The only time I'm willing to put up with this is when I'm driving and want to minimize interaction with the phone.
Maybe nobody uses them in public because it's faster, more precise and less obnoxious to type.
Looking up car parts would be nice ( "does $chain_store in $town have $partnum" and "$other_chain_stire in $next_town_over have $partnum") is hilariously easy from a implementation perspective since everything but slapping the voice recognition on top and mapping key words to software commands has been solved for a long time. (e.g. http://showmetheparts.com/).