I'd love it if HomePod recognized who was speaking, but I'd settle for Siri just recognizing "this is a kids voice" and ignoring it!
I think a retroactive activation feature would be the way to go, ie. be able to say “Siri” at the end of a sentence and have it parse the phrase that came before. It feels more natural to say “skip this please Siri” than “Hey Siri, skip” because it lets you amend the mistake of not identifying the subject earlier in the sentence rather than the phrase-exasperation-hey Siri-phrase.
I asked Siri to DJ and my daughter said “hey Siri, I don’t like that song”. Siri said “I’ll remember that” and played something else. Now I have no idea how to amend Siri’s notion that I dislike one of my favourite tracks. There should really be a Homepod app where you can see recent interactions and correct misinterpreted ones.
There are also some artists and tracks with unusual spellings that are always going to cause problems for a voice interface and so they need a solution for that.
Try saying that three times fast. Actually, try saying it just once fast.
I always end up spitting out something like "Okaygggeellggglle"
People want to believe they're talking to a personal assistant, even if that isn't true, and Google is stupidly stripping them of that.
Amazon has it right on this with Alexa.
We experimented with this extensively.
I'd also be interested in the differences between a collectively-optimized wake word (without personal training) and one which the user personally trains against their voice à la the parent article. To what extent does that personal training supplement/supplant (or fail to do so) the "good enough for everyone" wake word kernels?
We considered literally thousands of potential words. We also really didn't want anything like "hey" or "ok" to be required.
Note how long it took for "Computer" to be available - and you can imagine how many of us wanted that. I think it gives an idea of the difficulty.
But thanks to the hubris of corporate branding and marketing departments we are unlikely to have this universally understandable command usable. Instead to use any light switch every consumer will need to ask what bloody set up the person has...
The homepod knows to activate when I say hey siri, but my friends phone activates instead of the homepod whenever they say 'hey siri' about %50 of the time.
"I helped Apple recognize speech" for those struggling.
† With “isle of”, I have two syllables and a velarised (“dark”) L in “isle”, a mid-central vowel (/ə/) in “of”; with “I love”, I have one syllable in “I”, a non-velarised (“light”) L and an open-mid back unrounded vowel (/ʌ/) in “love”.
Without a special mode, you'd likely say "Hey, Siri, play Isle..." "Okay, I found--" "Dammit, Siri, I wasn't done!"
Yep, tried that too... but admittedly the sound very similar. I keep getting "I'll of dogs" and "I love dogs".
I'm excited for the day Siri can differentiate my voice and my wife's so we can use a single HomePod with two iCloud accounts. For example, Siri reading off calendar appointments from an iCloud account based on who's asking.
Should one be able to "Personalized" to his/her own phase PER Device such as "Hey Commander Data" for Iphone 6, "Hey Counselor Troi " for IPad Pro or "Ok Skynet" for the google phone?
That should also solve the problem where someone said "OK Google" in TV Shows/Podcast and wakeup all the google devices.
With the latest "Deep Voice" tech, the device should be able to response with the actors' voice too.
I remember seeing a clip of Chinese TV show when the actress said (in Chinese) "Honey, Honey where are you?" and the phone on chair wake up and said "Honey, I am here, I am here. " And tons of folks in the youtube comments ask "What kind of phone is that and I want it!". I told my wife that and she immediately said she wanted it too.
I still don't know which brand of Chinese phone actually has that features or if that was just imagination of the TV show writer. Someone should build and and I know my wife wants it.
There are many, many, many samples of people saying the approved trigger phrases. Many, many, many samples. To use your own custom trigger phrases, you would likely need to say it thousands of times to being to approach the accuracy the current systems ship with out of the box. Maybe tens of thousands.
That's just my conjecture, though.