I just don't get this, or any other "VUI"/voice-centric platform for that matter. The killer feature of the smartphone or watch isn't that it's the most convenient (which it is), it's that whatever you want to do on it is at least somewhat private. I don't want the guy next to me on the train to know I'm messaging Andrew, and he doesn't want to hear me message Andrew either. Asking me to speak out loud these commands removes that privacy. I think this type of "out loud interface" is the wrong direction for personal devices... forcing us to expose our "private selves" or conflate that with our "public selves" is really an area where humans need to draw the line, IMO.
This is why Alexa (and other voice assistants) are only really valuable in the home, and typically as communal devices... its mode is public by default. "What's the weather?", "Play The Beatles", "Add milk to my shopping list" are not expected to be private. How does a device like Humane offer us an "incognito mode", where everyone within earshot doesn't know exactly what I'm doing?
I agree with you. The killer innovation which will flip this is the ability to interact via hardware that takes sub-vocalisation as input. There's work being actively done in this area:
Because talking is a function of conversation between two or more people. It isn't the act of speech as a one way carrier. I've been able to dictate to my computer for years but nevertheless still prefer to type. Dictation is actually a skill one has to learn, as in the case of people such as lawyers and actuaries who don't type their own letters, it doesn't come very naturally to most people.
Because people find the concept of a person talking to themselves in public a bit weird, and talking into a completely unthinking machine is basically that. Maybe perceptions could change when low-latency conversational AI is very widespread but I think for the medium term unless there's a second human involved, people will still instinctively see it as talking to yourself, not talking to "someone".
Siri/google/alexa generally understand less without repetition, especially as complexity increases, than our fellow human. Making it doubly annoying in public
Yes, but we lack good input methods for on the go. Typing on the phone is ok for quick texts but not much more. AR has this issue as well. I tried things like chording keyboards etc but nothing really works and so lacking that, it’s going to be voice…
This is why Alexa (and other voice assistants) are only really valuable in the home, and typically as communal devices... its mode is public by default. "What's the weather?", "Play The Beatles", "Add milk to my shopping list" are not expected to be private. How does a device like Humane offer us an "incognito mode", where everyone within earshot doesn't know exactly what I'm doing?