Typing is faster than speaking and less disruptive. I can't really imagine how everybody on a train in Delhi is going to talk to their digital assistants without a noise proof helmet on their head. (or going crazy)
Also we've found many ways to condense information in written form. I can write "IIRC" but it's hugely uncomfortable to speak this way. You can't use smilies or emojis either when talking. In a way natural language is much less feature rich.
It's like that Obama quote about the difference between government and tech companies "Government will never run the way Silicon Valley runs because, by definition, democracy is messy. This is a big, diverse country with a lot of interests and a lot of disparate points of view. And part of government’s job, by the way, is dealing with problems that nobody else wants to deal with."
Try arguing this with the NFB...
National Film Bureau?
National Federation of Builders ?
National Facility for Biopharmaceuticals?
It appears ridiculous that people involved in creating this voice controlled UX are not aware of the inherent limitations involved, but it seems like there is a massive gap between it's inherently beneficial uses that have emerged and it's an area we're all trying to get right.
Personally, I never wish to have a fully voice controlled experience. I could just be old, but voice control is literally almost never an option.
I don't know what is the correct solution but voice can definately be valid option
Mac OS also has transliteration tools built-in for typing Hindi and most other Indian languages (though, again, not Marathi - you have to use Hindi transliteration and hope you get the right word suggestions). You can use the Caps Lock key to quickly switch between English and Hindi. I don't know if Windows has a similar feature but I wouldn't be surprised if it did - it's even more popular in India than OS X.
also typing in hindi using transliteration is as fast as english for me.
Also hindi language search is lacking many features which is taken for granted. Situation for other language are even worse
Look at the result for two same search in english and bengali
That's true. I hadn't thought of it. OTOH transliteration tools are more for desktop/laptop users who use the keyboard that comes with their machine. On mobile (which is where most of these "next billion" users are going to be) it'll be a language-specific keyboard with no knowledge of English letters needed.
Voice will be the next GUI.
If we're talking really sophisticated interfaces like in "Her" I agree. If the interface is working so flawlessly, it's difficult to see why you'd still want to occupy your hands with querying some knowledge base, composing blog posts/letters, etc.
However, I don't think we'll get there with current technology. When I think of the often awkward and cumbersome interactions I have with Siri, I find it really hard to imagine how this will evolve to a 'I don't need to worry about this at all anymore'-level in the next 5, 10, maybe even 20 years.
I suspect the giants of today won't be around anymore when truly voice-controlled interfaces come around.
Bandwidth. You have several degrees of freedom with each hand, with each finger. You have one linear stream with voice.
Latency. You can flick a switch in a few milliseconds, but saying "turn off the lights" or "lights off" takes half a second or more.
Privacy. You can overhear a voice command. You can't overhear (very easily) a buttonpress or touchscreen swipe.
Accuracy. Even with perfect voice transcription, people misspeak easily more than they mistype. And mistyping can be corrected within a few letters, while misspeaking will require interrupting the stream to switch the voice UI into editing mode or something.
Same for turning off the lights: Sure, it's faster if you only consider the flicking of the switch. It's another thing if you also incorporate the time it takes you to get up from your couch/bed/wherever you are without a light switch in arm's reach.
You definitely have a point with privacy though. Also, the volume of all people on a train talking to their smart assistants (though the question remains if this is really so much different from people talking to other people on a train).
EDIT: I misread your point about mistyping vs. misspeaking. Still, the interface I'm talking about does not work in modes. It's able to truly understand you and interpret your commands appropriately.
It still really bugs me how bad UI systems are at error correction.
We have a series of conventions to do this efficiently in spoken English, inflections, quick utterances that call out specific ambiguous syllables, context (the hard one, sure).
Even on a smartphone, if the system guessed a certain word when I swiped it, then I delete the word and enter it again, maybe stop guessing the same word every time?
I think you're right that typing will be more effective in general, but I still see so many areas where the gap could be closed a little more.
Correcting typing on a phone in very hard.
Voice assistants as solutions for people with no hands or eyes, or people who are at some distance from a keyboard and/or monitor - fine. Otherwise, they're CLIs without persistent displays (making even simple multiple choice branching far more difficult: see automated telephone helplines; try to remember what the first option was.)
They seem to be good for setting alarms and sending and reading emails (if you receive very few, very simple emails.) That's something. Otherwise their major use is as assistants for catalog shopping, which is why all of these companies want to own them.
Simple voice interfaces suffer the same problems as command line interfaces while being less flexible and slower to use than even GUIs.
The best voice interfaces have made good progress on most fronts, but discoverable voice are still a big problem. Instead of reading a bunch of buttons, you usually have to guess what features might be implemented. Or you go the route of phone system menues, but everyone hates those
AI will make computers able to understand people perfectly.
Gesturing, talking, writing, any input. The semantic gap will eventually be automatically traversed by understood intent.
From the article:
"To my mind, translation is an incredibly subtle art that draws constantly on one’s many years of experience in life, and on one’s creative imagination."
This is true, I just think that we'll arrive at the tools to accomplish this in the future.
Specifically I think(hope) it will be through clever application of GANS and reinforcement learning after a few more applications of moore's law.
Advanced AI would be able to learn about us through replaying years of possible generated experiences.
Eventually, “our” devices will be able to read lips and recognize other subvocal gestures.
Another thing I'd add to that is "what about languages that don't have a good keyboard input story?"
I find the qualities of someone’s voice, intonation and hesitation much more feature rich than emojis, in terms of transmitting information.
Personally, I think it's even more uncomfortable to hear people speaking that way. But that's just me!
You could do ten searches a day without spending much time at all on the phone. That's probably just as much value as other people get spending all their time on games or social networks.
If you're a company like Google, that is well established to last for another couple decades, you'd be foolhardy not to ride this massive wave.
Right but aggregate GDP is not as important to these companies as GDP per capita.
> If you're a company like Google, that is well established to last for another couple decades, you'd be foolhardy not to ride this massive wave.
Actually I'm pretty sure the big tech companies can make most of their revenue from high GDP-per-capita users. Not saying this is the moral choice, but financially I'm not sure that these less wealthy users will move the needle appreciably for the giants.
For example - Offline maps feature which Google built for India is now used all over the world and turned out to be a very useful feature.
Except the next billion is already using Weixin.
They are so concerned because they see that the next 1B is the only remaining userbase reserve that can possibly feed their probable contender.
> for example, asking “Do I need an umbrella today in Delhi?” rather than typing “Delhi weather forecast.”
I routinely ask Google in the USA, "will it snow tonight?" and it is nearly always wrong. I don't mean that the data it accesses is wrong, or that the prediction was wrong. That stuff is usually right. I mean that Google will read the weather forecast's 45% chance of snow and then it will say confidently, "No, it will not snow tonight" as there is a near-blizzard taking place outside my window (because the snow was not "scheduled" to start for another 20 minutes, at a 45% chance, despite the fact that the snow is plowing down outisde). Or if there is a 100% chance of snow for 15 hours in a row, and it starts at 1am, but you ask at 11:55pm if it will snow tonight, it will say "No it will not snow tonight."
Or if it is going to snow between 1am and 5am and you ask at 11:55pm the night before if it will snow tomorrow, it will say "No". You have to ask, "Okay Google, will it snow early in the morning tomorrow?" if you want to know if it will snow overnight.
Google's example query for the Next Billion needs work. "Okay Google, Weather forecast" and then reading the results like on the web produces a 100x faster experience sometimes with much much much better results.
(Aside: has it snowed even once at Google HQ since it was founded?)
Edit: And don't get me started on temperature. On the Home Mini, "Okay Google, what temperature is it?" .... "35." 35 what?! Yes I know you have a settings/ units preference somewhere that I cannot see or access right now (that's why I'm asking the magic box for the answer!). But 35 is not a temperature, unless you are giving it to me in Kelvin and choosing to not say the unit, which is WTF too.
If you say "Set alarm for 7", it will do 7pm, even if you use 24 hours time. And I know no one says 19 o' clock when speaking but it is interesting.
"Set alarm for 7 tomorrow" is 7am.
"Set alarm for 7"
"Do you mean 7 in the morning or 7 in the afternoon?"
"Setting alarm for 7 AM."
Everything on the internet is also becoming like everything on TV/print it's just the same dribble. Once when there was a sense of community now resides a vast echo chamber.
the fact that typing is difficult for people who never grew up with a computer keyboard
Point being, the AI doesn't know why I'm asking. It could make some reasonable guesses, but currently it doesn't even do that.
I could deal with that by learning the whole landscape of that it guesses about my question, when it's smart and anticipates my needs and when it doesn't, when I can and can't rely on it to do the right thing magically. Or I could just learn how to ask the question the way it needs me to, and that is a lot less to learn and worry about.
What rides on top of the internet is what will change with the next billion users but the infrastructure will stay more or less the same. Also, just because the developing world works differently doesn't mean I'll throw away my keyboard.
Mobile is not a replacement for that, yet.
But you need to learn it to program the internet. Get your muscle memory familiar with the hundred+ year old QWERTY layout.
> The next billion users are not becoming more like us. We are becoming more like them.
If we are to take his words on face value, then I would imagine so ...
Rather than understand and control our technology, we are forfeiting privacy and determinism to global megacorps larger than half the nations on Earth while they figure out what you should want rather than doing what you do want.
Lets just deep dive how horribly dystopic the "dream" is of the Brazillian cellphone user in 2022:
Proprietary mobile handset manufactured in Chinese sweat shops with pre-installed state backdoors that the user has no idea about because they don't even know what a computer is. But the state agents that can on-demand turn their microphones and cameras on to spy on them, and their ISPs recording and mining all their communications are fully aware of what they are doing.
Said device will be solid plastic bodied with no removable battery, such that battery replacement requires a solder gun and a semester of community college engineering to accomplish. If any individual part breaks, its live with broken hardware (cameras, gyros, gps, wifi, damage the screen, etc) because costs to repair are so catastrophically high or near impossible to do and parts are made unavailable on purpose that the device is designed to be disposable, to rot in a landfill where it can leak toxic chemicals from its manufacturing into the soil and air.
Locked bootloader to prevent anyone from running the software they want to on hardware they own. No documentation on how that bootloader works, no ability to payload alternative software from ROM. The SoC is proprietary, the baseband is a trade secret, nobody can operate their own cellular network because all the IP is top down policed by the state and privileged ISPs.
Kinda-open base OS. I'm still to this day not sure how Google messed up making Android a proprietary hellscape in the early days, but its unlikely Android 12 will have a new kernel or bionic / other base userspace. However...
Wholly proprietary app stack. Google apps are all proprietary, Google Play Services control about 80% of the functionality of the device, from poweron to shutdown everything is logged and sent to Google - GPS positioning, website browsing, microphone recordings, photos and videos taken, and all touch events / key presses.
Social media / ad farming shoved down the throat from day 1. Preinstalled Facebook, Instagram, maybe Twitter and / or Snapchat on the homescreen. No instruction manual on the technical details of how the device works. No means to even know you can or how to program the device yourself. No IDE, no compiler, no shell. No ability to install apps outside the Google Play store without knowing to navigate settings to toggle off the app lock.
And how would you even begin to start searching for information on the device you have? Oh right, Google. Who modifies your search results based on what they think you need, rather than what you know you want.
Its entirely meant to predate off the poor and ignorant. They aren't worth much to global corps, but advertising is damn good at operating on fractions of a cent, and they are already doing a good job draining the creative and emotional health of the first world into addiction rattled social hell.
And really, nothing summarizes more how horrible this is than in how they want you to use Google Assistant as your input - yes, Ma Google is here to record everything you say and interpret the best thing to give you based on your words since you cannot directly interact with the device because you are illiterate. You have to use our proprietary remote listening service to use your computer.
Risc V save us.
We, ourselves are "early to the game" of the next wave, and "late to the game" of previous waves (e.g. personal computer wars, mainframe era, industrial revolution, etc).
See "Facebook and the New Colonialism", https://www.theatlantic.com/technology/archive/2016/02/faceb...