In a way, I think it's even worse than that. I've used Alexa since the Echo was a relatively new product. Back then, I experimented with new phrases and commands often, but was frequently greeted with wrong answers or "Sorry, I don't know how to help with that." Over time, I stopped trying those commands. Skip forward to today--the backend has been improving for years, and many of those commands now work, but it's too late. Their users have already been taught that they don't, so folks stop trying to use those features. Not only can you not discover new commands easily, you might mentally blacklist useful commands permanently.
Perhaps more frustratingly, the "What's New" emails they send out don't help with this. They never say "Hey, we know you tried to ask your Echo to report its volume level before and it didn't work, but it does now." They always say "Ask Alexa to tell you an Arbor Day joke!" -_-
TBH, I don't understand why they couldn't add a circuit that checks if the microphone is being activated by the device's own speaker, and ignore it. Some sort of simple phase inverting circuit from the speaker that gets summed with the microphone input ought to do it. It's not exactly rocket science and they're doing complex phased-array/beam-forming stuff with the microphones already, so it's not like the engineers are idiots?
It's a very impressive infrastructure, and works pretty well too.
"Hey, 3 months ago you asked for the forecast according to the ECMWF weather model, and I couldn't answer you. But now I can, so go ahead and try!".
> "Hey, 3 months ago you asked for the forecast according to the ECMWF weather model, and I couldn't answer you. But now I can, so go ahead and try!".
What might work more effectively is
"We've been improving our weather forecasting and you can now get the forecast in your area according to a range of new models, including the often requested ECMWF weather model! Give it a try now!"
Thinking back to this classic where Target figures out the trick to not creeping out someone they suspect may be pregnant is to shuffle the maternity coupons in with a bunch of other random stuff: https://www.forbes.com/sites/kashmirhill/2012/02/16/how-targ...
What Target discovered fairly quickly is that it creeped people out that the company knew about their pregnancies in advance.
“If we send someone a catalog and say, ‘Congratulations on your first child!’ and they’ve never told us they’re pregnant, that’s going to make some people uncomfortable,” [snip]
So Target got sneakier about sending the coupons. The company can create personalized booklets; [snip]
“Then we started mixing in all these ads for things we knew pregnant women would never buy, so the baby ads looked random. We’d put an ad for a lawn mower next to diapers. We’d put a coupon for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance. And we found out that as long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons. She just assumes that everyone else on her block got the same mailer for diapers and cribs. As long as we don’t spook her, it works.”
Sounds much closer to hunting, literally targeted marketing.
I mean, you gotta listen to every word that comes out of the customer's mouth right?
Their Principles are too reductive to even guarantee that copies of other people's ideas are successful. For as much of a genius as he may be, Jeff Bezos has created an organization of automatons.
[obligatory disclaimer: I worked there years ago, but no longer do.]
I'd argue this is the wrong thing to solve for. The best skills take a long time to make and require privledged access. Sonos, Spotify and others have this, and they work amazing. It's the large mass of Alexa skills they were quickly made and that don't have privledged access that are dragging the whole experience down.
And this is entirely Amazon's doing.
It's humbling (and annoying) when you think of yourself as a power user of some application to find out from some semi-new user of it that there's a new feature that would have made your life much easier if you had only known it was added a year or two ago.
There's only so much time to learn about the tools you use, much less the changes in them over time. How do I find out if GNU grep has some interesting new feature in the version that ships with the next version of the distro I use? Or rsync? Or tmux/screen?
This problem definitely seems worse with smaller tools, since there's a lot of them and they are often packages together by some other party. Not every project is as large and used as much but such a large population as Firefox, which arguably does a very good job of advertising new features. But you can definitely tell it takes them a lot of time and effort to do so, and a smaller project may not find much traction if they tried the same strategy.
This makes me shudder. Such a functionality would mean that they stored your failed request in some sort of database, a database that they later used to send you a personalized marketing email. A machine cataloging our voice for later inspection is a very dark future. Alexa should delete and scrub any iota of voice that it doesn't instantly understand.
"Oh, remember last week when I though you were asking me to buy you a pot plant? I now realize you were asking me to buy you some pot from Canada. It will be arriving in two days."
"I didn't understand it at the time, but I now realize that you were yelling at your husband Alex, not Alexa. Your social credit score has been adjusted to reflect this negative interaction."
I don't think this kind of thing would need (or even benefit from) actual voice recordings when it comes to the "new functionality" part of things. I would expect something more like a running tally of the X most common failed interactions after running the transcript through a fuzzy-match filter, and a long list of accounts associated with each.
This alone would make for a much more usable experience, and yet no voice assistant has implemented it yet, which flabbergasts me. I don’t have much knowledge in this particular realm so maybe I’m missing some big blocker that prevents it from being possible, but to me it seems like such an obvious thing to do.
This is a really important point, and both a strength and a weakness.
The iPhone began as essentially a front end to existing services (with visual discovery, which a voice interface inherently lacks). I used to name mine "FEP" (as in Front End Processor) -- a front end to a subset of "real" computing or as I think of it these days: multiple windows into a shared computing space.
A watch (like the apple one) is really a crappy general UI device; discoverability is pretty bad because of the limited area and speed. But it's great in the role the phone had: subsetted interface to a limited number of "real" computing tasks (yes, it has a few of its own tricks too but mainly as data collection for apps on your phone).
Thompson captured this issue by talking about devices and software in terms of "the task it was hired to to". The problem is the voice assistants haven't figured that out yet.
The thing about this approach is it creates pressure to make more functionality available at the edge.
Alexa and google home tried to jump right to the edge in one go, which skips too much phylogeny.
Apple seems to understand this, but gets it wrong in the opposite direction: the iPad hasn't moved far beyond being "most of an iPhone but with a larger screen". And if you have an apple speaker, a phone, and iPad and call out "hey Siri, set an alarm for 10 minutes" you may get three devices chiming in 10 minutes. They don't act like a single device.
Edit: there's also a third-party skill that will let you play old-school adventure/interactive fiction games. https://www.amazon.com/Vitaly-Lishchenko-Interactive-Fiction...
1. An alarm clock / kitchen timer
2. A thing that tells me the weather report during morning coffee, so I know what to wear
3. A DJ that my kids yell at to play pop music
If it died tomorrow, I'd probably just go back to using my phone for these 3 things rather than buy a new one. I can't imagine getting into it enough to explore third-party "skills".
1. Metric / imperial conversions, especially in the kitchen - this is one feature that is miles better than using a phone or computer if my hands are dirty cooking something and I want to know how many grams 12 ounces is or something like that.
2. Intercom between my 2 google homes, one in my kitchen and one in our converted attic playroom - it's so much nicer to use Google home to call my kids down for dinner vs. screaming up two flights of stairs.
3. Making quick phone calls
4. Finding my phone - I lose my phone constantly, and it's super handy that I can get my google home to make it ring even if it's on silent.
* Setting multiple alarms (why can't apple do this??)
* Food questions
I think it would be rad if I could feed it a recipe and then have it read me the ingredients and instructions for each step. Maybe that exists?
It launched without that ability but an update sometime last year lets you set multiple alarms.
3. "Hey Google, call my wife". or mom or person's name.
4. "Hey Google, find my phone".
4. Shopping List.
And that's a killer feature for me; it's so damn convenient to use a voice interface to add items to the shopping list as you run out of item that I'd probably replace it just for that.
As a plus, he babbles at the Echo when he wants to listen to music.
I use it for:
- I have to go in 1h but i only need 30 minutes to prepare the leave so timer for 30minutes
- Pizza / food
- short nap
- for learning
And quite often.
Find the phone, unlock the phone, find and open the app, find and click the control. That requires some effort.
Saying commands, having physical buttons at the expected location in your home, NFC tags.. there's a bunch of more convenient ways to do extremely repetitive tasks that would otherwise take you more time to do.
Some people don't consistently have their phones on hand when they're at home, which would do it. And some people have problems with unwanted voice activation - my experience was that the voice-printing on "OK Google" was not actually all that personalized, and sometimes it did totally unexpected things like breaking out of navigation to make a phone call. Last time I tested it, keeping Google voice recognition on was also a massive battery drain, but I assume that's improved.
(That phone call was particularly ludicrous: Android offered me an unprompted 'helpful tip' that I could say things like "call mom", but since the tip fired when it was already listening for a navigation command, it accepted its own instruction.)
We also user audible heavily. The kids live the boxcar children readings.
The first thing I wanted to do was to add a feature so I could add a task to my to-do list software which is not supported by Alexa. It turns out that you cannot construct a sentence along the lines of "Tell Asana to add a task: <task>". You can't actually have a 'slot' which contains a freeform piece of text, even if it is the last piece of text in the sentence.
The Alexa API differs between regions, so the North America version of the API supports this but the EU version doesn't. It was removed from the NA version for a while but placed back after a bit of an uproar.
I think you could develop some more useful stuff using Alexa if only this feature was consistently available. I cannot think of a good reason why it isn't.
I now just mainly have it as a Spotify speaker, and occasionally I use it as an expensive egg timer. I normally have it on mute because I find it activates and starts recording private conversations. I don't see it getting better.
The most useful 'basic' behavior I could think of for a smart home device was to check the weather and trigger my alarm earlier in bad weather or heavy traffic. I knew IFTTT was capable of running scripts then a trigger happened, checking the weather and traffic, and setting off an alarm, so it seemed obvious. I literally wanted "if this (or this), then that"!
No such luck. The basic IFTTT setup couldn't do it at all, no existing app could do it, and the developer program was invite-only. I got in after quite a long time, and even then it wasn't obvious. IFTTT wanted to treat 'check weather' as a script output exclusively, which I couldn't feed into any other system. The best I could do was be told the weather when I woke up. So, I stuck with the phone that could already do that.
That specific situation might have improved, but the general ecosystem doesn't seem much better. The only smart-home features I see available that I would use are list-making, music/media playing, and quick reference. But the features I would actually value are dense integration between apps, floating scripts like the one you describe, and non-user-triggered events to turn active tasks into passive ones. Those seem to be the features which are least available, even when they would be easy to implement.
What about AMAZON.SearchQuery?
However I'm just thinking of getting rid of my Alexa devices anyway - I was discussing political events with my partner last night, spotted the Alexa device light up (I think I said "a letter") and noticed shortly after that I was moderating my speech and not referencing controversial topics. I later just set it on mute.
I'm very anti these devices, now I'm even moreso!
Is that a misfire? It reacted to the trigger phrase. Have you trained it to only react to your voice (I know that's an option for Google assistant, but I didn't do it for mine).
You can at least change Alexa's wake word to a couple other options ("Echo" or "Computer").
It's the same for Siri, the API is so restrictive.
The whole "tell XX to YY" isn't convenient.
One workaround on Google Home is to use IFTTT, and in that case you can customize the whole phrase and response, and then it's pretty nice, even though the latency is high.
"-Okay Google, open the blinds -Sure thing Commander" never gets old.
You don't have to do that - the system can usually infer what skill you want from your utterance. . And the "open the blinds" thing can be done on Alexa with a custom routine (Alexa specific IFTT).
Disclaimer - interned on the team that built this
I don't see a great future for voice assistants in the near future, until we truly solve the problem of intent (in a natural way) and accordingly can respond.
I can't imagine Amazon would hand the "Alexa, order more bread" keyword over to Instacart/Wal-Mart/whoever without a fight.
The whole skills ecosystem feels like an awkward stopgap on the path to AI. The language required to invoke them feels particularly clunky - "Alexa, ask ThingFinder about a thing" - and then, is the user supposed to be talking to ThingFinder now, or Alexa? She still sounds like Alexa, but she doesn't seem quite herself.
As developer choosing an invocation name is fraught with difficulty. There are only so many natural sounding names for something which does a particular thing, without incongruously inserting some invented branding word onto it - "Alexa open Tidy Tide Tables". If someone's already using the most natural name you are free to use exactly the same, but then who knows whose skill will be launched? And you'd better make sure your skill's name doesn't clash with anything else in the world at large, like the entire history of music for example, "Alexa play Wicked Game". It's all a bit of a mess.
Kinda reminds me of Neuromancer, where a superintelligent AI was broken into two pieces to avoid detection. One part was good at personality, and the other part had to mimic people in order to communicate. It was very unnerving for people to talk to an AI that was copying the personality of someone they knew.
It's like saying 'There are 80k mouse enabled apps and no runaway hit'.
What it can do is have an app for nearly anything, though, that makes sense for the form factor. It's on its way. The next step of its evolution is to look at what apps work and what conventions can be pulled from those as a general standard. Users will be much happier when they can nearly instantly download an Alexa enabled interface for an app and have it work intuitively. And this is doubly important for Alexa because there isn't deep feedback like you have with a mouse where you can see the things you aren't doing to gather hints at what's possible -- you just have to know or, at least know how to find the answer, like a command line.
Which gives me an idea 'man for Alexa'. At least we can standardize a help menu.
That being said, it took 22 years to get from Engelbart's original mouse to the first version of Windows that really took off (3.0), so perhaps we're just too early on in this product cycle for the hit to have emerged yet.
Now that companies realize there's a product here, there will be an arm's race. Look for big improvements in next decade. Lots of people and billions of dollars are about to go into making these products better.
There were advancements in Text-To-Speech and/or speech recognition, stretching back to the early days of PC/Mac e.g. DragonDictate using Hidden Markov Models, IBM ViaVoice, latterly Nuance Dragon NaturallySpeaking. Although, they were not perfect in their early iterations. However, they got progressively better and fairly impressive once trained properly.
I would rather draw comparisons with their current day counterparts like Amazon Polly or Lyrebird et al., than associate voice assistants with a paradigm shift.
Like taking the Internet away?
At any rate, being connected to the cloud will allow for faster iteration. Once it becomes a solved problem, then you can more easily remove the network.
Google, Apple, Amazon can update their voice devices continuously without a local software release. If the next generation voice algorithms needs twice as much hardware, your $30 device will still work because the processing is in the cloud.
Video games, for example, have been trying to move to the cloud. Put all the code in the cloud and just send the pixels
"The cloud" is little more than a hyped-up, glorified business objective; human beings shouldn't have to be connected to the hive mind to enjoy the full benefits of technology. Nothing smacks of SV-style elitism more than the proliferation of "the cloud"
And you can pry my locally-installed videogames from my cold-dead hands. I hope every single streaming startup in that sector fails spectacularly.
Hopefully, with 100 million Amazon devices, and growing fast, you’ll have your product within a decade or two.
And it's 100 million amazon devices because Amazon pushes them relentlessly and has been for some time. Their utility is questionable, even the article mentions that.
Did you know over 3 billion devices run Java?
Trying and failing for a decade or more, because people like low latency and being able to customise things.
I like the way you say that. My Google Home is a remote control for my mouth. It's like licking a keyboard one key at a time in the dark. Simple queries are easy, but any non-trivial query just aint't gonna happen at the moment.
Ewww. Good analogy, but eww.
I do agree with your latter sentiment. It does feel like a command line at times. More so like trying to figure out a text adventure game. Zork would be very difficult if you didn't know the basic functions/words. That's what Alexa feels like most of the time for me. I'd love a help menu.
Only if you conveniently ignore the AWS services it uses.
I don't know how much I'd use Alexa if it weren't for Spotify. There are some native apps but its real value is in the 3rd party apps it connects to.
Plus you would still have the entire Amazon app ecosystem. While their apps may not seem perfect they do have a free music section that could at least partially replace spotify.
Other first-party apps on Alexa allow you to access lot's of other Amazon services like purchasing items from their website. That's a lot of functionality even if we are excluding third-party applications.
I mean, look at VR and AR and, hell, even AI.
But AI is not useless even though it hasn't reached it generalized intelligence promise. It is adding tremendous value even though it has landed in the limited middle.
I don't mind learning Alexa's syntax and what it expects of me. I get value from what it can do well. As long as that's true, I think it can miss its more grandiose promises and still be a huge success.
I’ve been working on a an Audio App that reads any articles to you for iOS/Android and planned to bring it to the Echo and he seems to suggest my efforts are totally wasted. Although their market reach seems to be massive, likely related to their cheap price, not surprised by this article that their actual usage is incredibly low. Most people seem to buy them and then forget about them.
As a shameless self plug, if you would like to check out my app that reads articles to you using beautiful sounding AI/ML, find it here:
Wife uses the daily news rundown.
Kids ask Alexa questions (what’s the fastest bird), have it make fart sounds, and that’s about it.
I got a pair of buzzers to play the quiz game, it was so horribly janky they’ve been used twice. I scan the the apps list and nothing strikes and as worth even trying.
Whenever my wife and I go away we usually remark to each other that it suddenly feels very backwards to have to do things without the Echo. That might not be the best word, but it definitely feels like we're missing something integral to our home life when we don't have them around.
That being said, it's expensive to set up Hue lights everywhere and an Echo in every room. My setup might be one of the reasons I use it so much.
Those are by far my killer features why i use it and why i like it.
I haven't looked at any amazon skill store as i don't see any reason for it.
The inability of any of these solutions to identify the user in a meaningful way or interact usefully without involving the whole room kneecaps the utility of the products beyond actions that you take in a public space.
Turning on/off lights, wireless speaker and replacing the landline are long term probably the killer apps. Not trivial, but no smartphone either.
EDIT: Is there a way to buy the app, or do I really have to rent it for $60 a year?
I would like it to work as a room of human experts works with a moderator.
Alexa, how long will it take me to cycle to Netto?
Alexa sends this parsed query out to all apps it thinks can answer.
They respond with a confidence level and answer / follow up question.
Alexa decides which app to choose (the 'PageRank') based on a number of factors like has the query been seen before and how did each app perform for it, the app's answer success rate, has the user been given a response from this app before etc. etc.
The user should not have to install apps. Alexa should know all of the apps in the room and what they're good/bad at and select one to respond.
Surely discovery is the wrong way to think about this problem and Alexa needs to be elevated from a good speech parser.
But what if it /is/ just a toaster? Tech people tend to have high hopes for technology but technology must serve a purpose and it will go the way of the dodo if the purpose of not strong or clear. For me, having a voice interface for music and calls is enough justification for such a device, and I assume this is true for others as well. It's a classic case of over-engineering and then shopping around for a use case for the 25-use Swiss army knife that no one asked for.
I use it for, in this order
* Kitchen timers while cooking
* Setting simple reminders without having to go into a phone app or some crap (alexa, remind me to do X at 10am)
* Kids asking random questions (alexa, how many moons does jupiter have)
Other random stuff like weather maybe, when a sports game is on...
But this is all just pretty basic stuff you would expect from a voice interface... to me, it has all the "killer apps" built in with the above.
This was "the future" few years ago and we are living in it now :)
Your phone needs to be in range of the 'dumb' speaker, needs to be paired to it / connected to the same wireless network, can't be used for any other audio functions, and probably can't be in your pocket if you want it to be listening for voice commands. It now also can't be trained to your voice specifically if you have others in your home who want to use it, which makes it ripe for abuse whenever you leave the home.
I wouldn't assume anything like that.
A big part of the problem for a long time with developing Alexa skills is that if you needed to know where the user was, having them enable location permissions was insane. It required the user to have the Alexa app installed (which almost no one does), then navigate 3-4 fairly technical looking screens and finally flip a switch.
I recently wrote a demo app for a friend's company that needed to know where the user was, and I jumped through a bunch of hoops for an otherwise simple task to avoid the app-permissions-dance. The app I wrote was demoed to the Alexa team by my friend (his company was doing a deal for the company's data to be supplied to Amazon), and the feedback he got from the Alexa execs was they know the location permissions process was a big issue.
I think it's gotten better recently with the redesign of the Alexa app, but I haven't checked it out yet - I don't have any desire to release a public Alexa app. I just to keep my stuff in developer mode and customized to my needs.
"Alexa set a timer for 15 minutes." (She's a cooking timer)
"Alexa what time is it" (She's a clock)
"Alexa switch the tv off" (She can't switch it on)
"Alexa, meow meow/Pika/fart" (Novelty nonsense)
"Alexa, turn the heating up in the living room" (Nest integration)
Adding things to a shopping list is the other one that works nicely.
That music integration with sonos sounds really nice.
Nothing groundbreaking though.
A device with multiple microphones is more useful from the couch, especially when I need to yell into the kitchen. I don’t want to feel the need to keep my phone within earshot at all times.
I wish Apple had such a $30 device for Siri but it’s unlikely.
edit Also, I completely agree. I wish there was a HomePod Dot type offering. I've been sprinkling HomePods throughout the home, but it's expensive and also obtrusive in places where you don't want something that bulky.
Personally, I think that the challenges that disabled people face are great for all people. Thinking through all the permutations that real people with disabilities face opens up tech to new ways of interaction and design. The canonical example is the little ramp on the street corner, where the side walk and street interface. It was designed for wheelchairs originally, but the elderly, bicyclists, delivery men with trolleys, and all manner of people use them as an improvement in their lives.
If Amazon focused on the disabled and their use cases for the Echo-dot, I think that they would find many other applications that enhance all of our lives. It may not be an 'essential' thing for the fully-abled, at least not consciously, but the enhancements are enjoyed by all and well worth the costs. Perhaps tech that focuses on the hearing impaired (the number one disability in the world), or the voice impaired, or amputees, may help all of us lead richer and fuller lives.
There's a sentiment that everyone should use them, because it normalizes it. Even it's not verboten for non-disabled people to use the door button, there seems to be a feeling that "I'm breaking a rule" when I use it and that is changing I think.
This is small but I think defines the missing design principle of Alexa. Yes I want the music to be low when Im doing active 1, but the introduction of new action of knowing the weather I need to hear it. These are the little "smart" features that are needed for a product that Amazon is striving for needs.
Edit: I actually say Echo because I think we should stop personifying technology.
Here is a full rubric for whether or not a product is well designed: https://uxdesign.cc/the-design-critique-rubric-how-to-determ...
The hardest part might be that most apps for Alexa aren't privledged. You can't just ask Alexa to do the new skill. You have to first open the skill and then ask that skill to do something. "Alexa open the Food Planner app." This hitch in the process makes memorability very poor.
I am an advanced Alexa user, and I almost never use any skills. They just aren't integrated very well. The one I do use is Sonos (and Spotify to power it), which is given priveldged access and allows me to attach an Echo Dot and Sonos speakers to the same room in my house, so that when I say, "Alexa, play The Beatles," Alexa plays the Beatles in the room it hears me in on my Sonos speakers with music from Spotify.
When you use a combo like this, it feels pretty magical, but most of Alexa can't operate like this, so it feels really stitled, and the user experience is quite poor.
* Asking for the weather
* Play some (desired) music
Are there other things that these devices ("smart speakers") are really useful for?
A voice activated browser or computer, on the other hand, is extremely useful for those who don't know to type or to use a browser (or a computer). It could be people who are not privileged enough to learn or people who find these very difficult to learn or even small kids. These systems allow the user to accomplish what they want (like searching or opening a website or a game, for example).
Think, for example, sorting your email inbox by "importance", not based on some flags or fixed heuristics but an actual, intelligent scan of the contents. Think of having reasonable phone conversations that result in scheduling appointments and answering concrete questions.
But as I said, they all fall under the category of "mildly useful" and not "ground breaking"
* controlling smart-home stuff
* child/baby monitor
* home telephone
The opportunity is to figure out how to better utilize the voice based medium. No one has done it yet. When they do, it will also likely improve the experience around screen readers and accessibility.
I think the new feature in iOS where it guesses what I want to do (send a message to ABC, for example) based on previous patterns is promising. A whole screen of these actions would be great.
Sure, I won't use a conversational interface to build the next Photoshop.
I could tell it stuff like, what I ate and it calculates my kcals or macros and tells me how much I have left to eat this day or what other stuff I should eat to hit my macros etc.
I could tell it what I bought and it would categorize the bills.
Some things are just too bothersome to do with my hands.
Generally, however, I think we've overestimated the utility of a voice command line with no actual intelligence behind it. Unless you are using Alexa to control actual devices like lights or a Roobma vacuum, most of the apps for it are either useless or better suited for a visual interface.
I’d like to ask it the meaning of Latin phrases or other words, but it just hears the words used 99% of the time despite being clearly pronounced differently.
Ask it what bus and at which time I should leave to go somewhere. It doesn’t have any transportation info and can’t do basic logic.
Duolingo would be a great integration for conversations while learning another language, but it can bearly understand English.
I'm building something that I think is much more powerful than smart speakers for the following reasons.
1. It's software and lives on the hardware that you already have... your computer. It's not a standalone hardware device silo'd from the existing tooling you use daily.
2. It provides visual feedback (live transcription, with colors indicating what's understood), because it can. Smart speakers can't.
3. It's a browser extension, so it easily integrates with all the webapps that you use daily: gmail, google docs, hacker news... any webapp you want it to.
4. Plugins are intuitive to develop and open-source so anyone can build off and improve them (https://github.com/lipsurf/plugins)
It's a work in progress, but in case anyone is interested: https://chrome.google.com/webstore/detail/lipsurf-voice-cont...
* Sell Audio (Spotify, Audible, MP3 style): If the goal is to get into this business than you'll have much better reach if you sell on smartphones. The voice-assist app will be supplementary at best.
* Sell products (Amazon/Ebay style): People aren't generally comfortable buying products with voice-assist. You'll have to compete directly with Amazon.
* Market products (In-line advertising style): You'll have to generate a lot of original content to plug sponsored products. The original content will have to be very compelling. It's probably more appealing to host the original content somewhere else primarily.
* Sell audio advertisement: What make the voice-assist platform compelling for content consumption? How can you create a very engaging voice-assist app? Will people tolerate audio advertisements? They tend to consume more time and be less interesting than visual ads.
I also cant often remember what I'm supposed to say and I sometimes forget my words as if I've been put on the spot somehow. Overall its not a pleasant user experience.
Having played with the Alexa and Google home. My take on why there are no hits is that it's very clunky to invoke apps, especially for one-off queries.
"Hey, Alexa/Google, Ask <keyword> to do <blah>" or some variation on that. Starting an app and then interacting with it works fine for some use cases but not all.
It really needs to be a flow of install app for given type of task, say searching for a hotel, and then like Android, there are defaults. So I say "hey alexa, find me a hotel room in blah" and it just knows to hand off to whatever my favourite is.
So long as all the core easy commands are only accessible to Google/Amazon, this is going to continue to be a problem.
I think the real problem though is that people aren't good at remembering voice commands, either how to invoke the command or that the command even exists. Without visual cues it's easy to forget everything.
You're not going to have a runaway hit when it's predicated on everyone remembering that special thing to say. It's like trying to have a runaway hit that had to be based on a keyboard shortcut.
All of those services are great for the basics, e.g. How's the weather / set a timer / remind me in N minutes / order toilet paper ... Everything else is a bit tougher in my opinion.
"Me: Alexa, find me a hotel near the beach in Destin, Florida that is available next week"
"Alexa: Hotels.com has a three star hotel available for $987 for six nights, and Airbnb has a four bedroom condo available for the same dates. Would you like to book one of these, or hear more options?"
"Me: Book the first hotel."
...and so on...
In this instance, Amazon has to define the schema for user intention, and hotels.com and Airbnb must provide interfaces to that intent, responding with normalized/standardized data that Alexa can then reconfigure into an appropriate response. Amazon has to throw their weight around to get other companies to play the game, and it's on them to organize the returned data into an appropriate response. This is probably where a discussion about the semantic web would be useful :)
Saying "Alexa, ask [Whatever skill name] to do something" is really unnatural and ruins the whole user experience.
They need sort out the grammar to make it actually feel conversational.
And of course, although stricter curation and quality control would fix this, it would cut the number of skills down hugely, maybe even to 1% of the current size, so the app store looks tiny and people don't bother with the platform. See also Metcalfe's law  and network effects...
• as an alarm clock
• as a radio in the morning, podcast/audiobook player in the evening
• arithmetic - e.g. when planning trips we might ask "Alexa, what's 100 Malyasian Ringgit it pounds sterling?", more conveniently than flipping to xe.com or a phone app and an answer we both hear. When cooking we might ask for weight conversions, during DIY we might ask for cm to inches, etc
• to control Hue lighting in two rooms
• to control our TV/amp/satellite ("Alexa, pause|mute the TV/set TV to channel 503/etc"). I used to love my Harmony universal remote but found it pretty cumbersome to manage, and my gf could _never_ get the hang of it like she can "Alexa, turn on/off Sky"
• to play music when I'm wfh
Rather than a hit skill/application, I'd say voice is a superior interface when doing other, dextrous stuff - getting dressed, using the hair dryer, feeding the cat, cooking, washing up, gardening, DIY, etc. Asking for the lights to turn on (the switches are not near the front door) when arriving home in the dark, saying "Alexa, goodnight" rather than turning off 4 lights and 3 devices, etc.
The only thing I dislike about it is how it fires up I'm watching WWE and the commentators use Alexa Bliss's name.
Furthermore we need to look at what makes voice less restrictive and that’s making it instant. They are still reliant on it being cloud / web based which creates a lag and makes it awkward and nothing like ‘talking to something or someone ‘
(But yeah, i agree - definitely nothing very innovative about it. I don't really think voice control is ever going to be the next big thing)
My guess is that it and its competition is more or less a MVP to use audio for input and output and to understand how users interact with this "new" form of input. In a similar way to how Google Search and the various add ins they implemented for it along the way informs Google Assistant, so too I think these are stepping stones for something else.
So I don't think the expectation is to have a runaway hit app, but rather to have a product that users will actively engage with. And considering they're making huge pushes into smart home product integration with Alexa, I really don't see how any of it could be viewed as cautionary... from Amazon's viewpoint anyway.
I only use the speaker for casting music/podcasts, listening to news briefings, and timer/alarm/weather queries. Amazon Echo and Google Home do these basic functions well but I'd imagine they weren't the only actions these companies wanted us as consumers to do.
A good toolbox is filled with balanced tools. That there isn't some hyper-addictive time-suck application isn't a bad thing.
However, if it's a case of no one actually using the device, then there might be cause for concern.
2. TV control (with Harmony) - great for the kids so they don't break the remotes
3. Home monitoring when we are away (drop in on the echo spot).
But for me home security seems to be an unexplored opportunity for Amazon (and Google). Those microphones are very sensitive - and if paired with motion sense on other devices (a Nest or something) start to give you powerful intrusion detection tools and alerting. How hard would it be for them to recognise the sound of breaking glass or the presence of someone in the room at a time they shouldn't be there?
Obviously solving the pet issue is a challenge, but I think it's an interesting use case to explore.
At this stage I'd be very surprised if there was some new amazing use case. There are always going to be niche uses that some people really like. I create notes in Drafts on iOS via Siri all the time, for instance.
While gold rush is an accurate analogy, I'm not so sure that's a positive. I remember a couple+ years ago the NYT doing an article / series on (native) app developers and how - contrary to the hype - there were significantly more "losers" than winners.
So has Alexa made it worse? Now there are no winners at all? (Editorial: Well, sans Amazon and its shareholder.)
Every use case I hear parroted to justify the installation of audio surveillance devices in our homes sounds like a solution desperately in search of a problem. What do you plan to do with the milliseconds of time you may have saved?
And while I can set a single timer on my microwave/oven, or set them on my phone, being able to set a timer, hands free, while cooking is pretty damn convenient. I can set a timer while my hands are dirty from mixing meat for meatloaf or handling chicken.
There's something to be said about the convenience factor, and that's what most technological innovations are about, making life easier or more convenient.
I can imagine a future where Magic Door or something like it becomes a big hit. BTW, there's a web site for the game: https://www.themagicdoor.org/
Outside of that, Alexa isn't the platform the industry thinks it is but rather the equivalent of the touch screen on a smart phone. It's another form of input.
There needs to be much better and more seamless integration with 3rd party apps for it to really be useful.
If you have a child it is nearly a nanny that helps your child doing his/her homework, plus you can connect several Alexa (from friends) and interact with them, if you have also several Alexa you can communicate in different rooms (speaks/broadcast)
- landline dial by name
- radio station by name
- TV channel by name
- play/resume audiobook by name
- microwave for duration
- device power on/off
- time, weather, etc
If we did have a UI model, I imagine we'd be controlling car stereos and driver-mode mobile phones with voice already.
I wonder if any of these 80k apps have the "magic sauce" but just haven't applied it to the right problem.
And I suspect if you have your phone hooked up to your car stereo, or integrated using Google or Apple's in-car docks, then using your voice to play music is also possible.
I think your point about not having a good model is correct, though, for anything other than basic uses (phoning, messaging, playing music, navigation, weather, etc)
That is, provide a traditional desktop PC experience that is augmented and improved by Alexa. It should always be more productive to ask an AI than to Google, if this is to have a chance of success.
The problem was jumping immediately to a new modality instead of bridging toward it from the present.
Android assistant has the right idea.
But (at least then) there was no real way to monetize anything using it. Making an interesting skill that people use will just cost you money.
BigCos or VCCos can use skills for generating good will I guess. But independent developers, not so much.
Perhaps, but by eyeball, no less than 95% of those are stupid useless trivia games.