Hacker News new | past | comments | ask | show | jobs | submit login
Amazon’s Alexa Has 80k Apps and No Runaway Hit (bloomberg.com)
151 points by kristianc 12 days ago | hide | past | web | favorite | 224 comments

The problem is that Alexa is just a consumer voice command line. UI discoverability is impossible, and everything that you can do is just a utility that does something else. There are no native apps because everything that it can do is just an IO to something else. If they actually get conversational, this will change, but until then, it’s just a command line with no man page.

> The problem is that Alexa is just a consumer voice command line. UI discoverability is impossible, and everything that you can do is just a utility that does something else.

In a way, I think it's even worse than that. I've used Alexa since the Echo was a relatively new product. Back then, I experimented with new phrases and commands often, but was frequently greeted with wrong answers or "Sorry, I don't know how to help with that." Over time, I stopped trying those commands. Skip forward to today--the backend has been improving for years, and many of those commands now work, but it's too late. Their users have already been taught that they don't, so folks stop trying to use those features. Not only can you not discover new commands easily, you might mentally blacklist useful commands permanently.

Perhaps more frustratingly, the "What's New" emails they send out don't help with this. They never say "Hey, we know you tried to ask your Echo to report its volume level before and it didn't work, but it does now." They always say "Ask Alexa to tell you an Arbor Day joke!" -_-

These devices have a huge marketing problem. Like the Alexa commercial that shows somebody pausing a Prime TV show to order something from the Amazon store. Cool, I guess--except I can already order stuff while watching TV by using my smartphone, so the value proposition is completely absent.

And also, annoying for Echo device users, since the use of the word 'Alexa' will trigger the device, and thus mute/drop the volume so that you can't hear what is said next... I guess it could be worse, since the volume reduction means the device doesn't hear the command and try to execute it as well!

TBH, I don't understand why they couldn't add a circuit that checks if the microphone is being activated by the device's own speaker, and ignore it. Some sort of simple phase inverting circuit from the speaker that gets summed with the microphone input ought to do it. It's not exactly rocket science and they're doing complex phased-array/beam-forming stuff with the microphones already, so it's not like the engineers are idiots?

Google detects if it's being activated by a TV show or recording by fingerprinting the audio and disabling the hotword if millions of devices all activate at the same time or with the same fingerprint.

It's a very impressive infrastructure, and works pretty well too.

It would be pretty cool if it could update you on things you've asked for in the past that were not possible then but are now possible.

"Hey, 3 months ago you asked for the forecast according to the ECMWF weather model, and I couldn't answer you. But now I can, so go ahead and try!".

I wonder if that is too far on the "creepy scale". Everyone probably knows that Amazon has a full record of everything they have asked their Alexa, but I am not sure it is smart to repeatedly remind people of this.

Phrasing comes into play here. Instead of:

> "Hey, 3 months ago you asked for the forecast according to the ECMWF weather model, and I couldn't answer you. But now I can, so go ahead and try!".

What might work more effectively is

"We've been improving our weather forecasting and you can now get the forecast in your area according to a range of new models, including the often requested ECMWF weather model! Give it a try now!"

This attempt to obscure the fact that it's actually super-targeted can also backfire too, though. It wouldn't take too many of them for a savvy user to be like... now wait a minute.

Thinking back to this classic where Target figures out the trick to not creeping out someone they suspect may be pregnant is to shuffle the maternity coupons in with a bunch of other random stuff: https://www.forbes.com/sites/kashmirhill/2012/02/16/how-targ...

Relevant quotes:

What Target discovered fairly quickly is that it creeped people out that the company knew about their pregnancies in advance.

“If we send someone a catalog and say, ‘Congratulations on your first child!’ and they’ve never told us they’re pregnant, that’s going to make some people uncomfortable,” [snip] So Target got sneakier about sending the coupons. The company can create personalized booklets; [snip]

“Then we started mixing in all these ads for things we knew pregnant women would never buy, so the baby ads looked random. We’d put an ad for a lawn mower next to diapers. We’d put a coupon for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance. And we found out that as long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons. She just assumes that everyone else on her block got the same mailer for diapers and cribs. As long as we don’t spook her, it works.”

> As long as we don't spoil her it works.

Sounds much closer to hunting, literally targeted marketing.

The problem with Alexa is that Amazon runs their R&D team like a consumer research organization.

I mean, you gotta listen to every word that comes out of the customer's mouth right?

Their Principles are too reductive to even guarantee that copies of other people's ideas are successful. For as much of a genius as he may be, Jeff Bezos has created an organization of automatons.

Lots of people working there have good ideas that they never bring up, because past experience has taught them that it's a waste of time. So it's not that the workers are automatons, but rather that the workers have learned that the organization wants and what it doesn't want, and don't waste their time with what the org doesn't want. The individuals who comprise the organization are fine, but the meta-organism that is the organization is diseased.

[obligatory disclaimer: I worked there years ago, but no longer do.]

So an organisation full of automatons?

Effectively yes, but with a defense of the people.

If it did that unprompted, it would be a privacy issue. You may have asked it something that you don't want people currently in the room with you to know about.

It doesn't need to announce it verbally, just including capability updates in the (currently useless) emails that they already send their customers would do.

Discovberability is solvable. Amazon has chosen not to focus on solving it, and instead they are focused on sheer volume if skills and ease of use to make them. A lot of Alexa skills can be made in a day.

I'd argue this is the wrong thing to solve for. The best skills take a long time to make and require privledged access. Sonos, Spotify and others have this, and they work amazing. It's the large mass of Alexa skills they were quickly made and that don't have privledged access that are dragging the whole experience down.

And this is entirely Amazon's doing.

So, just like every app I use that doesn't put effort into making me aware of their new features (but probably worse, as it likely takes more effort to find those features).

It's humbling (and annoying) when you think of yourself as a power user of some application to find out from some semi-new user of it that there's a new feature that would have made your life much easier if you had only known it was added a year or two ago.

There's only so much time to learn about the tools you use, much less the changes in them over time. How do I find out if GNU grep has some interesting new feature in the version that ships with the next version of the distro I use? Or rsync? Or tmux/screen?

This problem definitely seems worse with smaller tools, since there's a lot of them and they are often packages together by some other party. Not every project is as large and used as much but such a large population as Firefox, which arguably does a very good job of advertising new features. But you can definitely tell it takes them a lot of time and effort to do so, and a smaller project may not find much traction if they tried the same strategy.

>> "Hey, we know you tried to ask your Echo to report its volume level before and it didn't work

This makes me shudder. Such a functionality would mean that they stored your failed request in some sort of database, a database that they later used to send you a personalized marketing email. A machine cataloging our voice for later inspection is a very dark future. Alexa should delete and scrub any iota of voice that it doesn't instantly understand.

"Oh, remember last week when I though you were asking me to buy you a pot plant? I now realize you were asking me to buy you some pot from Canada. It will be arriving in two days."

"I didn't understand it at the time, but I now realize that you were yelling at your husband Alex, not Alexa. Your social credit score has been adjusted to reflect this negative interaction."

They are 100% storing everything and so is google. You can actually go to google and replay all the recordings they have saved of you.

Some of the recordings are unintentionally hilarious though, it turns out. When I looked at the history page, there were a bunch of times when Google seemed to think I was saying "b b b b b b b b b" repeatedly? Anyway, when I played the audio back, it seems my alarm clock's beeping had triggered the Google Assistant, which then obviously failed to understand the beeps!

You can also replay the audio from all your Alexa commands in the app. In iOS it’s in Settings -> Alexa Account -> History.

> Such a functionality would mean that they stored your failed request in some sort of database, a database that they later used to send you a personalized marketing email.

I don't think this kind of thing would need (or even benefit from) actual voice recordings when it comes to the "new functionality" part of things. I would expect something more like a running tally of the X most common failed interactions after running the transcript through a fuzzy-match filter, and a long list of accounts associated with each.

A great step that could be taken before full on conversation ability would be the simple ability for the assistant to ask questions and seek basic clarification when encountering ambiguity and then using the user’s answer to learn so clarification won’t be needed next time. This way, over time the voice assistant would refine itself and acclimate itself to its user’s way of communicating.

This alone would make for a much more usable experience, and yet no voice assistant has implemented it yet, which flabbergasts me. I don’t have much knowledge in this particular realm so maybe I’m missing some big blocker that prevents it from being possible, but to me it seems like such an obvious thing to do.

> The problem is that Alexa is just a consumer voice command line. UI discoverability is impossible, and everything that you can do is just a utility that does something else

This is a really important point, and both a strength and a weakness.

The iPhone began as essentially a front end to existing services (with visual discovery, which a voice interface inherently lacks). I used to name mine "FEP" (as in Front End Processor) -- a front end to a subset of "real" computing or as I think of it these days: multiple windows into a shared computing space.

A watch (like the apple one) is really a crappy general UI device; discoverability is pretty bad because of the limited area and speed. But it's great in the role the phone had: subsetted interface to a limited number of "real" computing tasks (yes, it has a few of its own tricks too but mainly as data collection for apps on your phone).

Thompson captured this issue by talking about devices and software in terms of "the task it was hired to to". The problem is the voice assistants haven't figured that out yet.

The thing about this approach is it creates pressure to make more functionality available at the edge.

Alexa and google home tried to jump right to the edge in one go, which skips too much phylogeny.

Apple seems to understand this, but gets it wrong in the opposite direction: the iPad hasn't moved far beyond being "most of an iPhone but with a larger screen". And if you have an apple speaker, a phone, and iPad and call out "hey Siri, set an alarm for 10 minutes" you may get three devices chiming in 10 minutes. They don't act like a single device.

Just for the final point, they’re pretty good for letting 1 device respond.

Sheer volume of commands is a pretty good solution to the lack of discoverability - if it does enough, you're probably going to try things without knowing if it works. I got a Google Home recently and have been impressed at the complexity and range of commands it supports. Obviously there's a limit, but it's a lot "fuzzier" than a command line.

There are native apps, but I don't know how compelling they are yet. https://medium.com/@james3burke/creating-stories-using-amazo...

Edit: there's also a third-party skill that will let you play old-school adventure/interactive fiction games. https://www.amazon.com/Vitaly-Lishchenko-Interactive-Fiction...

My Echo was an interesting novelty at first. But a couple years in now, it is essentially (in order of usefulness):

1. An alarm clock / kitchen timer

2. A thing that tells me the weather report during morning coffee, so I know what to wear

3. A DJ that my kids yell at to play pop music

If it died tomorrow, I'd probably just go back to using my phone for these 3 things rather than buy a new one. I can't imagine getting into it enough to explore third-party "skills".

Pretty much the same for me, although I'll add (I have Google Homes but pretty much the same thing):

1. Metric / imperial conversions, especially in the kitchen - this is one feature that is miles better than using a phone or computer if my hands are dirty cooking something and I want to know how many grams 12 ounces is or something like that.

2. Intercom between my 2 google homes, one in my kitchen and one in our converted attic playroom - it's so much nicer to use Google home to call my kids down for dinner vs. screaming up two flights of stairs.

3. Making quick phone calls

4. Finding my phone - I lose my phone constantly, and it's super handy that I can get my google home to make it ring even if it's on silent.

Agreed on the kitchen part. Once I moved my google home (mini) into the kitchen it got more useful.

* Setting multiple alarms (why can't apple do this??)

* Conversions

* Food questions

I think it would be rad if I could feed it a recipe and then have it read me the ingredients and instructions for each step. Maybe that exists?

> * Setting multiple alarms (why can't apple do this??)

It launched without that ability but an update sometime last year lets you set multiple alarms.

Sorry that should have been timers. If I tell Siri to set a timer (while a timer is running) it still asks if I want to change the timer or leave it alone.

You can set multiple timers in Siri. At least on a Homepod.

It can definitely read recipes, although I've only ever had it read me recipes I've found by voice. I don't know how to explicitly tell it the recipe I want. It's still kind of cool as it goes step by step, although that can be a bit buggy, especially if you're also listening to music, but i still use it from time to time

If I google "burger recipe" on my phone (Pixel), I get some cards of recipes for burgers. Each of the cards has a "send to google home" button.

Works on my galaxy s8 as well, in Chrome at least. Great tip, thanks!

How do you do 2-4? I'm in OP's camp, where I've tried these things in the past and never knew I can do them now.

2. "Hey Google, announce dinner is ready". It handles standard meals with a standard message. Otherwise it will play back a recording of what you said. You can also use "broadcast" instead of "announce"

3. "Hey Google, call my wife". or mom or person's name.

4. "Hey Google, find my phone".

For me add:

4. Shopping List.

And that's a killer feature for me; it's so damn convenient to use a voice interface to add items to the shopping list as you run out of item that I'd probably replace it just for that.

Same here. One feature I would like is to be able to change where your shopping list goes as I do not particularly enjoy using the Alexa app.

I had the same problem with google home. I already had a shared (with the SO) google keep note for my shopping list. I wrote a google keep integration with an IFTTT app to solve this, letting me do a "OK Google, we need X" to add X to the list. https://github.com/Resinderate/shopping-list/blob/master/sho...

When I first got a Google home, the keep integration was built in. And then they moved it to Google Express shopping list, suddenly, which was infuriating. Ended up using ifttt with Asana. Will try with keep as I preferred keep for shared shopping and Todo lists.

It's possible to access the list via their API so theoretically you could build your own solution. I agree the Alexa app is not good for accessing the lists.

You can do this. I have mine set up to use Todoist. You need to add the appropriate skill and then link the account.

try https://getbring.com/#!/app - it has an Alexa integration. Me and my wife use it together.

Same, although I'd say a timer and weather machine that I talk to everyday is a runaway hit. There are few other things I own besides my bed, phone, clothes and a couple major appliances that I use every single day.

I came to say the same. For my 15mo old son, I prefer asking Alexa to play music on the Echo because I don't have to ignore him (in a way) to go tapping on my phone -- which makes him very curious about what I'm doing.

As a plus, he babbles at the Echo when he wants to listen to music.

4. An interface that allows me to listen to the ratio or to podcasts without having to touch (and potentially get distracted by) my cellphone.

For an elderly relative of mine who no longer has the physical dexterity they used to, this has been a killer feature of voice assistants for them. No longer do they need to fiddle with awkward buttons to change the radio station or find out what the weather is, they just... ask and it happens.

Novelity for sure but i use the timer/alarm clock so frequently, that i will rebuy a new one if the old one breaks.

I use it for: - I have to go in 1h but i only need 30 minutes to prepare the leave so timer for 30minutes - Pizza / food - short nap - for learning

And quite often.

I can turn lights off without getting off my bed, and damn, has it made my life crazy simple. I was in India for few months last year and I realized a large part of me getting up early was to just switch on the geyser, open the door for the cook and turn the lights on. I would get back to bed after and check up fb till the water got hot. Just the fact that Echo can do these things without making me get off the bed will be a crazy value addition to my current decently privileged life.

I see the house dj speaker aspect, but why does anyone need the first 2 when their phones accommodate those functions?

Same thing: convenience.

Find the phone, unlock the phone, find and open the app, find and click the control. That requires some effort.

Saying commands, having physical buttons at the expected location in your home, NFC tags.. there's a bunch of more convenient ways to do extremely repetitive tasks that would otherwise take you more time to do.

Hasn't this been minimized by 'Hey Siri' and 'OK Google'?

I guess it still has value if you have a non-privacy reason not to use voice control on your phone.

Some people don't consistently have their phones on hand when they're at home, which would do it. And some people have problems with unwanted voice activation - my experience was that the voice-printing on "OK Google" was not actually all that personalized, and sometimes it did totally unexpected things like breaking out of navigation to make a phone call. Last time I tested it, keeping Google voice recognition on was also a massive battery drain, but I assume that's improved.

(That phone call was particularly ludicrous: Android offered me an unprompted 'helpful tip' that I could say things like "call mom", but since the tip fired when it was already listening for a navigation command, it accepted its own instruction.)

Has become ubiquitous. With a speaker in every room, I don't have to find my phone to start a timer.

We also user audible heavily. The kids live the boxcar children readings.

I bought an Echo speaker when they first came out. The sound quality is impressive (to my untrained ear), but the development experience is not.

The first thing I wanted to do was to add a feature so I could add a task to my to-do list software which is not supported by Alexa. It turns out that you cannot construct a sentence along the lines of "Tell Asana to add a task: <task>". You can't actually have a 'slot' which contains a freeform piece of text, even if it is the last piece of text in the sentence.

The Alexa API differs between regions, so the North America version of the API supports this but the EU version doesn't. It was removed from the NA version for a while but placed back after a bit of an uproar.

I think you could develop some more useful stuff using Alexa if only this feature was consistently available. I cannot think of a good reason why it isn't.

I now just mainly have it as a Spotify speaker, and occasionally I use it as an expensive egg timer. I normally have it on mute because I find it activates and starts recording private conversations. I don't see it getting better.

This issue has killed my interest in basically all smart-home products I can find on the market, even ignoring their massive privacy issues.

The most useful 'basic' behavior I could think of for a smart home device was to check the weather and trigger my alarm earlier in bad weather or heavy traffic. I knew IFTTT was capable of running scripts then a trigger happened, checking the weather and traffic, and setting off an alarm, so it seemed obvious. I literally wanted "if this (or this), then that"!

No such luck. The basic IFTTT setup couldn't do it at all, no existing app could do it, and the developer program was invite-only. I got in after quite a long time, and even then it wasn't obvious. IFTTT wanted to treat 'check weather' as a script output exclusively, which I couldn't feed into any other system. The best I could do was be told the weather when I woke up. So, I stuck with the phone that could already do that.

That specific situation might have improved, but the general ecosystem doesn't seem much better. The only smart-home features I see available that I would use are list-making, music/media playing, and quick reference. But the features I would actually value are dense integration between apps, floating scripts like the one you describe, and non-user-triggered events to turn active tasks into passive ones. Those seem to be the features which are least available, even when they would be easy to implement.

"You can't actually have a 'slot' which contains a freeform piece of text"

What about AMAZON.SearchQuery?

This is the correct solution. The Alexa Skills Kit contains the Amazon.SearchQuery slot which meets your needs, but Lex doesn't. This is one of the two missing slots from Alexa in Lex. If you read articles regarding Lex slots you might get inaccurate info as they are slightly differing products.

Thanks, I just did some searching and found that this was added in February 2018 - I think I gave up on my project in 2017 so it now looks like I could program something like what I wanted. I still have the source code so that's nice.

However I'm just thinking of getting rid of my Alexa devices anyway - I was discussing political events with my partner last night, spotted the Alexa device light up (I think I said "a letter") and noticed shortly after that I was moderating my speech and not referencing controversial topics. I later just set it on mute.

For research: have another device perform "audio fuzzing" of Alexa, trying random phrases. Use a camera to look for Alexa turning on. Bonus points, check the list of Alexa recordings (via web interface) to confirm 1:1 relationship between device activation and recorded file. Publish the inadvertent "watchlist" of phrases, monitor over time.

That's a relatively new slot type.

Ah okay, I had the impression it was the replacement for LITERAL

> I normally have it on mute because I find it activates and starts recording private conversations.


It can mistrigger on phrases like "take a left on" or "they got a Lexus". My parents have one and get a kick out of the misactivations.

Wow. I was under the mistaken impression that the devices were more robust than that.

I'm very anti these devices, now I'm even moreso!

For what it's worth, in two years our Google Home has only 'misfired' once, and that was when someone on TV said 'Hey Google'. It is in the living room and sees heavy use by a family of five.

> our Google Home has only 'misfired' once ... when someone on TV said 'Hey Google'

Is that a misfire? It reacted to the trigger phrase. Have you trained it to only react to your voice (I know that's an option for Google assistant, but I didn't do it for mine).

Amazon added a real-time sanity check, so if too many Echo devices are triggered with similar sounds within a few hundred milliseconds, it will ignore them. This was after some TV ads or shows had intentionally abused it.

Wow! That feels kind of insane, but makes so much sense.

Is this true? Triggering is done locally in the device.

But they can ignore/cancel the request server-side, so technically the detection triggers, but no command does.

My roommate and I have a Google Home Mini in the living room, and to make the accidental activations more noticeable, my roommate set the accessibility "ding" sound for when it (thinks it) heard the trigger phrase. It gets activated a lot by television (maybe once every few hours of TV), and we generally have no idea what phrase accidentally triggered it.

I can't imagine how frustrating it is for folks named Siri.

You can at least change Alexa's wake word to a couple other options ("Echo" or "Computer").

One of my coworkers has a friend named Siri. When Apple's siri first came out somehow she kept getting texts from friends "remind me tuesday to call the doctor". She didn't realize what was happening so she kept track of those reminders and texted her friends back until everyone sorted their technology out.

Great story!

I have several, along with Google Home too. They all suffer from the same issue: it's awkward to use with third party skills. Unless they are the highly integrated ones (Spotify, Logitech Harmony) in which case it's great.

It's the same for Siri, the API is so restrictive.

The whole "tell XX to YY" isn't convenient.

One workaround on Google Home is to use IFTTT, and in that case you can customize the whole phrase and response, and then it's pretty nice, even though the latency is high.

"-Okay Google, open the blinds -Sure thing Commander" never gets old.

>The whole "tell XX to YY" isn't convenient.

You don't have to do that - the system can usually infer what skill you want from your utterance. [0]. And the "open the blinds" thing can be done on Alexa with a custom routine (Alexa specific IFTT).

[0] https://developer.amazon.com/blogs/alexa/post/c870fd31-4f91-...

Disclaimer - interned on the team that built this

Absolutely. Until the voice assistants become a bit more intelligent with the language used to activate them, it feels so rigid and un-natural communicating with them.

I don't see a great future for voice assistants in the near future, until we truly solve the problem of intent (in a natural way) and accordingly can respond.

I'm not sure if it's a question of intelligence, or a question of business.

I can't imagine Amazon would hand the "Alexa, order more bread" keyword over to Instacart/Wal-Mart/whoever without a fight.

Hopefully google or apple might.

You don't really need ifttt for this, both platforms have some form of "shortcuts"

Despite often seeming like a thin veneer of natural language processing on top of a search engine, I'm sure most users think of their voice assistants, or are willing them to be, a general AI agent. They don't want to have to care what app/skill/web-scrape is involved in enabling a response.

The whole skills ecosystem feels like an awkward stopgap on the path to AI. The language required to invoke them feels particularly clunky - "Alexa, ask ThingFinder about a thing" - and then, is the user supposed to be talking to ThingFinder now, or Alexa? She still sounds like Alexa, but she doesn't seem quite herself.

As developer choosing an invocation name is fraught with difficulty. There are only so many natural sounding names for something which does a particular thing, without incongruously inserting some invented branding word onto it - "Alexa open Tidy Tide Tables". If someone's already using the most natural name you are free to use exactly the same, but then who knows whose skill will be launched? And you'd better make sure your skill's name doesn't clash with anything else in the world at large, like the entire history of music for example, "Alexa play Wicked Game". It's all a bit of a mess.

Yeah, that's a good point. People are good at mentally compartmentalizing behaviors with personalities. If Alexa doesn't allow each agent to have a personality, people are going to have a hard time keeping the behaviors straight.

Kinda reminds me of Neuromancer, where a superintelligent AI was broken into two pieces to avoid detection. One part was good at personality, and the other part had to mimic people in order to communicate. It was very unnerving for people to talk to an AI that was copying the personality of someone they knew.

Alexa is just an input/output device. It is limited, which is OK but that means there will not be a 'runaway hit'. There's just not enough of a surface area for something so comprehensive.

It's like saying 'There are 80k mouse enabled apps and no runaway hit'.

What it can do is have an app for nearly anything, though, that makes sense for the form factor. It's on its way. The next step of its evolution is to look at what apps work and what conventions can be pulled from those as a general standard. Users will be much happier when they can nearly instantly download an Alexa enabled interface for an app and have it work intuitively. And this is doubly important for Alexa because there isn't deep feedback like you have with a mouse where you can see the things you aren't doing to gather hints at what's possible -- you just have to know or, at least know how to find the answer, like a command line.

Which gives me an idea 'man for Alexa'. At least we can standardize a help menu.

But the mouse does have a runaway hit: Microsoft Windows.

That being said, it took 22 years to get from Engelbart's original mouse to the first version of Windows that really took off (3.0), so perhaps we're just too early on in this product cycle for the hit to have emerged yet.

I'd say the Alexa is more like birth of the personal computer. Computers weren't very good in the mid 1970's.

Now that companies realize there's a product here, there will be an arm's race. Look for big improvements in next decade. Lots of people and billions of dollars are about to go into making these products better.

To take this analogy further, it is interesting that by say 1976, we seem to have already realized that what you need to make voice devices actually useful is a screen and a GUI.

In 1976 every office had perfectly functional systems that could be controlled with voice alone. They were called "secretaries."

>I'd say the Alexa is more like birth of the personal computer.

There were advancements in Text-To-Speech and/or speech recognition, stretching back to the early days of PC/Mac e.g. DragonDictate[1] using Hidden Markov Models, IBM ViaVoice[2], latterly Nuance Dragon NaturallySpeaking[3]. Although, they were not perfect in their early iterations. However, they got progressively better and fairly impressive once trained properly.

I would rather draw comparisons with their current day counterparts like Amazon Polly or Lyrebird[4] et al., than associate voice assistants with a paradigm shift.






It isn't that revolutionary. When they can do all that processing locally, without cloud assistance, then maybe...

You mean the opposite of networking computers?

Like taking the Internet away?

At any rate, being connected to the cloud will allow for faster iteration. Once it becomes a solved problem, then you can more easily remove the network.

Yup, which is how much of the world still lives. Spotty internet access at best

That’s why you don’t see rapid iteration in the rest of the world.

Google, Apple, Amazon can update their voice devices continuously without a local software release. If the next generation voice algorithms needs twice as much hardware, your $30 device will still work because the processing is in the cloud.

Video games, for example, have been trying to move to the cloud. Put all the code in the cloud and just send the pixels

I don't want rapid (rabid?) iteration, I want shit that works.

"The cloud" is little more than a hyped-up, glorified business objective; human beings shouldn't have to be connected to the hive mind to enjoy the full benefits of technology. Nothing smacks of SV-style elitism more than the proliferation of "the cloud"

And you can pry my locally-installed videogames from my cold-dead hands. I hope every single streaming startup in that sector fails spectacularly.

I guess you’ll have to wait until someone serves your market niche.

Hopefully, with 100 million Amazon devices, and growing fast, you’ll have your product within a decade or two.

Which is fine, as my life is perfectly great without. I like your assumption that all innovation is supposedly good...

And it's 100 million amazon devices because Amazon pushes them relentlessly and has been for some time. Their utility is questionable, even the article mentions that.

Did you know over 3 billion devices run Java?

Yes, I do know. Not sure how that’s relevant.

Shit that works is a niche market now?

> Video games, for example, have been trying to move to the cloud

Trying and failing for a decade or more, because people like low latency and being able to customise things.

> Alexa is just an input/output device. It is limited, which is OK but that means there will not be a 'runaway hit'.

I like the way you say that. My Google Home is a remote control for my mouth. It's like licking a keyboard one key at a time in the dark. Simple queries are easy, but any non-trivial query just aint't gonna happen at the moment.

> It's like licking a keyboard one key at a time in the dark.

Ewww. Good analogy, but eww.

I don't believe your comparison really works. A mouse is useless on its own. An Alexa device can operate on its own.

I do agree with your latter sentiment. It does feel like a command line at times. More so like trying to figure out a text adventure game. Zork would be very difficult if you didn't know the basic functions/words. That's what Alexa feels like most of the time for me. I'd love a help menu.

> An Alexa device can operate on its own.

Only if you conveniently ignore the AWS services it uses.

Pedantic. I conveniently ignored the electricity it uses as well. :/

>An Alexa device can operate on its own.

Can it?

I don't know how much I'd use Alexa if it weren't for Spotify. There are some native apps but its real value is in the 3rd party apps it connects to.

It's more useful with third-party apps but those are still a part of the Alexa system. That's like saying a computer is useless without software installed. I wouldnt use my computer very often if I wasnt able to install software but that doesnt mean it isnt functional without it.

Plus you would still have the entire Amazon app ecosystem. While their apps may not seem perfect they do have a free music section that could at least partially replace spotify. Other first-party apps on Alexa allow you to access lot's of other Amazon services like purchasing items from their website. That's a lot of functionality even if we are excluding third-party applications.

But that is not the original (or publicly perceived) USP. If I need to learn or memorize the interface it's not what has been promised.

That's the thing about USPs...most are idealized possibilities not yet realized.

I mean, look at VR and AR and, hell, even AI.

But AI is not useless even though it hasn't reached it generalized intelligence promise. It is adding tremendous value even though it has landed in the limited middle.

I don't mind learning Alexa's syntax and what it expects of me. I get value from what it can do well. As long as that's true, I think it can miss its more grandiose promises and still be a huge success.

Anecdotally a developer friend of mine strongly suspects, most people, are like him and not actually using these home speakers very much.

I’ve been working on a an Audio App that reads any articles to you for iOS/Android and planned to bring it to the Echo and he seems to suggest my efforts are totally wasted. Although their market reach seems to be massive, likely related to their cheap price, not surprised by this article that their actual usage is incredibly low. Most people seem to buy them and then forget about them.

As a shameless self plug, if you would like to check out my app that reads articles to you using beautiful sounding AI/ML, find it here:


Pretty much same, the Wink integration lets us turn on/off lights and do other things like “Turn on Movie Theater” which dims certain lights and turns off all others. We use Alexa to play music when we just want something going in the background as well. Additionally we use timers and the grocery list.

Wife uses the daily news rundown.

Kids ask Alexa questions (what’s the fastest bird), have it make fart sounds, and that’s about it.

I got a pair of buzzers to play the quiz game, it was so horribly janky they’ve been used twice. I scan the the apps list and nothing strikes and as worth even trying.

I use the flash briefing thing every day. I suspect most people don’t know about it. I wake up, stumble into the shower and say “Alexa play the news” on the way. By the time I finish brushing my teeth I got my curated news read to me. Pretty nice.

As a counterpoint, thanks to an Echo in most rooms of my house and Hue lights everywhere I almost never touch a light switch. Likewise I generally use the Echo timers instead of oven timers for cooking, and I like using them as Spotify speakers. And I can't remember the last time I left my house before I asked the Echo about the weather. Sometimes my wife even uses it as an intercom if I'm in my office, but I concede that feature is a bit janky.

Whenever my wife and I go away we usually remark to each other that it suddenly feels very backwards to have to do things without the Echo. That might not be the best word, but it definitely feels like we're missing something integral to our home life when we don't have them around.

That being said, it's expensive to set up Hue lights everywhere and an Echo in every room. My setup might be one of the reasons I use it so much.

My favorite behavior was setting up a lamp near the downstairs landing with a smart bulb, and a motion sensor in the upstairs hallway/landing (through SmartThings). When someone approaches the stairs from either direction, the lamp flips on with a low brightness (in red), allowing easy navigation of the stairway.

I think the problem with your app on Alexa is the interaction model - if your app could push to my Alexa so I can open an article on my phone and then say "Alexa read this article", or use the iOS share menu and share to your app on my Alexa that'd be pretty nice. But I don't see myself saying "Alexa, tell Articulu to read ...." if that's the interaction model it'd force.

Makes sense thanks for that feedback.

Totally agreed with GP. Needing to somehow uniquely identify an article over voice assistant will be painful. However, something like Pocket, where you can push articles/pages into the record and call them back by voice or tag would be cool, especially with the ability to queue articles in a desired order.

I'm using the timer, alarm, hue integration for my lights, music and weather.

Those are by far my killer features why i use it and why i like it.

I haven't looked at any amazon skill store as i don't see any reason for it.

Playing songs/podcasts from Spotify on Echo while controlling it from my phone/computer is my favourite use of the speaker now. Ironically this action involves no voice commands whatsoever except occasionally telling the speaker to pause/resume/adjust volume when I'm away from my phone/computer.

I use the same features in google home but long for proper integration with Google Keep. I had used home to populate my shopping list but now that has disappeared from the home app and you have to go to shoppinglist.google.com I don't understand why i cant tell Home to add an item to a specific Keep list.

I agree with you -- it's a fad product with limited utility.

The inability of any of these solutions to identify the user in a meaningful way or interact usefully without involving the whole room kneecaps the utility of the products beyond actions that you take in a public space.

Turning on/off lights, wireless speaker and replacing the landline are long term probably the killer apps. Not trivial, but no smartphone either.

I am surprised there isn't a demo available on the site, at least there wasn't an obvious one I saw. I imagine your conversion rate would improve from that landing page if customers could hear a sample!

Yeah I really wonder if these speakers are a sort of fad that folks feel is super cool and gravitate too ... and pretty quickly find there isn't a lot of use.

Thank you for the plug, I was looking for something like this.

EDIT: Is there a way to buy the app, or do I really have to rent it for $60 a year?

Curious, are the articles dictated by a human or a machine?

My question as well. I imagine it's a machine. I'm not gonna download/register just so I can listen to a sample. I find robotic voices to be annoying.

Alexa needs a PageRank and an 'I'm feeling lucky' that is successful the majority of the time.

I would like it to work as a room of human experts works with a moderator.

Alexa, how long will it take me to cycle to Netto?

Alexa sends this parsed query out to all apps it thinks can answer.

They respond with a confidence level and answer / follow up question.

Alexa decides which app to choose (the 'PageRank') based on a number of factors like has the query been seen before and how did each app perform for it, the app's answer success rate, has the user been given a response from this app before etc. etc.

The user should not have to install apps. Alexa should know all of the apps in the room and what they're good/bad at and select one to respond.

Surely discovery is the wrong way to think about this problem and Alexa needs to be elevated from a good speech parser.

The “canfulfill” API for skills is available. Alexa scientists are constantly working on improving the arbitration experience but it’s a very difficult problem, especially due to security concerns.


Oh I like this twist on things. The difficulty is how do you prevent your data/query going to a place you wouldn't want it to go to?

Very true, the privacy implications would need to be carefully considered.

I was taking with a friend who said ,"people use Alexa like a toaster, we haven't found what this interface will be really good at"

But what if it /is/ just a toaster? Tech people tend to have high hopes for technology but technology must serve a purpose and it will go the way of the dodo if the purpose of not strong or clear. For me, having a voice interface for music and calls is enough justification for such a device, and I assume this is true for others as well. It's a classic case of over-engineering and then shopping around for a use case for the 25-use Swiss army knife that no one asked for.

I agree, why does it have to do a billion things?

I use it for, in this order

* Music

* Kitchen timers while cooking

* Setting simple reminders without having to go into a phone app or some crap (alexa, remind me to do X at 10am)

* Kids asking random questions (alexa, how many moons does jupiter have)

Other random stuff like weather maybe, when a sports game is on...

But this is all just pretty basic stuff you would expect from a voice interface... to me, it has all the "killer apps" built in with the above.

I agree with all of the above. Setting a timer was never as easy as it is now with Google (I don't have Alexa). I tell my 5 year old to set a timer for me and all he has to do is say "OK Google set a timer for 15 minutes"

This was "the future" few years ago and we are living in it now :)

I don't think it's enough justification--you can get a "dumb" bluetooth- or wifi-enabled speaker to hook up with the voice-enabled smartphone you already have in your pocket. The value proposition for an Alexa or Google Home has to be over and above that functionality.

The smart speaker is always fixed in place, always listening, always ready to play and independent of anything else.

Your phone needs to be in range of the 'dumb' speaker, needs to be paired to it / connected to the same wireless network, can't be used for any other audio functions, and probably can't be in your pocket if you want it to be listening for voice commands. It now also can't be trained to your voice specifically if you have others in your home who want to use it, which makes it ripe for abuse whenever you leave the home.

I agree that the current solution is unfortunate, but most customers are seemingly blind to that or more generally demand seems inelastic to that aspect so I didn't address it.

> or me, having a voice interface for music and calls is enough justification for such a device, and I assume this is true for others as well.

I wouldn't assume anything like that.

I've written 15-20 Alexa apps, some public, most for running parts of my life / house.

A big part of the problem for a long time with developing Alexa skills is that if you needed to know where the user was, having them enable location permissions was insane. It required the user to have the Alexa app installed (which almost no one does), then navigate 3-4 fairly technical looking screens and finally flip a switch.

I recently wrote a demo app for a friend's company that needed to know where the user was, and I jumped through a bunch of hoops for an otherwise simple task to avoid the app-permissions-dance. The app I wrote was demoed to the Alexa team by my friend (his company was doing a deal for the company's data to be supplied to Amazon), and the feedback he got from the Alexa execs was they know the location permissions process was a big issue.

I think it's gotten better recently with the redesign of the Alexa app, but I haven't checked it out yet - I don't have any desire to release a public Alexa app. I just to keep my stuff in developer mode and customized to my needs.

The thing I’m missing is location within the house. If I’m in the bedroom I should be able to tell my bedroom Alexa “turn on the lights”, and the same for the kitchen Alexa. Having to repeat the same incantation over and over is frustrating.

Alexa is a glorified lighting management system for me. Given it’s current capabilities, that’s all it’s going to be for the foreseeable future. Oh, I’ll ask it the weather sometimes, but come on - what can Alexa really do that’s all that compelling, that already isn’t a capability of Siri/Google?

  "Alexa set a timer for 15 minutes." (She's a cooking timer)
  "Alexa what time is it" (She's a clock)
  "Alexa switch the tv off" (She can't switch it on)
  "Alexa, meow meow/Pika/fart" (Novelty nonsense)
  "Alexa, turn the heating up in the living room" (Nest integration)
Yeah that's about it. The only really useful thing she does for us is music control on our sonos system - "Alexa, play the album X by Y in the kitchen". That's quite neat.

Calling between rooms has been the main reason I got a second device (first one for tinkering). Got a young baby so being able to wake up the partner across the other side of the house without needing hands when there's been an ... explosion is useful.

Adding things to a shopping list is the other one that works nicely.

That music integration with sonos sounds really nice.

Nothing groundbreaking though.

Try customizing your daily briefing. Really nice for a curated news briefing while you’re getting ready in the morning, or making dinner.

Two timers at once. Great in the kitchen. But you’re right... Why can’t Siri do that?

A device with multiple microphones is more useful from the couch, especially when I need to yell into the kitchen. I don’t want to feel the need to keep my phone within earshot at all times.

I wish Apple had such a $30 device for Siri but it’s unlikely.

As of iOS 12 HomePod can set multiple timers too.

edit Also, I completely agree. I wish there was a HomePod Dot type offering. I've been sprinkling HomePods throughout the home, but it's expensive and also obtrusive in places where you don't want something that bulky.

Yeah they instead have a $349 speaker with Siri. I rather just buy two Mycroft Mark 1's for that chunk of change.

I don't have an Alexa myself, I'm a Google Home user. One thing that's always impressed me is the ability of Alexa to order things from Amazon just by issuing voice commands. Although, I'm sure that comes with its own host of problems, especially if you're not just ordering a generic product.

Ordering via Alexa is something I would definitely, never do, in any scenario.

You can order things from Amazon using Alexa but you can’t delete items from your cart using Alexa. For me this makes it useless. Not being able to delete items makes it user unfriendly. Unless Amazon changes this I’m not going to use Alexa.

Nor add to wish lists, or move to saved items, or move from saved items to cart, etc. The UI interaction model is in early days for sure.

The Alexa's timeline is likely much longer and deeper than a lot of the commentators here are aware of. Specifically, the lives of disabled people have been greatly enhanced with the Echo-dot specifically. The vision impaired/blind community has been very receptive to voice command tech [0][1]. Though these devices are not the commercial hit that many expect them to be, any nod towards the disabled community is greatly appreciated.

Personally, I think that the challenges that disabled people face are great for all people. Thinking through all the permutations that real people with disabilities face opens up tech to new ways of interaction and design. The canonical example is the little ramp on the street corner, where the side walk and street interface. It was designed for wheelchairs originally, but the elderly, bicyclists, delivery men with trolleys, and all manner of people use them as an improvement in their lives.

If Amazon focused on the disabled and their use cases for the Echo-dot, I think that they would find many other applications that enhance all of our lives. It may not be an 'essential' thing for the fully-abled, at least not consciously, but the enhancements are enjoyed by all and well worth the costs. Perhaps tech that focuses on the hearing impaired (the number one disability in the world), or the voice impaired, or amputees, may help all of us lead richer and fuller lives.

[0] https://www.youtube.com/watch?v=SDcvqfwOxOE

[1] https://www.pcmag.com/news/358338/why-amazons-alexa-is-life-...

But what about the privacy concerns of this technology, especially from Amazon? My fiance needs voice-to-text software for accessibility reasons -- does that mean that in order to get her the best accessibility software, we have to sacrifice our privacy by placing Amazon's always-listening microphones around the house? I know she won't let me do that; even if I'm willing to compromise on my principles for her sake, she won't compromise hers.

Yes exactly! The handicap door opener is another great example, I used to use it because I carry 12' ladders at work, but it was so convenient I just always use it now.

There's a sentiment that everyone should use them, because it normalizes it. Even it's not verboten for non-disabled people to use the door button, there seems to be a feeling that "I'm breaking a rule" when I use it and that is changing I think.

When I turn down the Alexa because of music but then I ask Alexa what is the weather, it is the same volume as down and I cant hear it. Then I have to say increase volume and then ask Alexa again.

This is small but I think defines the missing design principle of Alexa. Yes I want the music to be low when Im doing active 1, but the introduction of new action of knowing the weather I need to hear it. These are the little "smart" features that are needed for a product that Amazon is striving for needs.

Edit: I actually say Echo because I think we should stop personifying technology.

How is the device or software supposed to deduce your wants without your expressing them? How does it know what you can and cannot hear, and that you chose to turn down the music (rather than explicitly turn it off) because it was distracting to some other activity rather than because that was the optimal setting for hearing the music while not waking a sleeping baby in your arms? And when it decides to shout the weather rather than voice it at the same level of the music which results in waking the baby, how "smart" is that?

Same thing with the Google Home's. The first step in our good morning routine is set the volume low and the last step is increase the volume again at the end.

A lot of people are mentioning discoverability as an issue. That is a core issue. Same with understanding (the idea of once you discover something exists, how easy is it to understand how to fully use it?). Beyond that, we have issues with memorability. It's hard to memorize how to properly even use these voice apps once you learn for the first time.

Here is a full rubric for whether or not a product is well designed: https://uxdesign.cc/the-design-critique-rubric-how-to-determ...

The hardest part might be that most apps for Alexa aren't privledged. You can't just ask Alexa to do the new skill. You have to first open the skill and then ask that skill to do something. "Alexa open the Food Planner app." This hitch in the process makes memorability very poor.

I am an advanced Alexa user, and I almost never use any skills. They just aren't integrated very well. The one I do use is Sonos (and Spotify to power it), which is given priveldged access and allows me to attach an Echo Dot and Sonos speakers to the same room in my house, so that when I say, "Alexa, play The Beatles," Alexa plays the Beatles in the room it hears me in on my Sonos speakers with music from Spotify.

When you use a combo like this, it feels pretty magical, but most of Alexa can't operate like this, so it feels really stitled, and the user experience is quite poor.

* Setting a timer or an alarm

* Asking for the weather

* Play some (desired) music

Are there other things that these devices ("smart speakers") are really useful for?

A voice activated browser or computer, on the other hand, is extremely useful for those who don't know to type or to use a browser (or a computer). It could be people who are not privileged enough to learn or people who find these very difficult to learn or even small kids. These systems allow the user to accomplish what they want (like searching or opening a website or a game, for example).

I'm very skeptical about this as well but an argument I saw on here that I found kinda convincing was a comparison to the usual job of a "secretary". Most interactions are verbal and there's a wide variety of tasks that can be solved that way. The real problem is trust in an automated system handling those correctly, which is solvable.

Think, for example, sorting your email inbox by "importance", not based on some flags or fixed heuristics but an actual, intelligent scan of the contents. Think of having reasonable phone conversations that result in scheduling appointments and answering concrete questions.

Though it's in the same vein as "Asking for the weather" I find a lot of use for finding out when my next bus is coming, how the traffic is, and other things related to my commute. And calling for Ubers.

But as I said, they all fall under the category of "mildly useful" and not "ground breaking"

The only other thing we really use it for is calling from room to room, as a glorified intercom.

For me the big one is easily adding items to lists (e.g., Wunderlist) without having to stop what I'm doing.

* unit conversions

* controlling smart-home stuff

* child/baby monitor

* intercom

* home telephone

The problem is the interface. Voice commands and their responses are linear, one dimensional. It’s difficult to represent complex interaction within that scope. Think of all the investment that has gone into telephone based automated customer support. The best interface conceived so far is the dreaded phone tree. That’s essentially the same interface smart speakers are exposing.

The opportunity is to figure out how to better utilize the voice based medium. No one has done it yet. When they do, it will also likely improve the experience around screen readers and accessibility.

Well, imagine a goal-based system. The first utterance to a voice assistant provides some inputs and a goal, like booking an Uber, but if you haven't provided all the inputs it needs to meet the goal, the assistant knows how to ask for more information, and also saves enough state to let you modify (or cancel) requests after the fact. This is arguably more advanced than a phone tree; you're not just giving keywords to advance along the branches.

I don't find it appealing to use voice as an interface. I'd much rather have a button to press.

I think the new feature in iOS where it guesses what I want to do (send a message to ABC, for example) based on previous patterns is promising. A whole screen of these actions would be great.

This is right. No none has yet written “The Design of Everyday Things” for voice interaction. We just don’t know what works yet so we are redoing what was done before.

I think you're right, but on the other hand, a skill is faster created than a complex 2D UI.

Sure, I won't use a conversational interface to build the next Photoshop.

I could tell it stuff like, what I ate and it calculates my kcals or macros and tells me how much I have left to eat this day or what other stuff I should eat to hit my macros etc.

I could tell it what I bought and it would categorize the bills.

Some things are just too bothersome to do with my hands.

The medium of voice is very rich and capable when talking to a human, so this is something limited by the intelligence of the system you're interacting with.

Is it? I mean, some folks do great talking. Lots of them. Most are in narrative communication. If you get much beyond that, you jump to physical interesting quickly. Even telling is benefited with gestures. Consider how vague most spoken directions are. When you can augment with pointing, things are easier.

Maybe the problem is with the entire concept of distinct, discrete apps. Voice as an “interface” is based around conversation, not functional blocks and clickable items. A truly enjoyable and useful Alexa would have a conversation with you - even if a practical and transactional one - and the various functions and “apps” would be woven seamlessly into that, just as a human assistant might learn a new skill and work it helpfully into the conversation.

A lot of people don't even realize the Alexa mobile app even exists, despite the fact they must have installed it at one point in order to set up their Echo device. Hence, discoverability is pretty low. I don't know why Amazon still hasn't attempted to address this issue. If all they want is for people to buy shit from Amazon through it, why bother with all these custom "skills"?

Generally, however, I think we've overestimated the utility of a voice command line with no actual intelligence behind it. Unless you are using Alexa to control actual devices like lights or a Roobma vacuum, most of the apps for it are either useless or better suited for a visual interface.

I think it could be incredibly useful if it wasn’t like talking to someone who is hard of hearing, comically misinterprets similarly sounding words and can’t understand basic logic in sentences (what’s the weather _and_ traffic, then find my phone and turn off the lights).

I’d like to ask it the meaning of Latin phrases or other words, but it just hears the words used 99% of the time despite being clearly pronounced differently.

Ask it what bus and at which time I should leave to go somewhere. It doesn’t have any transportation info and can’t do basic logic.

Duolingo would be a great integration for conversations while learning another language, but it can bearly understand English.

I think voice is underutilized as a 3rd, powerful complimentary interface; mouse and keyboard being the first and second. A smart speaker doesn't enable me to be more productive on the tool I use most, my computer.

I'm building something that I think is much more powerful than smart speakers for the following reasons.

1. It's software and lives on the hardware that you already have... your computer. It's not a standalone hardware device silo'd from the existing tooling you use daily.

2. It provides visual feedback (live transcription, with colors indicating what's understood), because it can. Smart speakers can't.

3. It's a browser extension, so it easily integrates with all the webapps that you use daily: gmail, google docs, hacker news... any webapp you want it to.

4. Plugins are intuitive to develop and open-source so anyone can build off and improve them (https://github.com/lipsurf/plugins)

It's a work in progress, but in case anyone is interested: https://chrome.google.com/webstore/detail/lipsurf-voice-cont...

I've spent a lot of time thinking about monetization on voice-assist apps. This is a challenging problem, and I suspect it's holding AAA quality 3rd party support for the platforms behind. There is no stand alone (from a mobileapp/webapp) voice-assist app that has brought in significant revenue. I'm going to break down some of the most common attempts at revenue and describe their pitfalls.

* Sell Audio (Spotify, Audible, MP3 style): If the goal is to get into this business than you'll have much better reach if you sell on smartphones. The voice-assist app will be supplementary at best.

* Sell products (Amazon/Ebay style): People aren't generally comfortable buying products with voice-assist. You'll have to compete directly with Amazon.

* Market products (In-line advertising style): You'll have to generate a lot of original content to plug sponsored products. The original content will have to be very compelling. It's probably more appealing to host the original content somewhere else primarily.

* Sell audio advertisement: What make the voice-assist platform compelling for content consumption? How can you create a very engaging voice-assist app? Will people tolerate audio advertisements? They tend to consume more time and be less interesting than visual ads.

I find it quicker to enter commands on my phone for most things. One skill I thought would be really useful was "Alexa, hows my commute?" but without a way to go in and 'star' specific trains it would tell me about trains leaving in the next few minutes which isn't helpful!

I also cant often remember what I'm supposed to say and I sometimes forget my words as if I've been put on the spot somehow. Overall its not a pleasant user experience.

Disclaimer: I haven't played with this since 2016, so may be out of date.

Having played with the Alexa and Google home. My take on why there are no hits is that it's very clunky to invoke apps, especially for one-off queries.

"Hey, Alexa/Google, Ask <keyword> to do <blah>" or some variation on that. Starting an app and then interacting with it works fine for some use cases but not all.

It really needs to be a flow of install app for given type of task, say searching for a hotel, and then like Android, there are defaults. So I say "hey alexa, find me a hotel room in blah" and it just knows to hand off to whatever my favourite is.

So long as all the core easy commands are only accessible to Google/Amazon, this is going to continue to be a problem.

And it hasn't really changed since 2016 so that is still mostly spot on.

I think the real problem though is that people aren't good at remembering voice commands, either how to invoke the command or that the command even exists. Without visual cues it's easy to forget everything.

You're not going to have a runaway hit when it's predicated on everyone remembering that special thing to say. It's like trying to have a runaway hit that had to be based on a keyboard shortcut.

All of those services are great for the basics, e.g. How's the weather / set a timer / remind me in N minutes / order toilet paper ... Everything else is a bit tougher in my opinion.

Agreed, there needs to be a paradigm shift in registering intents. I should not create a skill that the user needs to remember a command for. Instead, my skill should be able to hook into intents and return structured data. For example:

"Me: Alexa, find me a hotel near the beach in Destin, Florida that is available next week"

"Alexa: Hotels.com has a three star hotel available for $987 for six nights, and Airbnb has a four bedroom condo available for the same dates. Would you like to book one of these, or hear more options?"

"Me: Book the first hotel."

...and so on...

In this instance, Amazon has to define the schema for user intention, and hotels.com and Airbnb must provide interfaces to that intent, responding with normalized/standardized data that Alexa can then reconfigure into an appropriate response. Amazon has to throw their weight around to get other companies to play the game, and it's on them to organize the returned data into an appropriate response. This is probably where a discussion about the semantic web would be useful :)

The problem is that the trigger for 3rd party skills sucks.

Saying "Alexa, ask [Whatever skill name] to do something" is really unnatural and ruins the whole user experience.

They need sort out the grammar to make it actually feel conversational.

Yep. It's a pain to have to say "Alexa, ask Chevrolet to start my car". Why can't I just say "Alexa, start my car". Clearly it's the only car related skill...

The problem seems to be there is no barrier to entry. Say what you like about Apple's app store, at least they have standards that are enforced, and to a certain extent Google as well. Windows, and the Microsoft app store, on the other hand, suffers the same problem of quality control and lack of curation as Alexa skills. There are some gems, and Amazon have put effort into making Alexa and the Echo devices useful out of the box with their default skills/commands. I've also never had issues with the smart home skills for various devices I have around the house. However, it's a telling statistic that there are over 50 'fart' skills, [0] half of which are rated at three stars or above.

And of course, although stricter curation and quality control would fix this, it would cut the number of skills down hugely, maybe even to 1% of the current size, so the app store looks tiny and people don't bother with the platform. See also Metcalfe's law [1] and network effects...

0. https://www.amazon.com/s?k=alexa+skill+fart&crid=2KFJ9DO3N51...

1. https://en.wikipedia.org/wiki/Metcalfe%27s_law

Amazon constantly runs “hackathons” with thousands of dollars of prizes to the winners and free echos just for playing. When I did it there were probably a hundred projects that were along the lines of “hi Alexa my name is John” “hi John I like you” (or similar effort). And all of them got into the store and all of them got free echos.

I couldn't name a "runaway hit/killer app", but my gf and I get plenty of use from our Echo(s). Nothing we couldn't or didn't do prior to getting them, but we genuinely find the voice interface often wins out over reaching for our phones. We use it:

• as an alarm clock

• as a radio in the morning, podcast/audiobook player in the evening

• arithmetic - e.g. when planning trips we might ask "Alexa, what's 100 Malyasian Ringgit it pounds sterling?", more conveniently than flipping to xe.com or a phone app and an answer we both hear. When cooking we might ask for weight conversions, during DIY we might ask for cm to inches, etc

• to control Hue lighting in two rooms

• to control our TV/amp/satellite ("Alexa, pause|mute the TV/set TV to channel 503/etc"). I used to love my Harmony universal remote but found it pretty cumbersome to manage, and my gf could _never_ get the hang of it like she can "Alexa, turn on/off Sky"

• to play music when I'm wfh

Rather than a hit skill/application, I'd say voice is a superior interface when doing other, dextrous stuff - getting dressed, using the hair dryer, feeding the cat, cooking, washing up, gardening, DIY, etc. Asking for the lights to turn on (the switches are not near the front door) when arriving home in the dark, saying "Alexa, goodnight" rather than turning off 4 lights and 3 devices, etc.

The only thing I dislike about it is how it fires up I'm watching WWE and the commentators use Alexa Bliss's name.

I think the whole voice activation hype inflated itself where people are naturally wanting to see the next big thing and of course people used this as an example - the title says it all.

Furthermore we need to look at what makes voice less restrictive and that’s making it instant. They are still reliant on it being cloud / web based which creates a lag and makes it awkward and nothing like ‘talking to something or someone ‘

It does. Skyrim for Alexa. https://youtu.be/BQl3T0uD5Aw

I followed and even used the tool StoryLine (which is now the enterprise tool https://www.invocable.com/). They had a very active Facebook group that I watched for awhile. The problem I found was that the tool made it too easy to build apps for Alexa. People would put out really junky apps that were the equivalent of fart apps long ago. Now some may argue that all app platforms need to go through this stage but I'd argue Alexa is different. The iPhone, pre app store days, provided other utility that was the driving force behind buying the phone. Alexa is seemingly reliant on 3rd party apps (with a few very minor exceptions ex. timers, weather etc) and therefore if there is no "killer app" what is the motivation to buy one?

"Alexa, set timer for 14 minutes." Great for keeping my iPhone clean while I'm cooking. Otherwise I'd have to try and convince Siri to set a timer...

(But yeah, i agree - definitely nothing very innovative about it. I don't really think voice control is ever going to be the next big thing)

This is looking at Alexa, most likely in smart speaker or some variant form, as the endgame which I don't think it is.

My guess is that it and its competition is more or less a MVP to use audio for input and output and to understand how users interact with this "new" form of input. In a similar way to how Google Search and the various add ins they implemented for it along the way informs Google Assistant, so too I think these are stepping stones for something else.

So I don't think the expectation is to have a runaway hit app, but rather to have a product that users will actively engage with. And considering they're making huge pushes into smart home product integration with Alexa, I really don't see how any of it could be viewed as cautionary... from Amazon's viewpoint anyway.

For nearly two years since me and my family owned Amazon Echo speakers, the Android app was practically just a browser wrapper that frequently loaded slowly or wouldn't load at all (this issue is now mostly addressed wtih a redesigned native-ish Android app but I'm afraid it may be too late). Looking for new apps was an excruciating experience and navigating menus inside the app was bad. I gave up soon after and have since only used the app to reconfigure network settings for the speaker.

I only use the speaker for casting music/podcasts, listening to news briefings, and timer/alarm/weather queries. Amazon Echo and Google Home do these basic functions well but I'd imagine they weren't the only actions these companies wanted us as consumers to do.

It's a utility device, not an entertainment device.

A good toolbox is filled with balanced tools. That there isn't some hyper-addictive time-suck application isn't a bad thing.

However, if it's a case of no one actually using the device, then there might be cause for concern.

The most frequent use cases for us in order of use-fullness:

1. Spotify

2. TV control (with Harmony) - great for the kids so they don't break the remotes

3. Home monitoring when we are away (drop in on the echo spot).

But for me home security seems to be an unexplored opportunity for Amazon (and Google). Those microphones are very sensitive - and if paired with motion sense on other devices (a Nest or something) start to give you powerful intrusion detection tools and alerting. How hard would it be for them to recognise the sound of breaking glass or the presence of someone in the room at a time they shouldn't be there?

Obviously solving the pet issue is a challenge, but I think it's an interesting use case to explore.

Voice is mostly inconvenient and slow unless you have certain disabilities, at times when your hands are otherwise occupied (driving, doing the dishes, maybe when out-and-about via Siri or Google Assistant) and for the basic in-home stuff that people have already discovered (music, home automation, news, weather, timers, random fact from wikipedia).

At this stage I'd be very surprised if there was some new amazing use case. There are always going to be niche uses that some people really like. I create notes in Drafts on iOS via Siri all the time, for instance.

> "The advent of the smartphone triggered an app gold rush. So far that hasn’t happened with Alexa."

While gold rush is an accurate analogy, I'm not so sure that's a positive. I remember a couple+ years ago the NYT doing an article / series on (native) app developers and how - contrary to the hype - there were significantly more "losers" than winners.

So has Alexa made it worse? Now there are no winners at all? (Editorial: Well, sans Amazon and its shareholder.)

We use it to "set a timer for 10 minutes" when cooking something and "what's the weather for tomorrow" to know whether to bring a coat or not.

Either of which could be done just as quickly by setting a timer on your oven/microwave or pulling out your phone.

Using Alexa for tasks like this is a qualitatively different experience. Hands-free/voice control can be very useful in that your hands can be busy, and you don't need to find or walk to and manipulate a device or remote.

If you're cooking, the timer on your oven should literally be right in front of your face.

Every use case I hear parroted to justify the installation of audio surveillance devices in our homes sounds like a solution desperately in search of a problem. What do you plan to do with the milliseconds of time you may have saved?

Sure, but I can ask Alexa what the weather is like while actively doing something else, like say before brushing my teeth or while in the shower. Or while in my kids closet, picking out clothes for the next day.

And while I can set a single timer on my microwave/oven, or set them on my phone, being able to set a timer, hands free, while cooking is pretty damn convenient. I can set a timer while my hands are dirty from mixing meat for meatloaf or handling chicken.

There's something to be said about the convenience factor, and that's what most technological innovations are about, making life easier or more convenient.

It may not be a runaway hit, but if you have kids and an Echo device, you should ask Alexa to "Open the Magic Door". It's essentially a text adventure with fun sound effects and good storytelling.

I can imagine a future where Magic Door or something like it becomes a big hit. BTW, there's a web site for the game: https://www.themagicdoor.org/

I disagree...the killer app is a smart speaker (i.e. Amazon Music + voice control). That in an of itself is pretty useful, especially when you can get a reasonably good one for under a hundred bucks now.

Outside of that, Alexa isn't the platform the industry thinks it is but rather the equivalent of the touch screen on a smart phone. It's another form of input.

There needs to be much better and more seamless integration with 3rd party apps for it to really be useful.

You are right, I would say (Music + Voice Control) as you can configure other services (Spotify for instance), there are a lot of negative comments here, I wonder if it is not due to the population of Commenters on Hackernews that would tend to be younger.

If you have a child it is nearly a nanny that helps your child doing his/her homework, plus you can connect several Alexa (from friends) and interact with them, if you have also several Alexa you can communicate in different rooms (speaks/broadcast)

Alexa is useful to visually impaired and people and seniors:

  - landline dial by name
  - radio station by name
  - TV channel by name
  - play/resume audiobook by name
  - microwave for duration 
  - device power on/off 
  - time, weather, etc
It takes some setup to get all devices integrated, but once it all works, it's borderline magic to a non-technical user.

I think we still don't have a good model for how voice UI needs to work. The equivalent of GUI for mouse-and-keyboard machines or tap-and-swipe interfaces for mobile.

If we did have a UI model, I imagine we'd be controlling car stereos and driver-mode mobile phones with voice already.

I wonder if any of these 80k apps have the "magic sauce" but just haven't applied it to the right problem.

Controlling a mobile phone while driving while using your voice is definitely a thing.

And I suspect if you have your phone hooked up to your car stereo, or integrated using Google or Apple's in-car docks, then using your voice to play music is also possible.

I think your point about not having a good model is correct, though, for anything other than basic uses (phoning, messaging, playing music, navigation, weather, etc)

I think the proper solution is to go old-school.

That is, provide a traditional desktop PC experience that is augmented and improved by Alexa. It should always be more productive to ask an AI than to Google, if this is to have a chance of success.

The problem was jumping immediately to a new modality instead of bridging toward it from the present.

Android assistant has the right idea.

I bought an echo on an Amazon sale to explore making "skills" for it.

But (at least then) there was no real way to monetize anything using it. Making an interesting skill that people use will just cost you money.

BigCos or VCCos can use skills for generating good will I guess. But independent developers, not so much.

Linking my Apple Music to Alexa was huge for me. It’s nice to be able to play my playlists or pick an album.

Shame that we can't do the same in Canada

As a techie, I have barely used it for weather , NPR, WSJ thats about it. May timer sometimes. But thats about it. I guess, dialogue as an interface is a little ahead of its time.

While I feel Alexa has a lot of potential from the couple POC apps I've developed, the device class itself simply isn't where I need it to be to fully capitalize on it yet.

Alexa was pretty mediocre. I use my Google Home to play Netflix on my tv including rewind, play music, turn off my tv, and pre heat my Tesla.

If you have an Alexa you really need to try playing Jeopardy on it. The Jeopardy Alexa skill and Akinator skill are fairly quick casual games that are very worthwhile.

I don't know anyone that has one of these assistants in their home. If you have one, do you like it and why? What are the main benefits to having one?

How is the discovery of Alexa apps? Other platforms often had hits because they had some viral component that helped the apps spread.

Skills targeted at children are the worst. I am not a child! Why should I have to jump through hoops because other people have children?

Amazon’s Alexa Has 80k Apps...

Perhaps, but by eyeball, no less than 95% of those are stupid useless trivia games.

“Computer Play ____” from Spotify is great. Everything else feels like tedious novelty.

Today we use several echos only as a time/voice activated watch/spotify. They well be thrown own if we move and replaced with non-cloud wlan speakers.

The runaway hit is probably Amazon buying. Miss that so much on my Google Home.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact