
I ask 100 information questions to four digital assistants - forgot-my-pw
https://vlad.d2dx.com/the-great-assistant-skills-comparison-google-alexa-siri-and-cortana-head-off/
======
dahart
What's interesting to me reading this is the expectation that voice search
should do things that text search currently doesn't do. If I ask Google about
Bill Murray running for president, I presumably am looking for articles that
mention the words "Bill", "Murray", and "President". I would expect to return
articles that best match the query, and I would never expect Google to be able
to tell me the difference between real and fake articles. For a server to
answer the author's question it has to understand the question so well it can
change the question into "What is the list of people that have run for U.S.
president?" and then check if Bill Murray is in that list. That's a _tall_
order.

We are moving past expecting the most word matches into expecting the server
to understand what we really want. Voice search is more like the "I'm feeling
lucky" button, because it takes longer; you only have time for one answer and
the first answer has to be right. It comes without the expectation that you're
_lucky_ if the answer happens to be right, now we _need_ the first result to
be the rightest result there is.

So the glass is half empty. I personally prefer to see it half-full, but it's
also true and the critique is more valuable and interesting than optimism.

~~~
acdha
It seems like there's something akin to the uncanny valley effect going on
here where a voice UI invites people to think about the other end of the
conversation as a person and then be disappointed when they hit the edges of
what it's designed to do.

There's a really interesting discussion to be had about how UI decisions can
make that process smoother — I really liked
[https://bigmedium.com/speaking/design-in-the-era-of-the-
algo...](https://bigmedium.com/speaking/design-in-the-era-of-the-
algorithm.html) as a call for how you can make the failure modes of the system
more graceful. I think a lot of the success in the next decade or so is going
to come from the places which figure out good answers for not making a system
which seems to promise more than it can deliver.

~~~
redler
The fact that we interact by voice leads toward a sort of inadvertent theory-
of-mind about the other party, which makes the pulling away of the curtain
with so many of the answers much more jarring. Voice interaction seems to
recruit a much deeper evolutionary expectation than the much more recent
phenomenon of typing and reading.

~~~
Retra
That's some heavy speculation.

~~~
redler
Well, this is Hacker News.

------
GCA10
As much fun as this test is, it dodges the most interesting question of all:
"Are these machines supposed to be talking search engines?"

I'm increasingly believing that the answer is: "No." These machines
(especially Alexa) are rapidly gaining popularity while still providing pretty
ragged answers to search queries. So we should start asking: "Are they taking
on a different function that didn't match our early expectations?"

In a word, yeah. Alexa is a really nifty jukebox for those of us that don't
have the good sense to create formal playlists. It's a handy kitchen timer,
especially if you've got multiple pots doing different things. It's a better
alarm clock and a better purveyor of soothing bedtime sounds. (If you're
asking: Good god, how many people really want or need that, think: Fussing
infants.)

Smartphones already provide pretty excellent search results on the fly. I'm
not sure voice-powered assistants will re-solve that problem with great
success. But there are a surprising number of rudimentary needs around the
house for which a voice-enabled device becomes quite handy.

~~~
stephengillie
These digital assistants are just begging for an app store. Search is just the
first app, jokes and weather are other useful apps. These could easily follow
a similar product life cycle pattern as smartphones.

~~~
rrdharan
They have app stores:
[https://www.amazon.com/b?node=13727921011](https://www.amazon.com/b?node=13727921011)

~~~
coryfklein
* One has an app store

------
IanCal
I tried the conversational weather one on google assistant on my phone.

"Should I take an umbrella tomorrow?"

"No..." and shows me tomorrows forecast.

"What about the day after?"

"No..." and shows a forecast that when I look more closely at I notice is for
_today_.

Neither of these also spotted that I'm heading to another city tomorrow, which
is in my calendar. If I changed it to "do I need to take an umbrella for my
trip tomorrow" it just searches google and gives me a search result suggesting
I take a small folding umbrella... for a trip to Thailand.

~~~
visarga
It might be so, but remember that experts were predicting computer will beat
top humans in Go in 10 years. Maybe next month there will be a breakthrough
with NLP. The amount of research and compute going into this problem is
amazing, and we don't know what's around the corner.

Also, it might be possible to create a much better assistant today, but it
would be too expensive to offer to the public for free. What if it requires
100 TPUs to run?

~~~
opportune
This isn't an NLP problem, it's a coder problem. These solutions already
exist, Google/ other assistant providers just need to dedicate the man hours
to make it happen.

~~~
nl
No, they don't (generally) I do NLP for work, and the edges cases are so
frequent and aggregate over processing pipelines.

That means the end to end performance of the system is bad, and it isn't clear
how to systematically fix it.

In this particular case there is the added system integration too, which is
"just programming".

~~~
IanCal
Perhaps, these just feel like such common things to ask and do that I don't
get why they're not planned for. The conversational side works for the
weather, the remaining problems are:

1\. There is an in-built assumption I am always where I currently am. This
part isn't anything to do with the NLP.

2\. "the day after" is translated to "today", possibly. [edit - see lower, it
is in only one case]

3\. This is more of an NLP one, it understands that the context of "weather"
carries from one question to the next, but not the timing. So asking for the
weather tomorrow and then "one day later" gets tomorrows as well. [edit - more
complex than this, it's actually working in some areas and not in others]

I'd like to see what user-stories it's trying to solve, because apart from
setting timers and alarms it's been massively hit and miss for me.

I tried to repeat what I'd put in and this time I had:

"What's the weather the day after" \- translated to tomorrow, with no context,
that makes sense.

"What's the weather today?" \- weather today, followed by "What about the day
after?" which gave me results about the film.

"What's the weather tomorrow?" \- weather tomorrow, followed by "what about
the day after?" which then worked.

So it works just fine for "weather" but not for asking if I need to take an
umbrella. And it doesn't work if I ask for today then the day after, but does
for tomorrow and the day after.

Why does the context get passed on for the day correctly for "tomorrow" but
not for "today"? Why can it get that "the day after" means tomorrow, unless
I've asked about an umbrella in which case it means today? At the core of my
question is _how is it this inconsistent_?

What tests have they got around this?

------
greggman
I found it curious the author didn't like that google was quoting webpages. I
actually liked that Google was not taking credit for the info and Google was
informing where the info was coming from so I can decide if I want to trust
that info or not.

Of course if I ask for a fact like "How many inches in a meter" I just want
the answer. But, if I ask "what's the weather going to be tomorrow" I might
prefer an answer like "badweather.com says it's going to rain tomorrow" so I
can then think (ugh, badweather.com is always wrong) and ask "What does
goodweather.com say about tomorrow's weather". Ideally I could ask the
assistant to use a particular site by default. This is specially true for me
because Siri's default doesn't seem very accurate to me being that I live on
the other side of the world from the offices of the company they use for
weather info.

~~~
ancalimon
Heya,

I'm sorry for the confusion. My issue with quoting webpages is not that it
quotes them -- that's fine -- but that it does so in a very verbose manner.
This leads to information overload. For example:

Me: What is the boiling point of water at an altitude of 1km?

Google: At sea level, water boils at 212 °F. With each 500-feet increase in
elevation, the boiling point of water is lowered by just under 1 °F. At 7,500
feet, for example, water boils at about 198 °F. Because water boils at a lower
temperature at higher elevations, foods that are prepared by boiling or
simmering will cook at

The problem is with the vast quantity of information, and the fact that some
is both irrelevant and truncated. The last sentence is incomplete and cut off,
yet as a listener I have no way of knowing this. I will thus try to remember
it, at the expense of the facts that came previously.

When reading a webpage, the important part is to read the specific parts of
interest, and not overload the user. If it can't do that, it risks providing
irrelevant or, quite frankly, confusing data (such as the odd answer to how
much a Dreamliner weighs). I don't know if that's better than not providing an
answer at all.

------
kelchm
One of the most 'magical' experiences I've ever had using Google Assistant was
the following exchange:

Me: "Okay Google, What's the latest album by Death Cab For Cutie?" Google:
"The latest ablum by Death Cab For Cutie is Kintsugi"

Me: "Okay Google, Play the album Kintsugi on Spotify" Google: "Okay, asking to
play Kintsugi. [album starts playing]"

~~~
sib
Wouldn't it have been a lot more magical if it had simply asked, "Would you
like me to play it?" (knowing that you have a Spotify subscription) at the end
of answering your question?

Or, at least, for you to be able to say "Play it!" rather than the unnatural
"Okay Google, play the album Kintsugi on Spotify"...

~~~
tomc1985
I would have preferred a straight answer, without a servile question at the
end.

------
elicash
I got 22 out of the "40 verbose Assistant questions" correct. Not bad! I beat
them all (as a percentage).

Maybe not a bad idea for a gameshow.

~~~
chris_overseas
That's similar to how The Chase[1] works, except contestants go head to head
with a professional quiz master instead of a digital assistant.

[1]
[https://en.wikipedia.org/wiki/The_Chase_(UK_game_show)](https://en.wikipedia.org/wiki/The_Chase_\(UK_game_show\))

------
codekilla
more people are starting to appreciate that AGIish stuff is actually really,
really hard.

~~~
ghaff
The ability to function as a virtual assistant, even at the level of a not-so-
sharp intern [1], would be a killer app.

Give it some parameters for a trip you're taking. It comes back with some
options and follow-up questions. We are a long way from that point. Even that
not-so-sharp intern has a huge amount of internalized knowledge about general
preferences, cities, airports, etc. and probably knows questions to ask to
narrow things down.

I strongly suspect there are other domains where a lot of people are assuming
we're 90% there and we're not.

[1] Not to insult interns or any other group. I just mean you don't need to be
at experienced executive assistant level to be really useful.

------
notadoc
I find Google Assistant to be very good at answering most questions.

Also, I can accurately get the weather from Siri most of the time.

------
51Cards
I notice some different answers on my devices. For example on the "Are
tomatoes vegetables?" question my Google Home states that they are definitely
a fruit. (quoting Oxford Dictionary)

Edit: And "What's the height in meters of the Empire State Building?" gets me
"381 meters, 443 meters to tip"

------
nmstoker
I'm surprised by a number of the failures, such that I did wonder if the
failure might be happening on the speech recognition side rather than the
response generation side.

Both Google Home and Alexa have very little problem recognising my speech
whilst friends often struggle even when they seem to say the precise same
phrase, to the point it's mildly entertaining. With Google I suspect they've
tailored to my voice (I've used voice commands extensively for several years)
but I've only had a Dot briefly and it worked well from the start. Another
surprise is that they cope well with my perculiarly English English phrasing
and pronunciation, but I'm sure there are lots of less widely spoken dialects
that would throw them.

~~~
ghaff
Alexa is the first thing I've owned that really does quite a solid job in the
voice recognition department (whatever its failings to return something useful
based on that recognition). Siri on my phone is rather hit or miss by
contrast.

I suspect that the microphone array has a lot to do with it. Anecdotally, I've
read pieces by people saying that homebrew "Echos" together with the Alexa
APIs aren't as good as an actual Alexa.

------
rojobuffalo
I keep coming back to the idea that progress towards AGI might be made by
someone working on a "coordinator" agent. We might have several narrowly
focused agents with deep knowledge in particular domains: a mathematician, a
fact-checker, a botanist, a structural engineer, etc.; then have an agent that
broadly understands how to route requests to the right vertical. Maybe that's
already descriptive of the underlying architecture for some of these agents.
The alternative might be that we interface with several different
conversational agents, and like interfacing with people, we use our judgement
to decide which specialist to ask.

~~~
Bjartr
That's kind of what Watson did, but that level of architecture hasn't made it
into personal assistants yet

------
majani
I'm curious about how accurate the assistants were at listening. That used to
be the most pressing issue with voice commands that had relegated the
technology to a running joke. It appears that's understandably what the
companies have been focusing on so far, but there's still work to be done to
get to 100% accuracy of listening, especially when you take into account
exotic names and interchanging between languages and slang.

------
rojobuffalo
Might be interesting to test with Wolfram Alpha as well. It looks like some of
the questions wouldn't fit the WA API, but I'm curious how it would score.

------
colinbartlett
Forget about questions, I cannot even get Siri on Apple TV to recognize what I
am saying. I have often wanted to keep a kind of journal like this poster but
I suspect it would recognize the correct words about 30% of the time.

My wife who, unlike me, is not a native English speaker has probably a 10%
success rate. This is why any kind of forthcoming voice-response Apple device
is completely a nonstarter to me.

~~~
Domenic_S
What a weird conclusion, that future -- and presumably better -- tech would be
a nonstarter because current tech doesn't work for you.

~~~
michaelmrose
The parent poster is dubious that improvements in features will coincide with
improvements in recognizing his voice. Voice assist functionality could be
200% more awesome but if it specifically doesn't seem good at just recognizing
what he has said such functionality is useless to him. This isn't terribly
strange at all.

------
paradite
Ensemble for the win, I suspect that they can perform better than the combined
individual best when sharing training resources and models.

------
jandrese
> Siri took the crown on factual questions, but surprisingly did poorly on
> reasoning (“Queries”) where I expected the Wolfram Alpha-backed service to
> get flying colours.

Didn't Apple ditch the WolframAlpha integration pretty quickly after Siri was
released? I remember a lot of the Wolfram type queries stopped working shortly
after release.

~~~
yosito
They didn't ditch it, but they did fuck it up and seem to prefer Bing searches
now over Wolfram Alpha queries.

------
myrandomcomment
So I decided to ask Siri some of these that he listed as giving a Bing search
answer that I felt Wolfram would have answered correctly for Siri. I my case I
did get the correct answer, not a Bing result.

Where does the Jackfruit grow?

What is the boiling point of water at an altitude of 1km?

What is 1km in feet?

How far away is Disneyland?

For "km" I said "kilometer" and not "km".

------
LesZedCB
i wonder if there would be any use in services that don't respond in real-
time.

I think these digital assistants are nerfed by the real-time response
requirement. I'd be happy to ask some of those questions, and get a pop up in
a few minutes. And they could be of much higher quality as they can be
processed and better researched.

------
benzoate
I’d be interested to see how Siri performs on different devices. The Siri on
your phone is not the same Siri that you have on your Apple TV. The Apple TV
version is much better at giving information about tv shows, movies and music,
but terrible in comparison for everything else.

------
contingencies
"Okay Google, spend my money."

~~~
glitcher
The Alexa version :) [https://xkcd.com/1807/](https://xkcd.com/1807/)

~~~
spurlock
"Okay Google, infer semantic meaning from my words"

------
EGreg
Why isn't siri as half as smart as Wolfram Alpha's box? Someone should license
them!

I want to be able to ask basic factual questions while driving, get the
answers and dig deeper.

Until then I would like an audio service like Google Helpouts used to be, on
demand. Like Magic service.

~~~
freeone3000
Wolfram Alpha is slow. Even if it's right, it's only good for knowledge
questions out of its database - things like public figures ("how many children
does barak obama have"), physical statistics ("what is the melting point of
tungsten"), and so on work fine. However, topical ("what about aluminium?"),
temporal ("what's the weather?"), and location-based ("show me restaurants
nearby") are outside of scope for wolfram alpha entirely - so a given app must
aggregate.

Why don't apps aggregate? The "can you handle this?" api endpoint is
frequently returns false positives, and the proper API is really slow
(multiple _seconds_ ) for negatives . If we get a false positive, or something
hard to detect as a negative, that's the only answer we can show. And since a
voice assistant is expected to return one answer quickly, this is straight
out.

------
ksk
One problem is that the computing power dedicated to each user is minuscule.
If you could dedicate a super computer for processing every input, you could
have a much more sophisticated system that could easily deal with all of those
queries.

------
csomar
It's interesting that while many fails, there is still one that wins. That is
if you combine the efforts of these 4 digital assistants, you'll get a much
smarter one. Do they have an API? Can you query siri, cortana, etc..?

~~~
giobox
There are APIs for both Amazon and Google's voice assistant services. Not
surprisingly Siri doesn't expose a public one, I've no idea about Cortana.
I've messed around a little with them on the Raspberry Pi.

This idea, while simple in principle, might be kinda annoying in practice.
You're still left with similar issues - how do you decide which talking
cylinder service answered the question best? Do you play all of the answers?
For me I'm fairly sure listening to all of them in a row would frustrate me
even further - just waiting for Alexa to finish telling me the news headlines
is sometimes kinda annoying, especially when that information in visual form
can be grokked almost instantly. Many of these devices, especially the Google
one, are getting better at context based followup questions - managing who to
send your follow up question to could be kinda crappy as well. I suppose you
could do one device that could ask each service individually ("Alexa...", "Ok
Google..."), but in my experience as soon as I get one bad answer, I
inevitably just use google.com to find what I need rather than risk wasting my
time on another failed conversation.

The main part that I've found hard to do in home rolled voice assistants is
microphone arrays. Almost all these devices use pretty sophisticated
microphone technologies for things like noise cancelling, subject isolation
etc, which so far has been non-trivial to do to a similar standard in homemade
versions of them. It also certainly used to be the case that creating your own
"hotword" system to call the Alexa API was technically against the ToS (it
allowed you to use a button press to call Alexa instead), as naturally Amazon
would rather you buy a real Echo. No idea if this is still the case, and at
any rate Amazon can't really enforce this either, but worth mentioning.

~~~
forgot-my-pw
A funny popular experiment is the seebotschat Twitch account who livestreamed
2 google homes running Cleverbot API talking to each others.

Here's a short highlight:
[https://www.youtube.com/watch?v=WoI6_z2mfdY](https://www.youtube.com/watch?v=WoI6_z2mfdY)
Some implementation details in AMA:
[https://redd.it/5nz3eb](https://redd.it/5nz3eb)

------
lowbloodsugar
Man asks Amazon digital assistant about the price of Lay's chips, is
disappointed when it "has little interest in having a conversation about it"
and wants to sell it to him instead. o_0

------
boznz
English is a terrible language for this, unfortunately it's the only one I
speak.

Not sure how other languages cope, I suspect the simpler ones cope much
better. We almost need a spoken equivalent of SQL

~~~
thaumasiotes
English is best known for being simpler than average, not more complex. It's
one of the flagships (along with Latin / Mandarin Chinese / Swahili) for the
theory "languages which are widely learned by adults become simplified over
time".

------
davidw
Google's thing can't even figure out my wife's Italian name most of the time.
It's quite frustrating.

~~~
octalmage
My girlfriend has the name Taryn and Siri really struggles with it, usually
correcting it to Karen or Terell (both names in my address book). Alexa does
better but probably because it doesn't know about the other names.

~~~
evilduck
Not that it excuses Siri's shortcomings or helps if you're referring to her by
name mid-sentence texting to someone else, but you can give assign nicknames
in your Contacts app. It might make it less frustrating to dictate texts or
start calls. So instead of saying "Call Taryn" and getting it misheard you
could say "Call my girlfriend".

------
dheera
It's far less than half for me. Maybe less than 5%. Examples of queries that I
frequently need that won't work:

\- OK Google, please download an offline maps area for Yosemite National Park
and about 80 kilometers around it.

\- OK Google, navigate to Yosemite National Park. Highway 120 is closed so
please remember that when we get to an area without reception, do not route me
via 120 in the offline maps.

\- OK Google, let me know when we reach a place along our route where I can
buy an SD card.

\- OK Google, let me know when we reach the last Trader Joe's or Safeway along
our route that is still open.

\- OK Google, when is the next Caltrain arriving in Palo Alto that stops in
San Bruno?

\- OK Google, if I miss that train, when is the next train?

\- OK Google, get me an UberPool for 2 people to Castro St. in Mountain View
as long as it's under $10 and would arrive within 30 minutes.

\- OK Google, what is the name of the driver and what is the license plate?

\- OK Google, navigate to my friend's party on Facebook.

\- OK Google, call my friend 5 minutes before we arrive. Their number is in
the wall on the event page for my friend's party.

\- OK Google, find me a restaurant that has non-americanized Chinese food,
recommended with higher frequency on Chinese language websites than English
websites, and has vegetarian options.

\- OK Google, connect to my Nest thermostat. My username is XXX and my
password is YYY. Turn on the air conditioner 30 minutes before we arrive home
based on the navigation.

\- OK Google, close that Java error that popped up and is covering the
navigation.

\- OK Google, please zoom out the map slightly so I can see how far we are
from the destination.

\- OK Google, please go back to the normal navigation view.

\- OK Google, navigate to my next calendar event's location.

\- OK Google, install Facebook Messenger, login as XXX with password YYY, and
message ZZZ saying that I'll be late by however much the Uber app estimates.

\- OK Google, turn off my alarm clock whenever I am biking or driving.

\- OK Google, let me know if I get an e-mail from XXX in the next 2 hours.

\- OK Google, please block all calls except from XXX for the next 1 hour.
XXX's phone number is in their e-mail signature.

Yeah. We're a LONG way before assistants are useful. The pieces are all there.
It's not a machine learning problem anymore. It's just that there are just way
too many walled gardens between the various parties that hold the data
necessary to be useful.

~~~
spurlock
> there are just way too many walled gardens between the various parties that
> hold the data necessary to implement any of the above.

This is changing. If Google had their way, they would be assigning IPV6
addresses to bits of dust lying around in your house and trying to assign
semantic meaning to them. _If_ they had their way.

------
maerF0x0
I'm gonna make a service where you ask my service and it answers the 4 answers
given :D

------
coldcode
I would like to know how much the pack of lies costs?

~~~
anentropic
"It’s right there on their website. These numbers do not accurately represent
the price you will pay"

------
netvarun
Shameless Plug: I work at Semantics3
[[https://semantics3.com/](https://semantics3.com/)] - an API for product and
pricing data.

These 4 digital assistants should partner with us to help their users find the
prices for a pack of Lays chips, iphone, etc. ;)

