Hacker News new | past | comments | ask | show | jobs | submit login
Google Now vs. Siri vs. Cortana – The Great Knowledge Box Showdown (stonetemple.com)
298 points by nreece on Oct 9, 2014 | hide | past | favorite | 168 comments

Having used both of them, I really couldn't live without Siri to be honest. I'm quadriplegic, and whilst obviously I wouldn't die without Siri my life is immeasurably better with her.

Some of the killer features of Siri for me are being able to to write emails, send messages and get quick answers to general questions throughout the day. She is much better at the first two than she is at the last one though, but even then she is not too bad.

And now that Apple has hooked her into much more of iOS in iOS 8, it means that more of the OS is open to me than before just by simply using my voice. And the "Hey Siri!" feature is one of the best accessibility features they've added so far.

I never considered using Siri this way, which feels strange to me since I've spent a lot of time thinking about the other Accessibility features of iOS (mostly VoiceOver). Have you ever written up your experiences with iPhone accessibility? I'd be interested to read it.

You know, I haven't got round to it just yet but I definitely will do. My blog[0] is focused on using robotics to help quadriplegics like me, and that generally has some accessibility stuff on it. Will definitely post some iOS-based stuff now I know there is interest in it though!


> I never considered using Siri this way

In what way? The OP seems to use it in the most basic way possible--dictation and simple searches.

I can't speak for the other two systems compared, but to be clear, OP's examples involve more than "dictation and simple searches"; Siri also controls the device. Without entering the text messaging or email apps, one can tell the virtual secretary to read and write messages and emails. Assuming Siri understands your voice well enough.

Fun specific example at the lock screen after holding down the iPhone's button for a couple seconds: "Read the latest message from my wife to me." Then, still at the lock screen, "Reply to my wife I love you".

Maybe it's just my experience, but I did not care much for Siri until she allowed me greater control over my device. When she first came out, you could not ask her to read the screen; later, the program was able to toggle assisting configuration like VoiceOver by command.


Siri was essentially a curiosity for me until they allowed me greater control over the device, but I use my iPhone in exactly the way you've just described.

Along with "am I meeting with my physiotherapist today?", and "arrange an appointment with my doctor, physiotherapist and partner" at which point Siri creates the appointment in the calendar and sends an email invite to the parties mentioned. Assuming those people are in your contacts obviously.

Sending email and text messages have been part of Siri since it was bundled with iOS...

> In what way? The OP seems to use it in the most basic way possible--dictation and simple searches.

I wouldn't want to put words into his mouth, but I would suggest he meant using Siri as a disability aid. Might be wrong though! :-)

More or less. I meant Siri as the primary/sole way of interacting with the iPhone.

I mean this in a very honest way: How are you posting here? Dictation software or something similar?

Then that deserves an honest answer!

I'm using a device called a TrackerPro[0] which tracks a small reflective dot on my glasses, this translates my head movements into cursor movements. To initiate a left mouse click I use a [buddy button[1] under my right index finger, this means that using just my head and right index finger I can control the mouse just like anybody else.

To input any kind of text into the computer I use DragonDictate for Mac[2], this has the dual function of enabling me to do speech to text really well but also to trigger little shell scripts and AppleScript's I've written.

So, if I say "Xylophone Google" Dragon will recognise that as a command and jump to the Google homepage. That's just a really simple example of what you can do, but as you might imagine it's one I use a lot each day. :-)

[0]:http://www.ablenetinc.com/Assistive-Technology/Computer-Acce... [1]:http://www.ablenetinc.com/Assistive-Technology/Switches/Budd... [2]:http://www.nuance.com/for-individuals/by-product/dragon-for-...

Very neat. Just out of curiosity, how long did it take you to write the above comment?

Thanks! The above comment took me probably about three minutes to write, and that includes thinking time. :-)

Very cool!

I completely agree. I commute by running, and it's unsafe to look at a screen. So I frequently issue commands into my headphone mic like "read my texts" and "text <name> I'll be there in five" or "shuffle playlist 180 BPM". Many people dismiss Siri on account of mediocre answers to open ended questions, but it's quite effective at carrying out precise instructions.

I really like the "Hey Siri!" feature, although why it's only usable while the phone is charging is beyond me. I thought maybe battery concerns since it has to be constantly listening, but I wouldn't think it would be that taxing on the battery.

How do you handle typos in iOS as a quadriplegic? They're a pain in the ass to fix by hand and I didn't even know you could do it by voice.

I couldn't agree with this more. You're absolutely correct that battery performance is the reason they don't switch it on by default. However, I really wish they would give me the option whether it's turned on by default or not. M I would gladly take the battery hit to use this feature Apple!

I also use the Apple Switch Control[0] feature which enables me to navigate text and correct typos, this is done through a series of sequential button presses which enables me to use menus to access all of the features of the phone. I use Switch Control in conjunction with the Tecla Shield[1] which uses Bluetooth to connect up a button by my cheek and the iPhone.

So yes, fixing typos when you're quadriplegic takes a little while but compared to the alternative it's awesome!

[0]:http://support.apple.com/kb/HT5886 [1]:http://gettecla.com/pages/tecla

Edit: Stupid voice dictation mistake!

You can do voice activation for very little power if you have dedicated hardware for it. The iPhone doesn't have such dedicated hardware (presumably they decided to add that feature after most of the current hardware shipped) so the constant listening has to run on the main CPU. That in turn means that the main CPU never gets to idle for more than a few milliseconds while the feature is active, so it'll suck up a lot of power compared to just sleeping normally.

Thanks for the detailed answer, I did wonder what the exact reason was. Do you think there would be much overhead - financially speaking - speaking to adding the dedicated hardware? I ask because I can only see them doing it if it's financially viable, as in, will enough disabled people or those interested in voice recognition by the iPhone to make it worth our while.

Also, does the CPU get hot whilst it's plugged in and constantly listening every few milliseconds?

I doubt the CPU gets too hot. We're talking about something like 5% utilization. There's a pretty large difference in power consumption between a constant 5% and zero, but there's also a pretty large difference between 5% and 100%. If you run an iPhone at full blast it'll last a few hours before the battery dies. If it's completely asleep then it'll last days. Constant listening would probably cut the standby time down to, fairly wild guess here, a day or two. Which when combined with normal usage over the course of a day could make the difference between making it until bed time and needing to recharge in the afternoon.

For the iPhone 6, Apple claims 10 days standby time, 50 hours audio playback time (which uses dedicated hardware for most of the work) and 10-11 hours of internet use. Which doesn't isolate the CPU, of course, what with the screen and radios, but should give some idea of what's going on.

cool, thanks for the explanation.

I wonder if it would be possible to use that dedicated audio playback hardware for this purpose, I would definitely pay a relatively small sum towards RND if anybody is interested!

I also wonder if there's a worry that average users might turn this on accidentally, running their battery out and then shouting at Apple? If that's the case, it could be happily buried in the Accessibility menu I think. Seriously, this feature would enable me to keep my iPhone in my top pocket while I go out and be able to access Siri rather than cart about all the other equipment. It would be great!

Edit: Why isn't accessibility in Dragon's menu?!

Apple typically likes to avoid any feature that might cause trouble, even if the usefulness could easily outweigh the problems. Their motto is "It Just Works" after all. Problems with accidental activation and such wouldn't surprise me as being part of their rationale.

I'd bet that allowing "Hey, Siri" on battery will be one of the first tweaks available once an iOS 8 jailbreak is released. Might be something to keep an eye out for if you're brave, jailbreak-wise.

Also, have you considered an external battery providing power over USB? The phone will be in a charging state that way. Sort of a silly workaround, but maybe worth it.

I think an add-on for this would be fairly easy. All you need to do to activate Siri is long-press the home button, which you can do externally through the microphone port or over Bluetooth. A widget that listens for the magic phrase and then simulates a button press would be doable. No idea how much it would cost or how hard it would be to build....

We had a quadriplegic friend live with us when I was a kid for several years in the 80s. He spent a lot of time watching TV... there just weren't a lot of options. We had a clapper set up that he could activate by making sounds with his tongue, and a stick he could hold in his mouth to manipulate the remote control, and also do things like make phone calls.

He was a smart guy and was really interested in all sorts of things, but he was sadly limited. Posts like yours remind me of him and make me wonder how he'd get on with modern technology. I do recall investigating speech recognition back in the day, I think it was an early version of Dragon, but the cost was just too high and the uses, with 80s-era computers and no internet or even nearby BBSs to connect to, were just too little. Anyway, thanks for sharing your experience.

You know, if I occurred I would get down on my hands and needs and hope that lovely blue piece of ethernet coming into my laptop. Not joking

I can only imagine what I would have done to indulge my intellectual curiosity without the Internet, sure I could have waited for people to bring me books, but that just makes me shudder. I also tried the early versions of Dragon and they were great, as long as you could also use a mouse to correct little typos and similar. So yes, not much use to a quadriplegic.

Edit: I forgot the: shudder

Yeah, I vaguely recall him reading a fair amount as well (I was pretty young, so details are hazy) but there are limits on that, especially when you're in rural Virginia dozens of miles from the nearest bookstore or library.

Hey, thanks for pointing out the impact of this technology from the perspective of a quadriplegic. It's always eye opening when a technology I really like turns out to be a lifesaver (hopefully not putting words in your mouth) for people with disabilities. Had a similar revelation re amazon home delivery recently. Great to see these technologies making meaningful improvements in peoples' lives.

I am not afraid of a little hyperbole where technology is concerned, iOS 7-8 really were a lifesaver. It enabled me to carry on and intellectual life with the outside world, in a way that wasn't possible before. So again, not dead without it, but really fing bored!

FWIW, I just switched from Windows Phone to iOS -- from the 1020 to iPhone6+. Personally, I preferred Cortana over Siri. And it basically comes down to speed and responsiveness. Cortana is much quicker to launch, and much quicker to find results. I haven't noticed any discernable difference in their query results. And actually, for one of the my most used queries - "What is the weather today?" -- I dont even use it on iOS because the stock weather app is disappointing, I use Yahoo. Cortana telling me the weather was much more usable. Siri is just so freaking slow, that I rarely use her.

Agreed on Cortana.

There are some things I feel Cortana does that Siri and Google Now does.

Every morning Cortana gives me the weather, appointments on my calendar, news stories and how long my commute will take given the current traffic conditions. It also learns when I normally leave for work, so about 10 minutes before I usually leave, I get an update on how long it will take to get home given the current conditions.

To me, Cortana is more like an assistant. You can set stuff so its off limits and she will ignore, and likewise, tell her to remember certain reminders like you pointed out.

Do either Google Now or Siri have these features that actively learn stuff about your preferences?

Google Now does the same with commute times, and seems to automatically work out where you live and work. Same with appointment reminders, assuming you added a specific location to your calendar event - it's a bit hit and miss for me whether these reminders actually come up, but sometimes it tells me "Leave now to get to this appointment on time".

Things like these are a much more useful part of it than voice detection, I very rarely interact with it via voice, and tend to just do searches through my normal browser. It also hooks into Gmail and can give you reminders about things like flights you've booked, although that raises questions about privacy, and whether you really want Google to be trawling through your emails.

> Google Now does the same with commute times

Eh, it also does it at really inappropriate times.

About 30 minutes after I get into the office it starts telling me how long it'd take me to get home, and that card stays up in Google Now until I actually do go home!

> Personally, I preferred Cortana over Siri. And it basically comes down to speed and responsiveness. Cortana is much quicker to launch, and much quicker to find results.

I haven't used Cortana, and have only had disappointing experiences with Siri (after coming from Google Voice Search); how does the performance on Google Voice Search Compare? I wonder if the performance rankings are just a complete reversal of the quality rankings; If so, I wonder who's closest to the sweet spot?

I haave used Google Now briefly with the Moto X. It is definitely faster than Siri in my experience. Speed is so important when it comes to these virtual assistants. I sincerely believe Siri is the worst of the bunch, which really stinks because I expected a lot more from a 900$ iPhone.

Siri is entirely in The Cloud™ so your phone doesn't matter much, aside from the speed of its internet connection.

Have you tried it on iOS 8 since it came out? They finally added streaming voice recognition, which speeds it up a lot in my experience. Previously, it waited until you were done speaking to initiate the upload, which added a pretty big delay to any response.

Holy cow, they JUST added streaming voice recognition? I haven't really followed Siri's progress since the early days but I thought that was table stakes, sheesh. I can't even imagine a voice service being usable in 2014 without it. Am I might just be completely out-of-touch with the current landscape for voice services, but Cortana isn't similarly hobbled, right?

Yeah, it was a pretty glaring omission. You'd think it would be there from the start. I don't know about the others but I can't imagine they're not streaming as well.

That's really interesting to me, because my daughter (6yo) asks Siri that question about 10 times a day on the 5S and the answer comes back in about a second. Whether we're on LTE or wifi. I wonder if there's other factors at play.

I believe Cortana would get that info back in less than a second. It seems Siri takes more than a second for me just to activate, before it even begins listening... Maybe this is an iOS 8 issue for the time being...?

There were not random queries. In fact, they were picked because we felt they were likely to trigger a knowledge panel.

For a good believable test your sampling methodology is very important. In fact, you want to have a random sample that is free of biases instead of somebody cherry picking queries. Perhaps you may do random weighted sample of queries with weights = frequency which represents usage pattern more closely. In any case, it's very important to describe your sampling methodology or otherwise this kind of testing has little value.

Out of curiosity, how would you propose taking a random sample of all possible questions?

If you want to do this kind of test with more discipline, you would give out random phones to randomly selected N people who have never used any of these services before. Then log each query they do for a week or two. Afterwards you can run same queries through all 3 services, generate results and do human judgement on mechanical turk about which one is the best. The science of measuring these stuff is complex and I've omitted many complexities here, for example, you want to do multiple judgement for same query, you need someway to measure judgement quality itself, you need great judgement guidelines that covers edge cases etc. Usually in organizations that work with big data, you will find competent measurement team building tools for these kind of measurements with years of investment.

I don't think it's possible, people on different platforms issue different queries partially due to marketing and insiticts on what works and what doesn't.

Also commands that already sort of work lead to more queries issues lead to better quality.

Even though the study (albeit informal) did not include personal assistant functionality, that's where Google Now (as manifested in an Android phone connected to a google account) really shines, in that like a real personal assistant, it brings you relevant information before you even ask. Around this time last year, I used to be startled by a timely reminder on the home screen widget about the next thing I was about to do (reminder to leave now for a meeting across town scheduled for a half hour from now, or automagic notification about a flight cancellation etc). But by now, I'm so used to this functionality that I feel slightly annoyed if I actually have to ask for such information (which is almost always a voice query to my phone at this point).

I used to be a heavy Android user until I bought my first Windows Phone about six months ago. So far, Cortana has been one of the biggest[1] reasons I haven't switched back to Android, despite what feels like a lack of commitment and focus from Microsoft[2].

Her location-based and people-based reminders have been a killer feature for me. I can open Cortana and say "Next time I talk to my mother, remind me to ask her for her cheesey corn recipe". Then, next time I open the messaging app to text my mother, or open the email app to email her, or when the phone's GPS realizes I'm over at her house, Cortana will show that reminder[3].

It also works for location-based reminders: "Next time I'm at the hardware store, remind me to buy softener salt". It's been about six months since I've last used Android, but I don't think Google Now could do these types of reminders when I switched.

[1]. Two other reasons: I much prefer "Live Tiles" to widgets or app buttons, and the Lumia line's cameras blow most other phones out of the water.

[2]. According to the /r/WindowsPhone subreddit, Microsoft tends to update their apps on Android and iOS long before they update them on Windows Phone. Additionally, apps on the other two platforms tend to be more feature-complete. This is all anecdotal though, so take it with a grain of salt.

[3]. http://imgur.com/DBQ9sjB http://imgur.com/fll3uCi

Google Now does do location-based reminders. I've never tried the "Next time I'm talking to my mom" thing, though.

I'd be interested to know if Google Now can do people reminders. Unfortunately I sold my last Android device so I'm unable to test it myself, but a quick search brings up this article[1] which suggests that it doesn't exist yet. Either way, I'm sure it's only a matter of time until both Siri and Google Now can do it too (which is a good thing, I've found it incredibly useful).

[1]. https://support.google.com/websearch/answer/3122344?hl=en

It can't do reminders when you are around people yet but there have been clues[1] that it's coming found when poking inside the Google Search app. It's expected to make use of the leaked Google Nearby[2] which will "...periodically turn on the mic, Wi-Fi, Bluetooth, and similar features..." to aid with proximity.

1. http://www.androidpolice.com/2014/03/17/rumor-remind-me-when... 2. http://www.androidpolice.com/2014/06/06/exclusive-google-wil...

For the record, Siri also does location-based reminders and they sync across all of your devices (e.g. "When I get home, remind me to take fish oil").

Thanks for clearing that up, I've never used Siri so I'm not qualified to talk about Apple's personal assistant features.

Agreed. I've always been pleasantly surprised by things that Google Now has brought to my attention.

Some examples of things I've not done anything to explictly be told about, but have been relevant when shown:

- Directions to a restaurant I'm about to go to

- Flight delayed notification

- Traffic time abnormalities going to/from work

- Package shipping updates

- Sports scores for teams I'm interested in

Having used a Nexus 5 for a year, I strongly disagree. Pulling stuff from your gmail is a cute trick, but it never happened reliably enough where I would actually trust it. Especially when you're not otherwise bought into the Google ecosystem. So with either, I'm in the mode of dictating things manually. And Siri shines at that. At least for me, voice recognition on my 6+ is much faster than on my Nexus 5. It always feels like Google Now does an internet query even though I've got the local recognition enabled. And Siri is much better at following the context of what you're saying to it.

"Pulling stuff from your gmail is a cute trick" ...

It has proven to be an incredibly powerful feature at least for me. I could repeat the list of things others have listed (e.g. https://news.ycombinator.com/item?id=8435239 ) but why be redundant.

"Especially when you're not otherwise bought into the Google ecosystem"

Well yeah. I have to admit that at work I'm a heavy user of google calendar and email and in personal life maps, navigation, email, calendar and google search. Honestly, that's what I see most iPhone users around me use too. The point here is that if google cloud services are already in your life (as they are in 100's of millions of people's lives), Google Now on Android puts a nice face to it and brings them together in a very user-friendly and intuitive way.

"I'm in the mode of dictating things manually. And Siri shines at that"

Perhaps you didn't read the original posting that we're discussing here. It shows how far Siri has to go to catch up with Google's Now's voice search.

I don't see the utility, honestly. The stuff I need to remember is in my company's Exchange server, not in my gmail. All I get in Google Now is reminders of people's birthdays who I don't care about.

Who owns the key patents for voice recognition - AT&T, Nuance, IBM? When will they be expire and be available for use in open-source speech recognition?

Edit: some data at http://www.quora.com/Is-anyone-working-on-an-open-source-ver... & https://news.ycombinator.com/item?id=4987875

Actually I think most of the work has been done in academia --- certainly that's where the recent deep learning stuff has come from. So, I don't think the important stuff is patented. (In general it's very hard to lock down ML improvements under patent. Once we can do it one way, and understand a little bit about what's working, we can usually replicate that performance with another technique.)

The big problem for open source speech recognition is training data.

First, these are two different problems to solve. Voice recognition and deep learning are different fields.

Is training really the issue for voice recognition? It has been a problem that has almost been solved for over a decade. Last year I saw this impressive use of Dragon Naturally Speaking for the PC, running in a VM on a Mac, that pretty much worked to code by voice.


The developer mentioned that he didn't have any luck with Sphinx.

Xah Lee summarized the talk here: http://ergoemacs.org/emacs/using_voice_to_code.html

...Of course deep learning is not voice recognition. But the most recent advances in speech recognition have been from deep learning models, which have come from academia.

The system in the video you link is single speaker, closed vocabulary. You need massive training data for multi-speaker, open vocabulary.

I've tried to use sphinx, but the problem was lack of training data (you have to supply it yourself pretty much!). It did have some data that was supposed to recognise numbers, but it didn't work (I mean, it ran, but the recognition was awful even when it only had to pick between 10 options).

Training is a huge issue for voice recognition. It's the only way Google and Apple have managed to take voice recognition from "works 80% of the time, but that is still bad enough to be totally usable" to "this actually works!". Maybe you don't remember how bad voice recognition was 10 years ago.

To give you an idea how important it is, on OSX you have the option to download data to improve offline voice recognition. It's something like 500 MB. And that's the result of the training.

I think there may be some confusion as to what "training" means. When it comes to voice recognition, it makes me (and I suspect others) think of the older software which required a user to read a bunch of text to it to train the software to your specific voice before it could do any kind of decent job understanding you. Now, everything is speaker-agnostic and works out of the box for anybody. Different kind of training.

Thanks for that pointer to the Python library which integrates with Dragon/Nuance to enable arbitrary commands, https://pypi.python.org/pypi/dragonfly/

This video is not going to convince anyone to use voice recognition.

> The big problem for open source speech recognition is training data.

What exactly is needed for training - audio recordings with transcripts, human validation of recognized text?

There are successful crowdsourced efforts for proofreading of OCR'ed text. Archive.org could host a CC-licensed archive of sound & transcripts.

Recognition of the human voice is almost like writing, hopefully everyone could have access.

Edit: how much disk space would be needed - TB or PB?

For example, the Switchboard corpus (300h, 8khz, transcribed audio) is about 16GB.

That is a common size for LVCSR, and you need something around that area to get good performance (maybe minimum 100h). In academic papers by Google, they usually use their own private training data set, with e.g. 1900h. (E.g.: http://arxiv.org/pdf/1402.1128.pdf)

Some crowdsourced effort to collect transcribed audio under a CC-licence would be great!

Maybe this? http://www.voxforge.org/home - "VoxForge was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac)." (caveat: I have not recorded on this from (any) of my machines - I don't have the right plugin apparently)

Maybe also: https://librivox.org - has audiobooks read by volunteers, plus the book text.

The more data the better although the relationship isn't linear.

One state-of-the-art framework is Kaldi, which is Open Source: http://kaldi.sourceforge.net/

You can even download trained models: http://kaldi-asr.org/

It supports many state-of-the-art methods, like DNNs, sequence training, etc. So you can get quite good results with it. To train it yourself, of course you need some good training data from somewhere.

A patent has a life of 19 years and after that companies use to revive a patent to block the competition. I dont think IBM or AT&T or any company will lose hold on any of their patents

Isn't there a 5-year limit on the extension?

A maximum of 5 years can be restored to the patent.

In all cases, the total patent life for the product with the patent extension cannot exceed 14 years from the product’s approval date, or in other words, 14 years of potential marketing time. I

f the patent life of the product after approval has 14 or more years, the product would not be eligible for patent extension.

all regulatory periods are divided into a testing phase and an agency approval phase. The regulatory review period that occurs after the patent to be extended was issued is eligible to be counted towards the following calculation:

First, each phase of the regulatory review period is reduced by any time that the applicant did act not act with due diligence during that phase. The reduction in time would only occur after an FDA finding that the company did not act with due diligence.

Second, after any such reduction, one-half of the time remaining in the testing phase would be added to the time remaining in the approval phase to comprise the total period eligible for extension.

Third, all of the eligible period can be counted unless to do so would result in a total remaining patent term from the date of approval of a marketing application of more than fourteen years. An additional limitation on the period of extension is that the extension cannot exceed five years. For example, if an approved drug product which is eligible for the maximum of five years of extension had ten years of original patent term left at the end of its regulatory review period, then only four of the five years could be counted towards extension. The Patent Trademark Office is responsible for determining the period of extension.

you can read more here - http://www.fda.gov/Drugs/DevelopmentApprovalProcess/SmallBus...

This just remind me that patent terms are too long ... but perhaps save that for all the other threads complaining about the IP systems across the world.

Move along, nothing to see here.

Some query patterns I use in Google that you might be interested in

    Translate {words} to {language}
    My flights
    My schedule
    My packages
    What time is at {city}?
    How tall/old/heavy is {important person}?
    {description of a photos} in my photos (like 'ocean in my photos')
    Note: if you have G+ photos. description doesn't need to be typed in the photo meta info
    compare {food} and {food} (nutritions)

> How tall/old/heavy is {important person}?


My love affair with Google Now started with exactly this query. Me and my friends were debating over Hina Rabbani Khar's height[1].

[1] https://www.google.co.in/search?q=How+tall+is+Hina+Rabbani+K...

time in {city|country}

123 {currency} in {currency}

alarm for 5 minutes

"Wake me at 6 AM"

"Remind me to check my air filter when I get home"

Oh awesome, let's see... how many atoms are there in the universe?

"The number of atoms in the entire observable universe is estimated to be within the range of 1078 to 1082."

Thank you Google! http://www.google.com/search?q=how+many+atoms+are+in+the+uni...]

Try it on Wolfram Alpha.

Actually, they all seem to me pretty limited compared to the answers you get from Wolfram Alpha.

Also c.f. the responses to one of their test questions, "how much is a quarter cup of butter?". Google makes fun of the inquiry. Wolfram Alpha gives you a thorough nutritional profile, and links to variations based on international cup sizing and different types of butter.

Wolfram Alpha is amazing at discerning the intention of the question.

In the article, the question "How old is the Lincoln Tunnel" struck me as incorrectly formatted for the parser (I know, that's the point), so I asked Siri, "When was the Lincoln Tunnel built." The Wikipedia article on the Lincoln Tunnel was returned. Wolfram Alpha was listed under other sources, so I chose that. The response? "1937"

I've noticed lately that many times things I know Alpha will slam-dunk don't get routed to it by Siri. I'm not aware of the details of the deal we have with them but from my observations of Siri it looks like Apple might be looking for certain keywords (such as how, what, why, etc) before it tries routing anything to Alpha. I hope they can relax that in future.

Luckily if you say "Wolfram XXX" instead of just "XXX" Siri will route your question straight to Alpha no-questions-asked.

Google isn't making fun of the inquiry, it simply bring up the relevant snippet from the provided web page.

The result Google also gives you is wrong (since its also a just a snippet).

I'm not sure the query is really easily understandable. How much it costs? I assume it's to figure out which marking to cut on the stick wrapper?

I just learned that Canadian cups != American cups. (227g vs 218g) Who would have thought....

Wait a second. Cups are a measure of volume and not weight, though. Grams is not the right unit to compare here.

There are approximate conversions for recipes, since many American recipes use volume measures and expect you to have measuring cups, while European recipes expect you to have a kitchen scale. But yes, there isn't any single conversion, since density varies: there's one cups/grams ratio for granulated sugar, one for powdered sugar, one for sifted flour, one for water, etc.

Which, as a geek, I find superconfusing and generally insane. I wish cooking was treated as chemistry (which it de facto is) and at least used precise units and proper measuring tools.

Outside of baking, you really don't need to be that precise. Bakers typically weigh their ingredients to get the correct measurements.

Siri is based on Wolfram Alpha

If you look at what the answer is linking to (or in the snippet of the first result), it seems like that's just a bad parse. The answer should be "... the range of 10^78 to 10^82".

Siri gets this right: http://i.imgur.com/ueKduWW.jpg

It's also worth noting that Siri gives more verbose answers in "Hey Siri" mode, presumably because it assumes you're not looking at the screen.

Are you referring to the effect of the 'Voice Feedback' always/handsfree setting? I believe you can get the "more verbose" mode by just setting it to 'always' http://drop.petec.io/151n0

10^78 and 10^82

Just a formatting issue.

Formatting and pronunciation issues are a major stumbling block for these interfaces, though. If the device (or service behind it) can't identify "ten to the power of" when it sees 10^, I can't rely on being able to converse with my device for significant problem sets.

Solving it isn't really easy, either. When reading a document is a caret a power, a regular expression operator, an exclusive or, or a nose on a smiley? :^)

A lot of context identification work has to be sorted out for smart agents to succeed.

Google doesn't seem to understand the sup element.

I imagine that they kill all html to prevent xss and styling issues (they want to decide which text to bold, for instance.)

It seemed to me that Cortana returned the most correct answers without any extra fluff. The google results seemed to have the same accuracy but the phone said so much extra information in that monotonic voice that the clarity of the result was lost in the noise.

It makes sense that it would answer the question first and then give relevant info. Most of the factual questions that I ask would be related to a broader context.

As an Android user, I've found that Siri tends to be generally better than Google Now. I don't usually ask my phone for trivia, but rather to do something. Most of the time I'll get routed to a search query when I'm looking to command the phone instead.

This is somewhat frustrating if you don't know the magic incantation to make Google Now do what you are asking it to do. Siri's engineers have done a better job anticipating the various forms of the commands and handling nearly all of them.

I am an Android phone user, past iPhone user and current iPad user and my experience has been the exact opposite. Google Now always understands me better than Siri.

When Siri fails for me, I sometimes ask my 7 year old to talk the same thing to Siri and she gets better results. My daughter has a more 'American' accent than me so I have concluded that Android is better at hearing through accents than iOS.

(I haven't yet read the original article but wanted to quickly comment since our observations are completely opposite)

EDIT - iOS tablet user = iPad user.

A while back I created a toy IRC bot that answers questions like this just by searching reddit and taking the top comment. It works surprisingly well for questions that are likely to have been asked before. And when it does work, the result is much better than a dry wikipedia excerpt.

I then added some simple machine learning to filter search results for the most relevant threads which improved it quite a bit.

This is called the "Take the First" heuristic. If you want to learn more about "fast and frugal" heuristics (which work surprisingly well in a lot of cases), read "Simple Heuristics that Make Us Smart". It explains how to get programs to give good answers when time and processing power are limited (similar to your case).


Interesting! Care to share some sample question that is likely to be answered by reddit? I tried it manually and can't find answers to any question I can think of right now (looks like I could be failing a Turing test any time soon)

You can try it at https://kiwiirc.com/client/snoonet.org##bottest, if it's online just do "!ask question". Examples:

>!ask what is the largest prime number?

For primes of the form 2^n - 1 (known as [Mersenne primes](http://en.wikipedia.org/wiki/Mersenne_prime)), a very fast primality test known as the [Lucas-Lehmer test](http://en.wikipedia.org/wiki/Lucas%E2%80%93Lehmer_primality_...) is available. The ten largest currently known prime numbers are all Mersenne primes.

>!ask what's the airspeed velocity of an unladen swallow?

African or European?

> !ask tell me a joke

I was eating chicken tonight last night, and got a little bone in one of the fillets. So I said "Fillet? More like fill-it with bones! Haha!" Then I looked around at the empty table and quietly sobbed into my bowl.

That might actually make a new kind of search engine. You should put it on the web?

That would be huge--after all these years a Reddit search engine that works!

It's currently running on a few IRC channels on snoonet. People seem to have lost interest in it though.

As I said, it only works for questions that are likely to have been posted on reddit before. Not a general search engine.

Sounds interesting. Do you mind sharing how did you trained your machine learning algorithm ? It seems like an awfully hard task, since in a sense you are competing with the Google algorithm and the order it offered you.

The reddit search engine isn't very good. It's just logistic regression with a number of features, like the percent of 1, 2, and 3-gram matches, it's score, number of comments, and if it's a selfpost.

All things considered, no surprises here. Extreme majority of Google's profits depend on how well their search works. Same is not true for Apple with siri, or microsoft with Bing.

I think these results were slightly biased because questions seemed to have been chosen to trigger Google search infobox results.

BTW, I worked with Knowledge Graph last year when I consulted at Google. It is an incredibly nice project. The team, who helped me when I needed help, was great - constantly improving the platform.

I am also using the IBM Watson APIs right now while helping another customer, so I feel like I am getting a broad view of what is available.

I expect that Knowledge Box, Cordova, Siri, IBM Watson, etc. are all going to get much, much better in the coming years and will change the way most people use computing devices. Exciting times!

How does Watson compares to the Google technology in your opinion ?

I think they are very different.

Knowledge Graph builds on linked data and semantic web technologies to encode knowledge that is served on an efficient scalable platform.

IBM Watson is a system for ingesting large amounts of text and for then allowing natural language queries on the information in the text.

Both are valuable properties.


One of the killer features of Cortana is that she gets back to you if she wants additional information. You can almost have something like a dialogue. I don't think that test has taken that into account at all.

When you're looking for the answer to something quickly why would you want to have a back and forth game vs. just getting the answer immediately?

Questions are often ambiguous.

"What's the population of New York?"

Google Now tells me the population of New York city, which is probably what most people want to know. If I wanted to know New York state I ask it again what is the population of New York state and it tells me. This is far better than every single query for what's the population of of New York asking to clarify if you mean city or state IMHO.

The second question doesn't even have to be "What is the population of New York State" – a follow-up of just "How about the State" / "I meant the State" is transformed on-screen into "What is the population of New York State" (despite "New York" and "population" not being mentioned the second time). Similarly "where's the nearest Chili's" followed by "phone them", for example.

Siri did that from day one.

I think this analysis is kind of pointless. Who asks their phone trivia questions? You want directions, reminders, etc. Siri seems a lot smarter about following spoken context in that situation.

I can't speak extensively for Siri (infrequent use on my iPad) or Google Now (no Android), but I've never had to repeat an instruction to Cortana for a reminder or directions. Not repeating myself when giving voice instruction is essentially my holy grail.

I haven't tried cortana. Google now is really awful at being a personal assistant because it doesn't keep much context.

I haven't used Siri in ages; what kind of context does it keep? I know that Google Voice Search will do coreference across consecutive queries (i.e. resolving "it" and "him" accurately).

Just tried this:

Me: "Siri, find me a target."

<finds several targets>

Me: "Directions."

Siri: "Which target?"

Me: "Third one."

<gives directions to third one in list>


Me: "Siri, find me restaurants."

<lists restaurants>

Me: "Review for Blue Duck Tavern"

<lists review>

Me: "Other restaurants"

<lists restaurants>

Me: "Reservation for Founding Farmers."

Google Now will do some coreferences, but Siri is almost modal. You can talk to it like an assistant instead of trying to formulate everything as a search query. I had a Nexus 5 almost a year before getting my 6+, and I was always jealous of how much more practically useful Siri was on my wife's iPhone 5.

Just tried this on my Moto X:

Me: "Find me a target"

<nearest target shows up>

Me: "Directions"

<changes to "directions to target">

<shows a list of targets to select>

Me: "first one"

<chooses Portland Galleria Target>

That is, indeed, fantastic. Does it hold context? I mean, can you say "Find a gas station." and then "Find a McDonald's near that" and have it work? That would be super.

Strongly agree. Isn't this testing the "knowledge box", voice recognition, and NLP at once? It seems like the bulk of this could have been done in a desktop web browser.

I would rank Cortana as the best and Siri as the worst.

Slightly OT, but in school I remember a teach talking about how good voice to text has gotten but was stuck, stating it was roughly about a 70% accuracy, but that we were pretty much at a wall there and haven't moved much in the last decade.

Has there been any major advances in voice recognition other than just growing your speech corpus?

I've actually taken to dictating most of my texts on Android. It's faster for me than typing or swype-style keyboards and it's really very accurate until I need to use a strange proper name it doesn't understand. I'd say it easily gets about 90% of what I say, and it figures out the correct context for words like "there" "their" "they're" and "to" and "too" so far completely correct.

edit thought I'd add this. I just had a conversation with a united agent and they probably understood less than 70% of what I was saying. it was beyond frustrating I wish that I was actually talking to my phone instead.

How do you write text messages in private and not have your messages overheard? I think I would be quite self concious that I'm talking into my phone and not directing it at anyone (I rarely use Siri for this reason)

Yea I'm with you. I do voice searches/typing CONSTANTLY when I'm at home, particularly on weekend mornings when I'm checking my schedule/weather/texting friends to organize my activities for the day while getting ready. Barring the occasional query when I'm on the sidewalk and not too near anyone, I don't really use them outside much.

well, I'll type it then, of course. But I'm usually locked away working in an office or working at home most of the time and not around too many people.

The biggest improvement in the past few years was deep neural networks. These were behind Google Brain and Microsoft's Adam:



I think it's been mostly throwing hardware at the problem. Server hardware, backed by terabytes of context data.

I remember back in the day, most of the errors were the exact same errors that a human would make. Even you and I only really hear 95% or so of the words someone says. But we can fill in the rest from the context. It always amazes me when I dictate some sentence to my phone, and at the end I see one of the words change to another that sounds almost identical but makes much more sense in the entire context of the sentence.

Not to mention mouth shapes.


Deep neural networks have helped a lot in improving accuracy (at least in the last few years).

Just machine learning algorithms and more data.

Side note; It's frusterating how long it takes Siri to launch. sometimes I feel like I'm holding down the home button for 5 seconds before Siri comes up.

And usually I'm trying to decide between a one sentence typing task or asking Siri to do it. The five second wait really throws off my time "profit margin".

As a huge halo fan, it makes me happy that Cortana is slowly becoming a well known name =). Having said that, I'd really like to give her a try, I would hope that it would exceed google now on my moto x which for the most part has been very useful while driving.

I recently bought an iPhone 6, and have really started using Siri for the first time. And Siri has actually surprised me in a good way at how well she understands what I say. I don't do anything too crazy with her. Set alarm at x time, remind me to do x when I get home, Call so and so, send text message to x.

I work on a UX team of three. One of our guys has a MotoX, One guy has a Windows Phone, I have an iPhone 6, and based off of what happens at work it seems that Google Now and Siri are slightly more functional.

This is from merely observations, but it seems like Google Now is faster than Siri. And Siri is better at accurately hearing/understanding the words you say.

In regards to understanding voice input, I have been thoroughly impressed with Google's ability to do this ever since I started using their 1-800-GOOG-411 service. I didn't get my first smartphone with data until 2009 (Droid), so the service was really handy for me before that. I'm sure it also gave Google a ton of data to help them improve their capabilities (even mentions that in the Wikipedia article [1]).


The article concludes that google is dramatically ahead, which is no suprise

but I AM suprised, anecdotally, how good Siri is vs previous iterations on my iphone 6 and OS 8

It feels close to "good enough" for the majority of functions I actually use eg: dictate an email, get directions, lookup a contact, and dial a phone number.

My sense is that Apple will nail the base functions so that most users, self included, won't notice a difference between the two.

50% is good enough if its done by Apple. 90% not good enough if its not done by Apple.

This is how I interpret most responses when people talk about Apple.

Well, as accurate as I feel your statement is, I'd also say sometimes it depends on the 10% missing from the products not done by Apple. If it's an important 10% that Apple has nailed (or close enough) then sometimes the 50% is more valuable than the 90%.

But we all have our biases as well. (Even as an Apple user I fully expect Google to always win the Knowledge Vault race. Its in their wheelhouse more so than Apple's.)

"Well, as accurate as I feel your statement is, I'd also say sometimes it depends on the 10% missing from the products not done by Apple. If it's an important 10% that Apple has nailed (or close enough) then sometimes the 50% is more valuable than the 90%"

Heh, you not only proved kumarm's point about apple fanboi'sm but served the proof on a silver platter with a little side of dessert.

I believe that is fanboi'sm in general, but if you feel better you about yourself you can keep thinking it is just apple users.

> Its in their wheelhouse more so than Apple's.

This is exactly why I'm shocked that Cortana's accuracy is so much lower than Siri's (under the assumption that this test is reasonably valid). They've got a world-class research arm and they've run a search engine for yeaaars, and somehow they dramatically underperform a company whose biggest fans would even admit has a spotty record when it comes to services. I guess the "time on the market" advantage is a lot more dramatic than I would've thought.

You may have missed this sentence:

> In addition, this was a straight up knowledge box comparison, not a personal assistant comparison. For purposes of this study, a “knowledge box” or “knowledge panel” is defined as content in the search results that attempts to directly answer a question asked in a search query.

I would like to see a comparison of the personal assistant functionalities that you are speaking of. This study did not cover them at all. I feel like they are a much more powerful use of these types of technology rather than just searching the internet for some factual data. I would not be surprised if the tables were completely flopped on the personal assistant test, with cortana on top and google now on bottom (based purely on second-hand knowledge of cortana).

Thank you. Good luck for us iPhone users that we won't need to buy a new phone :-).

In terms of 80 % functionality I think all three are quite close. In terms of general features I think Android is already lightyears ahead. But I need a solid day-to-day phone with consistent usablitity concepts and I'm very tempted to give WP a try as it seems to be the cleanest here. Let's see what WP 10 brings ...

Anecdotal I know, but it's strange that so many responses in this thread find Siri more useful than Google. I use an iPad and a Galaxy S3, and when I ask Siri questions, more of than not, she fails to give me an answer or provide meaningful search results. I can ask Google to find me any retail business locally and get proper results, call them with voice commands, etc. Siri chokes on a lot of those requests.

Someone submitted this and deleted it a bit before this submission.


Anyway, there I was trying to find out which speech recognition software was more accurate. How do these compare to Dragon?

It seems like we're quite close to being able to actually using voice dictation without the frustration.

I use Google's voice dictation frequently for text and email, because my fat fingers don't like onscreen keyboards. It often gets the dictation of an entire paragraph totally right. It's impressive.

How do such personal agents work? I mean after the speech recognition and NLP (part-of-speech tagging, named-entity recognition, etc):

Do they use template databases for different topics? (like afaik WolframAlpha) Do they use (scientific) ontologies or are the templates more flat and stored in a SQL database? [and web search results as fallback]

> Do they use template databases for different topics? (like afaik WolframAlpha)

What do you mean by a template database?

NLP sentence templates:

"How tall [object]?" / "[object] height"

"Becoming a [job]"

"How much is a [unit] of [object]?"

"How old is [object]"

"When is the next [event]"

My question is: Do Siri/Cortana/GoogleNow use such flat templates (per topic) or do the use an IR ontology? http://en.wikipedia.org/wiki/Ontology_(information_science)

We don't use flat templates in Alpha. We use a full context-free grammar that turns the question into a symbolic representation of what the user wants, which can naturally involve chaining. E.g. "when was the president of the US was born".

Then we execute that symbolic representation using a variety of strategies. There are funnily enough quite a few cases where we can understand the question but don't have the curated algorithms to actually answer it.

I havent used Cortana but in my experiences Siri has been much more accurate in guessing what I said. Google now on my S4 is horrible when I try to use it. Most of the time I have to talk like a robot to get something right.

That being said I really like Google Now but the accuracy of speech to text is what killed it for me.

That's interesting. What accent do you have? I am consistently surprised at how well my S3 reads my voice. It can get it even when I am tired and half-pronouncing words or even singing them (at least with "What does the fox say"). But I also generally use it at my apartment (very quiet) and I have a pretty standard American accent.

My accent is "standard American" if you met me you wouldnt be able to tell where I was from unless I said certain words. I'm from the South but by no means do I have any sort of Southern drawl.

I use it frequently in the car so maybe the background noise is affecting it.

ibm's Watson would knock em' out instantly if it were incorporated in a mobile platform https://twitter.com/allanbritto_/status/520053077110300672

I just wish there was some way to have this sort of functionality, without handing over all that data to various companies who in turn use it in unpredictable ways so that they can monetize the service.

IBM should have an Ask Watson app. From what I've read it seems like Watson would run circles around all three of them.

edit: And #12 on the front page right now appears to be essentially just that.

Can someone tell me which Android phone they used in the video?

There's no Android phone, Google Now is running on an iPhone 5(s).

Jane > all three

My test for when these are truly advanced:

"What's going to be on my ballot?"

Can Watson even answer this?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact