
How Siri Works - J3L2404
http://www.jeffwofford.com/?p=817
======
colinhevans
I worked at SRI on the CALO project, and built prototypes of the system that
was spun off into SIRI. The system uses a simple semantic task model to map
language to actions. There is no deep parsing - the model does simple keyword
matching and slot filling, and it turns out that with some clever engineering,
this is enough to make a very compelling system. It is great to see it launch
as a built-in feature on the iPhone.

The NLP approach is based on work at Dejima, an NLP startup:
[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.70.5...](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.70.5385)

A lot of the work is grounded in Adam Cheyer's (CTO of SIRI) work on the Open
Agent Architecture: <http://www.ai.sri.com/~oaa/>

A more recent publication from Adam and Didier Guzzoni on the Active
architecture, which is probably the closest you'll come to a public
explanation of how SIRI works:
[https://www.aaai.org/Papers/Symposia/Spring/2007/SS-07-04/SS...](https://www.aaai.org/Papers/Symposia/Spring/2007/SS-07-04/SS07-04-009.pdf)

~~~
Caligula
SRI is a great organization. They make the best language modeling tool srilm
that is available to developers <http://www.speech.sri.com/projects/srilm/>

I am curious if, and what its role in SIRI was.?

------
tansey
_> Virtually everything we call “AI” today is either a theatrical display of
essentially scripted behavior (that’s how most game AI works), a massive
database (such as Google Suggestions and expert systems) or a vague and
decidedly unintelligent jumble of neural networks and genetic algorithms._

Okay, I can understand the first two being less attractive, but evolutionary
neural networks-- really? I mean, what more do you want than an artificial
brain created through simulated evolution? What characteristics would satisfy
people like this that something is "AI" [1]? It seems a bit like the old
saying that once it's discovered, it's no longer AI.

[1] It's worth noting that I'm strongly biased since I work in the Neural
Network Research Group at UT Austin.

~~~
tel
I rather dislike ANNs. I recognize that ANNs _perform_ well and find research
to improve their performance interesting, but honestly they're completely
opaque black boxes with structure that only plausibly mimics the absolute
simplest of biological neural networks.

I want a "brain" which was built from principles that allow its workings to be
intelligible and efficient based on a technique which reflects the structure
of the problem of learning, generalization, and hypothesis search and
illuminates it.

So I'll buy (and often use) your ANNs, but I can't help but feel a little bit
worried about the state of affairs. It's probably a whole lot like that
automatically enumerated proof of the 4-color theorem. It's definitely
important, but there is something unsatisfying about it. We aren't learning as
much as we felt entitled to because the destination was only a small part of
the journey.

~~~
tansey
_> they're completely opaque black boxes with structure that only plausibly
mimics the absolute simplest of biological neural networks_

I guess this depends on how advanced your toolkit is. If you're simply using
feedforward nets with fixed weights, fixed topology, and sigmoid activation,
then yes that's probably right.

However, there are a lot of advanced techniques in ANNs that most people never
use. Lots of more biologically-inspired approaches have been explored, such as
leaky integrator neurons, Hebbian learning, and indirect encodings.

 _> I want a "brain" which was built from principles_

See, but the problem is that we don't even have the knowledge of how _real_
brains work to start forming such principals. In fact, we're starting to see
computational models used to form principals of biological brains (for
example, [1]).

 _> that allow its workings to be intelligible_

And what does that mean? You want to be able to discern some rule base from
the system that you can understand? Even biological brains do not have that
property.

 _> based on a technique which reflects the structure of the problem of
learning, generalization, and hypothesis search and illuminates it._

This is very vague. Happy to dive into it if you are willing to expound a bit.

 _> It's probably a whole lot like that automatically enumerated proof of the
4-color theorem. It's definitely important, but there is something
unsatisfying about it. We aren't learning as much as we felt entitled to
because the destination was only a small part of the journey._

Again, very vague. Do you think having children is unsatisfying because you
can't understand their actions most of the time? Maybe I'm misunderstanding
the issue here.

Seriously though, I want to know what will impress people here. It's not going
to be a fully general, "human-level" AI, obviously, as that requires huge
amounts of semantic knowledge about the world that was encoded through
billions of years of evolution, but that to me is not really a necessary
condition for AI.

[1] <http://nn.cs.utexas.edu/?hoffman:biopsych11>

~~~
RobertKohr
Thank you for your solid reply. I think ANN is the real thing, and most of the
other "AI" solutions that we have give the impression of intelligence, but are
simply algorithms that solve problems we have in very well defined ways.

ANN is a general intelligence that gets shaped by experience and adapts figure
out its own solutions. This is the path that will lead to our concept of AI
presented in sci-fi and what the general public actually thinks of. It isn't
some coder banging on the keyboard trying to replicate what a human would do
under a specific situation, it is something that truly learns.

We have a long way to go before this becomes a reality, but it will happen.

~~~
tel
That's very romantic, but I think instead we're going to learn that
intelligence is not such a well-defined idea as to be engineered into anything
imagined by science fiction.

Further, there are plenty of techniques which aren't ANNs but still are
"shaped by experience" and "figure out their own solutions".

I side pretty strongly with the idea of building tools to bring the powerful
modes of computerized learning close to the powerful modes of human learning.
I think we're inordinately far away from replicating one with the other.

------
ErrantX
OK, so, just got me a shiny new 4S and have been playing with Siri.

It quickly became apparent that it is "limited" to this list - but that's OK.
It does it very well - it recognised all of my family members and workmates
with 95% (or more) accuracy, and without any training.

OK so it is a modest start - but a quality one, and the options are useful.

The thing that _blows me away_ though is the speech to text. It's very
accurate, as good as my 2 year old trained Dragon Naturally Speaking install.
Sadly this is texts only at the moment, but I imagine we will see it in email
eventually (hope so anyway).

Dictating texts is great; I just sent a few instead of emails because it is so
frickin quick :)

Quality product.

(as to how it knows about stuff like "my wife" - it asks you first time.)

EDIT: I've just found out I can dictate email with it... awesome! but the icon
seems to be odd in whether it appears or not (if not going through siri
direct. Clicking reply on an existing email doesn't seem to give me the
option.

------
mortenjorck
This is a good start, but it's all analysis from the outside looking in. The
author is probably right about the overall structure, but there are already a
few things that have come out since he posted this (it's dated last week,
probably right before Steve Jobs' passing): Relationships, such as "my wife"
are not actually something you type into new fields in the address book. You
actually tell Siri yourself, in what appears to be a sort of natural-language
@define statement.

Once a much larger audience has been playing with Siri for a few weeks, we
should start to get a much clearer picture.

------
thoradam
Every time someone analyzes an AI system, they invariably conclude that it
isn't really AI, but rather just a complicated system of different, strung-
together technologies.

At what point do systems of sophisticated text-to-speech and grammar analysis
technologies actually become AI?

~~~
bitwize
When they can feel emotions.

The one place where AI actually is a term of art -- gaming -- is also uniquely
distinguished in that for agents both human- and computer-controlled, the
goals are definite and the number of choices of action limited.

Out here in the real world, things are a lot more fuzzy and complicated.
Accordingly we are not willing to give a bot AI status until it can
demonstrate competence at "real world" goals (presumably including having
relationships with other sapient beings in the world).

~~~
MrScruff
What do emotions have to do with intelligence?

------
rocha
Some background on Siri:

"In 2007, SRI [1] spun off Siri, Inc. Siri was born from SRI's work on the
DARPA-funded CALO [2] project, described by SRI as the largest artificial
intelligence project ever launched. Siri was acquired by Apple in 2010."

[1] <http://en.wikipedia.org/wiki/SRI_International>

[2] <http://en.wikipedia.org/wiki/CALO>

------
fleitz
Bullshit. AI works everyday and VERY well. AI does a fantastic job tuning SQL
Queries. A* does a fantastic job finding reasonable routes between
destinations. AI finds optimizations to massively speed up software. AI lays
out the very circuitry of most of the ICs in your computer. AI routes the
packets between you and HN. It seems that as soon as something in AI works it
becomes not AI.

I guarantee you that when we find out _exactly_ how a neuron and neural
networks work the human brain will be equally unimpressive. Each person's
brain is the product of an uninterrupted chain of evolution stretching back 4
billion years. The field of AI has been around for about 60 years, sorry it
hasn't quite managed to best the human brain in every task imaginable.

Also, if one looks at "natural intelligence" they can see some pretty stupid
things going on. Losing spaceshuttles due to metric / SAE conversion, wtf?

Ask the average person to name the country that lies between Iraq and
Afganistan.

[http://www.wolframalpha.com/input/?i=what+country+borders+on...](http://www.wolframalpha.com/input/?i=what+country+borders+on+iraq+and+afganistan)

~~~
finnw1
Wolfram Alpha doesn't actually answer the question that was asked. Instead it
answers two questions (1) what countries share borders with Iraq; (2) what
countries share borders with Afghanistan. The final step (find the
intersection of those sets) is a task that I would expect a computer to be
better at than a human, but that is the step that Wolfram Alpha misses out.

------
glhaynes
_But how did Siri learn who Scott’s wife was? The demo didn’t show us, but I
have a suspicion about how it works.

The Mac Address Book has long had an entry for setting up relationships
between contacts. I can indicate who my spouse is in Address Book. I suspect
that the iPhone Contacts app will gain similar new fields in iOS 5._

I haven't used Siri so I don't know whether it'll use those fields, but I do
know that if it didn't know who Scott's wife was, it'd have asked him and then
remembered his answer.

I also don't know (but would like to!) whether it does said remembering by
updating the contact.

Edit: seen on Twitter, "Siri doesn't just remember your mother, father, and
other significant people, it adds them to your contact details." So: it's not
a separate store.

------
skeletonjelly
But none of this is how it works. I was expecting to read about the company
they acquired and how they set about integrating and negotiating arrangements
with service providers.

------
itsnotvalid
When Siri was first available, there were almost no underpinnings on it would
be the next big thing or how important it would be. Suddenly, after this
general release on a limited hardware, we suddenly find it the next big thing.

~~~
athst
When it was first released, a lot of people said that it was the next best
thing. Take a look at Robert Scoble's initial coverage. The only problem was
that, since it was an app, I don't think they got the level of attention that
they were due. If you used it back then, it really did feel magical.

Now that Apple acquired them, they can use their platform to really promote it
and get it on everyone's device by default, which will help a lot more people
to find and use it.

~~~
itsnotvalid
Robert's article: [http://scobleizer.com/2010/02/08/why-if-you-miss-siri-
youll-...](http://scobleizer.com/2010/02/08/why-if-you-miss-siri-youll-miss-
the-future-of-the-web/)

I guess most of us here in hacker news would have known Siri (from Sri) from
its first public release. Many of us here also would have watched that Youtube
video describing this amazing combination of technology. I guess I didn't feel
any magic comes from the fact that I am not living in the US (so the app means
nothing to me)

------
stingraycharles
The site is currently down, here is a google cache:
[http://webcache.googleusercontent.com/search?gcx=c&sourc...](http://webcache.googleusercontent.com/search?gcx=c&sourceid=chrome&ie=UTF-8&q=cache%3Ahttp%3A%2F%2Fwww.jeffwofford.com%2F%3Fp%3D817)

------
saint-loup
>>> Just think how much fun it will be when I say, “Send a text to Andrea that
says ‘I love you,’” and Siri hears, “Send a text to Andrew that says ‘I love
you.’” I look forward to seeing how reliable it really is.

If some threshold of certainty isn't met, Siri will certainly ask "Did you
mean Andrew or Andrea". Doesn't seem like a really hard problem.

~~~
glhaynes
It always (from what I've seen) confirms before doing any not-easily-undoable
action.

------
swdunlop
From the article:

"if anyone has the motivation, the resources, and the smarts to get AI right,
the iPhone dev team is it."

Which is why they acquired Siri, instead of writing their own? :)

[http://techcrunch.com/2010/04/28/apple-buys-virtual-
personal...](http://techcrunch.com/2010/04/28/apple-buys-virtual-personal-
assistant-startup-siri/)

------
pkulak
I think this is all going to hinge on weather Apple can match the speech
recognition bar that Google has set. Without that (like the article mentions)
all the AI stuff doesn't matter. Voice Actions can't use arbitrary grammer,
and it's not as pretty as Apple's solution, or as integrated, but for
dictation, it's pretty good; honed by years of Google 411 and who knows what
other masses of input. Is Apple using a datacenter with petabytes of voice
training, or is it Dragon in the cloud? I've got an Android phone now, and as
soon as I get my hands on a 4S I'm going to put them through their paces
dictating text and compare. From the reviews, it looks like Apple has at least
matched Google in dictation, which means that Siri has a huge lead over Voice
Actions due to all the other improvements.

------
nihilocrat
I've always been curious, regarding "speaking with a Southern drawl, with a
stuffed-up nose from a bad cold" ... are there non-native English speakers
here who have had particular difficulty using speech-to-text because of an
accent? Have the British, Irish, Scottish, Australians, Canadians, Jamaicans,
or South Africans had trouble because of their dialect?

Speech-to-text sounds like it will be a constantly tough problem. Even humans
aren't 100% accurate, or even 99% accurate, depending on the circumstances.

~~~
gbog
French accent, my nexus never understand me when i ask for gas station, or the
names of my Chinese or French contacts.

------
pazimzadeh
If we created an AI that was actually intelligent enough to understand all of
our queries and sort through our junk to find the answers we need, it probably
wouldn't want to do it.

~~~
grecy
or... once we had an AI capable of sorting through all that junk, we'd pile
endlessly more junk on there until it couldn't :)

------
powertower
TL;DR;

Siri is a simple application with constrained functionality that uses existing
technology, all dressed up to make it seem much smarter than it really is.

~~~
khafra
If it's successful enough at seeming smart, in a large range of different
circumstances, how is that different from actually being smart?

~~~
powertower
seeming smart != being smart

Thought I see what you're saying.

------
rsolomakhin
google cache:
[http://webcache.googleusercontent.com/search?q=cache:SvjDtey...](http://webcache.googleusercontent.com/search?q=cache:SvjDteykXA4J:www.jeffwofford.com/%3Fp%3D817+how+siri+works&cd=1&hl=en&ct=clnk&gl=us)

------
smackfu
Honestly, if you wrote Siri as a text line parser, would anyone be impressed
by all the fancy features? It's more that current voice response systems are
so incredibly primitive.

~~~
epistasis
I think most people would be impressed if they could just type in what they
wanted as an English sentence. Most users are familiar with GUIs, and may have
heard of command line interfaces, but an English interface would be new. And
maybe a plain English typed interface would even be useful. Some percentage of
people type full English sentences into Google.

~~~
mason55
Like AppleScript.

God I hate AppleScript.

------
nirvana
Ok, I know it has entered the popular culture and become a myth, and thus
fighting it is kinda pointless, but the newton was not a failure, neither
technologically or from a business perspective.

First off, Technologically. It worked. I have terrible handwriting, and the
newton I had was the 100- the original message pad. It could recognize my
handwriting, just fine, well north of %90 of the time. When it got a word
wrong, it was trivial to correct. I was able to write on it at the same speed
that I would be writing on a pen and paper. I could increase this recognition
rate to %100 by altering the way I wrote, if I wanted to, but it was more
convenient to write my normal way.

Further, this was all done on an ARM RISC 610 CPU running at 20Mhz with 640kb
of RAM!

It was "technology from the future", and unfortunately it came at a time when,
at least in america, "computer literacy" was a big thing. Many people weren't
really computer literate, and having a computer in the home was not rare, but
not exactly common yet.

Secondly, the Newton was not a failure in the marketplace. The device found
wide adoption in industry where its unique features provided exceptional
value. In places where a Palm could compete (because the newton's superior
technology wasn't significantly more valuable) it didn't do as well... but
that doesn't mean it was a failure as a product or was losing money.
(Interestingly, NeXT, widely also regarded as a "failure" was also profitable
when Apple bought them.)

The newton was building momentum and was about to be spun out in an IPO as
Newton Inc, when Steve Jobs returned to Apple. I don't know the reasons that
Steve Jobs killed the IPO and the product. It seems to me that letting it spin
out would have been the wiser move.

I understand why it was popular for doonsbury to make fun of it, and the
simpsons, etc. It exhibited a kind of hubris. Handwriting recognition? At a
time when people are barely grasping the value of a computer to begin with?

I think the lesson to learn with AI like products is to make sure that the
target market is receptive to the idea.

I hope people are now more receptive to Siri than they were to Newton.

~~~
wanorris

        Teas Willis, and the sticky tours 
        Did gym and Gibbs in the wake.
        All mimes were the borrowers,
        And the moderate Belgrade.
    
        "Beware the tablespoon my son,
        The teeth that bite, the Claus that catch.
        Beware the Subjects bird, and shred
        The serious Bandwidth!"
    
        He took his Verbal sword in hand: 
        Long time the monitors fog he sought, 
        So rested he by the Tumbled tree,
        And stood a while in thought.
    
        And as in selfish thought he stood,
        The tablespoon, with eyes of Flame,
        Came stifling through the trigger wood,
        And troubled as it came!
    
        One, two! One, two! And through and though,
        The Verbal blade went thicker shade.
        He left it dead, and with its head,
        He went gambling back.
    
        "And host Thai slash the tablespoon?
        Come to my arms my bearish boy.
        Oh various day! Cartoon! Cathay!"
        He charted in his joy.
    
        Teas Willis, and the sticky tours
        Did gym and Gibbs in the wake.
        All mimes were the borrowers,
        And the moderate Belgrade.

~~~
pak
OK, I get your point, but transcribing e.e. cummings poems is kind of a corner
case for voice recognition, no?

What Siri is going to test is whether the 80/20 rule applies to a voice
recognition based personal assistant. By constraining it to assistant-type
tasks, and with what seems to be the most intelligent design we've seen yet,
it has a better shot of achieving it than anything else out there. Previous
entrants have been tripped up by the lack of a decent UX, failure to integrate
with other data sources, or any number of shortcomings in the long chain from
microphone to software back to speaker. Apple is the first company to have
such precise control over every component in that chain. (For instance, I
wouldn't be surprised if part of the reason Siri is 4S-only is because
hardware has been added for smarter noise or echo cancellation, or other
subtle design changes.) And the fact that it can learn from the internet, and
presumably submit its training data back to the cloud, means that maybe it
will eventually be able to handle not just 80%, but 90% or 99% of total
inputs.

~~~
trafficlight
It's Lewis Carroll, but your point stands.

------
chugger
ok i've been playing with Siri for a few hours now. this thing is awesome!

