
Ask HN: What are the challenges of building a personal assistant like Siri? - mandela
I&#x27;m thinking to build a language-specific personal assistant, like Siri or Cortana. I&#x27;m excited about the technical challenge but wondering about its feasibility.<p>Some of the challenges are:<p>1. NLP: I am not aware of any non-English library and don&#x27;t have much background on the subject.<p>2. Voice recognition: same as the above.<p>3. Web crawling: there are tons of libraries for doing crawling and I have a decent understanding of the subject.<p>So few questions:<p>1. are there any other challenge that I have not considered?<p>2. is the project feasible?<p>3. will it succeed?
======
skewart
Most of the comments here are pretty discouraging, which is reasonable. It
sounds like an insanely ambitious project for a single non-expert to take on.
But I completely disagree.

You should go for it. Even if you never really get it to work you'll most
likely learn a ton along the way. And if you even halfway pull it off it would
make an epic "Show HN". Also, a lot of the required technologies are advancing
at an incredible rate (both the state of the art and public accessibility). It
may well be an order of magnitude easier now than it would have been a few
years ago.

I don't have any specific advice other than the usual for any large project:
start small and focused, then take it one step at a time. Oh, and read lots of
academic research papers.

Good luck!

~~~
nxzero
>> "start small and focused"

I'd even go as far you say, use an existing platform like Amazon's Alexa with
their hardware to start.

Reading a books like "Calm Technology" and books on lean startups would be
good too.

Books: [http://www.amazon.com/Calm-Technology-Principles-Patterns-
No...](http://www.amazon.com/Calm-Technology-Principles-Patterns-Non-
Intrusive/dp/1491925884)

[http://www.amazon.com/Running-Lean-Iterate-Works-
OReilly/dp/...](http://www.amazon.com/Running-Lean-Iterate-Works-
OReilly/dp/1449305172)

------
hellcow
I'm building Abot, an open-source framework that makes it easy and fun to
build digital assistants. We try to handle a lot of the heavy lifting you
describe above, leaving you to create some cool things! You can see it here:
[https://github.com/itsabot/abot](https://github.com/itsabot/abot)

I set out six months ago to do exactly what you describe (build my own digital
assistant) but decided it would be more helpful to convert it into a framework
and open-source it.

I'm happy to help you get started via our IRC or the mailing list, or discuss
the challenges we've faced and specific solutions we chose and why.

~~~
skewart
That's awesome. I'm on my phone and just quickly read through the readme, but
it seems a bit like Hubot on steroids. In other words, as easy to deploy and
extend as Hubot, but with much more powerful NLP (and voice recognition too,
no?)

It's a fantastic idea. I wonder if there's a way to share the data generated
as people use it. I would imagine that everyone would benefit from this,
though it would have to be anonimized effectively.

~~~
hellcow
Thank you :) No voice support yet but it's easy to add your own drivers to
support external APIs. Official support for voice is coming in v.2 (ETA June
1st).

Yes, we will absolutely make it easy to share training data via opt-in! If I
train a plugin, everyone should benefit without needing a GPU rig or 2 TB of
data.

------
BinaryIdiot
Your challenges are almost immaterial compared with what goes into something
like Siri.

First you need a way to get input. Is that a text input or do you need voice
to text? Google and others can provide a voice to text API to help along here
at least at first.

Second you need to make that text meaningful. But what the heck is
"meaningful" in this context? Generically running an NLP over this is going to
get you sentence structure, etc which may or may not be all you need.

Essentially you need to take this text and turn it into a graph of decisions
which can then be executed.

Basically distill the sentence "send a text to Josh saying hi"

To a graph that might have a branch like: Action -> Send -> SMS ->
("joshbsmith@yahoo.com", "hi")

Categorization is hard. If you can get it reasonable (85% solution) then
you'll be pretty good. This is how Siri essentially works.

Things to consider:

\- "How long is the Titanic" is this referring to a movie or the boat?

\- "Where is Aaron's?" Is this referring to the furniture store chain, a
friend in your contact list or even the street name?

Source: been looking into this a ton. Wrote this during a hackathon:
[http://devpost.com/software/sim](http://devpost.com/software/sim). So this is
all very possible :)

------
ecesena
If you'd like to start simple, you could skip the voice interface and go for a
text-based interface like Messenger.

Voice can be a layer that you put on top later, with the extra complexity that
the recognition may not be perfect. I guess it really depends which is the
most interesting aspect for you.

~~~
afrancis
I find with all the voice APIs out there, I would recommend choosing one and
diving in. One of the skills is getting comfortable with the interactions
between the voice API and the back-end.

------
doke01
I would check out Sirius (open source Siri) if I was you:

[http://motherboard.vice.com/read/sirius-is-the-google-
backed...](http://motherboard.vice.com/read/sirius-is-the-google-backed-open-
source-siri?utm_source=howtogeek&utm_medium=email&utm_campaign=newsletter)

[https://github.com/jhauswald/lucida](https://github.com/jhauswald/lucida)

[http://sirius.clarity-lab.org/](http://sirius.clarity-lab.org/)

------
dudurocha
If George Hotz came in HN and asked about the feasability of doing a self
driving car by himself, people would say the same thing, tell him that was
impossible. Lucky for us he neither asked HN if he could hack the iphone or
the playstation 3 because if he did he could be desencouraged.

Anyway, geohot built a self-driving car and received investment from a16z to
hire more people and expand it further.
[http://www.bloomberg.com/features/2015-george-hotz-self-
driv...](http://www.bloomberg.com/features/2015-george-hotz-self-driving-car/)

My advice to you is study really hard and make it happen. use other people
work to leverage your software and go faster. And after that show me and the
others what have you made.

------
falcolas
You can offload both 1 and 2 to third party services - one of the major ones
with a properly public API that officially supports (and thus charges for high
volume use) use by other parties is Microsoft's Project Oxford [0].

You could also use some of the Google APIs, but those are more-or-less
unsupported, and subject to change as Google needs to change them.

From there, it's a matter of transforming intent into action.

It's entirely possible to do, but you'll have a lot of learning to do to
implement it. One place you might start looking is at the Microsoft Python
blog [1].

[0] [https://www.microsoft.com/cognitive-
services/](https://www.microsoft.com/cognitive-services/) [1]
[https://blogs.msdn.microsoft.com/pythonengineering/2016/02/1...](https://blogs.msdn.microsoft.com/pythonengineering/2016/02/15/talking-
with-python-fabrikam-pizza-shack/)

~~~
web007
As much as possible, this.

If you, singular, are building Siri v2, you need to do as little as possible
yourself. Find an OSS library / tool that does the thing you want, use
Caffe/Theano/etc. instead of rolling your own code, use a commercial package
if you absolutely have to. Concentrate on the "hard" parts, understanding and
translating language to action. Build that part, and glue on all the other
stuff. After you get a working thing you can peel off the layers and replace
them with something of your own, maybe, or more likely something better will
come out by that time and you can plug it in with minimal effort.

------
ttn
I was one of those who wanted to build their own Jarvis. If you want it to do
certain -pre-defined- tasks like opening Facebook or a website etc., it is not
that hard. However, if you want to create something like Siri, you need to use
almost everything humanity has in terms of AI/ML until today.

~~~
existencebox
It seems to me though that there is a continuum between "multi-channel
interactive semi-declarative rule engine" (Frankly, the continuum probably
starts back in oldschool IRC bots and expert systems) and "strong AI", wherein
the bulk of the value, for me at least, is on the former side of that scale.

The vast majority of the pragmatic functionality I'd use on a day to day basis
could probably be built by someone with the willingness to learn about the
systems involved, no rocket science or all of humanities ML required :) (I
certainly didn't know about anything from scraping to relay control to
information retrieval at the start of my pet project)

I guess why I'm saying this is to address yours and some other child comments
that seemed more dismissive of the idea due to the complexity (Responded to
yours largely because you had gone through the steps of "try it yourself"
which I see as the crux of all of this). To the op, Try it! What does it
matter if it's out of your grasp. Learn new shit. Realize the extents of your
knowledge, and in the process, push them a bit further. If you fail, I would
be hesitant to ever call it failure if you even learned a little; I can't tell
you the number of pipe dream impossible projects (microkernels, distributed
OSes, rewriting AML on my own platform, etc) I've taken on, and despite
failing at many of them, the learnings were well worth the time, not to
mention the once in a while surprising success at someone you thought was out
of your reach.

------
fourdoorsaloon
1\. are there any other challenge that I have not considered?

Always. NLP is a valid way to accept requests. Web crawling is a valid way to
get data into the system. How do you tie these two together? How is the data
stored? How do create relationships between the data in your system?

2\. is the project feasible?

You have not yet defined your project beyond "like Siri or Cortana". What does
that mean to you? How will you know when that has been accomplished? Can you
first define a more limited scope for testing the feasibility of the different
parts of your system?

3\. will it succeed?

First you must define success.

------
superice
At my company we did it reasonably well, we just released our product to the
market. We do have the advantage of having an app-based model, which means
that every app the user installs on our product improves the speech
recognition and the amount of actions installed.

Getting started is not that hard, getting good is. It's a hard problem to
parse speech correctly, take numbers for example: Nineteen-Eightyfour can be
parsed as 19 and 84, or as 1948, or as 9 10 80 4. There are challenges,
certainly, but creating a simple program with things like wit.ai is do-able.
Prepare to write a lot of speech parsing logic, and implement every piece of
functionality by hand. Magic as Siri, Google Now, and Cortana may seem, most
of it is just hardcoded responses and actions. That need not be a problem, but
I can promise you, smart assistants will lose a lot of their magic once your
realize it's just a bunch of responses and actions hard coded.

Anyway, I don't want to discourage you from trying, because it's really
interesting to try and see what challenges you're going to encounter. The
getting started pack for speech recognition is mostly wit.ai, or the Google
STT engine. Keep in mind: none of the big companies are doing everything from
scratch. Sure, Google has their own speech recognition, but recognizing the
trigger phrase (jargon for 'OK Google', or 'Hey Siri') is outsourced. Every
piece of software that has such a trigger uses the same library. Remember:
using libraries is not cheating, it's just focusing on your core task, which
is writing the parsing and actions.

------
pjc50
"I would like to build a space programme in my back garden, but I'm wondering
about its feasibility and I don't have a background in rocketry"

Yes, no and no.

~~~
gamerDude
Your fake quote sounds a bit like what Elon Musk might've said when he started
SpaceX.

~~~
pjc50
He didn't start by asking around on HN. Generally people aren't Elon Musk.
Even he probably isn't Elon Musk all of the time.

~~~
awinter-py
A young composer met mozart and said 'I want to write a symphony, where do I
start?'. Mozart spins him a story about starting with short pieces, then
apprenticing to a serious composer, mastering the different instruments and
gradually working up to it.

The young composer is surprised. 'You didn't do all that'. 'Ah,' says mozart,
'but I didn't ask anybody how'.

~~~
NhanH
I used to think that all the "geniuses" didn't ask anyone on how to do things
at all, and just go for it. Turned out that they do, and luckily for us, in
the internet age we have history.

[https://groups.google.com/forum/#!msg/comp.lang.java/aSPAJO0...](https://groups.google.com/forum/#!msg/comp.lang.java/aSPAJO05LIU/ushhUIQQ-
ogJ)

~~~
awinter-py
I like how your interpretation of 'do things' is 'set UA string in java
spiders'. Einstein visited bohr to rap about the speed of light and larry page
asked a mailing list how to use java libraries.

------
larryfreeman
The possibility of deep NLP suggest that there are tremendous opportunities
for building a personal assistant. For Deep NLP, I would suggest course CS224D
at Stanford on YouTube (going on now but also delivered last year:
[https://www.youtube.com/watch?v=kZteabVD8sU](https://www.youtube.com/watch?v=kZteabVD8sU)

For Deep NLP, you will need to be solid on linear algebra and machine
learning. For introduction to machine learning, check out Andrew Ng at
Coursera: [https://www.coursera.org/learn/machine-
learning](https://www.coursera.org/learn/machine-learning) and my favorite
talks on Linear Algebra are the ones done by Gilbert Strang:
[http://ocw.mit.edu/courses/mathematics/18-06-linear-
algebra-...](http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-
spring-2010/video-lectures/)

For web crawling, there are plenty of open source libraries. If you are not
familiar with it, check out the Common Crawl:
[http://commoncrawl.org/](http://commoncrawl.org/) This is a great source of
data to crawl.

If you focus solely on NLP and data from the Common Crawl (or even Wikipedia),
then you will see where you stand as the smoke clears and you feel comfortable
with the state of the art techniques.

Ignore all the naysayers. The good news is that it has never been easier to
get started in deep NLP. Once you have the experience of training a model and
seeing how well it works, you can decide on the next steps. Perhaps, you will
find a niche that is not well-served that you can go after as a first step of
a personal assistant.

Good luck.

------
blcArmadillo
To get an idea of where to start you could check out Sirius [1] which is "an
open end-to-end standalone speech and vision based intelligent personal
assistant (IPA) similar to Apple’s Siri, Google’s Google Now, Microsoft’s
Cortana, and Amazon’s Echo."

[1] [http://sirius.clarity-lab.org/](http://sirius.clarity-lab.org/)

------
galistoca
In this specific case, I would start by trying to come up with an idea for a
solution to an existing problem instead of trying to build a solution looking
for a problem. Trying to build just another Siri will be wasting your time
because you're just one person competing with pretty much all the big tech
companies, whereas if you try to build a useful tech for a specific use case
and people like it, then you can build your platform on top of it. I've never
seen any small startup succeed by trying to build an ambitious "platform" from
scratch. Most successful platform companies started out as small application
providers and went from there. Some people are encouraging you to do it
because that's how you learn things, but I would rather make something people
like AND learn something new at the same time, instead of spending the same
amount of time working on something that will never see the light of the day.

------
bpyne
Take a few minutes to discern your motivation.

If you're motivated by learning, you'll certainly do so. You'll probably learn
about topics you hadn't considered learning about. Start small and build
incrementally. Be patient, you're in for a lot of reading.

If you're motivated by profit, i.e. starting a company, then you need to be
strategic. The Big Guys - Apple, Google, MS, etc. - are mostly looking at
"general purpose" assistants. Keep in mind that they're putting massive effort
into these assistants because the technology will become a central part of
products like automobiles.

Perhaps you can find a niche market they're missing so that you can target
your research and coding efforts.

As others said, pick a small, focused goal in building an assistant and add
goals as you succeed.

------
V-2
For what it's worth, the BotBuilder from Microsoft is open-source:
[https://github.com/Microsoft/BotBuilder](https://github.com/Microsoft/BotBuilder)
\- could be worth a peek.

As the website states, this SDK is only "one of three main components of the
Microsoft Bot Framework", and I'm not sure if the other two are open-source,
but it shouldn't be all that significant given that it's just integration with
Microsoft products, like Office 365, plus the bot directory, which I imagine
consists of some ready-to-use bots one could possibly piggyback on.

This is not voice recognition, but once you're past that stage what you have
on your hands is a piece of text anyway - which has to be parsed and
processed.

~~~
frik
Please read up what SDK means. It's worthless in OP case.

------
raitom
If you dont' want to start from scratch you can use
[http://api.ai](http://api.ai) or [http://wit.ai](http://wit.ai)

------
amelius
The biggest challenge is probably covering all the corner cases. In this
regard, building such a thing is more about "caring/nursing" than it is about
"engineering".

But of course, there are some engineering challenges as well, although you
might want to use existing solutions (open source or APIs) for that. For
speech-to-text you can find many APIs. Here's a nice demo for doing NLP:
[http://nlp.stanford.edu:8080/corenlp/](http://nlp.stanford.edu:8080/corenlp/)

------
kleiba
That mostly depends on what scale you have in mind.

If you simply want to make an app for your own personal use, and you imagine a
restricted form of dialog (by which I mean e.g. "query/reply" or
"command"-type of dialogs as opposed to open discussions) to trigger a limited
set of actions (say, verbatim web searches, control the built-in functionality
of a smart phone, etc.) it is feasible.

But that doesn't mean it's easy. But for a project to hack on, why not?

The good news is that there are a lot of tools that can do some of the heavy
lifting for you, especially if you restrict yourself to English. You are right
that the situation for other languages is not quite as luxurious, but there
are tools (of varying quality) for other languages as well, as specially
Western European languages.

However, because it's a complex subject matter, expect that you might need to
first dig into some linguistic and/or NLP theory in order to get the most out
of these tools.

For instance, the Kaldi speech recognition toolkit is a state-of-the-art
research software for automatic speech recognition (ASR), and it's open
source. The thing is, to get really good recognition results, you might need
to train your own acoustic and language models. Hence, you'd need to learn
about these things.

For NLU (natural language understanding) there are also a bunch of free
software packages available; however, they often follow completely different
philosophies and goals. Thus, in order to make an informed decision which one
would be the best for you, you'd again have to be prepared to do some reading.

One quite user-friendly service for NLU you might want to check out is wit.ai
which was acquired by Facebook last year. They focus on setting the entrance
barrier really low for the task of turning spoken input into a domain
representation. For example, you can quite easily define rules that turn the
utterance "turn down the radio please" into a symbolic representation that you
can use in your downstream processing. The big plus here is that they do the
ASR for you, so you don't have to worry about that.

If you prefer to have more control over your tool chain, there are a wide
variety of scripting languages that you can use to get your feet wet. AIML is
sort of popular for writing bots, but it's quite limited and you have to write
rules in XML. VoiceXML is a standard that is great for form-filling
applications, ie., situations where your system needs to elicit a specific set
of information that's required to run a task. A classic example would be
traveling: for your system to find a flight for you, it needs to know (a)
point of departure, (b) destination, (c) preferred date and time, (perhaps
others). So you need to tell the system, or it has to ask about this
information.

There are also domain-specific languages like Platon ([https://github.com/uds-
lsv/platon](https://github.com/uds-lsv/platon)) that, again, give you more
control but also try to make it quite easy to write a simple application.

A next aspect more complex dialog systems typically care about is what the
intent of a specific user utterance is. Say, you ask your personal assistant:
"do you know when the next bus comes?", you don't want it to answer "yes".
That's because your (what is called) "dialog act" was not a yes-no-question,
but a request for information. So, you might want to care about how to detect
the correct dialog act. Well, first you might want to care about what kinds of
dialog acts there are and which of those your system should be prepared to
handle.

There are many different dialog act sets developed for different domains and
situations. There's also an ISO standard (ISO 24617-2) that defines such a
set, but then you'd go into more advanced areas again.

Next, say your system has done all of the above processing, recognized speech,
analyzed the meaning, etc. -- now your system has to make the next move! So
how does it decide what's the best reaction? What's by some considered the
state-of-the-art for dialog management these days runs under the label POMDP
-- Partially Observable Markov Decision Processes. These are systems that
learn the best strategy on how to behave from data, typically using
reinforcement learning. But you also still have the more traditional
approaches in which an "expert" (in this case: you) authors the dialog
behavior somehow, and there are tools for that as well.

But again, the more simple languages mentioned above like, e.g. Platon etc.,
also cover this in a way, so don't get discouraged just because you've never
even heard of POMDPs so far, nor do you have a large data set that is required
for the machine learning part: like with all of the different tasks here,
there's always alternatives.

Once your assistant has made up its mind about what to do and what to say, you
need to turn that into an actual utterance, right? If you just want to start,
having a large-ish set of canned sentences that you simply need to select from
can get you a long way. The next step would be to insert some variables into
those canned sentences that your system can fill depending on the situation.
That's called template-based natural language generation (NLG). More recently,
machine learning has also been applied with some success to NLG, but that's
(a) still researchy and (b) not even necessary for a first dab into writing a
dialog system.

Unless you just want to display the system utterance on the screen, you'd
finally need to use some text-to-speech (TTS) component to vocalize the system
utterance. There are some free options, such as Festival or MaryTTS, but
unfortunately, they don't quite reach the quality of commercial solutions yet.
But hey, who cares, right?

One topic I haven't talked about at all yet is uncertainty. Typically, a lot
of the steps on the input side of a dialog system use probabilistic approaches
because, starting from the audio signal, there's inevitably noise in the input
and so the outputs produced on the input side should always be taken with a
grain of salt. For ASR, you can often get not just one recognized utterance,
but a whole list of hypotheses on what it was the user actually said. Each of
these alternatives might come with a confidence score.

That, of course, has implications on all the processing that comes afterwards.

Now, I've written a whole lot -- and yet, there's so much more I haven't
touched yet, such as e.g. prosody processing, multimodality (e.g., using
(touch) gestures together with speech), handling of speech disfluencies,
barge-in, etc.

But I think that shouldn't keep you from just giving it a try. You don't have
to write a Siri clone in one weekend. Just like the first video game you write
doesn't have to be the next "Last of Us". You can start with Pac-Man just
fine, and likewise you can write your first small voice-based assistant that
cannot do half the stuff Siri can, and yet have a great time.

~~~
jesuslop
woa very thorough answer. I wonder what's the reward in dialog pomdp, where
does it come from?

------
rocgf
Apple and Microsoft are investing many millions of dollars [citation needed]
in this, there's just no way that you can compete. Not that miracles don't
happen and it's only huge corporations that build amazing things, but
statistically speaking, the answer is a clear "No, it's not feasible".

If you, however, want to learn a lot, then it's a definite "Go for it".

------
facepalm
I think the hardest part might be getting lots of context information. Google
has a huge database, you probably don't. And Google has access to your
calendar, your contacts, your search requests, your location history - and the
same for your contacts. They can infer a lot from that.

------
leemalmac
The most epic challenge is to start and get back to work when thing will go
wrong. Do it, you'll learn a lot. It will be hard. Post it on Github, ask
questions on SO and I hope to see it here as 'Show HN: My f*ckn awesome
personal assistant' =)

------
kespindler
Email me kespindler at gmail. I have relevant work that you'd be interested to
hear about.

------
wslh
Re 2) Speech Recognition: it seems like using the new Google Speech API is the
way to go but I wonder if you can obtain access or if it is very expensive for
small companies. I applied a few weeks ago and haven't heard anything from
them.

------
engineer_steve
What is the scoope for your project?? I mean, emmbedded system this is??
(Hardware/software). Do you think running on dsp/arduino/raspberry?? Exist a
lot of doubts. Im working in something like that. Good luck the road is long.

------
aryamaan
Hey, I am also thinking to work on something similar (as a side project). Do
let me know if you want to collaborate.

------
discordance
It's tricky but there are some nice ways to check the water quickly. Check out
LUIS - [https://www.luis.ai](https://www.luis.ai)

------
lumberjack
[https://en.wikipedia.org/wiki/Siri#Supported_languages](https://en.wikipedia.org/wiki/Siri#Supported_languages)

------
sidcool
To add a few, in addition to the ones you mentioned: 1\. Infrastructure 2\.
Scaling 3\. Huge datasets for machine learning

------
coreyp_1
1\. Do you know what you are doing? (not trying to be harsh... it's a serious
question.)

2\. Have you read any of the latest research in this area? (think Google
scholar).

3\. Often, apps like siri and cortana do not process the sound on the device,
but rather send it to a server for processing. There is, of course, exception
in cases of small, well-defined applications with limited pattern matching
required.

4\. Whether or not it is feasible and/or succeed is based on your ability,
performance, and approach. We cannot predict any of that here.

------
afrancis
Hi Mandela:

By Siri, I am assuming your mean some application that understands speech and
carries out commands like answering queries, or carrying out a command?
Answering queries is specific domain of NLP. As is developing the acoustic
model (roughly the part that links the audio with linguistic units). You will
find that there are many sub-problems to tackle. You have to decide on which
problems to tackle.

I have very limited experience in this field. If I didn't try out Nuance
NLU/MIX at a hackathon a few months ago (and get a lot of help from the Nuance
representatives), I would have never taken a stab at writing a voice
application.

I am currently building a simple request/response system. Requests of a "HAL,
open the pod bay door" nature. As opposed to something more free form and
conversational. One has to start somewhere.

 _2\. Voice recognition: same as the above._

As many posters have commented, there are many speech recognition and
synthesis systems out there that handle the acoustic model. Some of the system
handle languages other than English.

I have tried the Amazon Alexa SDK, Wit.ai and Nuance NLU/MIX. For the most
part, they are essentially a client/server model.

1 - At "compile" time, one builds up an acoustic model/language model with
samples (at least this is how Nuance works). Or maybe the language model
already exists and training through additional use makes it better.

2- At "runtime," a client such as a mobile application interacts with the
speech recognition system, via an API. Audio is sent. The speech system parses
the audio stream and returns some data structure (or audio if speech synthesis
is done). Or in a slight variation, the speech system sends the AST to an end-
point for back-end processing. Most of your application will be the back-end
that does the meaningful domain specific stuff. For instance, in Alexa, one is
developing "Skills"

One of the things I am learning is that there is not a clear cut distinction
between what is in the language model and in the back-end. Although the
language model allows it, I don't necessarily want to bake in business logic
(i.e. Pod bay Door maps onto part number 42). Also the back-end may have to do
additional NLP related processing ("Cod bay" is probably "Pod bay").

 _NLP: I am not aware of any non-English library and don 't have much
background on the subject._

I found the Stanford course cs224n to be super useful
([https://www.coursera.org/course/nlp](https://www.coursera.org/course/nlp)).
There is a book (Speech and Language Processing) associated with the course.

The Peter Norvig paper "How to write a Spell Corrector"
([http://norvig.com/spell-correct.html](http://norvig.com/spell-correct.html))
is very helpful.

Since I am working with Python, I use the book "Natural Language Processing
with Python" by Bird, Klein, and Loper and the Python NLTK.

Finally I find that reading the documentation associated with the various
speech systems to be very helpful. Vary to suit your needs. For instance,
Alexa provides guidelines for U/X like Voice Design Best Practices
([https://developer.amazon.com/public/solutions/alexa/alexa-
sk...](https://developer.amazon.com/public/solutions/alexa/alexa-skills-
kit/docs/alexa-skills-kit-voice-design-best-practices)).

 _3\. Web crawling: there are tons of libraries for doing crawling and I have
a decent understanding of the subject._

I think crawling is the least of your problems. Look at Week 8, Information
Retrieval of the Stanford NLP Course.

Mandela, at this point all I can say is go for it! Look at the speech SDKs out
there and pick one, preferably one with a large community. Build something
small. See if you can work in a computer language and system you are
comfortable with (With Nuance, I could work with mostly with Python and avoid
Android and Java).

Have fun!!!

