
Show HN: An open-source, Raspberry-Pi-based Siri alternative - shbhrsaha
http://jasperproject.github.io/
======
apendleton
It's a shame that the quality of open-source text to speech engines is so much
worse than the current commercial state of the art, as that's the most notable
difference in the demo videos between this and something like Siri. Would
fixing that just be a matter of recording more high-quality free sample
libraries, etc., or are there fundamental technical challenges to solve?

~~~
ar7hur
It's mostly a question of training data. Google trains its acoustic models on
thousands of hours of annotated audio samples. It's very hard of an open
source project to i/ get enough data ii/ have the computing power to actually
train with it.

~~~
jpalomaki
Sounds like this field could use some crowd sourcing.

What kind of data would be the most useful for learning? Would it be the same
phrases read by different people, single words or is any text good? Do we need
lots recordings from a single person or many smaller samples from different
persons?

I'm thinking of a web site where people could contribute to the project by
joining and then reading out phrases the system shows them.

~~~
cf
You mean something like [http://www.voxforge.org/](http://www.voxforge.org/)

Seriously there is no massive CC-licensed source of audio data out there. Most
of the fancy algorithms for doing speech recognition are on github. What isn't
is a massive and diverse dataset. I encourage others to reply if they have
seen otherwise.

------
scribu
It says it's 100% open-source, but I can't seem to find the sources for the
Jasper platform (not the Jasper client).

Even the "compile Jasper from scratch" installation method involves
downloading some binaries:
[http://jasperproject.github.io/documentation/software/#insta...](http://jasperproject.github.io/documentation/software/#install-
binaries)

edit: more specific link

~~~
shbhrsaha
The client code at [https://github.com/jasperproject/jasper-
client](https://github.com/jasperproject/jasper-client) comprises all of the
source. The modified Raspbian distribution linked to in the Software Guide
only includes supporting libraries and some configuration files.

~~~
scribu
Ok, but why do I have to both install some binaries and clone jasper-client?

[http://jasperproject.github.io/documentation/software/#insta...](http://jasperproject.github.io/documentation/software/#install-
binaries)

[http://jasperproject.github.io/documentation/software/#confi...](http://jasperproject.github.io/documentation/software/#configure-
jasper)

I think it would help if you briefly described in the docs what
[https://sourceforge.net/projects/jasperproject/files/usrloca...](https://sourceforge.net/projects/jasperproject/files/usrlocallib_binaries.tar.gz/download)
contains, since it's fairly large (75 MB).

Although smaller, the same goes for
[https://sourceforge.net/projects/jasperproject/files/usrloca...](https://sourceforge.net/projects/jasperproject/files/usrlocalbin_binaries.tar.gz/download)

Otherwise, great initiative!

~~~
shbhrsaha
Thanks! Yes, we'll be sure to clear that up-- the binaries were for a few
CMUCLTK and Phonetisaurus libraries that were difficult to compile on
Raspberry Pi.

------
kyle_martin1
I actually made this exact thing Saturday afternoon. Great job!

First prototype was on an Arduino it and eventually ran out of firmware space.
So then I upgraded to the RPi and from there it was a breeze.

Suggestions:

(1) Use Wit.ai for NLP. There is some added latency but the capabilities far
out reach Sphinx in the long run. It's free. Less code to maintain. Easier to
deploy and distribute.

(2) Try to find a small mic so that you can put everything in a sleek package.

(3) Add support for bluetooth speakers (you're on a RPi, it's basically done
for you)

(4) 3D print a custom case, throw some 3M tape on it and it's ready to be
wall-mounted!

~~~
crm416
Great suggestions. I've been meaning to take a deeper dive into Wit.ai for a
while now. It seems like their intents-entities architecture would actually
fit in pretty cleanly with Jasper.

As an aside: I don't think it'd be difficult to developer Jasper modules that
use Wit without modifying much of the original source (as long as the speech-
to-text systems pick up on the text you'd need to pass to Wit).

------
tdicola
Nice project--I like the code, it's very clean and well structured. You should
check out the subversion trunk of pocketsphinx, it has support for keyword
spotting built in so you can do things like instantly recognize the persona
keyword to enable the system instead of running the transcription through
pocketsphinx and hoping for the best.

Unfortunately the keyword spotting stuff isn't documented yet, but check out
the code for my Demolition Man swear detector project which is using it:
[http://hackaday.io/project/531-Demolition-Man-Verbal-
Moralit...](http://hackaday.io/project/531-Demolition-Man-Verbal-Morality-
Statute-Monitor) The important bit is the ps_set_kws function call, this takes
either a text keyword or filename with list of keywords. Then after processing
audio call ps_get_hyp and it will return any spotted keywords. Check out the
code here in PocketSphinxKWS.h/cpp:
[https://github.com/tdicola/DemoManMonitor](https://github.com/tdicola/DemoManMonitor)

------
sarvagyavaish
I am looking to use this / something similar at the startup I work at to
toggle our robot's operating modes. I have a question regarding the voice
recognition. Is the voice recognition stuff done on the Pi itself, or is there
a service that Jasper taps into to perform voice recognition?

~~~
fixedd
It's using Pocketsphinx:
[http://cmusphinx.sourceforge.net/](http://cmusphinx.sourceforge.net/)

------
atmosx
That's amazing, I'm impressed. I have no idea how voice recognition software
works, but is it possible using this open source project to add other non-
Latin languages (e.g. Greek, Arabic, Japanese)?

~~~
gallamine
It uses the CMUSphinx project
([http://cmusphinx.sourceforge.net/wiki/](http://cmusphinx.sourceforge.net/wiki/))
that is language independent. You can download language models for English,
Chinese, French, Spanish, German, Russian.

------
m4r71n
I'm not entirely clear on why it requires a WiFi adapter? Can you not use the
wired connection? Module writing looks pretty nifty though, can't wait to give
it a try over the weekend.

~~~
shbhrsaha
Yes you're right, it works absolutely fine over a wired connection, WiFi
adapter not required.

~~~
radiorental
How are you able to achieve this. My install says its 'attempting to connect
with -SSID-' and fails, even with the wifi adapter removed and I've confirmed
it's got an ethernet issued IP address.

------
lukasm
Guys, could you tell me where did you get the music for your demo? How much
did you pay? What was the process? Did you use something like MoveMaker?

I'm building a tool (prototype phase) for creating trail videos. One of the
use case would be to create the demo movie of a product.

~~~
shbhrsaha
Sure, the music is from [http://www.jamendo.com/](http://www.jamendo.com/),
where some tracks can be used for non-commercial purposes.

~~~
lukasm
So you downloaded mp3 and used a tool like MoveMaker to make the video? Thx

------
nl
This looks pretty good.

I've been working on a freetext question answering service, but more the
question answering part (as opposed to the voice recognition side).

Looking at the documentation[1], it appears there is no way for it to handle
free text questions ("What is population of X?" \- where X is any country)
since all words need to be defined in advance. Is that correct or am I missing
something?

[1]
[http://jasperproject.github.io/documentation/api/standard/](http://jasperproject.github.io/documentation/api/standard/)

------
ohblahitsme
This looks great! I'm only missing a USB microphone. I'll be sure to make this
once I get my hands on one.

It also seems pretty trivial to set up Wolfram Alpha on here. From what it
looks like, you'd just have to: 1) get a developer account at Wolfram Alpha 2)
download this promising looking module:
[https://pypi.python.org/pypi/wolframalpha/1.0.2](https://pypi.python.org/pypi/wolframalpha/1.0.2)
3) integrate it into Jasper (create a module)

I'll be sure to try it once I get it set up.

------
izqui
This is pretty cool.

I would use Android's TTS (picotts). Audio quality is better.

------
cscurmudgeon
Serious question: Why does everyone seem to confuse speech recognition with
other parts of NLP (e.g. parsing)?

I can understand CNN or TechCrunch getting confused, but there seems to be a
universal confusion here on HN too.

Not ranting. It is a bit exasperating to read comments and articles addressing
only speech recognition. Siri is more than that.

------
murali44
Sweet! This is just what I was looking for to command my Sonos speaker to play
some music.

~~~
bambax
I'd be very interested in this; I've been looking for a Sonos API for a long
time and could only find an old Perl script that doesn't seem to be
maintained.

How do you plan on doing this?

Thanks!

~~~
murali44
This is a nice python library for controlling a Sonos speaker.

[https://github.com/SoCo/SoCo](https://github.com/SoCo/SoCo)

------
samstave
This would be an awesome voice control addon for XMBC/media play services from
a Pi!

~~~
yaeger
I would love if that worked offline, too. Even if it is just a very limited
use of voice recognition.

Like, Siri has to be online cause everything is processed on Apple's servers
because there are so many things Siri could do. But if all I want is to be
able to say "play <NameOfMovie/TVShow>", I would love it if that could be done
offline. Even if I had to train the system myself.

Say I have 30 TV Shows in the library, I could see myself train the system the
names of all of them if that meant I am able to actually start them via voice.

------
tylercrumpton
Great work! What sort of recognition distance were you able to get with that
microphone?

~~~
shbhrsaha
Thanks. Depends on the conditions, but works most of the time around 10-15
feet away!

------
jkldotio
Excellent work, I've always wanted to see a real world use of pocketsphinx
with python. When I looked a year ago the documentation was lacking. The
module system looks nicely extensible as well.

------
rob_mccann
I had a very similar (albeit less-complete) hack a while back:
[https://github.com/rob-mccann/Pi-Voice](https://github.com/rob-mccann/Pi-
Voice)

------
sp332
Would it be easy to get this running on a normal PC, without a RPi?

~~~
0X1A
The RPi is essentially a PC, the only real difference is that it uses an ARM
processor so yes, I'd assume it would be rather easy.

~~~
sp332
Thanks. I've been looking at the code, and aside from a list of dependencies
in client/requirements.txt, it looks pretty simple.

------
endeavour
Is there a way to make this subtract any noise being outputted by the pi's
speaker ouput, so that it can still understand me if I'm playing
music/watching a movie?

~~~
kyle_martin1
Well, yes this can done. Noise cancelation & voice extraction algorithms
aren't implemented here but it's totally possible. It wouldn't be too bad to
implement if you have a decent understanding of DSP.

------
gyosko
It would be cool to trigger commands with my smartphone. I say "Open the door"
he sends it to my Raspberry who then opens my door. Is this possible?

------
drincognito
How accurate is text recognition (could you use it for dictation?) and how
fast between the end of a command and recognition/parsing of said command?

~~~
JetSpiegel
If you want to do dictation, you can use the Sphinx4 project. It's in Java,
and you have to write a bit of code, but it's an offline recognizer, you
record the audio and then it doesn't run in real-time.

------
kylemaxwell
Excellent, I've pondered doing something like this with a BeagleBone Black.
Can't wait to try it out and see what I can do.

~~~
gourneau
I am wanting to get this running on my BBB black. Hopefully it won't be that
hard. It looks like mostly Python code, and a mountain of dependencies.

------
achalkley
Open Source project pitched like a product. This is how you do it peeps!

I like this a lot and I'm encouraged to see it.

------
raghavsethi
Great work guys! Looking forward to playing with this!

------
nmadhavan
Seems clean, useful, and well-documented!

------
smallfluffycat
I need a french acoustic model, if anyone has one...

~~~
nshm
French acoustic model for CMUSphinx is available in downloads:
[https://sourceforge.net/projects/cmusphinx/files/Acoustic%20...](https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French%20F0%20Broadcast%20News%20Acoustic%20Model/)
You need to use it with French dictionary
[http://sourceforge.net/projects/cmusphinx/files/Acoustic%20a...](http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French%20Language%20Model/frenchWords62K.dic/download)

------
bliker
I think this is a great project! But Raspberry Pi might be a bit
overpriced/overpowered for this task. Maybe something like Arduino Yún would
be more appropriate choice. I am really hoping this movement of small
GNU/Linux based home appliances will take off and lower the price.

~~~
tylercrumpton
How would using a Yun be cheaper than a Raspberry Pi? The Yun is currently
twice the cost of a the RPi, and even taking into account the WiFi adapter and
SD card, the RPi comes out far ahead.

You would also need to take the time to write drivers for the USB microphone
and do realtime speech recognition, web traffic, and text-to-speech on the
Yun's little 16MHz/400MHz processors.

This project is perfect for single-board computers like the RPi, and I can't
imagine that you would get that far with a microcontroller.

