
Show HN: JuliusJS – Speech recognition in JavaScript - zzmp
https://github.com/zzmp/juliusjs
======
zzmp
There is now a (very rudimentary) demo on the GitHub page:
zzmp.github.io/juliusjs

Much thanks to @iffy for writing the first pass.

It uses voxforge's sample vocabulary, so you'll need to say things like "Dial
1 2 3" or "Call Kenneth McDougall" for it to understand you, but the
vocabulary is easily swapped out for your own projects, as explained in the
README.

------
ar7hur
Thanks for sharing, nice work!

Quick question the Julius website says there is no English acoustic model
available [1], how did you solve this? Do you provide a default acoustic
model?

[1]
[http://julius.sourceforge.jp/en_index.php?q=en_grammar.html](http://julius.sourceforge.jp/en_index.php?q=en_grammar.html)

~~~
zzmp
I used voxforge[1], a project made specifically to solve this problem:

> VoxForge was set up to collect transcribed speech for use with Free and Open
> Source Speech Recognition Engines (on Linux, Windows and Mac).

VoxForge's sample grammar is provided as a default, in its own folder [2]. It
would be nice to get a high-quality acoustic model, as voxforge's is not that
comprehensive yet, but I couldn't find anything with the right licensing and
zero cost. If anyone knows of one, I'd love to hear about it.

[1] [http://www.voxforge.org/](http://www.voxforge.org/) [2]
[https://github.com/zzmp/juliusjs/tree/master/dist/voxforge](https://github.com/zzmp/juliusjs/tree/master/dist/voxforge)

------
zzmp
Creator here - I ported this over from the open-source Julius using
emscripten. AMA

~~~
cjbprime
Nice, thanks for doing this! Are you aware of any sites using speech rec for
e.g. page navigation?

~~~
zzmp
No, but I think it's a great idea. If you end up using JuliusJS for this,
please let me know. I still need to make some example applications to showcase
what JuliusJS can do - I'll have them listed in the README when they're done.

I also came across PocketSphinx, which has been around a little longer - that
may have some users in the wild already, and I wouldn't be surprised if it was
used for navigation somewhere.

[PocketSphinx]
[https://github.com/syl22-00/pocketsphinx.js/](https://github.com/syl22-00/pocketsphinx.js/)

~~~
cjbprime
Sounds good, thanks! Also, I think you'd get more interest here if there were
a simple demo page that people could click through to -- that's the nice thing
about JS, after all. Maybe something hosted on a github.io page that just
transcribes into a textarea?

~~~
WhitneyLand
Yes, don't worry too much about fancy samples textarea would still be great.
Nice work.

------
danso
This is sweet...to get an idea of how much fun this could be for web apps,
check out the Annyang library
([https://www.talater.com/annyang/](https://www.talater.com/annyang/)), which
wraps around the Google Web voice recognition API...it works very well, but of
course, is subject to Google's terms...so an open source system is very
welcome

~~~
_pius
Strictly speaking, Annyang wraps the HTML5 speech recognition API, not
Google's specifically.

------
yeukhon
Pretty cool! When I did my project I had to use
[https://github.com/kn/speak.js](https://github.com/kn/speak.js) which is an
amazing library. The library still works on Firefox 30, 31 by the time I
finished my project (and the project itself hasn't change much for a year or
two!).

I would definitely give this JuliusJS library a try. I am actually amazed that
JuliusJS doesn't carry all the heavy data like speak.js does (multiple
languages support though). I love the fact that you state 100% client side!

------
hugozap
Great work, this will be another cool library i star on GitHub and never do
anything about it :/

~~~
cue232s
LOL!! IKR

------
bubee
Nice work. Can it return confidence scores? Say I want to load 3 commands in
my page: 1\. Click blue button 2\. Scroll down in the yellow text area 3\.
Expand image of man I feed those to the engine, and when somebody speaks, I
get a confidence score on each word so I can determine with a level of
configurable certainty that the user is using the command: {click: 0.9878
confidence, blue: 0.8789 confidence, button: 0.1889 confidence)

Something like that...

~~~
zzmp
It does post them back from the worker, but the Julius interface doesn't
expose them (yet). The way that Julius deals with confidence scores is also a
little different (they're not fractional), so you'd need to account for that.

I'll be sure to include them soon - it's probably just a few more lines of
code, so you can expect them in the onrecognition function this afternoon.

~~~
zzmp
OK it's done! Not well documented yet - that will wait for another day - but
you can now access the score through the `onrecognition` event.

~~~
bdevani
Fantastic project and response time. Congrats on this.

~~~
zzmp
I'm literally sitting here hitting refresh. Thanks.

------
jergason
I've played with pocketsphinx.js a fair amount, but this looks WAAAAAAAAAY
easier to set up and consume. Nice work.

~~~
zzmp
pocketsphinx.js looks amazing, but there's definitely a barrier to entry if
you've never worked with speech recognition before. That was my biggest goal
in porting this tool - a nice, abstracted API. Glad you like it :)

------
Gonzih
Is there online demo anywhere on the web?

~~~
zzmp
I'll try to whip one up today and post it to the README.

~~~
Gonzih
Great, thanks!

------
sunsu
Can you use any of the CMUSphinx compatible language models with this, or is
there a tool to convert them to something Julius supports?

~~~
zzmp
I believe that Julius offers their own tools for conversions like these, but
the best place to consult would be the JuliusBook [1].

[1]
[http://julius.sourceforge.jp/juliusbook/en/](http://julius.sourceforge.jp/juliusbook/en/)

------
bikamonki
Can this be used to detect a voice's unique digital signature? For example I
just say my name to login into a website?

~~~
crimsonalucard
I don't think voices have unique signatures. Voices vary widely but they can
still share identical properties and be imitated.

~~~
ar7hur
Voices do have a unique signature. The technology do identify it is called
Speaker Identification. See for instance [http://research.microsoft.com/en-
us/projects/whisperid](http://research.microsoft.com/en-us/projects/whisperid)

> Each person's voice is different. Some sounds, like "s", sound about the
> same no matter who says them, but other sounds, like vowels, tend to differ
> a lot from person to person. We use a special way of representing sound, the
> cepstrum, that captures lots of information, including the characteristic
> way you pronounce your vowels. Of course, someone could imitate the way you
> talk; fortunately, the cepstrum also captures certain fundamental
> characteristics of voices that are impossible to change. For instance, the
> length of your vocal tract -- the place where sound is produced in your body
> -- cannot be changed, and different length vocal tracts tend to produce
> cepstra with different characteristics. By identifying both the way you
> talk, and the way your body produces sound, WhisperID can do a great job of
> figuring out who you are.

~~~
woodson
As someone working in that area, I have to disagree. Sure, speaker recognition
can work very well under certain circumstances, but there is no _unique_
signature. (that would imply that you could successfully discriminate between
any two speakers in the world, irrespective of any other factors)

~~~
ar7hur
OK, maybe _unique_ was a strong word. But banks are starting to use speaker ID
to log in to mobile apps, so the signature is, let's say, unique enough for
practical applications.

------
cue232s
Does your application need a nodejs backend for this library to work?

~~~
prezjordan
Looks like they're just providing examples on serving the library on a
webserver. It's all static JavaScript.

~~~
zzmp
Right, it's all done on the client, so it is server agnostic.

~~~
cue232s
Thanks for this lib!

------
borplk
Genuinely not sure if the demo is a joke or not.

I said "hello" it said "DIAL OH OH".

I said "Apple" it said "GET KENT".

WTF?

~~~
zzmp
In order to get a more comprehensive vocabulary/grammar, you need to
substitute out the sample that it comes with. There are instructions in the
README. For the demo, it just uses the sample grammar that voxforge provides,
which is (as you can see) fairly limited.

------
cssandjs
This is awesome - who cares about Windows 10 or Linux, Javascript is the new
OS.

~~~
zzmp
It's definitely a fun little toy...

[http://node-os.com/](http://node-os.com/)

[http://osjsv2.0o.no/](http://osjsv2.0o.no/)

------
CmonDev
Wow, speech recognition in a Turing-complete language. Amazing.

------
kelvin0
This is the kind of technological challenge which must be fun to complete. And
it must be quite satisfying for the author. However, whenever I see a 'XYZ in
pure javascript', I keep getting the impression we are only delaying the
inevitable moment browsers have to step to a superior language. Kinda like
instead of quickly ripping off a bandaid is better than slooowwwwllly removing
it ....

~~~
evan_
Seeing cool things written in JavaScript makes you think that JavaScript is
doomed?

~~~
dubcanada
Not to be that guy, but it wasn't written in JS it was transpiled from C to
JS.

~~~
kelvin0
My point exactly. The end result being we are solidly entrenching Javascript
into browsers, not the contrary. So instead of starting with a proper hammer
for nails, we are slowly turning a screw driver into a blunt object which can
neither screw nor hammer nails properly.

