Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: JuliusJS – Speech recognition in JavaScript (github.com/zzmp)
202 points by zzmp on Oct 3, 2014 | hide | past | favorite | 52 comments



There is now a (very rudimentary) demo on the GitHub page: zzmp.github.io/juliusjs

Much thanks to @iffy for writing the first pass.

It uses voxforge's sample vocabulary, so you'll need to say things like "Dial 1 2 3" or "Call Kenneth McDougall" for it to understand you, but the vocabulary is easily swapped out for your own projects, as explained in the README.


Thanks for sharing, nice work!

Quick question the Julius website says there is no English acoustic model available [1], how did you solve this? Do you provide a default acoustic model?

[1] http://julius.sourceforge.jp/en_index.php?q=en_grammar.html


I used voxforge[1], a project made specifically to solve this problem:

> VoxForge was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac).

VoxForge's sample grammar is provided as a default, in its own folder [2]. It would be nice to get a high-quality acoustic model, as voxforge's is not that comprehensive yet, but I couldn't find anything with the right licensing and zero cost. If anyone knows of one, I'd love to hear about it.

[1] http://www.voxforge.org/ [2] https://github.com/zzmp/juliusjs/tree/master/dist/voxforge


Creator here - I ported this over from the open-source Julius using emscripten. AMA


Hi! Just curious on general emscripten work - how long does it take to port something like this? What are the main hurdles, and things that you can get stuck on?


This was my first emscripten project - I would definitely recommend it. I found it to be a great tool. Two things to point out:

- If something is not documented well, it is probably tested well. If you can find the tests that cover it, you can usually get a good idea of how it works. - There is no multithreading. I had to fake it by breaking up my loops with setTimeouts.

There is an IRC channel if you need a place to go for help. Also, feel free to PM.


This is absolutely amazing, great work. This will open up a whole new world of possibilities.

In my time using offline speech-recognition tech, I never was able to use julius properly (docs were a little lacking) so I jumped on to CMUSphinx/pocketsphinx, but I think you've just done a huge amount of work to bring Julius out of obscurity (at least in my mind). Thanks very much

[EDIT] - I really can't get across enough how awesome this is, please add a gittip link or bitcoin address or something


Nice, thanks for doing this! Are you aware of any sites using speech rec for e.g. page navigation?


No, but I think it's a great idea. If you end up using JuliusJS for this, please let me know. I still need to make some example applications to showcase what JuliusJS can do - I'll have them listed in the README when they're done.

I also came across PocketSphinx, which has been around a little longer - that may have some users in the wild already, and I wouldn't be surprised if it was used for navigation somewhere.

[PocketSphinx] https://github.com/syl22-00/pocketsphinx.js/


Sounds good, thanks! Also, I think you'd get more interest here if there were a simple demo page that people could click through to -- that's the nice thing about JS, after all. Maybe something hosted on a github.io page that just transcribes into a textarea?


Yes, don't worry too much about fancy samples textarea would still be great. Nice work.


This is the ultimate accessibility. Imagine a blind person speaking to the website and having the website talk back.


I'm using Chrome's speech recognition engine for a virtual reality in WebGL thing I'm building. It's a bit annoying, as it requires a network connection, and is rather buggy (I end up crashing Chrome about one ever ten sessions when using it, all of Chrome, every tab). Something like this would fit my needs a lot better.


The PIA Chrome extension enables you to open and manage some sites with speech. It was done at a hackathon using wit.ai

http://getpia.com


This is sweet...to get an idea of how much fun this could be for web apps, check out the Annyang library (https://www.talater.com/annyang/), which wraps around the Google Web voice recognition API...it works very well, but of course, is subject to Google's terms...so an open source system is very welcome


Strictly speaking, Annyang wraps the HTML5 speech recognition API, not Google's specifically.


Pretty cool! When I did my project I had to use https://github.com/kn/speak.js which is an amazing library. The library still works on Firefox 30, 31 by the time I finished my project (and the project itself hasn't change much for a year or two!).

I would definitely give this JuliusJS library a try. I am actually amazed that JuliusJS doesn't carry all the heavy data like speak.js does (multiple languages support though). I love the fact that you state 100% client side!


Great work, this will be another cool library i star on GitHub and never do anything about it :/


LOL!! IKR


Nice work. Can it return confidence scores? Say I want to load 3 commands in my page: 1. Click blue button 2. Scroll down in the yellow text area 3. Expand image of man I feed those to the engine, and when somebody speaks, I get a confidence score on each word so I can determine with a level of configurable certainty that the user is using the command: {click: 0.9878 confidence, blue: 0.8789 confidence, button: 0.1889 confidence)

Something like that...


It does post them back from the worker, but the Julius interface doesn't expose them (yet). The way that Julius deals with confidence scores is also a little different (they're not fractional), so you'd need to account for that.

I'll be sure to include them soon - it's probably just a few more lines of code, so you can expect them in the onrecognition function this afternoon.


OK it's done! Not well documented yet - that will wait for another day - but you can now access the score through the `onrecognition` event.


Wow, awesome! I will start playing with it today. Thanks!


Fantastic project and response time. Congrats on this.


I'm literally sitting here hitting refresh. Thanks.


I've played with pocketsphinx.js a fair amount, but this looks WAAAAAAAAAY easier to set up and consume. Nice work.


pocketsphinx.js looks amazing, but there's definitely a barrier to entry if you've never worked with speech recognition before. That was my biggest goal in porting this tool - a nice, abstracted API. Glad you like it :)


Is there online demo anywhere on the web?


I'll try to whip one up today and post it to the README.


Great, thanks!


Can you use any of the CMUSphinx compatible language models with this, or is there a tool to convert them to something Julius supports?


I believe that Julius offers their own tools for conversions like these, but the best place to consult would be the JuliusBook [1].

[1] http://julius.sourceforge.jp/juliusbook/en/


Can this be used to detect a voice's unique digital signature? For example I just say my name to login into a website?


I don't think voices have unique signatures. Voices vary widely but they can still share identical properties and be imitated.


Voices do have a unique signature. The technology do identify it is called Speaker Identification. See for instance http://research.microsoft.com/en-us/projects/whisperid

> Each person's voice is different. Some sounds, like "s", sound about the same no matter who says them, but other sounds, like vowels, tend to differ a lot from person to person. We use a special way of representing sound, the cepstrum, that captures lots of information, including the characteristic way you pronounce your vowels. Of course, someone could imitate the way you talk; fortunately, the cepstrum also captures certain fundamental characteristics of voices that are impossible to change. For instance, the length of your vocal tract -- the place where sound is produced in your body -- cannot be changed, and different length vocal tracts tend to produce cepstra with different characteristics. By identifying both the way you talk, and the way your body produces sound, WhisperID can do a great job of figuring out who you are.


As someone working in that area, I have to disagree. Sure, speaker recognition can work very well under certain circumstances, but there is no unique signature. (that would imply that you could successfully discriminate between any two speakers in the world, irrespective of any other factors)


OK, maybe unique was a strong word. But banks are starting to use speaker ID to log in to mobile apps, so the signature is, let's say, unique enough for practical applications.


No, speech recognition systems are not necessarily good at that. You need a speaker recognition system for that. http://en.wikipedia.org/wiki/Speaker_recognition


Does your application need a nodejs backend for this library to work?


Looks like they're just providing examples on serving the library on a webserver. It's all static JavaScript.


Right, it's all done on the client, so it is server agnostic.


Thanks for this lib!


Genuinely not sure if the demo is a joke or not.

I said "hello" it said "DIAL OH OH".

I said "Apple" it said "GET KENT".

WTF?


In order to get a more comprehensive vocabulary/grammar, you need to substitute out the sample that it comes with. There are instructions in the README. For the demo, it just uses the sample grammar that voxforge provides, which is (as you can see) fairly limited.


This is awesome - who cares about Windows 10 or Linux, Javascript is the new OS.


It's definitely a fun little toy...

http://node-os.com/

http://osjsv2.0o.no/


Wow, speech recognition in a Turing-complete language. Amazing.


This is the kind of technological challenge which must be fun to complete. And it must be quite satisfying for the author. However, whenever I see a 'XYZ in pure javascript', I keep getting the impression we are only delaying the inevitable moment browsers have to step to a superior language. Kinda like instead of quickly ripping off a bandaid is better than slooowwwwllly removing it ....


Seeing cool things written in JavaScript makes you think that JavaScript is doomed?


Not to be that guy, but it wasn't written in JS it was transpiled from C to JS.


My point exactly. The end result being we are solidly entrenching Javascript into browsers, not the contrary. So instead of starting with a proper hammer for nails, we are slowly turning a screw driver into a blunt object which can neither screw nor hammer nails properly.


True. Transpiled, with a few abstractions written over the transpiled code (such as a worker script), and some tweaks to the transpiled code to fake multithreading so that it can coexist with the Web Audio API.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: