You have to use a decent acoustic model - not the one in the demo. If you do I think it works 'pretty well' as a proof of concept. That said I'm not recommending Sphinx as a recognition framework, it is way behind the times in 2016, but this is the only 'in the wild' demo of this I've seen on the web, so I felt it was worth mentioning.
https://syl22-00.github.io/pocketsphinx.js/live-demo.html
this works pretty well - all in the browser, especially if you drop in some better acoustic models.