Some of that stuff isn't too hard, if you can narrow down the domain of words you need to recognize. For example, let's say you wanted to have a hot word of "computer", like in star trek, you can literally filter the output of the recognizer (e.g. pocketsphinx) with grep and sed and it works not too badly. For the natural language part, you can get pretty far with a simple parser like the old infocom games used, esp. if your domain is limited. I'm making an open source multiplayer networked starship bridge simulator, kind of like star trek, using pocketsphinx for speech recognition, and it's working ok (not perfect, but ok.) Here is a demo: https://www.youtube.com/watch?v=tfcme7maygw
You have to use a decent acoustic model - not the one in the demo. If you do I think it works 'pretty well' as a proof of concept. That said I'm not recommending Sphinx as a recognition framework, it is way behind the times in 2016, but this is the only 'in the wild' demo of this I've seen on the web, so I felt it was worth mentioning.