A little bit off-topic, but do you know any recent work or paper for speech recognition in language teaching area ? (I mean, analysing and rating accuracy of speaker, detect incorrect pronunciation of phones, and so on)
What you're describing is called "speech verification". Language education is an application I'm personally very interested in, and one that almost no one discusses in the speech community (I assume because of machine translation), so if you find any research papers please let me know! I wrote a little about it: http://breandan.net/2014/02/09/the-end-of-illiteracy/
The task is actually much simpler than STT. You display some text on the screen, wait for an audio sample, then check the model's confidence that the sample matches the text. If the confidence is lower than some threshold, then you play the correct pronunciation through the speaker. The trick is doing this rapidly, so a fast local recognizer is key. I've got a little prototype on Android, and it's pretty neat for learning new words. I'd like to get it working for reading recitation, but that's a lot of work.
Actually, checking against confidence is something that we've tried to play with, but to my knowledge there is not a model that allows you to compare speech confidence against an specific text. Public APIs like MS ProjectOxford.ai can return a confidence, but against the "recognised" text, not against a predefined text.
Going further, this kind of approach can be very effective on words and small sentences, but I'd really love to see which specific phones the learner is failing, which can help in analysing full speaking exercises.
It works, but I am sure it should be possible to do better