
Attempt at Lip Reading with AI - vinchuco
https://www.technologyreview.com/s/602949/ai-has-beaten-humans-at-lip-reading/
======
vinchuco
From link in article: "There is a significant challenge here. During speech,
the mouth forms between 10 and 14 different shapes, known as visemes. By
contrast, speech contains around 50 individual sounds known as phonemes. So a
single viseme can represent several different phonemes."

Seems like facial recognition has advanced enough that the mapping of video to
visemes can be done similarly as in the Portland Digit Recognition Test (plus
face direction).

For the phoneme mapping problem, I'm just wondering how videos with subtitles
could be used for training data.

