...although its OCR engine (Tesseract) can be a little sketchy.
I don't think the metadata is quite there yet.
If you have an accurate text transcript, but not detailed enough timings, you can use speech recognition on the audio and it will be very accurate, since you know exactly what is being said (unlike speech recognition on arbitrary speech, this is more like command-and-control). You can do word-level or phoneme-level timing granularity pairing an accurate transcript with the original audio.
The metadata isn't there in regular subtitles, but you can certainly get it there with some post-processing.
I used to have some links but the companies went out of business.
Annosoft's SDK: http://www.annosoft.com/prices
Annosoft made a command-line front-end to the Microsoft Speech API, which is what many of these other Windows-based systems may also use, and I used in a project in 1999-2001: http://www.annosoft.com/sapi_lipsync/docs/ (There are other SAPI front-ends if you dig around online, too.)
Others, including open-source ones: http://en.wikipedia.org/wiki/List_of_speech_recognition_soft...
Magpie, used in animation and gaming: http://www.thirdwishsoftware.com/magpiepro.html
Crazytalk, used in animation, uses SAPI: http://www.reallusion.com/crazytalk/crazytalk.aspx
FaceFX, used in gaming: http://www.facefx.com/documentation/2013.2/W194
Source Filmmaker includes it, although I'd be surprised if it wasn't Sphinx or SAPI or some other existing library: https://developer.valvesoftware.com/wiki/SFM/Lip-sync_animat...
Using common Python NLP techniques you could very easily search for every instance of a phrase across a massive corpora of subtitles.
If you got a large enough collection of subtitles and videos in a single directory this tool would do what you are asking.
Lets you search video/audio for spoken words.
Some places call this "closed captioning" (i.e. deaf target audience), versus "subtitles" (target audience is people who can hear, but not understand the language).
please somebody make an automatic rap impersonation generator
can make use of karaoke youtube clips for the background music...