For sure, that's an interesting idea, but potentially very costly (for longer videos). A plus side of this strategy is that the Transcription gets clean up a lot and also the math notation fix up too. So, it's just a cleaner text, well formatted for people who like to read videos instead of mindlessly watching a video.
We're at Emergent Mind are working on providing bits of a technical transcript to a model and then asking follow up questions. You can check it out here http://emergentmind.com if curious.
We're at Emergent Mind are working on providing bits of a technical transcript to a model and then asking follow up questions. You can check it out here http://emergentmind.com if curious.