This approach really doesn't make sense to me. The model has to output the entire transcript token by token, instead of simply adding it to the context window...
A more interesting idea would be a browser extension that lets you open a chat window from within YouTube, letting you ask it questions about certain parts of the transcript with full context in the system prompt.
That's initially what I thought this was. Seems like somebody had the same concept, there's an extension called "AskTube" which looks like it does exactly this.
For sure, that's an interesting idea, but potentially very costly (for longer videos). A plus side of this strategy is that the Transcription gets clean up a lot and also the math notation fix up too. So, it's just a cleaner text, well formatted for people who like to read videos instead of mindlessly watching a video.
We're at Emergent Mind are working on providing bits of a technical transcript to a model and then asking follow up questions. You can check it out here http://emergentmind.com if curious.
Until I read other comments here, I assumed that's what they were doing since it bugged out on me and didn't regurgitate the transcript back to me yet still let me ask questions about it.
A more interesting idea would be a browser extension that lets you open a chat window from within YouTube, letting you ask it questions about certain parts of the transcript with full context in the system prompt.