Developed https://www.videototextai.com/ exactly for this reason as it was quite impossible to search videos otherwise. Also you can copy the transcript into a LLM and ask questions from video content like that.
Right now: voice-bot telephony. So, immediately have a twilio number that you can call/text and talk to a GPT much like I do with the ChatGPT app interface. This rapidly could couple into actions API for doordash amazon etc…so Twilio would be handling all this scalable AI-Telephony data. They have the infrastructure and engineering for it.
Long term: own the infrastructure for infinitely scaling call centers for any task you would need to do over sms/gsm voice
While it seems YouTube's auto-generated are hit or miss, I wonder if feeding them through an LLM can fix the mistakes and still get the video's idea out of them
I've found that to be the case. I typically don't want a full transcript -- I want the materials list, or a summary, or a counterargument. I've found it is totally sufficient to just plop the transcript into an LLM and ask for my desired output. No need to clean of the transcript ahead of time.
Wow, why are they so expensive? Like even the regular whisperAPI by OpenAI is less expensive.
This is also why I decided to create https://www.betterwhisperapi.com/ . I believe most of the companies are charging pretty insane amounts for transcriptions...