Hi HN, I built VectorVid after repeatedly seeing teams hack together Whisper + vector DB to search inside webinars and demos.
The problem: You have 100+ hours of videos. You want to index them for RAG. But the pipeline is messy—transcription, frame sampling, OCR, chunking, embeddings, then plugging into your own vector DB.
VectorVid does one thing: video → RAG-ready JSON.
Input: Video URL (webinar, lecture, demo)
Output: { chunks: [{ start_sec, end_sec, text, scene_description, ocr_text, embedding }] }
How it works:
Transcript + speaker diarization (Whisper/Deepgram)
Frame sampling (1/5s) + OCR (EasyOCR/Claude)
Scene descriptions for visual context
Embeddings included (OpenAI)
The MVP is a live demo—you can search inside the 2007 iPhone keynote and see the exact JSON API output.
Tech: Next.js frontend, async processing, Supabase pgvector, deployed on Vercel.
Use cases I'm seeing:
SaaS teams: "Search our help videos" → power internal search/chat
EdTech: "Students find specific slides" → jump straight to diagrams
Sales: "Did the pricing slide appear?" → automated demo auditing
Early feedback wanted. Try the demo and let me know what you'd build on top of this.