Claude-real-video － any LLM can watch a video

cortexosmain · 2026-07-02T19:10:12 1783019412

Hi HN! I built this because I was frustrated that no LLM actually "sees" a video — Claude won't accept video files, ChatGPT reads the transcript only, and Gemini samples at a fixed 1fps (missing fast cuts, over-sampling static slides).

claude-real-video takes a URL or local file and:

1. Extracts frames at every scene change (not fixed intervals) + a density floor 2. Deduplicates with a sliding-window pixel-diff algorithm (so A-B-A interview cutaways don't re-send the same shot) 3. Transcribes audio (prefers embedded subtitles, falls back to Whisper) 4. Optionally keeps the full soundtrack for audio-capable models 5. Writes a clean MANIFEST.txt you can drop into any LLM chat

A 10-min presentation goes from ~600 fixed-interval frames to 5-15 meaningful keyframes. 90%+ token savings with better comprehension.

The dedup approach (v0.2.0) uses real pixel difference on 16x16 RGB thumbnails against a sliding window of the last N kept frames — inspired by videostil's pixelmatch, but simpler and self-contained.

`--report` generates a self-contained HTML showing every keep/drop decision with diff percentages, so you can tune the threshold visually.

pip install claude-real-video && crv "https://youtube.com/watch?v=..." --report

MIT licensed, pure Python + ffmpeg. Happy to answer questions!

ProofHouse · 2026-07-02T21:15:32 1783026932

Very cool I have something that does this as well along these lines. I’ll dig into yours over the next few days and contribute where and if I can too, awesome to see!