So far it’s basically a heuristic for the video having both captions and the token count for generation be reliably under the limit, but I am working on making it work for arbitrary length videos! I did some tests for 2-3 hour podcasts and it worked pretty well