Instead of taking a image every 5 seconds from the video and embed it, you could detect when there are enough changes between frames to decide to embed or not. One frame, one scene, one vector.
For instance, Ffmpeg can do that with the filter `select=gt(scene,0.3)`. It selects the frames whose scene detection score is greater then 0.3 (the scene change detection score are values between 0 and 1).
For instance, Ffmpeg can do that with the filter `select=gt(scene,0.3)`. It selects the frames whose scene detection score is greater then 0.3 (the scene change detection score are values between 0 and 1).
https://ffmpeg.org//ffmpeg-filters.html#select_002c-aselect