Non-Rectangular Video Cropping with FFMpeg

the8472 · on Dec 29, 2019

Note that a few video codecs also support alpha these days, which could be used to preserve the mask and overlay the video over some background.

It even works on the web: https://files.catbox.moe/uu2ze0.webm (needs a non-black background)

izietto · on Dec 29, 2019

here with a yellow background (edit: changed to a yellow-to-green circle-shaped radial gradient because why not): https://codepen.io/mdesantis/pen/RwNZEZJ

pbhjpbhj · on Dec 29, 2019

Maybe, https://codepen.io/pbhj/pen/dyPzwEj instead - centre the video and put the gradient on the body to emphasise the matting?

pbhjpbhj · on Dec 29, 2019

Firefox (Linux): I had to open dev tools, remove the bg-image that Firefox shows around media links, and then adjusting the bg-color showed the effect. Otherwise it just looked like a normal video.

jpincheira · on Dec 29, 2019

Wow. Pretty cool. Had no idea about this.

rejschaap · on Dec 29, 2019

It has some aliasing artefacts around the edge of the circle, you can see it very clearly during the fade-in. When the video is playing you don't really notice it. My guess is that this is caused by the way the mask is created. I can't try it out right now, but it could probably be solved by setting the RGB channels of the mask image to black.

gyan · on Dec 29, 2019

See https://superuser.com/questions/1270950 for a way to do it without needing to create a masking image.

lipeltgm · on Dec 29, 2019

Agreed. The 'gex' filter can readily accomplish the masking if the shape can be represented as an X,Y equation. This method does however work for a more comprehensive list of shapes, titled wrt circular since that seems to be a highly desired form of <i>cropping</i>. Thank you for pointing it out and likely would address the aliasing another reader referenced.

vernie · on Dec 29, 2019

I think it's more appropriate to call this matting. I came in thinking they'd figured out how to coax FFmpeg into encoding non-rectangular frame geometries.

batt4good · on Dec 30, 2019

Is there any way to provide a dynamic mask? For instance, if I wanted to blur a dynamic portion of each fram, like a face or logo?

I've been trying to find higher-performance methods of doing this with live footage that don't require openCV. Seems like FFmpeg doesn't have a great way to do this or I just haven't found it yet?

I guess I'm also looking for frameworks or tools that can handle live video "manipulation" like rendering shapes or graphics. OpenCV is painfully slow for most of this stuff, everything else is seemingly completely proprietary. But nVidia seems to have some incredible tools baked into their latest stream manipulation API's.

rambojazz · on Dec 30, 2019

Some day someone will write a simpler replacement for ffmpeg's user interface. A tool that you have to read the manual every time you want to write (or even just understand) a command is not a good tool.

gyan · on Dec 30, 2019

The CLI syntax design reflects the N-inputs --> N-outputs capability and no. of options reflects the number of components e.g. there are 300+ demuxers and 400+ filters available.

This is not to say there isn't scope for improvement. A couple of areas for improvement is the consistency of options naming across filters that implement the same feature within those filters, and a clearer syntax to identify the target of an option i.e. protocol, demuxer, decoder..etc

the8472 · on Dec 30, 2019

perhaps a declarative config file instead of CLI arguments would help. With a config file you could lay things out in a more tree-like manner, have lists etc.

And it's not just N inputs, M outputs. You also have a processing graph for the intermediate filters. So ideally you want graph nodes represented by some identifier which can then be referenced by other nodes.

gyan · on Dec 30, 2019

You also have a processing graph for the intermediate filters. So ideally you want graph nodes represented by some identifier which can then be referenced by other nodes.

Unless I misunderstand you, this is already the case. Simple or complex filtergraphs are declared as a string* by the user as an arg to -vf/-af (per-output) or -filter_complex/-lavfi (global). Filter outputs can have link labels, which allow their consumption by other filters.

*can also be read from a file

the8472 · on Dec 30, 2019

Yeah, but that's just part of the overall parameters. with a structured file format (toml, yaml, json... pick your poison) the filter arguments could also be structured themselves instead of having to be a single string, just another nesting level among the other params. The shell only so many ways of structuring things until you run out of seprator chars. And already has to deal with the impedance mismatch of translating space-separated args into an array of zero-terminated strings which results in its own escaping rules.

gyan · on Dec 30, 2019

Filtergraph declarations in files* can have whitespace so you can format it logically. Each option of a filter can be on different lines, if you like.

*via shell too, if the arg is quoted/escaped.

the8472 · on Dec 30, 2019

You may be missing the forest for the tree. I am not talking about the filter graph in isolation. I am talking about specifying all command line arguments (inputs, seeking, outputs, filters, codec options) etc. in a structured way. Filters would only be a part of that.

I think this might make use of ffmpeg less daunting than people attempting to battle a single-line shell input, ordering-sensitive parameters and escaping.

Well, maybe it's redundant, there already is vaporsynth after all.

bobosha · on Dec 30, 2019

Actually, someone already has: https://kkroening.github.io/ffmpeg-python/

ArtWomb · on Dec 29, 2019

My new years resolution might just be to gain mastery of ffmpeg. It's incredibly versitile. Combine with python bindings, image magick and opencv and you have a powerful weapon in your arsenal.

But I keep running out of memory. I like that the process just dies. And can usually be solved by sequential batching. But how can I automate via predictive instrumentation?

sorentwo · on Dec 29, 2019

Learning ffmpeg can be daunting but it is a highly useful tool to know. The option intricacies and peculiarities make it extremely hard to “master” though. It requires so much experimentation and experience (trial and error) that achieving mastery in a year seems nigh impossible!

the8472 · on Dec 29, 2019

ffmpeg can operate in on streams of frames, you don't have to keep everything in memory or operate on discrete batches.

BubRoss · on Dec 29, 2019

I don't know what you mean by predictive instrumentation or why you are running out of memory, but don't forget the original interprocess communication, the file system.

ArtWomb · on Dec 29, 2019

Thanks for the replies the8472 and BubRoss!

I just mean I'd like to "cap" ffmpeg's memory usage to ensure it never crashes.

I'll provide a simple example. Creating an image slideshow from a directory of huge images. I can downscale and compress to jpeg before processing. But even then, besides batching, it always fails. And this is just raw, without any filters or image processing. Have tried all the i/o techniques from the wiki:

https://trac.ffmpeg.org/wiki/Slideshow

All this local laptop based (4GB). I just want to get a better handle before devoting cloud resources (ffmpeg is standard on gcloud serverless) ;)

shakna · on Dec 29, 2019

I think there's something vital missing from this story.

I regularly use a laptop with 4GB RAM to generate videos using the sequential method listed there, with 4-8k sized frames, producing video files of hundreds of GB, without a crash. There can be hundreds of thousands of them as well.

No batching required.

The final command in my pipeline is a simple:

    ffmpeg \
    -threads "$threads" \
    -y \
    -start_number 0 \
    -i 'build_frames'"$unique"'/%09d.png' \
    -c:v libvpx-vp9 \
    -lossless 1 \
    -qscale:v 2 \
    -r "$fps" \
    build_tmp"$unique"/media.webm

But, if you really need to limit ffmpeg's memory consumption, you probably need to look at -max_alloc and -bits_per_raw_sample. It'll be highly specific to your own hardware.

izacus · on Dec 29, 2019

There's also a word of warning here - the fps filter (which gets auto-inserted with -r) has a tendency to buffer the duplicated frames between two input frames.

This can cause large memory spikes if your input has low fps (or is generated from JPEGs with large time difference), since frames between two input frames will be generated and then sent to the output in a single operation, spiking memory usage.

gyan · on Dec 30, 2019

the fps filter (which gets auto-inserted with -r)

Which ffmpeg version are you using? This hasn't been the case since 2012. Both input and output -r are effected through fftools and not libs.

ArtWomb · on Dec 29, 2019

Thnks for the replies all ;)

Switching to vp9 (from x264) worked for me!

AstralStorm · on Dec 29, 2019

Sounds like there's a lack of feature in the CLI tool. You will have to write your own, pick a language and it probably has enough of low level ffmpeg bindings to do this. The library supports streaming reads and writes very well, though you may have to provide a custom reader callback or use the old loop APIs.

BubRoss · on Dec 29, 2019

ffmpeg from the command line is very powerful and people use it for what is being described here all the time. Why do you think it is necessary to have to resort to the library and API?

BubRoss · on Dec 29, 2019

I'm not sure why you would need to use a server when you have your laptop, ffmpeg should be able to chew through this stuff faster than you could upload it.

Something is likely very off with how you are using ffmpeg. If you force more keyframes, use good quality settings etc. it should make short work of it.

kohanz · on Dec 29, 2019

You should be able to, for example, divide the slideshow into smaller segments, create those and then use the concatenate ability to join them back together. Maybe that's what you were already referring to by 'batching'? It's a completely valid approach.

GJR · on Dec 29, 2019

Fill and key is basic video manipulation. As to the aliasing, they also need to apply some filtering...