
Learn FFmpeg the hard way - dreampeppers99
https://github.com/leandromoreira/ffmpeg-libav-tutorial#learn-ffmpeg-libav-the-hard-way
======
anonymfus
>In summary this is the very basic idea behind a video: a series of pictures /
frames running at a given rate.

>Therefore we need to introduce some logic to play each frame smoothly. For
that matter, each frame has a presentation timestamp (PTS) which is an
increasing number factored in a timebase that is a rational number (where the
denominator is know as timescale) divisible by the frame rate (fps).

Constant framerate is a very dangerous assumption to make. It's true for video
recorded by professional cameras, but if you check video recorded by most
mobile phones you notice that it's wrong.

~~~
orev
Any tutorial will not get very far if every corner case is spelled out every
time. The point of “learning” documents is to give enough concepts and useful
information for the reader to have a basis to start building additional
knowledge on top of. Getting into the weeds quickly derails this process.

~~~
ouid
what's wrong with presenting everything in the correct abstraction to start
with?

Whether you a public school math class, or a git tutorial online, you always
get the same thing. A list of procedures for using the tools in the most
common cases you are likely to encounter. This seems like a good idea in the
move fast and break things ideology, but what do we notice about people's
skills in the real world? No one knows how git works, and no one knows any
math by the time they leave high school.

Give people a set of tools and _prove_ to them that it solves every problem in
some domain. If they can't solve the problems using the most primitive,
complete toolset you can give them, then the case where they solve the problem
_is_ the edge case.

As an example, often, we will have students write fractions in "lowest terms",
then only present students with fractions in which the numerators and
denominators only share a few, small common prime factors. But checking prime
factors is absolutely the hardest way of solving this problem, and unless
you're exhaustive in your search, you can't have any confidence that you
actually have solved it.

Students are intrinsically aware that this wold be tedious in general, and it
gives them anxiety to know that they could have missed something, or not tried
sufficiently large divisors. They have no confidence in their tools, and
rightly so. The only reason they can solve these problems at all is because
the instructor gives them problems which can be solved this way.

The origin of the question "when will I use this in the real world?" is the
instinct that the tool you have been given is only for edge cases, and cannot
be relied upon.

I think you are exactly wrong in your assessment. Getting into the weeds is
the only way to learn, because almost everything in the world is weeds, some
of those weeds just happen to be called crops.

~~~
dfabulich
> _what 's wrong with presenting everything in the correct abstraction to
> start with?_

Nobody can retain the material that way. Each thing you learn has to be
attached to other concepts you already have, and your skills build on one
another.

Nobody teaches Peano's axioms of arithmetic to kindergarteners; they're still
learning how to identify shapes, to compare quantities. When they eventually
learn even the simplest proofs, they'll build on the basis of careful
attention to detail that they learned while mastering basic arithmetic.

Nobody teaches general relativity on the first day of college-level physics
class. Students are still learning calculus at that point; even if they
already had a calculus course, this is their first opportunity to apply it.

And _nobody_ teaches the fundamental proofs of calculus on the first day of
calculus class; you can use sloppy language like "infinitesimals" to establish
a good intuition for how to use derivatives and integrals.

If you tried introducing this material from the bottom up, it wouldn't take.
But if I'm wrong, sure, go try it and see how it works.

~~~
ouid
You're right, I shouldn't have said the correct abstraction, but you should at
least provide a complete one. You don't have to build from the most
fundamental possible assumptions to build from a set of tools that is
_provably complete for some domain of problems_.

Doing otherwise is like giving someone a screwdriver that only turns
clockwise.

Also, I challenge the notion that "no one can retain material that way". Have
you ever met anyone who has retained the material any other way who didn't do
it despite the system?

~~~
CrystalLangUser
Essentially, this is impossible for anything that’s not easily explainable all
at once- so it’s useless for any advanced domain.

You can’t just give people a firehose of information. When you learn acoustic
engineering, you have to first learn the prerequisite math needed to
understand later concepts such as room architecture. It simply does not make
sense to jam pack it all in at once, because 1) it won’t make sense and 2)
nobody has that kind of memory.

Next, if you want to cover all possible use cases, that could take forever in
certain domains.

Last, some of this stuff is objective as to what’s necessary, or what cases
need to be covered.

No, what’s better is to give general knowledge as needed, and the student can
seek out knowledge for various corner cases. It would be ridiculous otherwise.

------
fndrplayer13
Having built programs that used the ffmpeg libraries (as well as x264) I have
to say that a tutorial that is up to date and recent like this would have been
very helpful at the time. Glad to see somebody is undertaking this effort.

~~~
epberry
I agree. ffmpeg can be inscrutable sometimes. Understanding the internal
concepts is a big boost when using it. For example I've used "-avcodec"
hundreds of times in the command line but I just now understood where that fit
in.

------
opticalflow
All and all a good intro tutorial that gets into some of the common
professional use cases. On "constant bitrate" assumptions and some of the
subsequent discussion here, ANY transform-and-entropy codec like VP9 or H264
will ALWAYS be variable bitrate internally. In the pro broadcast world, where
you can't go over your FCC-allocated radio frequency bandwidth allocation by
even one iota (or nastygrams and fines ensue), this is "handled" by actually
having the encoder understoot the actual target (say it's 5mbit), and then the
stream is padded with null packets to get a nice even 5mbit/s. This also
happens with LTE broadcast as well. The encoders that do this reliably are
fabulously expensive and of a rackmount flavor.

~~~
bluedino
I'd love to hear more details about those things. I'm guessing it's not as
simple as wiggling the quality around to keep the output within a certain
size.

~~~
toast0
That's basically it. All of the digital broadcast streams (over the air,
cable, satellite) use a fixed bitrate per physical channel, and send an mpeg
transport stream. A transport stream is built of fixed length packets. Within
the transport stream you can multiplex different programs. OTA gets 20Mbps per
channel, if you use that for one program, it's likely you may not use the
whole thing, so you include null packets as needed to fill the stream. If you
send multiple programs, you probably will fill the stream, so you have to
reduce quality and/or play games with timing of i-frames and possibly
adjusting program start times or commercial break lengths to avoid having
multiple programs needing high bitrate at the same time.

------
spython
I wrote about creating adaptive streaming over HTTP (MPEG-DASH) using ffmpeg
and mp4box before [1]. It is nice to see that ffmpeg is incorporating more and
more of the same features.

Has anybody got ffmpeg to create HLS out of mp4 fragments as well? It would
save some conversion time and space, since the same MP4 fragments could be
used with the two different manifest files, DASH and HLS. Mp4box is not quite
there yet, sadly.

[1] [https://rybakov.com/blog/mpeg-dash/](https://rybakov.com/blog/mpeg-dash/)

~~~
RBO2
I think they're almost there in two ways. 1) is
[https://github.com/gpac/gpac/issues/790](https://github.com/gpac/gpac/issues/790).
2) is to extend FFmpeg to include the MP4Box muxer: [http://www.gpac-
licensing.com/signals/](http://www.gpac-licensing.com/signals/)

~~~
spython
Thanks for the links. Yes, the gpac dev version is close to have a working
solution -
[https://github.com/gpac/gpac/issues/772](https://github.com/gpac/gpac/issues/772)

Not quite sure what the signals project strives to be. Is it a framework for
commercial applications built on top of gpac?

------
qwerty456127
Why exactly do people call good explanatory manuals "the hard way"? The hard
way to learn something is using nothing but the official reference manual or
the man pages. What I have found by the link is what I would rather call "for
dummies" :-)

~~~
the8472
Consider GUI wrappers the "for dummies" version.

~~~
qwerty456127
I actually find those I tried rather unfriendly. Every time I have to convert
something I start by looking for a quick-and-easy GUI solution but end up
using command line because it's more intuitive (!), more flexible and it
actually works (GUI wrappers don't always do, it often happens that you click
"start" or whatever and nothing happens or a nonsensical error pops up).

By the way it's always easier to teach a not particularly bright person to use
a textual dialogue command line interface than a complex GUI: you just tell
them (and let them write down) what do they have to type, what response they
can expect and how are they to respond to it if it's this or that. I can
remember how easy it was to teach my granny to use the UUPC e-mail system
under DOS and how much harder it was to teach GUIs.

~~~
blt
Handbrake is a pretty solid and easy to use GUI wrapper of ffmpeg.

~~~
mulvya
It handles a few (of the most common) use cases but it is not a generic
interface for all of ffmpeg's functionality.

And strictly speaking, it is built on top of Libav, although they are
considering switching over after 1.2

------
sbarre
This looks pretty awesome..

I was hoping this was about the command-line tool, which is quite daunting to
learn once you start getting into the filters and combining/merging multiple
audio and video sources.

In a recent project I eventually had to write a script to do 2 passes of
ffmpeg after spending a good day trying to figure my way through all the
filter documentation to do a 5-step process in one pass..

This isn't a knock against ffmpeg, which is an amazing open source package,
but it's complicated, so I'm sure this repo will be of great help to many.

~~~
exikyut
The ffmpeg commandline tool is ultimately just a frontend to all the
internals.

So understanding how everything works internally will translate directly to a
better grasp on how to use ffmpeg at the commandline.

Even though this document is very new I now understand why fiddling with the
PTS value in the filtergraph would slow down/speed up some video I was
tinkering with a while back, for example.

------
pselbert
I’ve spent a large portion of the past month working with ffmpeg. We are using
it to repair, concat, and mux an arbitrary number of overlapping audio and
video streams recorded via WebRTC. While this sounds straight forward on the
face of it, the interplay of filters and controls is impossible to predict
without extensive experience.

To my knowledge ffmpeg is the only tool that could possibly do this. It took a
whole lot of reading man pages, bug trackers, stack exchange and various
online sources to stitch together the necessary info. This seems like a great
resource to learn what the tool is really doing and skip scrambling for
information so much.

~~~
mythrwy
FWITW I found a python library moviepy a lot easier to work with (although
sometimes you still have to drop down to ffmpeg as it won't do everything
ffmpeg can do).

Moviepy is basically an easy to use wrapper around ffmpeg and some other
utilities like ImageMagick. Not as full featured, but one can get quite a long
way with it.

------
amelius
Sidenote: over the years I haven't seen a Linux utility crash so often as
ffmpeg; not by a wide margin. So when calling this library, it is probably
wise to assume that it may not return.

~~~
barrkel
I use ffmpeg to transcode video, perhaps 10,000 times in the past year (part
of an automated pipeline). It has failed about 50 times or so, almost every
failure down to corrupted input and not a crash per se. It has deadlocked
about 200 times, often enough that my tools monitor the log file I redirect
output to and restart the job if too long passes without progress.

I'm amazed it's as stable as it is, given the complexity of video formats. Of
course crashes are also potential security problems - I sometimes wonder how
far you could get with a malicious video spread virally (in the social sense).

~~~
srcmap
Probably good to document those use cases, collect sample clips and contribute
back back to the project as test cases.

Help improve the project for everyone.

~~~
pfranz
I've made efforts in the past, but often I can't legally redistribute the
problematic data. If the problem is obvious enough, I try and clean-room
recreate it.

------
dvirsky
About 10 years ago I wrote a video encoding app with libavcodec/libavformat.
Beyond very simple examples with everything set to default, there was almost
zero material on how to use it. There was maybe one trivial example of
decoding a file where all the parameters are guessed automatically. But I
needed to encode a live video stream from a webcam. I spent a couple of weeks
reading the code and hacking away until I got it right. I wish there was stuff
like that back then.

------
ourcat
When ffmpeg came along nearly 20 years ago, it seemed to be the absolute video
toolbox I was looking for.

It was, but you needed what seemed to be 'dark arts' to use it. And that took
a long long time.

I owe years of my career to working out how to use this.

------
Xeoncross
I would gladly donate to anyone that wants to statically bind this for use
with Go. I really would love to see the power of FFmpeg open to more apps.

I've used FFmpeg libs dynamically linked, but it requires FFmpeg be installed
on the system.

~~~
xmichael99
I would gladly donate to anyone that wants to statically bind this for use
with c# with mono multiplatform support. I really would love to see the power
of FFmpeg open to more apps. I've used FFmpeg libs dynamically linked, but it
requires FFmpeg be installed on the system.

~~~
etaioinshrdlu
Indeed, this would be great.

This actually came up for me a couple days ago, and my ham-fisted solution was
to _bundle_ ffmpeg.exe and ffplay.exe in the DLL and extract when running :)

It solved the problem but is so gross!

------
billconan
This is great! my experience with libav is very frustrating. There was no
document/tutorial. The most useful thing was the example programs. but if a
problem is not covered by the examples, then I would be doomed.

The worst part is that when the api doesn't work, you receive meaningless
error messages. You don't know what's wrong.

------
aurelian15
I wrote a small wrapper library a decade ago that wraps the decoding
capabilities of libavcodec/libavformat in way that makes it relatively easy to
use from other programming languages (Pascal in this particular case)

[https://github.com/astoeckel/acinerella](https://github.com/astoeckel/acinerella)

Note that this was one of the first C programs I ever wrote and the API is
suboptimal (relies on structs being passed around instead of providing access
via getter/setter functions). I don't really recommend that people use it, yet
looking at the code might help people to get started with ffmpeg.

Also note that the libavcodev/libavformat libraries have gone a long way in
terms of ease of use. If you have a look at the first versions of my wrapper
library, it required really weird hacks (registering a protocol) to get a VIO
interface (i.e. have callbacks for read, write, seek).

All that being said, today I usually just spawn subprocesses for
ffmpeg/ffprobe if I need to read multimedia-files, and I think that for most
server-side applications this is the best method (it also allows to sandbox
ffmpeg/ffprobe).

~~~
blt
Based on the code in this tutorial, it seems like it would already be easy to
use from other languages. What did you need to change?

~~~
aurelian15
By "being easy to use from other languages" I refer to the size of the header
that needs to be translated.

------
mdip
About a year ago I wrote a zsh script to analyse files in my collection and
format/selectively re-encode them based on a set of rules. Basically, I wanted
.mp4 files that the Roku Plex app could play back without having to do any
kind of re-encoding (stream/container or otherwise).

That started me on a 2-week mission of understanding the command-line. Good
God is it a nightmare! All my script was doing was reading the existing file,
deciding on the "correct" video and audio track, creating an AC-3 if only a
DTS existed, and adding a two-channel AAC track. It would also re-encode the
video track if the bitrate was too high or the profile was greater than the
Roku could handle.

Here's the thing that I discovered, as much as I swore at the command-line
interface, I couldn't come up with a more elegant solution that would still
allow the application to be controlled entirely from the CLI. Ffmpeg is
capable of _so much_ that figuring out a way to just "tell it what to do" from
a CLI ends up with an interface that's ... that difficult to use. The program,
very nearly, handles _everything_ [0] as it relates to media files and
simplifying the application to improve the CLI would necessarily involve
eliminating features. It's one of those cases where a light-weight DSL would
provide a superior interface (and to a certain extent, the command-line of
ffmpeg nearly qualifies as a DSL).

[0] The "very nearly" was necessary because of the _one_ feature I found that
it didn't handle. It's not possible in the current version to convert subtitle
tracks from DVDs to a format that is "legal" in an mp4 file because the DVD
subtitles are images rather than text, whereas the only formats that are legal
for the mp4 container are text. I found some utilities that can handle this,
but didn't try any, opting instead to simply download them when I encountered
a file with subtitles in this format.

~~~
mrunkel
Would you be willing to share this script?

~~~
mdip
I'm not entirely sure what state its in (and the code is _brutally hacked
together_ \-- I spent far more time reading up on _how_ to mangle the ffmpeg
CLI to get what I wanted than I did bothering to write anything resembling
quality shell code)

I don't use it any longer, but I'll see if it's still sitting on my media
server (but probably not until the weekend) - if you're on keybase, you can
find me @mdip; I'll toss it in my public keybase filesystem folder
(keybase.pub/mdip, I think) if I can locate it! :)

------
Teknoman117
Always wondered why there were never any stellar guides for the underlying
libraries of FFmpeg. Had to do a couple of projects with hardware encoding and
decoding and the best example I found of how to use it was ripping apart mpv.
At least it has samples of how to use the hardware encode features...

------
jsdir
My project ran into a limitation that required only the use of FFmpeg libav
without the command line tool. This is exactly what I needed. I'm eager to
read the rest.

------
seertaak
This is awesome -- I read the first few pages and I'm really looking forward
to reading the rest. Very clearly explained; thanks for creating/sharing!

------
FraKtus
I use FFmpeg since more than 10 years and I am amazed at how stable it is.
Even when we say to our users drop anything in our apps we will do our best to
handle it .. it's exceptional to have an issue with it. Because the API is
always evolving the best advice is to read the headers, the best documentation
is there and its accurate.

------
mmanfrin
Question related to FFMpeg: In the first few examples, there are input options
defining the video and audio encoding -- why is this needed? Shouldn't a video
container be self evident of things like that?

~~~
kalimoxto
The video container of course tells you what codec the video is, but it can
lie or be otherwise incorrect. FFMpeg will use what it says as the default,
but you can override it if you'd like (or if you have multiple decoders for a
given codec).

But yes, you're right. 99.9% of the time there's no need to specify input
decoder library

------
jakalah
i have used it on the commandline really gives you fine grained control. winff
is a easy way and almost as good within the front end.nice article on it.

------
imagetic
Thank you!

