
Remove Moving Objects from Video - Yuqing7
https://medium.com/syncedreview/magically-remove-moving-objects-from-video-dfc789fe092d
======
kaibee
Related better implementation of the inpainting, though I wonder how it'd look
with the automasking in this.

[https://nbei.github.io/video-inpainting.html](https://nbei.github.io/video-
inpainting.html)

------
tareqak
There is a related issue that I have some personal interest in: removing
specific audio from video.

Background:

There are streamers on Twitch who will play copyrighted music as part of their
streams. Twitch allows them to do this, but then Twitch will scan these videos
for copyrighted music against some music fingerprint database and mute that
section of video entirely (the part of the video that played that audio). The
other parts are unaffected unless they played some other copyrighted audio.
YouTube does something in terms of recognizing copyrighted music, but will
demonetize the entire video as a result. Needless to say, demonetization does
hurt the streamer.

Alternatives:

I don't want to get into whether or not Twitch and YouTube are right in doing
the copyrighted audio matching and the subsequent actions they take. Some
streamers who've been affected by this have started playing royalty-
free/copyright-free music or music from lesser known artists that are less
likely to be in these music fingerprint databases.

My question:

Is it possible to just _subtract_ the audio of a copyrighted track from a
video after it has been detected to having being played in a video?

~~~
Hydraulix989
It's possible, but the quality isn't that great even with state of the art
machine learning:

[https://towardsdatascience.com/audio-ai-isolating-vocals-
fro...](https://towardsdatascience.com/audio-ai-isolating-vocals-from-stereo-
music-using-convolutional-neural-networks-210532383785)

~~~
canada_dry
The only tool in the audio toolbox is essentially the fourier transform.

It would be a game changer if someone were to come up with a novel method of
decomposing audio into discrete components (e.g. people speaking, specific
instruments, background noise).

Likely this would require completely new hardware to capture different audio
attributes in addition to simply capturing a stream of vibrations from a
microphone.

~~~
kyzyl
The tools to do this exist. It's usually called 'blind source separation', as
in "What are the N distinct audio signals which sum up to best explain a given
compound signal, without knowing the possible source signals ahead of time."
Usually it's done with some sort of matrix factorization, Principal Component
Analysis, and/or Independent Component Analysis. It's also used for non-audio
signals, like pulling the discrete firings out of noisy EEG signals. It's
definitely not a foolproof solution but in a lot of applications it can get
you going, at least.

~~~
salty_biscuits
By the problem setup it isn't blind source. It is sound = song plus other. A
mixture model with 2 components

Edit. If you know the song it should be something simple like do cross
correlation of audio with known song. Find peak. Solve for the gain and
subtract away scaled and shifted song from original track. Will be rubbish if
gain and timing have errors. Might need to do it in little chunks and
interpolate the gain and shifts.

Edit 2. More generally, you might want to worry about the song having passed
through some unknown transfer function (i.e. it is being played and recorded
through shitty equipment). Then you have an interesting inverse problem. If
everything is linear it will involve a regularized deconvolution. Will be
tricky then.

~~~
Hydraulix989
It still is reduce-able to the more general blind source problem, right? We
can conveniently "forget" that we know what the sources are so now we are
blindfolded and can still use the same techniques to solve it.

~~~
salty_biscuits
It will do worse with less assumptions. The more you know the better you can
estimate

~~~
kyzyl
Sorry I didn't see your responses until now. Indeed, there are many ways to
slice the specific problem. I was specifically responding to the parent's
statement:

> It would be a game changer if someone were to come up with a novel method of
> decomposing audio into discrete components

It's something that has been generally addressed and ~works. It will obviously
depend on the specifics of the application, and yes if you can constrain the
problem space further you ought to do better!

------
kumarm
I was looking for a decent inpainting implementations for photos earlier in
the week and it appears there are no decent open source implementations.

------
tw1010
Second thing that's felt like a genuine, useful, innovation coming out of the
last wave of AI hype. (The first being deep-fakes.)

------
somada141
That looks fantastic! I believe that Adobe Premiere and After Effects already
offer that feature called content-aware fill for video and it seems to work
very well [1].

Of course those are not open-source but it's really inspiring to see such uses
of AI. Another very interesting one when it comes to video has been [2]

[1]
[https://www.youtube.com/watch?v=25ltIoHtiO4](https://www.youtube.com/watch?v=25ltIoHtiO4)
[2] [https://github.com/avinashpaliwal/Super-
SloMo](https://github.com/avinashpaliwal/Super-SloMo)

~~~
tasty_freeze
The technique you refer it is "inpainting", but such sophistication isn't
necessary. If the object of interest is moving across a largely static
background (pans and rotations are easily compensated for), then the missing
background image you need in frame N is available in adjacent frames.

I've used such a filter to remove dirt and dust from 8mm films via avisynth, a
program that goes back more than 20 years, though the filter in question is
not quite that old -- at least 10 years.

Here is the filter in question:
[http://avisynth.nl/index.php/RemoveDirt](http://avisynth.nl/index.php/RemoveDirt)

------
Causality1
Fascinating. Is this based on an earlier still-frame method or is it entirely
unique to video?

------
pndy
Just like with that face blending (is that the correct term?) seeing this I'm
getting even more afraid how this technology can be use for malicious
purposes, especially in media.

It still looks amazing of course

------
trilila
This is pretty awesome, to say the least.

