
ImageMagick, four point perspective distortion in a video - silveira
http://silveiraneto.net/2014/12/07/imagemagick-four-point-perspective-distortion-in-a-video/
======
ux
Well... you can do that with only ffmpeg.

ffmpeg -i zelda_720p.mp4 -vf perspective=60:90:589:147:50:415:582:418
output.mp4

Edit: BTW on your picture it's not (50,145) but (50,415) for the bottom left
corner.

Edit: if quvi supports still works, you can ffplay
"[https://www.youtube.com/watch?v=rMXD5DxbXog"](https://www.youtube.com/watch?v=rMXD5DxbXog")
-vf perspective=60:90:589:147:50:415:582:418 (otherwise something similar
should be doable with youtube-dl)

Edit: you can also do better and enable the perspective only in the part where
it matters. Typically with something like -vf
"perspective=60:90:589:147:50:415:582:418:enable='not(between(t,0,5))'"

~~~
silveira
Wow, impressive, I think I now more about ffmpeg now. And I fixed the
(50,415), thanks.

~~~
barrkel
ffmpeg is the ImageMagick of video.

(GraphicsMagick is the ImageMagick of pictures, for me. IM has a few more
features, but GM is more stable and usually much faster.)

~~~
troels
Completely unrelated, but what do you think about vips
([http://www.vips.ecs.soton.ac.uk/](http://www.vips.ecs.soton.ac.uk/))? I
recently discovered the difference in performance between im and gm and that
let me to investigate vips, which is presumably (a lot) faster than both.

~~~
barrkel
In principle, it looks promising (in particular, using ORC to compile image
manipulation kernels to SIMD), but it is a different thing to im or gm. It's a
library and a GUI. The library is slightly too low-level for most things I'd
use gm for, but a GUI is too laborious.

Perhaps it could be integrated into GM or something similar, and used as a
back end for certain operations.

------
abalone
If you're dumb like me and don't realize what's going on at first: the source
clip was of some guys talking about a game being shown on a video monitor at
an angle. But he just wanted to see the game, not the guys. So he extracted
the video and made it fullscreen. He did it by writing a script to _turn every
frame into a PNG_ , run them through an image-processing tool, then recompress
a new video. Thankfully the monitor did not move much so some fixed distortion
parameters worked.

~~~
sillysaurus3
_Thankfully the monitor did not move much so some fixed distortion parameters
worked._

Even if the monitor moved alot
([http://hyperboleandahalf.blogspot.com/2010/04/alot-is-
better...](http://hyperboleandahalf.blogspot.com/2010/04/alot-is-better-than-
you-at-everything.html)) it'd be fairly easy to write an algorithm to detect
the four corners of the video. You basically want to throw some edge detection
at it, and then look for anything that seem like corners.

~~~
girvo
_> You basically want to throw some edge detection at it, and then look for
anything that seem like corners._

Which, if you're a web dev like I am, seems scary at first, especially when
you're (like me) lacking a CS education. However, as it turns out,
understanding edge detection and Haar cascades (for feature detection, that
was the problem I was solving) enough to be dangerous with OpenCV is
surprisingly easy! I recently built some facial feature detection stuff in it
that is in production right now, and it only took me a couple weeks :) So,
have a play!

------
mxfh
With some fiddling in matrices this could also be done live in the browser as
a CSS 3D transform using matrix3d or in a WebGL context on the iframe or video
element.

Have no time to do the rectifying math on the components today but basically
it would be some kind of inverse of this projective transformation:

[http://franklinta.com/2014/09/08/computing-css-
matrix3d-tran...](http://franklinta.com/2014/09/08/computing-css-
matrix3d-transforms/)

quick non-inverse transformation adapted from
[http://math.stackexchange.com/a/339033/70086](http://math.stackexchange.com/a/339033/70086)

[http://jsfiddle.net/w4bkmeaq/](http://jsfiddle.net/w4bkmeaq/)

~~~
nilknarf
The math for working out the transform is actually the same (not an inverse!)
since you are still just trying to map 4 points to 4 points.

I had a demo for the 'inverse' transform also:
[http://codepen.io/fta/pen/LHonf](http://codepen.io/fta/pen/LHonf)

Here is the same thing except using the video from this post:
[http://codepen.io/fta/pen/JoGybG](http://codepen.io/fta/pen/JoGybG)

------
revelation
If you want to do this in much faster than realtime, calculate the transform
once and apply it e.g. using OpenCVs remap (1).

1:
[http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/remap/...](http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/remap/remap.html)

~~~
silveira
I have played a little bit with OpenCV in the past
([http://silveiraneto.net/tag/opencv/](http://silveiraneto.net/tag/opencv/))
but I wouldn't know how to do it with OpenCV, while with ImageMagick it seems
simple. I'd love to see an example with OpenCV as I could use it one day in
real time with the camera input.

~~~
revelation
Here is how it would work in OpenCV:

[https://gist.github.com/stschake/445aea35a3c9846573ad](https://gist.github.com/stschake/445aea35a3c9846573ad)

I'm getting 50fps with the imshow, 100 without on an ooold Q6600. That said,
remap is basically memory-bandwidth limited.

~~~
lobster_johnson
I always assumed OpenCV was about computer vision, and I'm (pleasantly!)
surprised it's this extensive. Does this mean it's good as a general-purpose
image processing library? Not a fan of Imagemagick and have been looking for
something better and faster.

------
LeoPanthera
What's the point of the PNG-to-JPEG step? Can't FFMPEG use PNG frames to make
the final video?

~~~
silveira
I think I could. I'm encoding one to compare. I converted to JPEG because I
originally was going to use a encoding that required jpeg input.

------
derefr
Aw, when I clicked this I expected it to be something much cooler: taking a
video of one of the old, 2D-tile-based Zelda games, and then doing recognition
on the video against the game's tile "alphabet" in order to correct for both
perspective and noise. (Basically, a crude implementation of a live-video to
machinima converter.)

------
brentm
If you're interested in ImageMagick take a look at Fred's ImageMagick Scripts
(link below). There is some really interesting stuff there. I spent a good 2
hours on Friday evening just trying to digest some of it.

[http://www.fmwconcepts.com/imagemagick/](http://www.fmwconcepts.com/imagemagick/)

------
Daiz
With someone else mentioning Avisynth here in the comments (which is actually
still used widely by video enthusiasts today), I became curious if there was a
plugin for this sort of thing, and sure enough, I found Reformer[1]. With
that, I figured I'd try reproducing OP's results. (By the way, while Avisynth
is Windows software, as far as I know it works quite well under Wine).

Step 1: Download the video with youtube-dl. This gives us zelda.mp4 (original
video) and zelda.m4a (extracted audio, requires ffmpeg)

    
    
        youtube-dl -x -k -o "zelda.%(ext)s" https://www.youtube.com/watch?v=rMXD5DxbXog
    

Step 2: Write the Avisynth script (in AvsPmod). Plugins used: LSMASHSource,
Reformer, RemapFrames

    
    
        lwlibavvideosource("zelda.mp4")
        deskewed = q2r(last,"lanczos",60,90,51,412,588,148,581,417)
        normal = "[0 149] [1488 2063]" # frame ranges where to use original video
        ReplaceFramesSimple(deskewed,last,mappings=normal)
    

Plugins used: LSMASHSource, Reformer, RemapFrames

Step 3: Pipe it to x264 with avs2yuv (necessary under Wine and with 32-bit
Avisynth & 64-bit x264) to encode the video.

    
    
        avs2yuv zelda.avs -o - | x264 --preset slower --crf 16 -o zelda.mkv - --demuxer y4m
    

Step 4: Merge encoded video and original audio back together with mkvmerge
from mkvtoolnix. You could use ffmpeg here as well, but I find mkvmerge much
nicer for simple muxing like this.

    
    
        mkvmerge -o zelda_muxed.mkv zelda.mkv zelda.m4a
    

And with that, we're done. The whole process took about 20 minutes (of which
~13min was spent encoding the video) and a few hundred megabytes of space
(since there's no need to have all the frames as individual image files
several times over). Other benefits include having the video run at the
original framerate of 29.970 (OP's video runs at 25.000 since he forgot to set
the framerate when encoding the processed images), including the audio as well
not having distorted picture when the TV isn't visible (which was simple
enough to do with the ReplaceFramesSimple function from RemapFrames). You can
see the end result here:

[https://www.youtube.com/watch?v=Jk_z4TiweHs](https://www.youtube.com/watch?v=Jk_z4TiweHs)

[1]
[http://www.avisynth.nl/users/vcmohan/Reformer/Reformer.html](http://www.avisynth.nl/users/vcmohan/Reformer/Reformer.html)

~~~
ux
Thanks for the frame ranges. So...

ffmpeg -i zelda.mp4 -vf
"perspective=60:90:589:147:50:415:582:418:enable='not(between(n,0,149)+between(n,1488,2063))'"
-c:a copy -c:v libx264 zelda.mkv

~~~
Daiz
>-vf
"perspective=60:90:589:147:50:415:582:418:enable='not(between(n,0,149)+between(n,1488,2063))'"

Incidentally, this is a pretty big reason why I'd pick Avisynth over ffmpeg
for video filtering any day of the week.

~~~
pkroll
As a fan of AviSynth, the scripts that I build often make that line look so,
so simple. :) And you can use ffmpeg (and a couple of other tools) in
VirtualDub 1.10+ to directly render to MP4 with no interstitial file (it'll
handle all the pipelining for you). They're all useful tools.

~~~
Daiz
_> As a fan of AviSynth, the scripts that I build often make that line look
so, so simple. :)_

And now imagine what those scripts would look like as an ffmpeg -vf command!
That was basically the point - the -vf line is already pretty messy with just
one range-applied command, and would become even more so if you started doing
something more complicated. Avisynth on the other hand has actual scripts for
the video processing, which scales to much more complex processing while still
remaining accessible.

ffmpeg -vf might be good for doing one or two simple things to the whole
video, but for anything more complex than that you really should use an actual
video processing solution instead.

~~~
ux
That's why you have -filter_script and -filter_complex_script options.

------
natch
Since you're using ImageMagick, you could do without mplayer and use a simpler
command for the first step:

    
    
        convert infile.mp4 %08d.png
    

Add -verbose in there if you want to see the progress as it goes.

------
MrBuddyCasino
Back in the day there was AviSynth to do this kind of thing. Unfortunately the
version 3 rewrite which was supposed to use GStreamer and Ruby never went
anywhere. Is there finally something similar for Linux?

~~~
marios
VapourSynth[1] is what the AviSynth rewrite was supposed to be from what I
have gathered. Hence the name ;). OTOH, it is still for the Windows platform,
so not really what you were after but I thought I'd mention it anyway.

[1] [http://www.vapoursynth.com/](http://www.vapoursynth.com/)

------
KobaQ
DON'T FOLLOW THIS TUTORIAL!

Unless you like Adware or what else ...

(JDownloader installs some crap Adware.)

~~~
silveira
Hi, I'm the author of the post. I've been using JDownloader on Linux for years
and never noticed a problem with Adware. Can you please elaborate? I'd really
like to know more.

~~~
nkuttler
Never heard of JDownloader, but you might also want to look into
[http://rg3.github.io/youtube-dl/](http://rg3.github.io/youtube-dl/)

------
sergiotapia
Can you upload the resulting video to a private youtube video? Would love to
see this video proper. :)

~~~
silveira
Original video:
[https://www.youtube.com/watch?v=rMXD5DxbXog](https://www.youtube.com/watch?v=rMXD5DxbXog)
Final video: [https://www.youtube.com/watch?v=WdB28QD-
QSY](https://www.youtube.com/watch?v=WdB28QD-QSY)

------
patejam
My favorite us of this technique was at Hack Princeton a while back. Some
people made an app that lets you take pictures of blackboards, automatically
cropping and fixing the perspective of the board for later use.

~~~
pyre
I don't know if it was based on that, but a co-worker had a point-and-shoot
digital camera a few years back (2009~2010) that had a similar feature for
taking photos of black-/white-boards

~~~
RyJones
I had a Canon P&S that did this in 2003 or so. It was nice - whiteboard mode
was used after every meeting.

------
cedrosaure
For windows users (and with simple video like this one)you can do that in two
minutes.

Drop the video in Virtuadub 1.10.4 and use it's buit-in "perspective" filter.

Using Blend mode and curves editor, you can only apply the filter to the parts
of the video you want, so the part when gamepad is shown is not affected.
(tutorial is here:
[https://www.youtube.com/watch?v=2MWoVY9mYbk](https://www.youtube.com/watch?v=2MWoVY9mYbk))

------
tambourine_man
Is the merit of the article in not using closed source applications? Because
that can be done in 3 steps in Photoshop, AfterEffects, etc.

~~~
zo1
Or OpenCV even, which is open source. So no, I don't think the plus of the
article is that it's avoiding the use of closed source. If you read some of
the comments by the poster, you'll see that it's simply him solving his
problem using the tools he knew and spliced together.

------
anigbrowl
This seems a bit tortuous. You can do this sort of thing in Blender or After
Effects (even very old versions that you can get cheap or free) very easily,
and skip the PNG conversion stage altogether.

------
turnip1979
a) I totally knew what this was going to be since I am a Zelda fan and was
surprised at the way the original video was shown

b) We live in an amazing time for software and computing

Fantastic job!

------
sitkack
There was a similar story a couple weeks ago on HN where someone remapped the
video from a time lapse of hand drawing. Can't find the link.

~~~
sitkack
The story wasn't on HN

[http://uberhip.com/python/image-
processing/opencv/2014/10/26...](http://uberhip.com/python/image-
processing/opencv/2014/10/26/warping-brien/)

youtube link,
[https://www.youtube.com/watch?v=BPijRAK2NHg](https://www.youtube.com/watch?v=BPijRAK2NHg)

This second video better shows the effect with uncorrected and corrected video
side by side.
[https://www.youtube.com/watch?v=7xQ0WDmTyVY](https://www.youtube.com/watch?v=7xQ0WDmTyVY)

------
papaver
its called homography... fairly strait forward to code...

[http://en.wikipedia.org/wiki/Homography_(computer_vision)](http://en.wikipedia.org/wiki/Homography_\(computer_vision\))

------
ctrijueque
Lots of useful info about ImageMagick, ffmepg, etc. in the comments.

Thanks everybody.

