
Videogrep: Automatic Supercuts with Python - dvanduzer
http://lav.io/2014/06/videogrep-automatic-supercuts-with-python/
======
Osmium
Very cool. What's the current best tool to help make the .srt file? e.g. the
current best-of-breed text recognition / text alignment tool. Last time I
looked at something like this there didn't seem to be a particularly robust
solution, especially for text alignment.

~~~
saaaam
Hello! Original author here. I'm not sure about the best tool for creating new
.srt files, but for films, you can find pretty much anything on
opensubtitles.org or subscene.com (although the quality varies). You can also
download .srts for youtube videos (again, quality varies).

~~~
isomorphic
I will caution people that downloaded subs are very often misaligned (timing)
with your video, due to various cuts of videos, intro logos, disc versions,
framerates, etc., etc. Leave your download page open and check one-by-one
against the source video.

------
nsxwolf
I wanted to do exactly this, but I wanted to do it at a word level
granularity. So that you could input any text and get a video of random clips
from many sources each saying exactly the words in the input string.

I don't think the metadata is quite there yet.

~~~
vitovito
Word (and phoneme) level granularity is usually used for lip-synching (CG,
video games) and karaoke-type applications.

If you have an accurate text transcript, but not detailed enough timings, you
can use speech recognition on the audio and it will be very accurate, since
you know exactly what is being said (unlike speech recognition on arbitrary
speech, this is more like command-and-control). You can do word-level or
phoneme-level timing granularity pairing an accurate transcript with the
original audio.

The metadata isn't there in regular subtitles, but you can certainly get it
there with some post-processing.

~~~
arafalov
Is there a good easy-to-use software/service that does that kind of alignment
of accurate text transcript to audio/video?

I used to have some links but the companies went out of business.

~~~
vitovito
Probably not for free. I also haven't done this kind of work in over a decade.
The current links I have are:

Annosoft's SDK:
[http://www.annosoft.com/prices](http://www.annosoft.com/prices)

Annosoft made a command-line front-end to the Microsoft Speech API, which is
what many of these other Windows-based systems may also use, and I used in a
project in 1999-2001:
[http://www.annosoft.com/sapi_lipsync/docs/](http://www.annosoft.com/sapi_lipsync/docs/)
(There are other SAPI front-ends if you dig around online, too.)

Others, including open-source ones:
[http://en.wikipedia.org/wiki/List_of_speech_recognition_soft...](http://en.wikipedia.org/wiki/List_of_speech_recognition_software)

Magpie, used in animation and gaming:
[http://www.thirdwishsoftware.com/magpiepro.html](http://www.thirdwishsoftware.com/magpiepro.html)

Crazytalk, used in animation, uses SAPI:
[http://www.reallusion.com/crazytalk/crazytalk.aspx](http://www.reallusion.com/crazytalk/crazytalk.aspx)

FaceFX, used in gaming:
[http://www.facefx.com/documentation/2013.2/W194](http://www.facefx.com/documentation/2013.2/W194)

Source Filmmaker includes it, although I'd be surprised if it wasn't Sphinx or
SAPI or some other existing library:
[https://developer.valvesoftware.com/wiki/SFM/Lip-
sync_animat...](https://developer.valvesoftware.com/wiki/SFM/Lip-
sync_animation_and_Extract_Phonemes)

------
tdicola
Nice tool, also I had no idea about the moviepy library used by the tool.
Looks like a really nice little library for making small video edits in
python. Cool!

------
jpdlla
Really impressed with the example of instances of specific grammatical
structures. Really great application of something useful with this script.

------
chatmasta
I remember reading a while back that employees at big news networks (think
Fox, CNBC, CNN, etc.) had access to some massive database of broadcast videos,
and tooling built around the database to do exactly this. I can't find the
source at the moment, but if anyone knows it, a link could be relevant.

~~~
cambo
I'm assuming you meant this article at ars
[http://arstechnica.com/gadgets/2013/09/with-30-tuners-
and-30...](http://arstechnica.com/gadgets/2013/09/with-30-tuners-and-30-tb-of-
storage-snapstream-make-tivos-look-like-toys/)

------
manish_gill
One thing that annoys me with subtitles is that when they even have all the
sound effects. [SCREAMS LOUDLY], [OMINOUS MUSIC PLAYS] etc. So something like
the Total Recall silence thing probably won't work to a great degree of
accuracy in those cases.

~~~
rmc
Those are for people who are deaf.

Some places call this "closed captioning" (i.e. deaf target audience), versus
"subtitles" (target audience is people who can hear, but not understand the
language).

------
derpplease
[http://www.youtube.com/watch?v=Wpd2VaFt5iY](http://www.youtube.com/watch?v=Wpd2VaFt5iY)

please somebody make an automatic rap impersonation generator

can make use of karaoke youtube clips for the background music...

------
captaincrowbar
I can't find, anywhere in the documentation or a quick skim of the source
code, any clue as to which version of Python this requires.

~~~
gdw2
Based on the style of print statements (no parens), I'd say python2.

------
LeicaLatte
Fascinating!

------
nobody_nowhere
very cool!

------
finnn
Absolutely irrelevant correction: Jay Carney is the current, not former, press
secretary.

[https://en.wikipedia.org/wiki/Jay_Carney](https://en.wikipedia.org/wiki/Jay_Carney)

~~~
sanityinc
Well, yes, until tomorrow.

~~~
finnn
Ah, didn't realize that. Everything is clear now

------
notastartup
The video produced was extremely entertaining and insightful. I can imagine
this tool being very useful for big data analysis.

