

Audiogrep transcribes audio files and then creates “audio supercuts” - albertzeyer
http://antiboredom.github.io/audiogrep/

======
saaaam
Hey all - I made this and am wondering if anyone here has any experience with
pocketsphinx and could lend a hand in making the transcriptions more accurate.
Let me know! (Or just make a pull request.)

~~~
th-ai
Can't help w/ pocketsphinx, but do work w/ transcription sync, where accuracy
depends on source. Goog/tube ASR is above 90%, at least where well-recorded
people speak evenly paced with minimal accent. Otherwise, where vocals are
hard to hear, ASR isn't good enough. Human corrected transcripts cost
$1/minute today, and will 10x more affordable soon.

~~~
dubeye
Why will human corrected transcripts be 10x cheaper soon? Because of
increasing ASR? What about poorly-recorded people?

------
albertzeyer
Also check out the examples: [http://lav.io/2015/02/audiogrep-automatic-audio-
supercuts/](http://lav.io/2015/02/audiogrep-automatic-audio-supercuts/)

And this related project:
[https://github.com/antiboredom/videogrep](https://github.com/antiboredom/videogrep)

He already states the obvious idea to integrate audiogrep into videogrep,
which at the moment just uses subtitle files.

> All the instances of the phrase "time" in the movie "In Time":
> [https://www.youtube.com/watch?v=PQMzOUeprlk](https://www.youtube.com/watch?v=PQMzOUeprlk)

> All the one to two second silences in "Total Recall":
> [https://www.youtube.com/watch?v=qEtEbXVbYJQ](https://www.youtube.com/watch?v=qEtEbXVbYJQ)

> The President's former press secretary telling us what he can tell us:
> [https://www.youtube.com/watch?v=D7pymdCU5NQ](https://www.youtube.com/watch?v=D7pymdCU5NQ)

~~~
saaaam
As an intermediate step of merging this into videogrep, I've been using
moviepy/audiogrep to make these somewhat unnerving condensed c-span videos:

[http://lav.io/2015/02/c-span-excerpts/](http://lav.io/2015/02/c-span-
excerpts/)

~~~
rasur
These are really quite compelling, thanks for sharing!

------
georgehm
Awesome! Will it be possible to do a reverse map from a sentence back to the
approx time in the original clip?

------
jaideepsingh
Nice! I started making a python radio that used pydub and transcribed text
(just like this) from different sources few years ago, but abandoned it
prematurely. I'll look at your code, hopefully will kickstart it again!
Thanks!
[https://bitbucket.org/jaideepsingh/isodi/](https://bitbucket.org/jaideepsingh/isodi/)

------
hardwaresofton
Nice! Glad to see more work done with CMU Sphinx (and I'd be equally excited
if this was done with Julius) -- so many possibilities!

I assume this will be useful to data scientists who want to process lyrics?
what other intended/near-at-hand use cases are there?

~~~
anigbrowl
It would be handy as a starting point for ADR, automated dialog replacement.
In films you don't always get what you want in production sound - maybe a
plane was flying overhead as the sun was setting on the last day that your
Famous Actress was available, so you make the most of the visual opportunities
and accept the inadequate sound. Then You bring the actors back to a recording
studio later and have them re-read their lines. On large films, a _lot_ of
dialog that you hear in the final version is recorded this way, as much as 50%
in action movies (because you have all this noisy equipement going on around
the set, and getting good quality sound recordings always has a lower
priority. On indie films there's more location shooting and smaller post-
production budgets, so you aim to minimize adr requirements, to 0% if at all
possible.

Actors hate doing ADR and it's time-consuming and annoying for editors. This
wouldn't automatically solve the problem because you wouldn't have a good
match between dialog recorded in different acoustic environments, but it does
have the potential to save a lot of grunt work, especially for background
dialog where you can compromise on quality a bit.

Also, in post-production you often find yourself wanting to edit just one or
two words in a scene and you'd rather not bring the actor back for such a
small problem, so you look for other scenes and other takes of the same scene
where the same word or syllable appears, and do a little cut-and-paste and
blending, the audio equivalent of photoshop retouching. It would be _very_
useful for that.

------
Mithaldu
I'd like to see this applied to this video, targeting "hushed":
[https://www.youtube.com/watch?v=BpsMkLaEiOY](https://www.youtube.com/watch?v=BpsMkLaEiOY)

~~~
ouchy
Heads-up: this video begins with loud audio from a smoke alarm.

------
niknamelogin
I know similar project
[http://LaconiaTrimVideo.com](http://LaconiaTrimVideo.com) Trim silence from
video.

------
jaywunder
This is a really cool tool. Just curious what the "real world" use case for
would be for it.

------
archimedespi
I'm going to have so much fun with this (evil grin).

