
My Eight-Year Quest to Digitize 45 Videotapes - mtlynch
https://mtlynch.io/digitizing-1/
======
allanrbo
Thanks for posting - great read! And felt like deja vue to me :-) I did pretty
much the same project some years back with 34 tapes dating back to 1988 or so.
I guess I was lucky that the actual digitization was a bit easier for me,
because all my tapes were using Sony's 8mm tape system (Video8 / Hi8 / and
Digital8). They were all playable on a 2005 Sony camcorder with a Firewire
output, providing good quality DV files and no audio sync issues. However,
deinterlacing was the big challenge for me. I remember
[http://www.100fps.com](http://www.100fps.com) being a great resource on
understanding the problem. Spent a lot of hours learning and perfecting my
setup using VirtualDub, x264vfw (with as little compression as possible),
AviSynth and QTGMC for deinterlacing
[http://avisynth.nl/index.php/QTGMC](http://avisynth.nl/index.php/QTGMC) , and
then another pass through Handbrake to get a much more reasonable file size
and streamable format. I used Nextcloud for the web interface, but might
switch to MediaGoblin now after reading your post. Didn't get around to
splitting into clips actually - my family is happy enough with just being able
to scrub/skip through tapes in Nextcloud's video player. I think it was about
a 3 year on-and-off project.

Did the shop provide you with deinterlaced video?

~~~
mtlynch
Thanks for reading!

> _Did the shop provide you with deinterlaced video?_

I think so. The video they gave me is 29.97 FPS, which I think means they
deinterlaced it, but my knowledge of video technicals is pretty weak. I didn't
realize just how weak until I read a lot of the comment on here and
/r/DataHoarders about other stuff I could have tried.

Let me know if you have any questions about switching to MediaGoblin. I'm bad
at video but good at software.

------
mrguyorama
Worth noting: I had a $10 video capture device hooked up to a shitty VCR as
well, but I recorded into OBS and did not have the audio skew problem. I
wonder why that is...

My process involved writing a giant python script to open up each giant tape
rip in ffmpeg, cut out a part, attach metadata, and save it. To my horror, I
found out that cutting up an h264 video stream and creating proper I-frames
for the start and end is non-trivial in ffmpeg without fully re-encoding, so I
bit the bullet and ran it overnight. I was lucky and only had about 20 hours
to encode and it gave me the chance to drop the quality on some to save space.

The hardest part of the project, for anyone looking to take this on, is the
data-entry of finding where to split each tape, what's in each video chunk,
and whether you even can identify it. In comparison, the hardware and software
side is piss easy.

~~~
mtlynch
Oh, interesting. I didn't know about OBS until a year or so ago, long after
I'd outsourced the capture. A lot of the feedback I'm hearing is that the skew
might have been on the software side, so maybe OBS would have solved a big
pain point for me.

The encoding step for me takes about 20 hours because I'm maximizing quality.
That part I don't mind though because I can just run it in the background and
grab it when it's done. I've only had to do that 2-3 times, so I don't mind a
longer encode job for higher quality.

------
iforgotpassword
I'm currently doing this, but with close to 16k slides. As I also felt
uncomfortable with sending those to someone, plus the (small) risk of
something getting lost while shipping, I decided to try it myself. The price
difference wasn't that great at that point either.

Turns out pretty much all the professional scan services use the same scanner,
able of scanning a whole magazine automatically. Since there basically is just
that one device that can scan magazines and delivers good quality.

But as I slowly learned, this is a typical niche market with no competition,
so you can ask any price you want.

The device is made in China and sold under different brands in the west. In
the US its pacific images and costs only around 1200$. In Germany it's either
Reflecta or Braun, but they want around 1900€.

The device feels cheap, sounds cheap, moving parts feel flimsy. I spent two
month tinkering with different software, different settings. It has some
quirks. It's slow. If the image is too bright, the included software sets
exposure so low that it's scanning to fast for the internal processor and
stops every second or so, creating a tiny but notable seam in the image. It's
USB 2.0 but definitely not coming close to saturating that bandwidth.

After about 1000 slides, the scanner produced serious artifacts in every scan.
Had to send it in for repair.

I don't know how you can make money using this device for scanning services.
Not at the prices they charge. You barely cover the retail price once you
reach the advertised life span of 20k scans.

------
laurex
Having paid for part of this service (pre-hosting) from a pro shop (film
industry-focused), I am wondering what the comparison might be between what
you ended up with and what you might get for paying for this service from a
consumer service (the Walgreens type services) and from a professional
service. And if you tried an of those for comparison?

~~~
mtlynch
Thanks for reading!

I only tried digitization from one shop, so I can't really compare. The
service I used was targeted toward home consumers, so I'd imagine that higher-
end shops that serve clients in TV or film could do more to improve quality.

------
kiddico
In part 2 you mention the use of sub_filter (never heard of that. adding it to
my toolbelt!) and said it was kind of ugly.

Honestly given the circumstance, that's a really elegant solution. Didn't even
need to touch the internals!

~~~
mtlynch
Haha, yeah, there's something that just feels dirty to me about doing a search
and replace on HTML output.

The gcsfuse solution to me felt more elegant even though it functioned poorly
because everything still worked and all I had to do was mount GCS and add some
symlinks where MediaGoblin expected to open files. But it is hard to beat the
simplicity of a two-line config change.

------
mtlynch
Author here. As you might imagine, this was a long passion project for me, so
I'm happy to answer any questions or take any feedback about this post.

~~~
tasogare
I’m surprised you only tried to automate some tasks after putting hundreds
hours in. I wouldn’t have the patience for that. Anyway, your post is
interesting because it shows how important it is to prototype the whole
processing chain instead of trying to get one phase complete then the next.

~~~
mtlynch
Thanks for reading!

Yeah, I think it was a case of just not thinking critically enough about what
I was doing. I knew it was taking a ton of time to edit, but I thought, "Well,
that's just an unescapably manual task because you can't automate finding the
starts and ends of clips." The silly part was not realizing that I could batch
process the clip boundary finding if I used different tools.

------
toomuchtodo
Great post, awesome project to hack on, thank you for sharing the code.

~~~
mtlynch
Thanks for reading!

