Hacker News new | past | comments | ask | show | jobs | submit login
My Eight-Year Quest to Digitize 45 Videotapes (mtlynch.io)
34 points by mtlynch 77 days ago | hide | past | favorite | 14 comments

Thanks for posting - great read! And felt like deja vue to me :-) I did pretty much the same project some years back with 34 tapes dating back to 1988 or so. I guess I was lucky that the actual digitization was a bit easier for me, because all my tapes were using Sony's 8mm tape system (Video8 / Hi8 / and Digital8). They were all playable on a 2005 Sony camcorder with a Firewire output, providing good quality DV files and no audio sync issues. However, deinterlacing was the big challenge for me. I remember http://www.100fps.com being a great resource on understanding the problem. Spent a lot of hours learning and perfecting my setup using VirtualDub, x264vfw (with as little compression as possible), AviSynth and QTGMC for deinterlacing http://avisynth.nl/index.php/QTGMC , and then another pass through Handbrake to get a much more reasonable file size and streamable format. I used Nextcloud for the web interface, but might switch to MediaGoblin now after reading your post. Didn't get around to splitting into clips actually - my family is happy enough with just being able to scrub/skip through tapes in Nextcloud's video player. I think it was about a 3 year on-and-off project.

Did the shop provide you with deinterlaced video?

Thanks for reading!

>Did the shop provide you with deinterlaced video?

I think so. The video they gave me is 29.97 FPS, which I think means they deinterlaced it, but my knowledge of video technicals is pretty weak. I didn't realize just how weak until I read a lot of the comment on here and /r/DataHoarders about other stuff I could have tried.

Let me know if you have any questions about switching to MediaGoblin. I'm bad at video but good at software.

Worth noting: I had a $10 video capture device hooked up to a shitty VCR as well, but I recorded into OBS and did not have the audio skew problem. I wonder why that is...

My process involved writing a giant python script to open up each giant tape rip in ffmpeg, cut out a part, attach metadata, and save it. To my horror, I found out that cutting up an h264 video stream and creating proper I-frames for the start and end is non-trivial in ffmpeg without fully re-encoding, so I bit the bullet and ran it overnight. I was lucky and only had about 20 hours to encode and it gave me the chance to drop the quality on some to save space.

The hardest part of the project, for anyone looking to take this on, is the data-entry of finding where to split each tape, what's in each video chunk, and whether you even can identify it. In comparison, the hardware and software side is piss easy.

Oh, interesting. I didn't know about OBS until a year or so ago, long after I'd outsourced the capture. A lot of the feedback I'm hearing is that the skew might have been on the software side, so maybe OBS would have solved a big pain point for me.

The encoding step for me takes about 20 hours because I'm maximizing quality. That part I don't mind though because I can just run it in the background and grab it when it's done. I've only had to do that 2-3 times, so I don't mind a longer encode job for higher quality.

I'm currently doing this, but with close to 16k slides. As I also felt uncomfortable with sending those to someone, plus the (small) risk of something getting lost while shipping, I decided to try it myself. The price difference wasn't that great at that point either.

Turns out pretty much all the professional scan services use the same scanner, able of scanning a whole magazine automatically. Since there basically is just that one device that can scan magazines and delivers good quality.

But as I slowly learned, this is a typical niche market with no competition, so you can ask any price you want.

The device is made in China and sold under different brands in the west. In the US its pacific images and costs only around 1200$. In Germany it's either Reflecta or Braun, but they want around 1900€.

The device feels cheap, sounds cheap, moving parts feel flimsy. I spent two month tinkering with different software, different settings. It has some quirks. It's slow. If the image is too bright, the included software sets exposure so low that it's scanning to fast for the internal processor and stops every second or so, creating a tiny but notable seam in the image. It's USB 2.0 but definitely not coming close to saturating that bandwidth.

After about 1000 slides, the scanner produced serious artifacts in every scan. Had to send it in for repair.

I don't know how you can make money using this device for scanning services. Not at the prices they charge. You barely cover the retail price once you reach the advertised life span of 20k scans.

Having paid for part of this service (pre-hosting) from a pro shop (film industry-focused), I am wondering what the comparison might be between what you ended up with and what you might get for paying for this service from a consumer service (the Walgreens type services) and from a professional service. And if you tried an of those for comparison?

Thanks for reading!

I only tried digitization from one shop, so I can't really compare. The service I used was targeted toward home consumers, so I'd imagine that higher-end shops that serve clients in TV or film could do more to improve quality.

In part 2 you mention the use of sub_filter (never heard of that. adding it to my toolbelt!) and said it was kind of ugly.

Honestly given the circumstance, that's a really elegant solution. Didn't even need to touch the internals!

Haha, yeah, there's something that just feels dirty to me about doing a search and replace on HTML output.

The gcsfuse solution to me felt more elegant even though it functioned poorly because everything still worked and all I had to do was mount GCS and add some symlinks where MediaGoblin expected to open files. But it is hard to beat the simplicity of a two-line config change.

Author here. As you might imagine, this was a long passion project for me, so I'm happy to answer any questions or take any feedback about this post.

I’m surprised you only tried to automate some tasks after putting hundreds hours in. I wouldn’t have the patience for that. Anyway, your post is interesting because it shows how important it is to prototype the whole processing chain instead of trying to get one phase complete then the next.

Thanks for reading!

Yeah, I think it was a case of just not thinking critically enough about what I was doing. I knew it was taking a ton of time to edit, but I thought, "Well, that's just an unescapably manual task because you can't automate finding the starts and ends of clips." The silly part was not realizing that I could batch process the clip boundary finding if I used different tools.

Great post, awesome project to hack on, thank you for sharing the code.

Thanks for reading!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact