Hacker News new | past | comments | ask | show | jobs | submit login

I wrote a commandline app that takes a youtube URL (or path to any audio or video file that ffmpeg can read) and converts it into a transcript using the Whisper model and then optionally translates or summarizes it using OpenAI. It's been incredibly useful for chewing through my youtube backlog, but it might also be hugely useful for the deaf or hearing-impaired. It uses Nix to manage dependencies, although I got clever about making that not necessary (I don't like forcing Nix on people until they're ready for it)

https://github.com/pmarreck/yt-transcriber

This is mainly useful for single-speaker videos that are conveying information.

Most other solutions out there that claim to do this only download the closed-captioning and summarize that, but MANY YouTubes do not have a good closed-captioning track, in which case my method still works. (Note: Aiming for Linux/Mac compatibility but have only tested it on Mac so far)

I next want to convert it into a simple web service and/or perhaps Docker image to democratize this out to everyone. (I don't know if I'd be able to afford to host since the CPU/GPU cost for running Whisper on spoken audio is not insignificant, but it should work fine on anyone's local machine assuming they have the hardware for it.)

I also want to add speaker identification (something called "diarization"), possibly by going to WhisperX or other solutions out there, which would make this more useful for multi-party conversation audio.

In other news, I'm looking for contract work (I'm just doing side projects like the above to keep myself busy and, ideally, useful). My last job was Director of Engineering for a startup, but due to having a toddler I wish to remain work-flexible for the time being. https://www.linkedin.com/in/petermarreck/




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: