Hacker News new | past | comments | ask | show | jobs | submit login
Inspired by My Wife's Unique Interests, I Created a Transcription Platform
2 points by terryops 87 days ago | hide | past | favorite | 1 comment
Last year, I got the amazing news that I was going to be a dad. My wife, who’s really into TV dramas—especially those focused on Japanese urban life and ancient Chinese romance—started watching a lot of shows while waiting for our baby to arrive. She couldn't get enough from Netflix and even subscribed to a 'local Netflix'. But she quickly ran into a problem: there were no translated subtitles for the shows she wanted to watch.

Being the go-getter she is, she tried various platforms like Happyscribe and Notta. Unfortunately, the transcription quality for Asian languages was just terrible, not to mention the translation quality. That’s when I decided to step in, get my hands dirty and fix it.

First Attempt

The solution is straightforward to me: First, get the video and extract the audio. Then, transcribe the audio into subtitles with high accuracy. Finally, translate those subtitles and watch the video with the synced, translated subtitles.

With the AI boom happening, I came across OpenAI's Whisper automatic speech recognition model and started thinking about how I could integrate it with ChatGPT.

For those who aren’t familiar, Whisper is renowned for its accuracy in most commonly used languages—though it struggles with unclear audio, which can lead to some hilarious hallucinations.

Setting it up was straightforward. It’s easy to use Whisper to transcribe audio into text with timestamps. I then used GPT for translation, synced the subtitle file with the original video in Adobe Premiere Pro for a quick check, and exported the video. This is how I translated and localized an episode of a Chinese TV series. The first show I experimented with was "Joy of Life."

By the way, watching a few episodes together was quite enjoyable.

However, Whisper ran slowly on my M1 Pro, and since I was also experimenting with generating images using Stable Diffusion, I decided to set up a PC with an NVIDIA RTX 4090. I configured a few scripts to download, convert to text, and translate in one go, speeding up the whole process significantly.

One day, while my wife was watching an episode, she suggested, "Why don’t you just make this into a product? Others might find it useful too, and you could even make some extra money for baby formula."

It was a eureka moment for me.

At the time, AI products were emerging rapidly, and with my previous project in the maintenance stage, I was eager to create something new. I invited two friends to join me, and we formed a small team. Thus, a new product was born.

Choices to Make

PC or Mobile: I chose to develop for PC instead of mobile. Subtitle editing and translation typically involve long videos, which are better handled on a PC. Web or Client: I opted for a web-based interface over a client-side application. Client applications need to be compatible with both Windows and Mac, and different versions of these systems. Moreover, I was fed up with Whisper's slow speed and various model limitations. Offloading the computation to the cloud allows users with any computer configuration to use this service smoothly.

I decided to name the product "SubEasy" because it makes creating subtitles easy.

So, it's SubEasy.ai

see comments below to continue...




Breakdown of SubEasy.ai

SubEasy.ai is an all-in-one platform where you can create automatic subtitles, AI translations, transcriptions with speaker names, chat with the transcription, and export it as a video or text file document.

Transcribe:

1. Powered by Whisper: We leverage OpenAI’s Whisper model, which supports many languages with high accuracy, especially in multilingual scenarios. This gives us a competitive edge against ‘traditional’ transcribe services. 2. Enhanced Accuracy and Readability : Whisper isn’t perfect, so we aimed to maximize its potential. We implemented the following:

  - Clear +: Whisper can pick up background noises in audio/videos, like passerby voices, music, and even honking. Using Clear +, we remove these noises with DEMUCS and normalize the audio before sending it to Whisper for transcription.

  - Subtitle Reflow: Many audio/video-to-subtitle applications group large blocks of text within the same timeframe, resulting in overly long subtitles on the screen. With our exclusive Subtitle Reflow feature, you can have context-aware cutting and time-aware segmentations, improving the viewing experience. We actually use smaller NLP models to achieve this, if you’re interested in tech spec. (Just to say don’t use LLM everywhere, it’s just too expensive and very unpredictable)
3. Enhanced Transcription View: We turn audio into well-constructed articles with punctuation, sentences, and paragraphs, useful for previewing podcasts, long audios and videos, and meeting minutes.

  - Speaker Recognition: This feature identifies different speakers in a multi-speaker conversation, making it easier to follow who’s speaking. We use NVIDIA Nemo toolkit for state-of-the-art accuracy in Speaker Recognition.
What Makes it Next-Gen?

1. Context-Aware AI Translation: Most translation services work sentence by sentence, missing context-specific meanings. Using modern AI models, we create context-aware and highly accurate translations. We also introduced a second round of refinement and proofreading, launching AI Plus translation, which can sometimes outperform human translators.

2. Chat with the Transcript: We integrated GPTs with our platform, allowing users to interact with their documents with natural language. You can summarize, and rewrite transcripts and much more on ChatGPT. Since ChatGPT now roll out a lot of features(previous plus-only) to free users, actually you can use this feature with extra cost!

3. Integrated AI Companion: You can create summaries, meeting minutes, show notes, and social media content with one click without leaving the page. Regardless of the transcript language, you can always get AI content in English(Or other languages you prefer).

What Makes the Product More Than Good:

We offer a WYSIWYG video preview with multiple subtitle styles, a lightning-fast subtitle/transcript editing interface, document management system, search, video output, multi-format document output, and more. We believe we have the best overall performance and experience in this specific field.

Final Thoughts

Creating SubEasy.ai has been an incredible journey, inspired by a simple yet profound desire to make my wife's viewing experience more enjoyable. It started as a personal project but quickly evolved into something much larger, driven by the potential to help others facing similar challenges with transcriptions and subtitle translations.

For those who need reliable transcription and translation services, I invite you to give SubEasy.ai a try. You might be pleasantly surprised by its capabilities and the seamless experience it offers. Whether you're curious about the technical aspects, the cost, or just want to provide feedback, I'd love to hear from you. Your insights will help us continue to improve and innovate.

Thank you for taking the time to read about our journey and the creation of SubEasy.ai!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: