Hacker News new | comments | show | ask | jobs | submit login
Show HN: DeepSpeech based automated transcription service
67 points by braindead_in 4 months ago | hide | past | web | favorite | 26 comments
We have been building a DeepSpeech model with our data for the past year and we have recently hit 95% accuracy on the LibriSpeech dataset. That puts us close to the published results for DeepSpeech 2. However our dataset is conversational audio and we do much better with our own internal dataset compared to PaddlePaddle. Here's a blog post on the method we followed to build our models.


We have been using this internally in our service and it saves a ton of time and effort during the typing stage. It is nowhere near to the accuracy which our transcribers can achieve, but we are getting close. We are offering automated transcripts free for a limited time. Please do try it out.


Thanks in advance!

How does this compare to Google's speech API?

We are planning to do a benchmark with Google Web Speech once it adds support for Multi Speaker files. We once tried with our internal test set on Google Web Speech when we started building this and the WER came to be around 18%.

I would recommend also testing with IBM's Watson Speech. In my usage it was a lot more accurate than Google and Azure. I also did a couple of tests with AWS and Watson was always ahead. All these tests were with American and British English.

In general we're pretty happy with Trint for American & British accents for our stuff (though not to say we won't take a look at what you've got :)). They usually require a bit of tweaking, but it's pretty good. The killer feature for us would be training against people with other accents. You'll notice our transcripts really constitute a pretty big part of what we do, so a good quality transcription service for people with different accents would be an awesome thing.

e.g. clearly once this course leaves early access we'll want to get this copy-edited, and Yan here is British, so even here Trint's not always great :) https://livevideo.manning.com/module/38_1_1/production-ready...?

We do have provide an option to get the transcripts corrected manually by our transcribers. Would that work for you?

By "95% accuracy on the LibriSpeech" you mean a word error rate of 5% on that test set, using your own training data?

The WER on LibriSpeech clean test set is around 0.087 and the CER is 0.030. We trained on around a 5000 hours dataset which included LibriSpeech train.

So where did 95% come from?

I think I made a mistake there. It should have been 92% accuracy. Can't find the edit button now.

Have you considered using a more traditional HMM style recognizer? The stock Kaldi chain model recipe should get you more like 5 or 4.5% WER on LibriSpeech.

We actually started our experiments with Kaldi and even built a dataset out of our files to train on. But we found that Kaldi required a lot of data-prep and a long lead time. Our internal dataset is quite large and data prep is quite easy compared to Kaldi.

You mean you meant to type 91.3% but accidently typed 95%? Were you using your own transaction software?

This is pretty great, I'm going to show it to our UX person who works with transcripts from user testing. The editor feature is pretty great for cleaning up transcripts, and I think it'd be faster to do that than to manually do it as we have been.

The automated process is pretty funny when working with Australian english though!

> Upset is gonna record you forget it out, so... alright, so I just wanted to start saying, What is your family technology? And I can send a man and I can use Excel and Word, such the internet use as post things of baseball pesticides that you... absolutely, absolutely, but a part of not a great deal.

It's pretty good despite the chaotic stop start of conversations between two or three people.

Yeah, the predictions are bad around the speaker turns right now. We are working on a better turns model and that should fix this issue.

You said its free but I got a $5.90 charge!

Try choosing Auto transcribe from the dropdown menu next to the Order Transcript button

Consider saying explicitly which languages are supported by your service because it not written anywhere in the blog post.

Done. Thanks!

How does this compare to trint.com?

We are pretty much on par with Trint. They use the API from Speechmatics and by our benchmark, we are better than Speechmatics on conversational audio. We will be posting a proper benchmark numbers soon. We are building a podcast dataset for testing right now.

Have you checked out http://otter.ai? They do real-time transcription in the browser and on iOS and Android.

I'm always looking to speed up the transcription process. I'll be sure to give this a try! The editor is especially useful.

Seems to be tuned to American English - coped poorly with my British accent.

I can't see a way to delete uploaded files. Am I missing something?

The dropdown menu next to the Edit Transcript button has the delete option. It works best for North American files with clean audio. We will eventually get it as good for British, Australian and all other accents as well. We build a new model every month almost, based on the corrections our transcribers make.

What about real time audio transcribing? Will you add support for this

Real time transcription is not something we are looking at right now. Our goal with this is to see if we can assist our transcribers and improve the efficiency of our system. So our focus is offline transcription for now.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact