
Show HN: DeepSpeech based automated transcription service - braindead_in
We have been building a DeepSpeech model with our data for the past year and we have recently hit 95% accuracy on the LibriSpeech dataset. That puts us close to the published results for DeepSpeech 2. However our dataset is conversational audio and we do much better with our own internal dataset compared to PaddlePaddle. Here&#x27;s a blog post on the method we followed to build our models.<p><a href="https:&#x2F;&#x2F;scribie.com&#x2F;blog&#x2F;2018&#x2F;03&#x2F;continual-learning-for-speech-to-text&#x2F;" rel="nofollow">https:&#x2F;&#x2F;scribie.com&#x2F;blog&#x2F;2018&#x2F;03&#x2F;continual-learning-for-spee...</a><p>We have been using this internally in our service and it saves a ton of time and effort during the typing stage. It is nowhere near to the accuracy which our transcribers can achieve, but we are getting close. We are offering automated transcripts free for a limited time. Please do try it out.<p><a href="https:&#x2F;&#x2F;scribie.com&#x2F;transcription&#x2F;free" rel="nofollow">https:&#x2F;&#x2F;scribie.com&#x2F;transcription&#x2F;free</a><p>Thanks in advance!
======
bob_theslob646
How does this compare to Google's speech API?

~~~
braindead_in
We are planning to do a benchmark with Google Web Speech once it adds support
for Multi Speaker files. We once tried with our internal test set on Google
Web Speech when we started building this and the WER came to be around 18%.

~~~
pell
I would recommend also testing with IBM's Watson Speech. In my usage it was a
lot more accurate than Google and Azure. I also did a couple of tests with AWS
and Watson was always ahead. All these tests were with American and British
English.

------
gregjwild
In general we're pretty happy with Trint for American & British accents for
our stuff (though not to say we won't take a look at what you've got :)). They
usually require a bit of tweaking, but it's pretty good. The killer feature
for us would be training against people with other accents. You'll notice our
transcripts really constitute a pretty big part of what we do, so a good
quality transcription service for people with different accents would be an
awesome thing.

e.g. clearly once this course leaves early access we'll want to get this copy-
edited, and Yan here is British, so even here Trint's not always great :)
[https://livevideo.manning.com/module/38_1_1/production-
ready...](https://livevideo.manning.com/module/38_1_1/production-ready-
serverless/introduction/introduction-to-course)?

~~~
braindead_in
We do have provide an option to get the transcripts corrected manually by our
transcribers. Would that work for you?

------
gok
By "95% accuracy on the LibriSpeech" you mean a word error rate of 5% on that
test set, using your own training data?

~~~
braindead_in
The WER on LibriSpeech clean test set is around 0.087 and the CER is 0.030. We
trained on around a 5000 hours dataset which included LibriSpeech train.

~~~
gok
So where did 95% come from?

~~~
braindead_in
I think I made a mistake there. It should have been 92% accuracy. Can't find
the edit button now.

~~~
gok
Have you considered using a more traditional HMM style recognizer? The stock
Kaldi chain model recipe should get you more like 5 or 4.5% WER on
LibriSpeech.

~~~
braindead_in
We actually started our experiments with Kaldi and even built a dataset out of
our files to train on. But we found that Kaldi required a lot of data-prep and
a long lead time. Our internal dataset is quite large and data prep is quite
easy compared to Kaldi.

------
ryan-allen
This is pretty great, I'm going to show it to our UX person who works with
transcripts from user testing. The editor feature is pretty great for cleaning
up transcripts, and I think it'd be faster to do that than to manually do it
as we have been.

The automated process is pretty funny when working with Australian english
though!

> Upset is gonna record you forget it out, so... alright, so I just wanted to
> start saying, What is your family technology? And I can send a man and I can
> use Excel and Word, such the internet use as post things of baseball
> pesticides that you... absolutely, absolutely, but a part of not a great
> deal.

It's pretty good despite the chaotic stop start of conversations between two
or three people.

~~~
braindead_in
Yeah, the predictions are bad around the speaker turns right now. We are
working on a better turns model and that should fix this issue.

------
goesprotocall
You said its free but I got a $5.90 charge!

~~~
braindead_in
Try choosing Auto transcribe from the dropdown menu next to the Order
Transcript button

------
titanix2
Consider saying explicitly which languages are supported by your service
because it not written anywhere in the blog post.

~~~
braindead_in
Done. Thanks!

------
kvz
How does this compare to trint.com?

~~~
braindead_in
We are pretty much on par with Trint. They use the API from Speechmatics and
by our benchmark, we are better than Speechmatics on conversational audio. We
will be posting a proper benchmark numbers soon. We are building a podcast
dataset for testing right now.

~~~
bwill94070
Have you checked out [http://otter.ai](http://otter.ai)? They do real-time
transcription in the browser and on iOS and Android.

------
Arn_Thor
I'm always looking to speed up the transcription process. I'll be sure to give
this a try! The editor is especially useful.

------
edent
Seems to be tuned to American English - coped poorly with my British accent.

I can't see a way to delete uploaded files. Am I missing something?

~~~
braindead_in
The dropdown menu next to the Edit Transcript button has the delete option. It
works best for North American files with clean audio. We will eventually get
it as good for British, Australian and all other accents as well. We build a
new model every month almost, based on the corrections our transcribers make.

------
CommanderData
What about real time audio transcribing? Will you add support for this

~~~
braindead_in
Real time transcription is not something we are looking at right now. Our goal
with this is to see if we can assist our transcribers and improve the
efficiency of our system. So our focus is offline transcription for now.

