Hacker News new | past | comments | ask | show | jobs | submit login
Podcast Transcription with Amazon Transcribe (ipfs.io)
40 points by markhenderson on Jan 18, 2018 | hide | past | favorite | 18 comments



For reference, here is an example of the end product of a (in my opinion) very good transcription service: https://www.grc.com/sn/sn-645.txt

I do not know how much Elaine (who makes those transcriptions) costs, but of course it's going to be more than Amazon Transcribe's cost. What I wonder is, what would be the cost to take the output of Amazon Transcribe, and fix/format it into something like what I linked?

Regardless, one thing that (I think) is important, is that having at least the output of Amazon Transcribe (or something similar) will make it easier for people to find something that was said in your podcast. At least, until Google/etc. start transcribing audio they find, and indexing that…


I’m surprised they don’t already. The ECM at the company I used to work for transcribed video and audio files to make their content searchable.


Youtube already offers auto-generated subtitles. It's likely that the content of those transcriptions are part of the search index for the video.


Very true, but I’ve found no way to search a video and jump to the section that matches what I’m after.

It’s especially useful for long talks, etc.


Very cool. I love transcription projects.

Only problem is he didn’t go over really the quality of the transcription. Transcription by robot is still AWFUL a human can generally do wayyy better.

it’s hilarious how bad it is in this age of AI we supposedly live in.


It's not even a question of AI. In the past epoch, when we used to own (rather than rent/subscribe to) software, it was common to use desktop apps such as Dragon Naturally Speaking. They didn't pretend they could do a great job out of the box - the result would be just like you get with Google, Amazon and the rest. In order to get good results, you needed to train it first. It didn't last long, but after that it produced exactly what you said (with exceptions like less usual proper names etc.). I wouldn't ever accept the quality of all these "modern" transcription APIs, they look like a step backward to me.


Does anyone know how this differs in quality/speed/price from existing transcription APIs from Google, Microsoft and/or IBM?


Haven't have a chance to test AWS Transcribe yet but I run a benchmark of IBM, Microsoft and Google transcription api weekly.

So far, IBM watson work best for english, google in other . language. For example the transcript of trump inauguration speech (16minutes 38seconds, 8371 character) give me a levenshtein distance to the official one of (smaller is better) :

- IBM : 898 (566130ms)

- Google : 1249 (301635ms)

- Microsoft : 5233 (109391ms)


This is not a far comparison. IBM is the leader because they have a product that is specifically designed for this whereas the others do not.


Do you have some github or link as this is interesting to follow the quality of these transcription services


I tried to use this, but it looks like they have to review me for access.

I've tried the IBM and Google transcription APIs on a podcast with overlapping speakers, and they turned out dadaist poetry. I've been meaning to check out other solutions.

* Google: http://mefiwiki.com/wiki/Podcast_107_Transcript_(automated)

* IBM Watson: http://mefiwiki.com/wiki/Podcast_117_Transcript_(automated)


Where's the audio? We also provide automated transcripts and I'd love to compare it.


Click the "Episode ___" link in the first paragraph, then search for "Direct mp3 download" on that page.

I'd love to see more details on what you do. I haven't found a good automated solution yet. It seems like the solutions with acceptable quality involve Mechanical Turk.

Edit: I added a Bing attempt, and I'll link the list, rather than individual eps: http://mefiwiki.com/wiki/Podcast#automated_transcripts


Here's the automated transcript from our system. I checked around a minute or so and looks like accuracy is around 80%.

https://scribie.com/transcript/c5ae0522950c421c9d939dc8a17f4...


Hmm. Is it $0.60/min for the automated solution?


Automated transcripts are free. That's the pricing for our manual transcription.

https://scribie.com/transcription/free


I've tested the google API and I'd say it's quality is similar to what I see in this example result. In other words, not great.

For me, this isn't close enough to be considered a "valid" transcription. Too much information is lost. Too much context is lost.

Truly looking forward to a day when ML / AI can do this with high degrees of accuracy, but we're not there yet.


I need a site that can transcrib very well




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: