
Amazon Transcribe - Automatic speech recognition - irs
https://aws.amazon.com/transcribe/
======
mkempe
I've been looking for, and testing, various automated transcription APIs over
the years and _never_ found one that was high-quality for videos of interviews
(background noise, people don't talk in full sentences, people use filler
sounds). I'd love to find something usable -- and plan to try this one as
well. Human transcription is laborious, slow, and expensive. I've toyed with
the idea of a human clean-up pass after the automated transcription, but
that's still labor intensive.

~~~
tabeth
Perhaps the task you're trying to complete is just inherently laborious, slow
and expensive, then?

Given that humans _frequently_ have trouble understanding one another, it's
very unlikely you'll ever encounter an API that can do 100%, or even 90%
accurate transcription across environments.

\---

Isn't transcription inherently an opinionated activity, as well? In terms of
how to transcribe utterances, words spoken in other languages when the main
language is another, etc?

~~~
ghaff
Even inexpensive human transcription works pretty well though assuming good
audio and recordings that aren't difficult for other reasons (strong accents,
etc.) I usually get my podcasts transcribed by a service that is basically a
quality control front-end for Mechanical Turk and I need to do minimal
cleanup.

I'll be interested to see how this works in comparison when it's more broadly
available. I'm guessing it will be sufficiently worse that it won't be worth
using for this purpose. If it's cheap enough though I could see using it for
other recordings that I don't normally get transcribed today.

~~~
mkempe
Would you mind sharing which MT-based service you're using for podcast
transcription?

~~~
ghaff
CastingWords. I've been very happy with them for my purposes. They tend to
even get various CamelCase product spellings and the like correct even if I
forget to put them in the transcriber notes. I usually get the 6-day $1.50/min
service partly because I need the turnaround and partly because, anecdotally,
the quality also goes up a bit when you pay for the premium rates. I do maybe
a couple dozen 20 minute podcasts a year so the costs are negligible.

~~~
mkempe
Thanks. I'm dealing with hours of interviews per day.

~~~
mslate
Can you share about your use-case? I have a very similar transcription need--
feel free to contact me at my email in profile

------
wpietri
Wow! Circa $1.50 hour. Whereas human transcription services are more like $45
per hour.

This kind of price will open up entirely new applications. E.g., for ~$10/day,
you could transcribe every conversation you have at work. Combine that with
good search and your phone becomes a supplemental, artificial memory. "I know
I talked about that with somebody but I can't remember who" becomes a thing of
the past.

~~~
zerostar07
I thought you were going to say that the human translator would use the
service , add his own surcharge of $30 and still be cheaper than the
competition :)

seriously this will sell like crazy to public offices and services that keep
transcripts of conversations, courts of law etc.

~~~
wpietri
I'm sure that will happen. Automating a first pass and saving the humans for
the second, smarter pass will be a great combination. Higher quality, lower
cost.

But that's only if you care about human-readable quality. I'm especially
interested in the applications where we don't.

------
nairboon
I've been looking into podcast transcription services, but I haven't really
found a service build using similar APIs like this. Can anyone shed some light
on the transcription business, is the transcription quality not yet good
enough or why don't we see many of these services?

~~~
simonturvey
Shameless plug follows...!

I'd love it if you'd take a look at [https://trint.com](https://trint.com) \-
we're designed exactly for what you're looking for. We use machine learning to
deliver a highly accurate initial transcript that you can polish to perfect
with our in-browser editor.

~~~
Just1689
HN might be DDOSing you at the moment. Perhaps consider putting Cloudflare or
similar in front of your landing page?

------
mrep
Same price as google ($0.006 per 15 second increment) but more expensive that
microsft ($0.004 per 15 second increment). However, this does charge by the
second after 15 seconds which is nice.

I just wish they would lower the minimum threshold of 15 second intervals.

~~~
dvfjsdhgfv
Is the quality so much better than CMU Sphinx that it's worth paying for it?

~~~
mrep
According to others, yes:
[https://news.ycombinator.com/item?id=11820837](https://news.ycombinator.com/item?id=11820837)

~~~
nmcfarl
From personal experience we can get pretty good results on our workflow from
sphinx - but the amount of time it took to get those results was massive, and
spread over years. It's probably worth it for most organizations to skip that
and go for a vendor.

Results from Amazon are an unknown right now, but the other vendors vary a
good bit and are focused on particular kinds of audio (kinds vary by: the
number of speakers, the duration of the audio, the vocabulary, language).

------
bluddy
Given Alexa's performance, this is the last thing I'd want to use.

