Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Podscripter – Automated Transcription for Podcasters (podscripter.co)
153 points by craigcannon on June 4, 2018 | hide | past | favorite | 59 comments



Hey HN!

Craig from YC here. This project is a follow-up to SpeechBoard, which was a text-based audio editor - https://news.ycombinator.com/item?id=15670827. Thanks for all the feedback there :)

We were surprised to find that many users just wanted transcripts, so Podscripter is an attempt to solve that.

Here's how it works: every time you publish an episode (or give us a file) we run it through a speech to text service. Then we split up the speakers by hand, which ends up being a fair bit of work and is why it's 24hrs instead of minutes . Then we email you the transcript.

Before I was podcasting at YC I had my own podcast and couldn't justify paying $1 a minute for transcripts. These machine generated transcripts get you most of the way there for a lot less money :)

Let me know what you think!


Hey Craig,

I wrote a service, (nowhere close for public release), that segments audio based on speakers. You have to identify one speech segment, it is then capable of labelling others. It uses GMM and MFCC. Is something like this in the works? Cool idea! I consume a fair bit of podcasts, I can affirm that there is definitely a need for this


Nice! Let me know when it's ready :)

I've tried what's out there and still haven't found a solution that can consistency diarize well. So I'm doing some experiments on my end too.


Do try out https://scribie.com/transcription/free as well once. Our diarisation system is around 90% accurate on longer paragraphs and will be out this week.


For $10, I'd expect above 99% accuracy, especially since I can get an automated transcript for $5 for 60 minutes that presumably has a similar error rate to what you are offering.

Also, I'd expect it sooner than 24 hours, since I can get automated ones back in under an hour.

Not trying to be cold water. I am actually interested, but what sets you apart from the other cheaper automated solutions? Am I wrong about the error rate I can expect elsewhere?


By way of context, human transcriptions cost about $1 to $1.50 per [EDIT: minute] --which is what I use for my podcasts. Accuracy is extremely good, especially if you flag obscure terms when you submit.


For a podcasts with good audio quality(which is usually a case for podcasts) you probably can use speech recognition tools that will cost you a fraction of this price. They are pretty good nowdays


They're really not--at least for my purposes. And $25-50 for an episode requiring minimal cleanup versus spending 30-60 minutes going back through the audio and fixing things up is a no-brainer for me.

ADDED: Machine transcriptions are pretty good for a lot of things such as search and quickly skimming content. But if you want something that people can read as an alternative to listening to the podcast, you pretty much have to use human transcription or budget a bunch of time to fix up.


Is there a way you can record then edit a podcast but keep track of which microphone different voices are coming from? Seems like you could make speaker identification easier that way?


Yeah, if you record on multiple tracks it's pretty easy but podcasting setups vary a ton.


That's super cheap! Who do you use?


Uggh. Per minute. Sorry. So it costs me about $25-30 per podcast typically. Which is pretty reasonable but obviously adds up for longer transcriptions.


Thanks for the feedback. Ours are diarized accurately. I've yet to see an automated service that does that well.


That's a valuable piece of info that I didn't see mentioned on the site. I'd add it if I were you.


Updated. Thanks for the feedback.


What automated service do you use?


Google


Would this integrate directly with zencastr? They record tracks individually - and then mix them - so you'd have direct access to the individual tracks.

Seems like if you did that or made a way for people to upload their unmixed tracks, you could save some time on the whole thing?


Yeah, that's an awesome idea.

I had kicked it around with some potential customers but went with this simpler model just to see if anyone was interested.

Will most likely go for that next because you're right, diarization is insane :)


Hey Craig, I appreciate the hard work! Is there any way to demo this service for free, or any plans to in the future? The $10 isn't a big hit or anything, but I, as well as many others, I'm sure prefer to try a service before handing off their credit card info.


Yeah, we'll definitely have something free in the future. :)

For the launch we wanted to make sure we could handle all the first orders quickly.


I think this is super neat. The podcasts I am on do not make $10 an episode, so sadly, I can't justify it. But if I could, I would. I shared it with my podcasting friends.


Thanks!


Would you also do YouTube videos? There are many lectures online and it would be nice if more people transcribed them.


Yeah, we could definitely do videos.


I'm able to upload one WAV file per speaker, like most of the podcasters I would pressume. Would that make it easier to automate the split and this make the service cheaper?


I need to test this out more thoroughly. Is your speaker audio totally clean when they're not talking?


For all practical reasons I'd say yes, I'm using https://auphonic.com/ which does use machine learning to mute the parts where no human speaks, then they send this audio to third party (like yours) https://auphonic.com/blog/2016/12/02/make-podcasts-searchabl... then they get the files back so auphonic can bundle it for download with the rest.


So the real value add is when you have multiple speakers? Any suggestions on a reliable service that can handle a single speaker?


You wouldn't need a "service", you can just run it through a TTS program locally.


True. Although I assume you mean speech recognition and not TTS. Good recommendations there would be great :)


Oops, yeah, STT.


Can your service handle corrupted files? I record on the fly and I have some great interviews that I cannot play.


Are you using AWS Polly?


Google actually


Hello,

Just curious if you are able to speech-to-text phone calls? Or if any point you plan to do so.

Thanks


Great project!

My podcast search engine project Listen Notes ( https://www.listennotes.com/ ) does transcription as well.

It's not as accurate as Podscripter, but good enough for in-audio search. Example: https://www.listennotes.com/e/1dae4f4c2c0d4202a1180bd9c9f17d...

Website visitors can request to transcribe episodes on Listen Notes websites.


Listen Notes is awesome. Just found out about it recently.


To drum up business you should just do all the Joe Rogan podcasts for free.


Given the hand-curated speaking order, that seems hard.

There are services out there that will do quality transcription that is completely automated (e.g., Bitplatter's FluidData), and IIRC, they're already doing most of the podcasting world's transcriptions for free right now, including Joe Rogan's.

This seems like the more niche market of those who want last-mile, extra-high-quality transcriptions to sell, for which I think they should be charging more than $10.


FluidDATA has the Joe Rogan podcasts for free at https://fluiddata.com/search?channel_id=9853

Not to mention, FluidDATA has transcribed over 8.2 million podcast episodes from over 230,000 podcast feeds.


I can't find a single transcript at that website, though it's a cool service, kind of like Google for searching within audio (Podcasts).


FluidDATA definitely has a different model than PodScript.

FluidDATA doesn't expose the entire transcript. It currently only exposes the ability to search the transcripts of millions of podcasts.

For example, you can find podcasts that talk about SpeechBoard and Craig by searching: "speech board" + "craig canon"

https://fluiddata.com/search?term=%22speech%20board%22%20%2B...


That's a really cool idea. However the search (or maybe the transcript?) is truly terrible. I can't find anything at all, or it's grossly inaccurate.


Only the Joey Diaz episodes :)


I really would love something like this: to transcribe chinese podcasts into pinyin and characters. this would really help me learn the language better, as listening skills are the hardest to learn when learning a foreign language.


Wouldn't you be concerned about transcription errors interfering with your learning?

A lot of movies have Chinese subtitles. Pick an action movie and the dialog is quite easy.


That's a neat idea!


What distinguishes it from popular competitors like Trint, Temi, etc. who also do speaker identification?


We're hoping to provide better speaker detection and an easy workflow for podcasters.


This brings up a point that has long puzzled me: why is it so uncommon for podcasters to write out what they intend to say? It seems like it would eliminate a lot of the misspeech, circumlocution, and unclearness that make podcasts so frustrating to listen to for me. It would also eliminate the need for transcription after the fact.


That might make sense for one person basically reading a script. (Which, with a few exceptions, aren't a very good format.)

But most podcasts are interviews/conversations. You're not going to get most podcast guests to write out full responses in advance.

I do usually review topics and some potential questions for a few minutes with my guest before we get started and do editing if a question or answer goes off the rails or there's an error. I also do some light editing to cut down on umms, you knows, etc. But a lot of casual podcasts created as sidelines wouldn't make sense if they were going to take a week to put together.


A lot of podcasts are adhoc conversations between multiple people


I've never listened to a podcast but does this work on YouTube and Ted talks?


TED talks and YouTube videos already have closed captions/subtitles.


YouTube captioning tends to range from bad to horrible. TED talks are probably hand-transcribed, as I remember them as being higher quality.


> YouTube captioning tends to range from bad to horrible.

Yes, and I mentioned YouTube specifically because it's representative of the best machine transcription (which this service is) can offer.

TED talks are indeed transcribed by professionals, and so the quality is a magnitude better than what this service can provide. TEDx talks are transcribed by volunteers, so their quality is more variable.[1]

[1] https://www.ted.com/participate/translate/transcribe


Yeah, they most likely pay for professional transcripts. As do orgs like NPR and bigger companies using podcasts for content marketing.


I don't want that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: