Hacker News new | past | comments | ask | show | jobs | submit login

A bit sad that this interesting content is not available for audio/video impaired readers.



Fyi if you weren't aware ... most podcasts don't have text of the audio because high-quality (accurate) transcription of podcasts costs money. Example rates: https://www.google.com/search?q=podcast+transcription+servic...

So this thread's podcast of 52 minutes of a complex technical topic with multiple speakers could cost ~$200. A programming-related podcast is already a niche topic with a tiny audience and an Array Languages podcast is an even tinier subset of that so the cost might not be justified.

I suppose podcasts could be uploaded to Youtube and let their speech-to-text algorithm do an auto-transcribe. However, the A.I. algorithm is not good at tech topics with industry jargon/acronyms and the resultant transcription will be inaccurate.


I make transcripts of all my work using Descript. It uses Google's speech-to-text algo (same as the one in youtube presumably) and gives you a transcript you can then edit. It costs $15/month I believe, and you have to spend some time editing the transcript that realistically won't be read by many, but it works pretty well ime (no affiliation besides being a happy customer)


Right. A high-quality podcast is already lots of pre- and post-production work on just the audio. I use Rev which hires captioners on my behalf [0] but it's also expensive. I use it sparingly.

[0] https://www.rev.com/


Thanks for bringing Descript to my attention. Do you use any of the production aspects of it?


Yeah, it works really well, it's basically completely replaced what I used to use Audacity and Premiere for.


Presumably there was a script or at least a summary. Why not publish that as well as any slides used?


Why would there be? It's a recorded conversation between a group of people. Some of them may have some rough notes but maybe not even that.


There was no script or summary, nor any slides. It was a completely organic conversation.


I have just finished transcribing it (quite roughly) https://gist.github.com/rak1507/3aec8c0b720e6d8a9ef121fc14e4...


wow! much appreciated, many thanks!


No problem


Chrome now provides on-device powered live captions (which hooks into any chrome originating audio) - chrome://settings/accessibility -> toggle "Live Captions"[1] which could help alleviate some of the limitations for audio impaired viewers

1: https://support.google.com/chrome/answer/10538231?hl=en


>Chrome now provides on-device powered live captions [...] which could help alleviate some of the limitations for audio impaired viewers

That's a great feature! But it also highlights the limited accuracy of the AI machine learning algorithm for technical topics with jargon. E.g., at 27m00s, the caption algorithm incorrectly transcribes it as as "APL is joked about as a right only language" -- but we know the speaker actually said, "APL is joked about as a write-only language". And the algorithm incorrectly transcribes "oversonian languages" when it's actually "Iversonian languages".

The algorithm also doesn't differentiate multiple speakers and the generated text is just continuously concatenated even as the voices change. Therefore, an audio-impaired wouldn't know which person said a particular string of words.

This is why podcasters still have to pay humans (sometimes with domain knowledge) to carefully listen to the audio and accurately transcribe it.


I know the joke is that APL is a write-only language, but it somehow seems more true to say it is a right-only language.

I am nonplussed about AI/ML in general but this accidental wisdom is worth meditating on even if it didn't come from a human.


Samsung's bastard version of Android had a similar "Automated Subtitles" feature. It's decent for watching videos with the phone on silent, but it's pretty crap when there are lots of proper nouns and unusual jargon, as I imagine this podcast has.


So does stock Android, at least the second (?) latest version. (I can never keep track, but I think my phone was eol'ed before the latest version ...)


> on-device powered live captions

I hate this. What were they thinking about? Why not a damn text file that people can grep?


probably because that's a very niche usecase and most people just want some video captions :)

(don't get me wrong, what you describe would be cool and useful! but i can't imagine a lot of people would use it)


Or patience impaired. It's a whole hour.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: