I've been using whisper to get transcripts from my local radio stations. I know ...

synesthesiam · on Dec 7, 2022

Have you tried whisper-cpp?

https://github.com/ggerganov/whisper.cpp/tree/master/example...

iKlsR · on Dec 7, 2022

The streaming I was speaking to is from a url or network resource. I keep a directory of m3u files which is just a url inside that you can open in vlc etc. `So ffmpeg -i [url] -c copy [file-name].mp3` does the trick for now. `mpv` can do this while saving to a file but it's a nice point to start from as I'm already getting some ideas how I could use this or roll my own, thanks!

ggerganov · on Dec 7, 2022

whisper.cpp provides similar functionality via the `livestream.sh` script that performs transcription of a remote stream [0]. For example, you can transcribe BBC radio in 10s chunks like this:

  $ make base.en
  $ ./examples/livestream.sh http://a.files.bbci.co.uk/media/live/manifesto/audio/simulcast/hls/nonuk/sbr_low/ak/bbc_world_service.m3u8 10

  [+] Transcribing stream with model 'base.en', step_s 10 (press Ctrl+C to stop):

  Buffering audio. Please wait...

   here at the BBC in London. This is Gordon Brown, a former British Prime Minister, who since 2012 has been UN Special Envoy-
   Lemboy for global education. We were speaking just after he'd issued a rallying cry on the eve of the 2022 Football World Cup. For
   Governments around the world to pressure Afghanistan to let girls go to school. What human rights abuses are what are being discussed as we...
   run up and start and have happened and have seen the World Cup matches begin. And it's important to draw attention to one human rights abuse.
   that everyone and that includes Qatar, the UAE, the Islamic organization of countries, the Gulf Corps...

[0] https://github.com/ggerganov/whisper.cpp/blob/master/example...

iKlsR · on Dec 7, 2022

Wow! Thanks for sharing, I didn't explore the repo much beyond that link, this looks very promising, I was going to tomorrow but checking out the src and building now! The confidence color coding btw is chef's kiss. Great job with this.

nico · on Dec 7, 2022

Is this English only, or does it support other languages? Thank you.

EMIRELADERO · on Dec 7, 2022

It's a port of Whisper and uses the original models in a different format so it supports all the languages that the original does.

ortusdux · on Dec 6, 2022

I would love to do this for my local police scanner.

posguy · on Dec 7, 2022

The codecs and narrow frequencies used on most public safety trunked radio networks is truly terrible. IMBE and AMBE+2 should have never made it past Y2K, yet Motorola and Project 25 have ensured these remain in widespread use on these networks: https://en.wikipedia.org/wiki/Project_25

If Whisper achieves 85% or higher accuracy on this audio, it would be a miracle. Garbage in, garbage out tbh. Project 25 needs to move to a modern codec, ideally not one seeing little development done by one small company.

jjwiseman · on Dec 7, 2022

It does pretty well* on ATC. E.g. https://twitter.com/lemonodor/status/1578516727549153280 and https://twitter.com/lemonodor/status/1581354245181218816.

* Better than anything else I've seen, but certainly far from perfect.

lelag · on Dec 7, 2022

Being a very constrained domain with it's own speech rules, transcripting ATC conversation would probably benefit at lot from fine-tuning on ATC speech data.

_fjb4 · on Dec 7, 2022

Yeah the tech is there for radio, same for POTS as well (still 8 bit/8000Hz at its core). I do listen on the scanner from time to time and assuming a clear signal traditional NFM always beats digital in terms of intelligibility.

Speaking of the telephone, they definitely could put improved audio quality as part of the 4G and 5G specs, but they don't. A modern telephone network is all IP anyways and backwards compatibility can be maintained.

lights0123 · on Dec 7, 2022

I've done this with Google's video transcription model, which works surprisingly well.

CiceroCiceronis · on Dec 7, 2022

I wonder how much training on degraded/radio encoded samples would be needed to improve performance in this area. You’re probably not the only person who wants to monitor police radios in their city.

dgtlntv · on Dec 7, 2022

whisper.cpp (https://github.com/ggerganov/whisper.cpp) supports streaming!

mindvirus · on Dec 7, 2022

Out of curiosity, when you chunk it, what do you do about words (and maybe even sentences) that might get caught in the middle of the split?

iKlsR · on Dec 7, 2022

Great question, I've not encountered it too much as I'm looking for specific stuff in the transcript based on keywords so some loss is acceptable and I don't always need the data immediately so it has not been an issue so far but what I do to alleviate this somewhat is have "stitched passes" at set intervals.

News, betting games and some shows HAVE to happen exactly at a certain time and are very rarely late so you can use these known times as checkpoints with some bias. So say I run this command `ffmpeg -i http://someexamplesite.fm/8b0hqm93yceuv -c copy -f segment -segment_time 60 -reset_timestamps 1 zip-%03d.mp3` that automatically chunks the stream into 1 minute files and I know news gets read at 1pm, I can merge everything from 10am to 1pm into say "Segment B" and then process that, you get the idea.

I also have a step in the pipeline after on the transcript after that tries to summarize what it can so any small gaps would likely be inferred as I have not noticed anything too wonky and the summaries and text I get so far have been pretty clean and good enough for my needs. Once I bench this some more however I'm sure I will have this and other interesting problems to solve, the one I'm fighting now a bit is ads that have music and fast talking.

mindvirus · on Dec 7, 2022

Wow thank you for the in depth reply! That makes a lot of sense about things happening on a cadence.

cwkoss · on Dec 7, 2022

Oh geez, this is one of those technical problems that have so many possible good answers that it'd make a great interview question.

NaturalPhallacy · on Dec 6, 2022

Have you tried https://webcaptioner.com/ ?

iKlsR · on Dec 6, 2022

I came up with this specifically because of whisper. This looks nice but doesn't suit my use case as I'm "listening" to over a dozen stations at once and I also stitch the chunks together end of day in a fat wav file.

bochoh · on Dec 7, 2022

This but for automation of calling in for prizes or entering keywords on station websites.

rexreed · on Dec 6, 2022

What processing / server / backend are you using to run the whisper model?

iKlsR · on Dec 6, 2022

Stupid simple setup until I streamline the process some more. Just running on my pc (i7 8700k, gtx 1080) and using ffmpeg for copying the stream to disk and a simple python script that creates the chunks and transcripts.