
Codec2: A Whole Podcast on a Floppy Disk - ericdanielski
https://auphonic.com/blog/2018/06/01/codec2-podcast-on-floppy-disk/
======
dahauns
Aside from the seriously impressive WaveNet based results, I think the article
doesn't do the codec itself enough justice. I mean, low-bitrate speech codecs
have been around for some time (hey, vocoders are the oldest kind of audio
codecs in history!), and I grew skeptical when they started to compare with
mp3 and opus.

But looking at this page Codec2 really holds its own when compared to AMBE and
especially MELP, two of the most prominent ultra low-bandwidth speech codecs
used today: [https://www.rowetel.com/?p=5520](https://www.rowetel.com/?p=5520)

~~~
gourneau
Here is a fascination video history of the vocoder. Complete with coverage of
the early room size machines. [https://video.newyorker.com/watch/object-of-
interest-the-voc...](https://video.newyorker.com/watch/object-of-interest-the-
vocoder)

------
bcaa7f3a8bbc
The article failed to mention the original reason why Codec2 is invented.

In digital amateur radio communication, currently the most widely-used codec
is AMBE. But AMBE is a proprietary codec, covered by patents, unhackable - the
counter-thesis of amateur radio. Codec2 was born to bring freedom to digital
amateur radio communication, and technically even better than AMBE.

~~~
boomlinde
The article does mention why Codec2 was invented, under "Background".

------
MrRadar
Codec2 is also fully open source and patent-free, in contrast to virtually
every other ultra-low-bitrate voice codec (which are proprietary and have
expensive patent licensing attached). He has a Patreon if you want to support
him in the ongoing development of Codec2 and his SDR modems to enable use of
it in amateur radio:
[https://www.patreon.com/drowe67](https://www.patreon.com/drowe67)

~~~
gwern
Codec2 might be patent-free, but Codec2 with a WaveNet decoder isn't because
WaveNet (convolutional neural networks for generating audio sequence data) is
patented:
[https://patents.justia.com/patent/20180075343](https://patents.justia.com/patent/20180075343)

~~~
merinowool
When it was patented? When I was working with AI about 15 years ago I was
experimenting with conv nn to generate audio. I wouldn't have expected for
this to be patented as this is so friggin obvious thing to do. It is like
patenting 2+2=4 once you discover numbers.

~~~
andai
[Serious question] Does your prior art invalidate the patent?

~~~
merinowool
I am not a scientist, just I was very interested in that space and it would be
a long way to create scientific paper out of my experiments. Since patent law
has been created for the privileged to reap profits I wouldn't stand a chance
contesting that.

------
corruptio
Having grown accustom to MP3 artifacts, it's strange to hear artifacts that
are natural, but just aren't quite right. More specifically, in the male voice
sample: "sold about seventy-seven", I received it as "sold about se _th_ enty-
seven".

~~~
mrob
If we're abandoning accurate reproduction of sound and just making up anything
that sounds plausible, there's already a far more efficient codec: plain text.

Assuming 150wpm and an average 2 bytes per word (with lossless compression),
we get about 5bps, which makes 2400bps look much less impressive. Add some
markup for prosody and it will still be much lower.

This codec also has the great advantage that you can turn off the speech
synthesis and just read it, which is much more convenient than listening to a
linear sound file.

~~~
peterbmarks
Speech to text is certainly getting better but it makes mistakes. If the
transcribed text was sent over the link and then a text to speech spoke at the
other end you'd lose one of the great things about codec2 - the voice that
comes out is recognisable as it sounds a bit like the person.

A few of us have a contact on Sunday mornings here in Eastern Australia and
it's amazing how the ear gets used to the sound and it quickly becomes quite
listenable and easy to understand.

~~~
andai
Could you elaborate on "a contact"?

Are you using Codec2 over radio?

~~~
baobrien
Yeah, the main use case for codec2 right now is over ham radio. David Rowe,
along with a few others, also developed a couple of modems and a GUI
program[1]. On Sunday mornings, around 10AM, they do a broadcast of something
from the WIA and answer callbacks.

[1] - [https://freedv.org/](https://freedv.org/)

------
Ambroos
That is very impressive! I wonder if a WaveNet decoder could be built for
phone calls, as those still sound awful. If it's possible to do this only on
the decoder side you don't have to wait for your network to start supporting
HD voice or VoLTE to get better quality audio!

~~~
IshKebab
Actually if you're lucky and make a phone call with HDVoice, or whatever
they're calling it, the quality is excellent. It makes a huge difference.
Unfortunately the place where you really want good quality is call centres -
it's often hard to hear people and half of the reason is the shitty POTS
quality - and call centres will probably get HDVoice in about 40-50 years.
Maybe.

Edit: nm should have read all of your comment before replying!

~~~
gsich
What do you mean with "HDVoice"? On landline connections this usually means
G722. G711u/a is definitly not "HD".

~~~
IshKebab
I don't know what technology it is specifically, but it's a brand name they
used for actual high quality calls. Think, 128 kB/s MP3, rather than the
standard cups-and-string quality.

It only seems to work on mobile.

~~~
gsich
I know the difference, used G722. On mobile its G722.2, a totally different
codec, but with the same ~7KHz range.

But there were some companies that advertised a lower frequency range as "HD".

------
childintime
Everything spoken in a whole life could fit on a 128GB pendrive (assuming 5%
talk time). Astounding.

~~~
JetSpiegel
Black Mirror is now technically possible.

------
tommoor
Make sure you get to the end and listen to the WaveNet samples, amazing stuff.

------
ksec
Let say we have Codec2 with WaveNet, its 3.2Kbps now does similar to may be
16Kbps EVS. ( EVS being the codec used in VoLTE, which is slightly better then
even Opus in Speeches. )

What "value" / "uses" does this bring us?

It cant be used in podcast because as shown it isn't very good with Music. And
many podcast has Music in it.

While Codec 2 with WaveNet can have a 2-4x reduction in bitrate. I cant think
of a application that benefits from this immediately.

The other thing I keep having in my mind is convolutional neural networks on
Codec in general, Music, Movies, etc. What sort of benefits it bring us.

~~~
perlgeek
> What "value" / "uses" does this bring us?

Maybe not too much for "us" with LTE and 128GB storage on our phones, but in
cases of low bandwith (think digital police radio), or when you have low
storage availability, that's really awesome.

------
mmastrac
Seriously impressive and game-changing results, especially when you take
Wavenet into account. I'm curious to see how Wavenet would perform w/Opus.

------
sbr464
I've become almost entranced with the concept of comparing things to the size
of a Floppy Disk. I'm actually planning to get a tattoo of one on my right
forearm. I've been working on a large business management platform for the
last couple of years and noticed that after investing $500k (salaries/etc) and
building a huge amount of functionality, the frontend and backend codebases
are still under 1.5mb. Pretty amazing.

~~~
calabin
I actually got a floppy disk tattoo on my foot in a moment of spontaneity
(bottomless mimosas).
[https://imgur.com/a/slCG519](https://imgur.com/a/slCG519)

~~~
sbr464
nice haha

------
jancsika
Would be a fun experiment to use something like 3 or even 1 sine to get
unintelligible speech, but then pair it with subtitles where each syllable of
the text is animated synchronized with the speech. (Like the "follow the
bouncing ball" song lyric animations.)

By pairing the audio with the text, you would almost certainly convince the
listener that they can understand it.

Edit: typo

~~~
carapace
;-)

Sine-Wave Speech Demonstration
[https://youtu.be/EWzt1bI8AZ0?t=74](https://youtu.be/EWzt1bI8AZ0?t=74)

> Sine-wave speech is an intelligible synthetic acoustic signal composed of
> three or four time-varying sinusoids. Together, these few sinusoids
> replicate the estimated frequency and amplitude pattern of the resonance
> peaks of a natural utterance (Remez et al., 1981). The intelligibility of
> sine-wave speech, stripped of the acoustic constituents of natural speech,
> cannot depend on simple recognition of familiar momentary acoustic
> correlates of phonemes. In consequence, proof of the intelligibility of such
> signals refutes many descriptions of speech perception that feature
> canonical acoustic cues to phonemes. The perception of the linguistic
> properties of sine-wave speech is said to depend instead on sensitivity to
> acoustic modulation independent of the elements composing the signal and
> their specific auditory effects.

~ [http://www.scholarpedia.org/article/Sine-
wave_speech](http://www.scholarpedia.org/article/Sine-wave_speech)

~~~
andai
To anyone who listens to this, I recommend rewinding to the segment starting
at 1:23 a few times and not letting it reach the spoilers. After a few rounds,
my brain adjusted to the distortion and I could make it out perfectly, without
ever hearing the original.

------
mwcampbell
The WaveNet demos are indeed impressive. But I wonder if the WaveNet decoder
needed to be trained for those specific voices.

------
_emacsomancer_
On a related note, I wish more (any!) podcasts were distributed in opus.

~~~
geofft
As far as I know, enough podcast apps require MP3 (and not even VBR!) that you
have to use MP3, and you can't have multiple <enclosure>s, so how would you do
this? A separate RSS feed for Opus, linked only on the website and not
submitted to aggregators?

~~~
CharlesW
> _As far as I know, enough podcast apps require MP3 (and not even VBR!) that
> you have to use MP3…_

Nope! Podcast episodes can be encoded using AAC (which is as ubiquitous as
MP3) without issue.

That won't realistically possible with Opus until Opus hardware decoding has
available in mobile devices for 5-10 years.

~~~
Hello71
I highly doubt there are any devices that are capable of accessing the modern
web, with all its JavaScript bloat, yet cannot decode a simple audio codec.
Even when Apple was installing AAC hardware decoders, they were already almost
obsolete by modern embedded CPU development (especially the rise of medium-
power ARM SoCs). I highly doubt any devices released in the past 5 years have
any sort of fixed-function audio decoder. Maybe an _en_ coder, possibly some
general-purpose DSPs, but not a format-specific decoder.

~~~
floatboth
Yeah, the last time hardware audio decoders were relevant was like... back in
the Nokia N-Gage days.

The N-Gage QD removed the MP3 decoder that was present in the original model.
And you could install a software player, and it would struggle with bitrates
above 128kbps :D

Modern phones can decode _video_ in software (sucks for battery life, and
framerate/resolution are more limited than with hardware, but it's _possible_
). Audio is _nothing_ for them.

~~~
CharlesW
> _Yeah, the last time hardware audio decoders were relevant was like... back
> in the Nokia N-Gage days._

I guess it's irrelevant you feel overwhelmed by how long your phone can go on
a charge. Plus, low-power/low-CPU requirements are an order of magnitude more
critical in devices like smartwatches.

------
WhiteNoiz3
The Wavenet stuff sounds great, but I'm curious how big the model is. The
audio files may be tiny, but you may need a huge neural network to decode
them.

------
Apocryphon
"The man behind it, David Rowe, is an electronic engineer currently living in
South Australia. He started the project in September 2009, with the main aim
of improving low-cost radio communication for people living in remote areas of
the world. With this in mind, he set out to develop a codec that would
significantly reduce file sizes and the bandwidth required when streaming."

What do you know, it's sort of like Pied Piper without the magical compression
or cloud handwaving.

~~~
LeonM
I've been reading David Rowe's blog [0] since 2008, there are some other
really interesting projects and products on it. One of my favorites back then
was his home build electric car.

[0] [https://www.rowetel.com/](https://www.rowetel.com/)

------
codedokode
I noticed that when you listen to compressed audio first you hear the
unnaturality of voice and clicks (probably when one frame's ending doesn't
match next frame start). But in a few seconds you adapt to it and now voice
sounds pretty clear.

It is impressive how far one can compress speech.

------
dredmorbius
I read, and listemn, to this, and am impressed.

Then I think of the possible negative applications.

a noation of 100m people, talking an hour per day on phone or other audio
channel, could be stored on 100m * 365 * 1.5 MB of storage annually: 54 PB.

In raw storage, that's less than $2 million. Far below national actor budgets.

------
samps
> However, where it starts to get more interesting is the work done by W.
> Bastiaan Kleijn from Cornell University Library.

The authors are not from Cornell. I think the author made this mistake because
the paper is posted on arXiv, and that’s what’s it says at the top of every
page?

------
mr_donk
This is amazing! With this codec and enough processing power, you could do
this bidirectionally and have enough bandwidth to stream a two way realtime
voice chat using 2400bps modems over a standard analog phone line!!! ... Oh...
Wait a minute...

------
bitwize
The plain Codec2 decoder sounds like a TI-99/4A (and works on somewhat similar
principles). If I hook a TI-99/4A to the WaveNet decoder, will it sound
natural?

------
gigatexal
But this guy a beer. What a feat!

------
hatsunearu
Side note: I'm still waiting for an open source, cheap way to do FreeDV/Codec2
on VHF either with a dongle that goes between a raspi/SBC or a laptop and a
cheap ass radio like a baofeng, or an inexpensive radio with Codec2 support.

~~~
baobrien
I think 2400B support is coming to the FreeDV GUI soon. I've seen some work
done on that. That'll let you use a cheap FM radio and a laptop to get on the
air with something codec2 based. I'm slowly chipping away at a TDMA mode for
SDRs, but that's still probably a ways off.

------
madengr
Would be interesting to combine this Codec2 with LoRa modulation. Of course
the latter is patented, but it combines both chirped and direct sequence
spread spectrum to yield some very resilient modulation.

------
danschumann
"Enhance" \- said every movie guy ever.

------
smooc
Ikzmjzn nsh

------
mockery
None of the audio samples play for me (In neither Chrome nor Edge... Other
sites play just fine.)

Makes it very hard to evaluate claims of codec quality, which seems like the
primary purpose of the blog post. :(

~~~
jimnotgym
Working on Firefox Android

~~~
S3raph
confirm, works fine on latest Firefox stable Android

