

CELT: Next-generation low-latency audio codec from xiph.org - mbrubeck
http://people.xiph.org/~xiphmont/demo/celt/demo.html

======
Groxx
Having listened to the 64bit comparisons of the audio on the page (that's a
particularly cruel sample set; is it a common one?):

Wow. The worst audio chunks on CELT are significantly better than the worst on
the other codecs. Overall, I hear a bit of a loss on the low-end of wider
sounds (the one that pops out to my ear is the entry starting at 0:37), but
for such an _incredible_ improvement in the high ranges that's a wonderful
price to pay. Still a decent bit of that "anti-pre-echo", which I _despise_ ,
but generally less than the others.

All in all: an epic improvement over the other samples, especially when taken
across the board. All the other encoders had _huge_ distortion on one or more
of the sound bites (especially the third, 0:22); CELT had very little, even at
its worst. Phenomenal work.

/listens to 32kbps. shall update!

edit: yep, it hurts most on that 0:37 entry. Unfortunately, I put it as
_noticeably_ worse than all but the AAC-LC (but that one's horrible in
_everything_ ) at 32kbps. It does handle the voice at the end very well,
though. That, with the latency, means it'd probably be great for voice use.

/tries 48; maybe it survives at the middle?

edit2: yeah, about middle. Almost catches up to the HE-AAC entries, maybe
passes Vorbis.

Voice-only or required-low-latency aside (both important qualities), I think
I'm not going to use or recommend this for sub-64kbps, though it could of
course improve yet. Vorbis and HE-AACs beat it. At 64 though, maybe above, or
for respectable quality telepresence, it sounds like a clear winner.

~~~
nullc
FWIW, mono at32kbit/sec is fairly close to stereo at 64. (Of course, there is
some joint coding gain from stereo— but stereo's wost case is just as bad as
2x mono and it's the often the worst cases that limit the quality).

For example, mono speech at 32kbit/sec:
<https://people.xiph.org/~greg/celt/spec.orig.wav>
<https://people.xiph.org/~greg/celt/spec.MP3.wav>
<https://people.xiph.org/~greg/celt/spec.vorbis.wav>
<https://people.xiph.org/~greg/celt/spec.CELT.wav>

The SILK+CELT hybrid does better still, but even CELT alone is still fairly
useful at lower rates.

------
anigbrowl
Great work and great documentation - as usual from Xiph. CELT looks like a big
step forward for real time internet audio. Sub-10ms latency is very impressive
even if the quality is somewhat compromised - for many purposes it will be
sufficient, and where it's not, lossless recordings can be transferred later.

The only thing that I wish they would change is the name 'Ogg' for their
container format. It sounds like a character in a bad children's movie and I
always feel slightly embarrassed introducing it into conversation.

~~~
InclinedPlane
<http://www.amazon.com/Secret-World-OG/dp/B000MX7UEY>

>_>

~~~
anigbrowl
o_O

------
cheald
Mumble (<http://mumble.sourceforge.net/>) uses CELT, and it's amazing - when I
first started using it, I was a little unsettled, because I could hear
inflections and intonations in others' voices that I didn't realize were being
stripped out by Speex.

CELT is an amazing product, and Mumble has built an amazing product on top of
it. I'm pushing very hard to try to help Mumble topple Ventrilo/Skype, at
least for gamers. I'm really excited to see how CELT has grown and changed
over the last couple of years, and it's just getting better.

(Full disclosure: I run a Mumble hosting service. But that's mostly because I
love the product so much.)

------
Groxx
> _low-bitrate performance ('sweet spot' >= 32kbps for 48kHz stereo)_

That seems to be contradictory. or is it just me? It seems to imply the "sweet
spot" (which I'm interpreting as the minimum good-sounding point) exists only
at or _above_ 32kbps; which is low _ish_ , but not all _that_ low. And why the
">"? Surely an upper bound is more useful to people interested in low
bitrates, not a _lower_ bound.

> _flexible streaming with the ability to change most codec parameters mid-
> stream_

Fantastic news. As to the rest... I'd _love_ to understand all that. I'll have
to read through with Wikipedia some time.

~~~
anigbrowl
Sweet spot is more 'best bang for the buck' - you can go lower than that, but
you'll have to sacrifice either latency or frequency resolution. Have a look
at the comparison chart for some context: <http://www.celt-
codec.org/comparison/>

32kbps is not the lowest possible, obviously, but it's still very very low.
Like that's near-realtime encoding at quite high quality at a bitrate low
enough to go over an old modem. Normal uncompressed 16-bit 48khz stereo audio
(the most popular yardstick since the establishment of DVD) is 1536 kbps.
Remember that's kiloBITS per second, and two channels of 16-bit audio are
taking up 32 bits per sample. At 32 kbps without any compression you'd be
limited to a 1 khz sample rate which about as smooth as a cheesegrater.
Compressing audio in almost real time by a factor of 48 and still having it
sound this good is astonishing, trust me. If you didn't do so already, find
the picture of the spectrogram and look at the uncompressed and mp3 plots for
a while. Then look at how much more faith the CELT codec is to the source
material. Their insight about maintaining energy of each coding band at unity
is _extremely_ impressive, one of the cleverest things I've seen in DSP since
perceptual coding (which is what mp3 does).

If digital signal processing and the like stimulates your intellectual
curiosity then I urge you to learn more about it - it's a really interesting
and very useful field of study, with all sorts of interesting applications and
lots of territory still unexplored. The Scientist's and Engineer's Guide to
DSP is a fairly basic introductory text, but has two massive advantages of all
its competitors: it is available for free at the author's website, and it is
extremely well written. Other books can tell you what you need to know. The
DSP guide tells you why you need to know, and why the fundamental algorithms
are so elegant. <http://www.dspguide.com/>

~~~
Groxx
Comparing it to uncompressed audio is a bit of a red herring. MP3s can easily
do 32kbps - they sound like crap from a musical standpoint, but they do it
just fine. Heck, mp3s at 8kbps still sound significantly better than my cell
phone - you could run _7_ audio streams at the same time on dial-up with that
quality. Similarly, uncompressed video is _huge_ , when a _high_ -quality
h.264 pass will look almost identical with _massive_ size savings. From the
several charts, it looks like CELT could be a very large improvement over the
encoders they compared against (I'm assuming a relevant sampling), especially
where speed is concerned, and I'll definitely poke at it and see what I think.

I very much liked the spectrograms, that looks to be a _massive_ improvement.
I've got to test it on my good pair of headphones to see just what it sounds
like.

Nearly everything stimulates my intellectual curiosity; this is high-ish on my
list, but that might mean years yet. Many thanks for the suggestions on info
though, I'll most certainly keep that handy!

~~~
nullc
Exactly.

CELT at 2.5kbit/sec:
[http://myrandomnode.dyndns.org:8080/~gmaxwell/celt/16k_60ms_...](http://myrandomnode.dyndns.org:8080/~gmaxwell/celt/16k_60ms_20-2.wav)

So sure, CELT can be coerced to run at obnoxiously low rates— it's far more
flexible than MP3 in this regard, every frame size which is an integer number
of bytes greater than 8 or so should more or less work (and it should use
every bit effectively).

This doesn't mean that it'll actually be useful at very low rates. 32kbps is
about the limit for 20ms frames where things really start to come apart and
everything starts sounding pretty poor.

The CELT _decoder_ is more computationally complex than you might guess. We
started the design in 2007, and the decoder is pretty comparable to MP3 or
Vorbis decoding (though requiring a _lot_ less memory). Using substantially
less CPU than that, if there was quality to be gained would have been a sin.
So although the overall design is nice and simple— we use a number of 'more
optimal' techniques which are decent CPU sinks. E.g. Everything is range coded
(so we're not constrained to coding things with probabilities of the 1/2^n
form) and we use high dimensionality vector quantization.

The encoder, however, doesn't need a psy-model at all nor does the current one
have much of a psy-model (it has some simple hacks for a couple psycho-
acoustic tweaks, but nothing too complex) Reasonable perceptual performance is
implicit in the format. This means that a good CELT encoder can be much faster
than, say, a good vorbis encoder.

On the subject of DSP stimulating curiosity, checkout
<http://www.xiph.org/video/>

~~~
charlesdm
Where/when can I get a library for this?

~~~
anigbrowl
<http://www.celt-codec.org/downloads/>

------
slug
<http://www.ekiga.org/> (ekiga / ubuntu maverick) comes with support for CELT,
among others (Speex,etc), although I didn't compare them for quality.

