
An Introduction to Sine-Wave Speech (2007) - doener
http://www.mrc-cbu.cam.ac.uk/personal/matt.davis/sine-wave-speech/
======
kragen
[http://www.haskins.yale.edu/featured/sws/sws.html](http://www.haskins.yale.edu/featured/sws/sws.html)
is a page the Haskins Lab folks have put up about this; unfortunately at some
point they switched it to use Flash or something, so now it's unusable. The
archived version at
[https://web.archive.org/web/20060828113107/www.haskins.yale....](https://web.archive.org/web/20060828113107/www.haskins.yale.edu/featured/sws/sws.html)
directly links to AIFF files; by telling Firefox to open them with
/usr/bin/play or /usr/bin/vlc I can hear them.

The original experiments were done in the 1970s, I think, and the papers on
the page are I think from 1981–1994. It would have been interesting to see how
things would have developed if this formant-centric viewpoint had become
central to speech codec research in the 80s and 90s. In a sense LPC codecs
like GSM RTE-LTP are formant-centric, since the linear prediction coefficients
in some sense encode the formant structure of the speech, but (and I'm getting
out my depth here) as I understand it the impulse response of the linear
prediction filter is pretty broadband.

We really need a better medium than the current WWW for this kind of archival,
because the researchers are going to retire or die, and then they will
probably stop updating their pages, which will then apparently become
inaccessible with modern browsers.

~~~
unlikelymordant
It is my understanding that most voice codecs are still essentially lpc based,
it is just that lpcs are unstable when quantised, so they get converted to
things like line spectral pairs, then there is all sorts of different
excitations and codebooks used to eek out every bit of savings. But the core
idea of lpcs are still in there. Happy to be corrected by someone who knows
better!

~~~
hprotagonist
it’s still LPC-ish, yes.

a lot of the work in modern codecs goes into good VQ for the codebook, and
better bit allocations.

LPC10 sounds like ass, but damn is it cheap on bits!

~~~
kragen
Mmm, ass! My favorite!

The RPE in RPE-LTP — which is also one of the two excitations in LPC-10 — is
indeed fairly precisely a fart sound, although the usual euphemism is "quasi-
periodic pulses". That's because your larynx sounds like a fart if it doesn't
have a vocal tract glommed onto the end of it to filter it with its recursive
linear resonances.

Fortunately sox has LPC-10 built in, so you can enjoy the tasty tasty 80s ass
quite simply if you install sox:

    
    
        sox foo.wav foo.lpc10
        play foo.lpc10
    

The LPC-10 code in Sox says it was translated from Fortran with f2c sometime
around 1998, and as far as I can tell, nobody's been changing it since then.
That's a nice counterpoint to the fucking "this web page doesn't work because
nobody has changed it in the last five years" WWW.

[https://web.archive.org/web/20200420003350/https://www.jwz.o...](https://web.archive.org/web/20200420003350/https://www.jwz.org/doc/cadt.html)

~~~
hprotagonist
ah good stuff.

I studied with one of Rabiner’s old coworkers, so this is a fun trip.

------
andrelaszlo
I listened to only the "degraded" sample, and I could understand it on the
second listen. English is not my native language and I have pretty severe
hearing loss. I suspect the latter makes it _easier_ for me in some situations
to decode language with a lot of noise - I have a lot of practice guessing, I
suppose. It would be interesting seeing some research on this.

------
crazygringo
What I find interesting is that I only need priming on one, then I can
understand all of them once I realize I'm supposed to be listening for speech
that's behind some kind of high-pitched distortion.

It reminds me somewhat of when I'm in a bilingual conversation and somebody
unexpectedly switches to the other language for just a quick word or two when
you don't expect it. Your brain tries to interpret the sounds in the first
language and they're just gibberish, even though you speak the second
language. Then if the speaker continues in the second language quickly, you
"retroactively" re-interpret what you just heard and understand it.

------
tomxor
After hearing the first two examples with clear speech I could identify most
of the last three without, took a few listens before the last one clicked.

------
Tomte
I saw the other submission yesterday and thought it was very cool. Today it's
more meh.

The priming didn't hold for me from yesterday. I clicked the first of the four
extra examples and didn't understand anything.

After listening to the clear version and trying again I heard it.

But again priming did not hold for me (unlike what's described on the page):
each of the other three examples were unintelligible to me (not even
recognizable as speech sounds). Again, after listening to the clear version,
it became easy, but only for the corresponding sine-wave encoding.

I still like that idea, if only for nostalgic reasons. We did some formant
analysis at university, up to trying to "read" a sonagram, i.e. deducing fron
the sonagram as shown on the page what was being said.

~~~
hprotagonist
ever played with praat? it’s the pinnacle of highly specialized totally non-
ux-designed software.

~~~
yorwba
Its source code is also highly ideosyncratic with lots of macro use, including
definitions like "our" for "this ->"
[https://github.com/praat/praat/blob/9d20024e2f68f84bf23ff49b...](https://github.com/praat/praat/blob/9d20024e2f68f84bf23ff49bf313b2e7ba24fb42/melder/melder.h#L41)

~~~
hprotagonist
weirdo academic code at its “finest”

------
tauntz
Random data point but as a non-native speaker I could understand about 80% of
the words on the first go. Listening to the clear audio didn't really make me
"hear" the correct word on the 2nd try. It just made me realize that "yes,
that word that I know is correct here, might indeed sound like this" IDK how
to explain it. Take for example, the "The camel was kept in a cage at the zoo"
sentence. I heard "The owl was kept in a cage at the zoo" on the first go and
after listening to the clear version, I still hear it's an "owl". I now know
that "camel" is also reasonable there and makes sense, but I still hear "owl".
Am I wired incorrectly? :)

~~~
swixmix
> Am I wired incorrectly? :)

Doubtful. Reminds me of when Grover was accused of saying a bad word.

[https://youtu.be/HfwF5cuAMsk](https://youtu.be/HfwF5cuAMsk)

------
carapace
DIBS! I called it! Dibs on this for robot voices! H.E.L.P.eR-style

\- - - -

Unrelated, there are at least thirty-five Venture Bro.'s (so far.) I don't
want to show my work here because it would be spoiler.

\- - - -

Seriously though: this is how "droids"(tm)(R) should talk, no?

\- - - -

Also unrelated, the only speech act of H.E.L.P.eR's that I can understand in
the entire series is when he and Brock are discussing poetry and he asks
Brock, "Maya Angelou?". That's it. :-)

~~~
kragen
Awesome :)

You could probably turn up the comprehensibility knob by adding a fourth
formant or a little bit of vocoded white noise. Or a sawtooth or FM or AM tone
or something.

------
bzb3
This reminds me of voice inversion, a scrambling method used by some analogue
portable radios.

[https://en.m.wikipedia.org/wiki/Voice_inversion](https://en.m.wikipedia.org/wiki/Voice_inversion)

------
tricolon
The last time I listened to these was ten years ago. I understood all of them
just now.

------
smashah
Whoa. Spooky. I wonder if this works with a text spoiler also.

~~~
Evidlo
There's the classic chainmail that claims you can scramble the inner letters
of all words in a paragraph and it still remain legible. Turns out some of the
words are cherry picked, but it still works somewhat.

    
    
        Aoccdrnig to a rscheearch at Cmabrigde
        Uinervtisy, it deosn't mttaer in waht oredr 
        the ltteers in a wrod are, the olny 
        iprmoetnt tihng is taht the frist and lsat 
        ltteer be at the rghit pclae. The rset can 
        be a toatl mses and you can sitll raed it 
        wouthit porbelm. Tihs is bcuseae the huamn 
        mnid deos not raed ervey lteter by istlef, 
        but the wrod as a wlohe.
    

[http://www.mrc-cbu.cam.ac.uk/people/matt-davis/cmabridge/](http://www.mrc-
cbu.cam.ac.uk/people/matt-davis/cmabridge/)

~~~
smashah
No I mean send someone a sine wave speech file and via another method the
transcription of the speech. Would you hear it after reading it?

