
Voc: a physical model of the vocal tract, written in ANSI C - adamnemecek
http://pbat.ch/proj/voc/
======
zebproj
Hey! I'm the author of this thing. Let me know if you have any questions about
it.

FYI, the generated C code is now part of the dev branch of Soundpipe here:
[http://pbat.ch/proj/soundpipe.html](http://pbat.ch/proj/soundpipe.html), my
music DSP library, and it has also made it's way into the develop branch of
AudioKit: [http://audiokit.io](http://audiokit.io).

Also, check out the original implementation Pink Trombone:
[https://dood.al/pinktrombone/](https://dood.al/pinktrombone/). It's the
perfect interface for this kind of model.

~~~
TheNewAndy
Funnily enough, I also made a C90 port of the same code, and also named it
"voc". Here is an oldish version:

[http://ultra-premium.com/scratch/voc.zip](http://ultra-
premium.com/scratch/voc.zip)

I would completely expect yours to be better in all ways (mine was basically a
line for line transcription of all the audio and stripping out the UI stuff,
where as you actually went to the effort of understanding the thing being
ported).

~~~
zebproj
I found your code last night, and it really freaked me out that there would be
another person in the world who would want to make a C port of pink trombone
and call it Voc as well.

Voc is pretty much a line-for-line port of PT as well, but I removed some bits
like the simplex noise. I also wrote some small utilities to go with Voc, like
small plotting programs and some plugins for my audio language for both the
whole source-filter model and just the filter.

Still sinking my teeth into the literature. Voice synthesis has a very rich
history!

------
userbinator
Interesting use of Knuth's literate programming concept. I remember reading
about it and the source code of TeX in PDF format a long time ago, but now
that I have a chance to read another piece of code written in it, I'm finding
it _less_ readable because of the proportional fonts than if it were equally-
commented but in a more conventional monospace programming font.

Also, some samples of making it talk would be good, like:
[https://en.wikipedia.org/wiki/Voder](https://en.wikipedia.org/wiki/Voder)

~~~
zebproj
> I'm finding it less readable because of the proportional fonts than if it
> were equally-commented but in a more conventional monospace programming
> font.

My initial motivations for using literate programming with Voc was to take
advantage of TeX's math mode to express what was happening numerically in
code, as well as to have the ability to use BibTex inside the code.

> Also, some samples of making it talk would be good, like:
> [https://en.wikipedia.org/wiki/Voder](https://en.wikipedia.org/wiki/Voder)

At the bottom of the page, there are music examples on Vimeo, with plots of
the 44 vocal tract diameters being manipulated in realtime:

[https://vimeo.com/220091107](https://vimeo.com/220091107)

[https://vimeo.com/220091290](https://vimeo.com/220091290)

[https://vimeo.com/220091487](https://vimeo.com/220091487)

My goal was really to build vocalizations, and not necessarily to produce
speech. This engine is a bit more low level than that. It could be possible to
build a speech engine on top of Voc though... next steps perhaps?

------
jarmitage
Great! Also inspired by Pink Trombone, we ported the same model to our maker
platform and then to a modular synth:

[https://www.youtube.com/watch?v=bo5ZEgBEapk](https://www.youtube.com/watch?v=bo5ZEgBEapk)

[https://twitter.com/BelaPlatform/status/856110345332674561](https://twitter.com/BelaPlatform/status/856110345332674561)

[https://github.com/giuliomoro/pink-
trombone](https://github.com/giuliomoro/pink-trombone)

~~~
zebproj
Hey yeah! I actually came across that when I first set out to make my project.

I was going to actually fork off your project, but decided it would be
cleaner/faster to do it off the original code since I wanted to write it in
ANSI C.

~~~
Sean1708
Slightly off-topic sorry, but why ANSI C and not C99 (or C11)? Is it common to
come across systems where C99 isn't supported, or are there reasons why you
prefer C89 to C99?

~~~
zebproj
There aren't any real reasons I have for choosing C89 over C99. Both tend to
be very portable, which is very nice if you aren't sure what operating system
you are running on (if any, in many situations). I still write many programs
using the "-std=c99" flag, but I never find myself in dire need of the
extensions, basically honorary ANSI C. For projects like this that just do
numerical processing, C89 C really isn't that much more of a hassle.

------
macawfish
Would this be a good target for deep learning? It's low level in some sense,
but still nicely parametric. It strikes me that this could be a good synth for
some neural nets to learn how to play.

~~~
vortico
Yes, speech recognition and speech generation would be easier to implement if
you used neural networks that were trained on these vocal cord inputs rather
than audio samples. In either case, you'd need to solve the inverse problem to
generate vocal cord parameters given an audio sample. This seems difficult but
I'd imagine some commercial software packages do it to some extent.

~~~
skykooler
I would imagine a neural network that fed parameter values to Voc would be far
faster (real-time?) than something like WaveNet which needs to sample the
output thousands of times a second.

~~~
zebproj
Correct. WaveNet is a very brute force approach to speech synthesis.

------
anirudt
This is quite cool!

On a sidenote, can this be used to train or obtain voice parameters of oneself
for using it in software programs like Espeak?

~~~
zebproj
Not directly, no. IIRC, programs like Espeak and Festival use formant
synthesis, which would require explicit formant values. Voc models the tract
itself... the main parameters are diameters in the vocal tract (which
implicitly produce vowel sounds).

It may be possible to go the other way around and analytically derive
parameters for Voc that match target formant frequencies. Not sure though...

~~~
marmaduke
There are other ways though e.g. if features like formants produced by the
model can be differentiated with respect to vocal tract parameters, the latter
could be estimated based on real data maybe

------
adamnemecek
Here's an example of it in action
[https://vimeo.com/221310975](https://vimeo.com/221310975)

~~~
ruste
That was... very strange. I'm not sure how else you'd demonstrate a physical
vocal tract model though.

------
mrstone
This is super cool. I seem to remember a JS version of this floating around
earlier this year. Does anyone have it?

~~~
ajacksified
It's the first link on the page

------
MrBuddyCasino
Is there something that does the opposite? Feed vocals, get jaw movements?
I've got a weird little project, think "talking skeleton".

~~~
coldtea
There are numerous programs that translate spoken words to mouth/jaw movements
-- usually for animation (to match the 3D /2D model with the right movements
for what it's supposed to say).

IIRC, Adobe Animator does that too.

~~~
MrBuddyCasino
Thank. Unfortunately, I need something suitable for embedded hardware.

~~~
coldtea
Check this maybe:

[https://github.com/DanielSWolf/rhubarb-lip-
sync](https://github.com/DanielSWolf/rhubarb-lip-sync)

------
Keyframe
Wouldn't you also need to model tongue and mouth chamber (with teeth) and all
of their movements?

~~~
zebproj
Yes and no. Perceptually, you don't really need to model everything to get
convincing speech sounds. Most of the realism actually comes from performance,
and not the mathematical model.

In a way, lips and mouth are accounted for here, but in a more abstract away.
The KL model approximates the vocal tract as a series of cylindrical tubes
with varying diameters. Segments of the tubes actually correspond to things
like the tongue and mouth somewhat. In this model there is a really neat
tongue control that manipulates these segments. It's quite expressive!

This model is a 1d waveguide, so it doesn't account for things like the
curvature of the tract. More modern vocal modelling techniques include
implementing a 2-dimensional waveguide, which _does_ allow for this control.

------
traverseda
Title says "Vox". Page says "Voc".

~~~
adamnemecek
fixed

