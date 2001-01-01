FYI, the generated C code is now part of the dev branch of Soundpipe here: http://pbat.ch/proj/soundpipe.html, my music DSP library, and it has also made it's way into the develop branch of AudioKit: http://audiokit.io.
Also, check out the original implementation Pink Trombone: https://dood.al/pinktrombone/. It's the perfect interface for this kind of model.
I would completely expect yours to be better in all ways (mine was basically a line for line transcription of all the audio and stripping out the UI stuff, where as you actually went to the effort of understanding the thing being ported).
Voc is pretty much a line-for-line port of PT as well, but I removed some bits like the simplex noise. I also wrote some small utilities to go with Voc, like small plotting programs and some plugins for my audio language for both the whole source-filter model and just the filter.
Still sinking my teeth into the literature. Voice synthesis has a very rich history!
Also, build a language. Mine is Sporth: http://pbat.ch/proj/sporth.html
More recently, I've been building UIs on top of Sporth called Spigot: http://pbat.ch/proj/spigot
I think pinktrombone would make an excellent resource for learning how to make certain sounds when learning a foreign language.
The KL model does sing! Max Matthews and Bell Labs produced "Daisy Bell" using a very similar model 1960:
http://www.cs.princeton.edu/~prc/Daisy.mp3
This was the inspiration for HAL to sing Daisy in 2001: A Space Odyssey.
Also, some samples of making it talk would be good, like: https://en.wikipedia.org/wiki/Voder
My initial motivations for using literate programming with Voc was to take advantage of TeX's math mode to express what was happening numerically in code, as well as to have the ability to use BibTex inside the code.
> Also, some samples of making it talk would be good, like: https://en.wikipedia.org/wiki/Voder
At the bottom of the page, there are music examples on Vimeo, with plots of the 44 vocal tract diameters being manipulated in realtime:
https://vimeo.com/220091107
https://vimeo.com/220091290
https://vimeo.com/220091487
My goal was really to build vocalizations, and not necessarily to produce speech. This engine is a bit more low level than that. It could be possible to build a speech engine on top of Voc though... next steps perhaps?
https://www.youtube.com/watch?v=bo5ZEgBEapk
https://twitter.com/BelaPlatform/status/856110345332674561
https://github.com/giuliomoro/pink-trombone
I was going to actually fork off your project, but decided it would be cleaner/faster to do it off the original code since I wanted to write it in ANSI C.
It was very encouraging for me to see some ports to C/C++ already in progress. At the time, it was definitely an overwhelming notion. That chunk of JS code looked impenetrable to me.
"The Bela Modular breaks out the audio, analog and digital I/Os to jacks, and handles voltage scaling for Eurorack-compatible CV levels, providing a total of 2 audio in, 2 audio out, 8 analog in, 8 analog out, 4 digital in, 4 digital out and 4 LEDs over two modules, 12HP and 10HP wide.
We are planning to do a small production run of Bela Modular units later this year. Please contact us[0] directly if you think you would like one of these, or stay tuned here and on the forum.[1]"
[0] info at bela dot io
[1] http://forum.bela.io
What you put into the filter is important. The LF glottal pulse model used here is a pretty good excitation signal... aspiration noise REALLY makes a difference. It would still sound artificial, but it definitely wouldn't sound metallic.
On a sidenote, can this be used to train or obtain voice parameters of oneself for using it in software programs like Espeak?
It may be possible to go the other way around and analytically derive parameters for Voc that match target formant frequencies. Not sure though...
IIRC, Adobe Animator does that too.
https://github.com/DanielSWolf/rhubarb-lip-sync
In a way, lips and mouth are accounted for here, but in a more abstract away. The KL model approximates the vocal tract as a series of cylindrical tubes with varying diameters. Segments of the tubes actually correspond to things like the tongue and mouth somewhat. In this model there is a really neat tongue control that manipulates these segments. It's quite expressive!
This model is a 1d waveguide, so it doesn't account for things like the curvature of the tract. More modern vocal modelling techniques include implementing a 2-dimensional waveguide, which does allow for this control.
