Hacker News new | past | comments | ask | show | jobs | submit | zebproj's comments login

oh yeah, I made that.

Sporth is a stack-based language I wrote a few years ago. Stack-based languages are a great way to build up sound structures. I highly recommend trying it.

Chorth may need some fixes before it can run again. I haven't looked at it in a while, but I had a lot of fun using when I was in SLOrk.


If you compare codebases, SuperCollider is definitely the more "modern" of the 2. SC is written in a reasonably modern version of C++, and over the years has gone through significant refactoring. Csound is mostly implemented in C, with some of the newer bits written in C++. Many parts of Csound have been virtually untouched since the 90s.

Syntax-wise, Csound very closely resembles the MUSIC-N language used by early computer musicians in the 60s. "Trapped in Convert" by Richard Boulanger was written in Csound in 1979, and to this day is able to run on the latest version of Csound.

Both Csound and SC are both very capable DSP engines, with a good core set of DSP algorithms. You can get a "good" sound out of both if you know what you are doing.

I find people who are more CS-inclined tend to prefer SuperCollider over Csound because it's actually a programming language you can be expressive in. While there have been significant syntax improvements in Csound 6, I'd still call Csound a "text-based synthesizer" rather than a "programming language".

That being said, I also think Csound lends itself to those who have more of a formal background in music. Making an instrument in an Orchestra is just like making a synthesizer patch, and creating events in a Csound score is just like composing notes for an instrument to play.

FWIW, I've never managed to get SuperCollider to stick for me. The orchestra/score paradigm of Csound just seems to fit better with how I think about music. It's also easier to offline render WAV files in Csound, which was quite helpful for me.


I have programming experience, but that's actually why I prefer Csound. Since Csound's engine is effectively oriented around building up instruments in a modular way, I feel it can simply be wrapped up into more general purpose programming languages to get a language with the power of the more modular synth engine.


You might enjoy my project called sndkit [0]. It's a collection of DSP algorithms implemented in C, written in a literate programming style, and presented inside of a static wiki. There's also a tiny TCL-like scripting language included that allows one to build up patches. This track [1] was made entirely using sndkit.

0: https://pbat.ch/sndkit/

1: https://soundcloud.com/patchlore/synthwave


Sounds interesting. I'll have a look.


I actually met BT and asked him about this track.

While it's mostly written in Csound, he "cheated" with the guitar track, which was a recorded sample brought into Csound.


I can't think of another Csound user that would call that cheating. Function tables and audio file loaders are in there for a reason. Sample away!


See my other comments here for more info about the underlying technology.

It is pretty incredible that sophisticated digital physical models of the human vocal tract were being done in the early 60s. This was able to be done largely due to the deep pockets of Bell Labs. A lot of R+D was put into the voice and voice transmission.


The singing synthesizer used a surprisingly sophisticated physical model of the human voice [1].

The music was mostly likely created using some variant of MUSIC-N [2], the first computer music language. The syntax and design of Csound[3] was based off of MUSIC-N, and I believe the older Csound opcodes are either ported or based off those found.

Apparently the sources for MUSIC-V (the last major iteration of the MUSIC language) can be found on github [4], though I haven't tried to run it yet.

1: https://ccrma.stanford.edu/~jos/pasp/Singing_Kelly_Lochbaum_...

2: https://en.wikipedia.org/wiki/MUSIC-N

3: https://en.wikipedia.org/wiki/Csound

4: https://github.com/vlazzarini/MUSICV


I guess they built upon the Voder [1] (Homer Dudley, Bell Labs 1939 as well. But that was played manually, amazing ‘instrument’?

1: https://youtu.be/5hyI_dM5cGo


Sort of. Both use articulatory synthesis, which attempts to model speech by breaking it up into components and using some coordinated multi-dimensional continuous control to perform phonemes (the articulation aspect). The voder uses analog electronics, while Daisy does it digitally (and without a human performer).

The underlying signal processing used for both is different, but both use a source-filter mechanism.


The synthetic voder output sounds more or less exactly like the output of a vocoder where the input is a human voice and the carrier is a sawtooth. Not surprising, given that the voder was made by the same people.

But I'm still unsure why those two things sound so similar to each other, and formant/LPC chips sound so similar to each other, but the two groups of things sound so dissimilar (at least, IMO).

I have a background in electronic music, so I'm pretty familiar with additive, subtractive, and other types of synthesis.

I'm especially surprised about the physical modelling sounding more like a formant chip, because a guitar "talk box" gives a sound exactly like a vocoder, and that should be almost the same thing, just with a real human mouth instead of a model.


The vo(co)der uses banks of fixed filters to apply the broad shape of a spectrum to an input signal. It's basically an automated graphic EQ. The level of each fixed band in the modulator is copied to the equivalent band in the carrier.

The bandpass filters have a steeper cutoff than usual and are flatter at the top of the passband than usual. And the centre frequencies aren't linearly spaced. But otherwise - it's just a fancy graphic EQ.

The formant approach uses dynamic filters. It's more like an automated parametric EQ. Each formant is modelled with a variable BPF with its own time-varying level, frequency, and possibly Q. You apply that to a simply buzzy waveform and get speech-like sounds out. If you vary the pitch of the buzz you can make the output "sing."

LPC uses a similar model but it applies data compression to estimate future changes for each formant band. So instead of having to control all the parameters at or near audio rate, you can drop the control rate right down and still get something that can be understood.

There are more modern systems. FOF and FOG use granular synthesis to create formant sounds directly. Controlling the frequency and envelope of the grains is equivalent to filtering a raw sound, but is more efficient.

FOF and FOG evolved into PSOLA which is basically real-time granulated formant synthesis and pitch shifting.


Many of the simpler vocal tract physical models are very similar to the cascaded allpass filter topologies found in LPC speech synthesizers.

In general, tract physical models have never sounded all that realistic. The one big thing they have going for them is control. Compared to other speech synthesis techniques, they can be quite malleable. Pink Trombone [1] uses a physical model under the hood. While it's not realistic sounding, the interface is quite compelling.

1: https://dood.al/pinktrombone/


Thank you! Seems like that project was incredibly far ahead of its time.

The physical-modelling aspect is super interesting. Does that mean that the similarity in sound to formant-based speech synthesis is because they're both using a sawtooth wave, noise, or other relatively simple sound as the raw input? I always imagined that a physical-modelling speech synthesizer fed by a sawtooth wave would sound more like a vocoder than Votrax or TI LPC output does, but I guess not.


> Does that mean that the similarity in sound to formant-based speech synthesis is because they're both using a sawtooth wave, noise, or other relatively simple sound as the raw input?

Essentially, yes. Both are known as "source-filter" models. A sawtooth, narrow pulse, or impulse wave is a good approximation glottal excitation for the source signal, though many articulatory speech models use a more specialized source model that's analytically derived from real waveforms produce by the glottis. The Lilencrantz-Fant Derivative Glottal Waveform model is the most common, but a few others exist.

In formant synthesis, the formant frequencies are known ahead of time and are explicitly added to the spectrum using some kind of peak filter. With waveguides, those formants are implicitly created based on the shape of the vocal tract (the vocal tract here is approximated as a series of cylindrical tubes with varying diameters).


Human speech production/perception works by articulation changing the shape, hence resonant frequencies (formants), of the vocal tract, and our ear/auditory cortex then picking up these changing formants. We're especially attuned to changes in the formants since those correspond to changes in articulation. The specific resonant frequency values of the formants vary from individual to individual and aren't so important.

Similarly the sound source (aka voice) for human speech can vary a lot from individual to individual, so serves more to communicate age/sex, emotion, identity, etc, not actual speech content (formant changes).

The reason articulatory synthesis (whether based on a physical model of the vocal tract, or a software simulation of one) and formant synthesis sound so similar is because both are designed to emphasize the formants (resonant frequencies) in a somewhat overly-precise way, and neither typically do a good job of accurately modelling the voice source, and other factors that would make it sound more natural. The ultimate form of formant synthesis just uses sine waves (not a source + filter model) to model the changing formant frequencies, and is still quite intelligible.

The "Daisy" song somehow became a staple for computer speech, and can be heard here in the 1984 DECtalk formant-synthesizer version. You can still pick up DECtalks on eBay - an impressive large VCR-sized box with a 3" 68000 processor inside.

https://en.wikipedia.org/wiki/Daisy_Bell


The neat thing about this particular singing synthesizer is that it used a surprisingly sophisticated (especially for the 60s) physical model of the human vocal tract [1], and was perhaps the first use of physical modeling sound synthesis. Vowel shapes were obtained through physical measurements of an actual vocal tract via x-rays. In this case, they were Russian vowels, but were close enough for English.

While this particular kind of speech synthesis[2] isn't really used anymore, it's still fun to play around with. Pink Trombone [3] is a good example of a fun toy that uses a waveguide physical model, similar to the Kelly-Lochbaum model above. I've adapted some of the DSP in Pink Trombone a few times[4][5][6], and used it in some music[7] and projects[8]of mine.

For more in-depth information about specifically doing singing synthesis (as opposed to general speech synthesis) using waveguide physical models, Perry Cook's Dissertation [9] is still considered to be a seminal work. In the early 2000s, there were a handful of follow-ups to physically-based singing synthesis being done at CCRMA. Hui-Ling Lu's dissertation [10] on glottal source modelling for singing purposes comes to mind.

1: https://ccrma.stanford.edu/~jos/pasp/Singing_Kelly_Lochbaum_...

2: https://en.wikipedia.org/wiki/Articulatory_synthesis

3: https://dood.al/pinktrombone/

4: https://pbat.ch/proj/voc/

5: https://pbat.ch/sndkit/tract/

6: https://pbat.ch/sndkit/glottis/

7: https://soundcloud.com/patchlore/sets/looptober-2021

8: https://pbat.ch/wiki/vocshape/

9: https://www.cs.princeton.edu/~prc/SingingSynth.html

10: https://web.archive.org/web/20080725195347/http://ccrma-www....


Another excellent, but quite dense, resource I've found helpful for implementing my own waveguide models is Physical Audio Signal Processing, a book available as a hard copy and online [1]. There are also an absolute ton of research papers on these topics which have failed to be summarized anywhere or cited outside the small circle of researchers, so there's a ton of institutional knowledge about physical modeling locked up in academic papers that isn't super accessible.

1: https://ccrma.stanford.edu/~jos/pasp/


I've been fascinated by the simplicity of this since I ran into SAM (Software Automatic Mouth) on the C64, but never really taken the time to delve into it. Your links are an amazing resource...


From the website:

> Note: aubio is not MIT or BSD licensed. Contact the author if you need it in your commercial product.


Module file formats [0] have been around since the 80s, and they can kind of be thought of as MIDI files with sound samples embedded in them. You can get some very impressive sounding full-length tracks in only a few kilobytes.

0: https://en.wikipedia.org/wiki/Module_file


Do you know if they have a .js implementation? Or some lightweight C one?


Are you asking for library recommendations for a simple standard used extensively for the last 35 years when there are already 6 libraries linked in the wikipedia page you were given?


I appreciate the shoutout to Sporth! Admittedly, I haven't used it for quite a few years. But it still works just fine.

In addition to being a part of AudioKit, it also has it's own repository as a self-contained command line program:

https://github.com/PaulBatchelor/Sporth/

I used to use a live-coding setup with Sporth centered around Vim, though it has never been added to the codebase. If anyone is interested in this, please feel free to email me at thisispaulbatchelor at gmail dot com.


When I was pursuing Forth, and about to give up on doing anything beyond a file munger I wrote, I discovered Sporth, and had a blast for at least two or three weeks and kept Forth on my radar! A lot of complexity from a few lines of Sporth! What are you up to nowadays?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: