Hacker News new | past | comments | ask | show | jobs | submit login
Pink Trombone (dood.al)
575 points by errozero 3 months ago | hide | past | web | favorite | 75 comments

In a similar vein: http://www.adultswim.com/etcetera/choir/

The most interesting thing about this one is the chord progressions it generates.

If you enjoyed that, the developer, David Li has a lot of other great projects: http://david.li/

The same team did a squishy Morty head for Adultswim. http://www.adultswim.com/etcetera/elastic-man/. Chris Heinrichs (Dolphin Club Audio formerly of Enzien Audio, https://dolphinclub.website/) does the procedural audio models.

In a historical vein, the absolutely magnificent Voder, from the late 1930s:



I found it difficult to tell how well it's actually speaking because the announcer is priming the audience for every utterance. Very cool though!

After your comment, I muted the audio when the announcer told the audience what the computer would say. I couldn't get it straight away, but when they emphasised different words in the sentence I could usually tell what was being said "Greetings everybody!".

If you're interested in chord generation, I've used Cthulu[0] by Xfer records to experiment with chord progressions quickly.

[0] https://xferrecords.com/products/cthulhu

Cubase Chord Pads are also pretty sweet!

Praise Him.

I didn't expect that to be so beautiful.

You’re right, the progressions are beautiful! Do you know how they are generated?

Very impressive indeed. I dug around and found his tweet: "uses a neural network I trained on choral pieces to generate the harmonization for your melody"

[0] https://twitter.com/daviddotli/status/1075068713936830464

Man, shoot. I compose a bit myself and am really into vocal harmony and I was hoping that he'd hard-coded some theory rules.

I did notice that it seems to vary with the past 3 notes you've moved to. If you continue repeating the same pattern of jumps among the scale, you can replicate the chords the voices slide to.

> I compose a bit myself and am really into vocal harmony and I was hoping that he'd hard-coded some theory rules.

Would you find this useful? Is this not already available somewhere? Sounds like a fun project!

Yeah, I think I'd find it useful- I'd love to be able to ape a few of the tricks that the four voices use to slide to pleasing chords. The music theory "rules" are certainly available, but only in the same sense that the rules of calculus are available somewhere.

Either way I agree, definitely a fun project.

that is incredibly satisfying to play with

A little while back I used this combined with a physics simulator to make a toy where you throw polygons in the air and they scream: https://ohgodwhathaveidone.stackblitz.io/

Code's here if anyone wants to play: https://stackblitz.com/edit/ohgodwhathaveidone – I did a fairly medium job of abstracting the synthesis engine away from the UI, but it might be a decent starting point if you're looking to make other Trombone-based web silliness.

This reminds me of the Sprechmaschine [1] ("speaking machine") built at the end of the 18th century by Wolfgang von Kempelen (the guy who build the original mechanical turk [2]). Here is a YouTube video showing it in action (for example, the machine says "Mama" around 1:14): https://www.youtube.com/watch?v=k_YUB_S6Gpo

[1] https://de.wikipedia.org/wiki/Wolfgang_von_Kempelen#Die_Spre...

[2] https://en.wikipedia.org/wiki/The_Turk

Lest we forget that speech synthesis is not just for grotesque but amusing semi-real vocal synths like this, here's a BBC Radio 4 history of speech synthesis as an assistive technology - Klatt's Last Tapes, by Stephen Hawking's daughter, Lucy:


Man this brings back memories. We used this program for my linguistics homework in college. In 1996. Although I think it was an app then not a web page.


I'm guessing the thing from '96 was a Java applet, so indeed an app.

or a desktop/windows app. Back then we'd be more likely to call them programs.

And also sometimes referred to as applications. And while the abbreviation “app” only became mainstream around the time of the first iPhone, the warez scene had a habit of referring to software applications as “appz” since way before then, which admittedly is not the exact same word as “app”, appz being an intentional misspelling of a pluralization of the abbreviation, but it’s close to it. They sure loved the letter z. Appz, crackz, mp3z, moviez, gamez, ebookz.

This way of writing gave people some unique words to query search engines for, making it easier to find warez sites, torrent trackers and ftp servers hosting pirated files but I think it orginated long before the web was born — even before any sort of networked computing existed.

Consider the word “phreaking” which was invented at the time of mechanical telephone switching systems. This word came from combining the words “phone” and “freaking”. I think that word could have inspired hackers to use use ph in place of f in other words, and then once you start making that substitution, other substitutions follow, like using z instead of s.

I dunno, I grew up in the 90s so there is a lot of hacker culture that precedes my time. What I do know is that a lot of early hacker culture spawned other subcultures, and that several influences of the origins remain central in these. For example, the demoscene.

How were you involved in the warez scene? How did things work, and how have they changed? Stories that are part of the history of the internet are always intriguing to me so I'm really interested in hearing about that.

To be fair, we also said progz.

We had applications back in the 1990s. I know, it seems unfathomable now but applications preceded the App Store.

Could you explain in what sense?

This would be useful to demonstrate the difference between p/f and l/r for those brought up without those distinctions.

I'd also (as an English speaker) like to see/hear Dutch g and Xhosan clicks.

yes, I think it's time for this to be connected to text input for seeing how any word is pronounced.

specifically: https://twitter.com/shaunlebron/status/989192507828432896


Would it be possible to use Reinforcement Learning + Speech Recognition to turn this thing into a real voice synthesizer?

No need. This thing is already a voice synthesizer. This is how modern synthesizers work, more or less: by generating a sine wave and then modifying it in the same way as the vocal tract does.

Not a sine wave. The vocal tract is subtractive; the larynx has to generate a waveform with lots of harmonics some of which the vocal tract can then remove.

I meant "text to speech engine" then I guess.

Yeah, I know what you meant. You use a tool like the CMU pronunciation dictionary[1] to turn words into phonemes, and then you use a model similar to the pink trombone to turn the phoneme string into sound, including the transitions between different phones (which, it turns out, actually matter more than the phones themselves for making it understandable). This is how TTS works.

1 http://www.speech.cs.cmu.edu/cgi-bin/cmudict

I wonder if this sort of model would lend itself well to program induction similar to this paper: https://web.mit.edu/cocosci/Papers/Science-2015-Lake-1332-8.... Having a mouth seems like it would enforce a strong inductive bias.

Reminds me of Xiph's Speex/CELP model of speech as a mix of noise and frequency to achieve high compression, requiring as little as 2.15 kilobits (275 bytes) per second. It sounds perceptibly similar to the original recording, even though the difference between the input and output sampled data may be high:


Bitrate comparison:




Maybe higher compression can be achieved with better prediction, aka machine learning.

I've actually been looking for the opposite of this (i.e. sound in, mouth representation out) for a while. Does anyone know of such a thing?

Yes! Oculus makes an SDK for this. You can use it in Unity 3D, Unreal, or directly in a native app. https://developer.oculus.com/documentation/audiosdk/latest/c...

Thanks! That's like 80% of the way there. It looks to be missing a lot of state internal to the mouth (understandable given that it's targeting avatar lipsyncing), and appears to discretize the values somewhat, making it less useful for linguistics practice. But I bet the underlying technology could be adapted easily.

Wow! I was able to successfully recreate all sorts of letters and sounds just by imagining how my own mouth works, and then manipulating the different components on the pink trombone in the same way. I'm impressed!

Pondering what makes this sound "male".

I have actually used this very tool back in the day to help learn how to speak in a male or female voice. One of the five things I do is manipulation of my tongue to change the cavity of my mouth to make the space bigger (more masculine) or smaller (more feminine) which this tool demonstrates very well.

Edit: to be clear I used to sound male 24/7 and now I sound female 24/7 Rather than thinking you are speaking male or female it helps if you think you are playing a musical instrument with a number of controls that you control (with your mind whahahaha). Then it is just about learning what each control does and how to play them so you get the result you want.

Your voice is muscle memory so while at the start I had to actively "play" a female voice that is no longer the case and now if I ever want to "play" a male voice I have to actively think about how I am going to speak each word to make it male.

This is also exactly how male countertenor singers produce a female-sounding voice, by making the vocal cavity smaller to adjust the formants upward. If you don’t do that, it just sounds like a male head voice or falsetto.

You can make it sound female by dragging the voicebox control to the right and down. It requires both higher pitch, and less power in the odd harmonics.

I agree it gets closer, but as is with much of speech synthesis, it still to me ends up sounding like "a male talking in a high-pitched voice" and not "a female".

In addition, this would imply that only males can talk in a low register, which is patently false. Low register female voices are fairly common.

It's a bit of a subtle distinction. A lot of voice acting on cartoon boys is done by women (classic example: Bart Simpson)

So they can make it sound more masculine but still high pitched

It has to do with formants. It's briefly explained in this video, which also demonstrates changing a female voice to male: https://youtu.be/nPAINeIGxMc.

Vocal chord.

Sudden sound warning.

Sudden disconcerting sound warning!

This is amazing. I can get it to make almost any speech sound, but one I can't get is [s], because the model lacks teeth!

disable "always voice", and click a bit below hard palate, slightly to the right towards the lip (below the at in palate), so there will only be a small gap for the air to go through

You're right. It sounds to me like [s] with lip rounding, where the teeth don't contribute as much acoustically, but it does indeed sound like an [s].

The model also lacks mouth width/lip shape, which is crucial for differentiating between Swedish vowels (for example i vs y, which have most of the mouth the same except for y has the lips protruding and i is more of a smile).

Were you able to get "m" and "n"?

Yes, just click in the nasal cavity above 'lip' and 'hard palate', respectively, to get [m] and [n].

I feel like I am simulating orgasmic responses!

What someone needs to do is put sensors in people's mouths, record them saying known phrases, then stick the sensor data+phrases into some AI and see if we can't get that Trombone talkin'!

The shape of the tongue control reminds me a lot of the rhombus in https://en.wikipedia.org/wiki/International_Phonetic_Alphabe...

... which is because the latter was patterned after the shape of the mouth.

I feel embarrassed playing with this

You should have seen the looks from my coworkers when my computer started shouting AAAAAHHHHHHH

Thats nice and everything but until I have a hardware version where I can just switch a button on and do this and that, I'm not really going to be truly satisfied. Please hardware'ify.

how do i use this to answer the phone?

Is the author aware of the other meaning of "pink trombone"?[1]

[1] https://www.urbandictionary.com/define.php?term=pink%20tromb...

I'm fairly sure there is no other meaning than the one to which you refer, so its perhaps more likely that you are missing the risqué point being made by assuming the prude levels are higher than you might think. A lot of the folks who make these kinds of hacks, are perfectly fine with the obscure, obscene, perverse nature of their naming of things ... Those of the anthropological inclination may decide that, in fact, an obscene name for something like this is a requisite.

Like how in jargon.txt they used 69 as an example of a big number:

"69 adj. Large quantity. Usage: Exclusive to MIT-AI. "Go away, I have 69 things to do to DDT before worrying about fixing the bug in the phase of the moon output routine..." (Note: Actually, any number less than 100 but large enough to have no obvious magic properties will be recognized as a "large number". There is no denying that "69" is the local favorite. I don't know whether its origins are related to the obscene interpretation, but I do know that 69 decimal = 105 octal, and 69 hexadecimal = 105 decimal, which is a nice property. - GLS)"

I was scared to click on the parent link at work because I KNEW this term had to have a second "urban dictionary" meaning...

For some reason, dirty names are common practice in audio tools. Some examples: "Rectal Anarchy" is another vocal synth for Buzz, grANALizer (emphasis not mine) is a popular granular audio effect, and even in the commercial world, it gets more subtle but it still exists: Image Line sells a plugin called "Gross Beat", which is a bilingual en/fr dirty joke.

How anglocentric

The words "pink" and "trombone" are english words, so there's that...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact