I've always wondered if there is something like this for the voice of Majel Barrett-Roddenberry (Computer). I know her voice was recorder phonetically before she died (https://io9.gizmodo.com/the-voice-of-star-treks-computers-co...), but I don't think there's public access to that.
Perhaps transfer learning could be used to copy the style, using something like SV2TTS.
As far as I know they have not released those phonetic recordings. But even without those recordings it might be possible to use those deepfake voice fingerprinting systems to build an STT engine from sound clips from the show.
Where I'm at now: there's an audio book she recorded, which is on the edge of being a large enough corpus for a third party to generate a model. I haven't listened to it yet though, so I don't know of it's in the right flat intonation to be easily usable.
There’s an HN thread within the last few weeks discussing research presented showing ML needing only five seconds of someone’s speech to replicate them.
Teensy audio board is here: https://www.pjrc.com/store/teensy3_audio.html (heads up, there's a rev D board to use with the Teensy 4, has the same functionality but different pinouts to match the Teensy 4).
I have a friend who is very handy, very into building vintage stuff. Cars, bikes, etc. One day he rang on some unrelated subject, I could hear he was on the motorway, I asked him where he was going.
"Oh I'm just off to get a voice box for my dalek". Like that's the most normal thing in the world.
Sure enough, next time I visited, a certain iconic malevolent alien, that fears only stairs, was sitting in the corner of the garage, happily murderously chuntering away every time he pressed the button.
This is very spot on, especially if you use some standard Dalek lines. A good example to test: "The Doctor is detected: terminate, terminate! Seek, locate, destroy."
Makes it more interesting, in a way - who knew there was an uncanny valley between humans trying sound cybernetic and software trying to sound like humans trying to sound cybernetic.
Same here. It's obvious from the other clip provided that the voice evolved over time; but it's not something I noticed at the time. The voice you linked to is the one that is most memorable to me.
Safari has all kinds of issues with audio. Every report just leads to a rdar: url and then silence, both from Apple and from Safari haha (cry)
For a while couldn't send streamed but redirected audio through the webaudio api on Safari only. Workaround was to manually catch the redirect but the latest safari that doesn't help.
Like WebGL I don't think Apple wants Web Audio to work. They've got several outstanding bugs in WebGL (3yrs+) and their non-existent WebGL2 support as not seen a single commit in > 3yrs. Web Audio appears to be the same. It's frustrating.
I've had trouble identifying why. Safari has a largely compatible AudioContext API, and there's no errors, but the audio never starts playing and there's no "onended" event when it's supposed to be done. So I'm a bit stumped at the moment.
Which voices are available are browser and OS dependent and there's no "borg" voice anymore. There used to be several alien and or non human voices but Apple removed them from the OS and most browsers just call the OS's text to speech API
--correction--
You need to go into the VoiceOver Utilities and add all the novalaty voices back in
This wasn't made with the window.speechSynthesis API, it's using 2 older systems (espeak and sam) that have been ported to JavaScript. They don't sound as good but they generate AudioContext data which can be processed, mixed, and visualized in the browser. I don't think it wouldn't be possible to make this kind of Borg voice using the speechSynthesis API -- I did it by generating the speech using 6 voices, 3 in each channel.
I totally agree the built-in OS speech systems sound better over and I may end up adding window.speechSynthesis support to the API I made so it'll expose more voice profiles, but those ones will lack the visualization ability.
It would be fun if you could generate a link with a hash of a message so you can send it to your friends and coworkers with a silly message that autoplays.
It already does this, when you click the "say" button it generates a base64 url. You can share that url. The problem is when someone loads the URL the browser will not autoplay the clip. You have to click a button (or some other user interaction) to start the Web Audio API, it's a really annoying limitation that I wish Firefox and Chrome would change to a one-time popup confirmation. So what I did was hide the text box until after playing the audio.
Perhaps transfer learning could be used to copy the style, using something like SV2TTS.