Hacker News new | past | comments | ask | show | jobs | submit login
JavaScript singing synthesis library (github.com/usdivad)
91 points by gattilorenz on Oct 12, 2016 | hide | past | favorite | 26 comments

Cute as a Javascript hack, but not going to compete with Vocaloid or Festival Singer.

Somebody really needs to crack singing synthesis. Vocaloid from Yamaha is good, but it works by having a live singer sing a prescribed set of phrases, which are then reassembled. Automatic singer generation is needed.

Figure out some way to use machine learning to extract a singer model from recorded music and generate cover songs automatically. Drive the RIAA nuts. Get rich.

Thanks for the feedback, and yeah I agree that it's pretty primitive at the moment (it can't even do legato!). I'm working on improving it though, and hopefully it demonstrates some of the potential for Web Audio applications; I actually originally made this for a demo session at this year's Web Audio Conference [0].

Modeling singers using machine learning would be really neat; I'm not too hip to the current research around that, although the idea brings to mind WaveNet [1], which seems like it'd be absolutely fascinating to try with pitched audio / using musical parameters.

[0] http://webaudio.gatech.edu/

[1] https://deepmind.com/blog/wavenet-generative-model-raw-audio...

well, something better than Vocaloid does exist: http://mtg.upf.edu/research/labs/mir-lab/vap

It's from the same University that helped develop Vocaloid, and if you look at (well, listen to) their INTERSPEECH 2016 submission you'll find it's another league entirely ( https://chanter.limsi.fr/doku.php?id=winner_of_the_singing_s... )

Related software for iOS musicians: https://klevgrand.se/products/jussi/


Way back in the mid-1980s in the United Kingdom, and there were few places more 80s than that, Superior Software produced Speech!, a software speech synthesis program for the BBC Micro, a 6502-based machine running at 2MHz which didn't have PCM audio. It could reasonably reliably read out ordinary English text in a fairly robotic voice.

It was 7.5kB of 6502 machine code.

There's a writeup from the author here: http://rk.nvg.ntnu.no/bbc/doc/Speech.html and a demo here: https://www.youtube.com/watch?v=t8wyUsaDAyI

It was an utter sensation (featuring, among other places, as the computer voice in Roger Waters' Radio Kaos).

It's obviously not going to win awards, being barely intelligable, but if you can achieve that with a table of 49 phonemes each of 128 4-bit samples, then producing basic speech isn't that hard. I think that mespeak.js, which is what this demo is based on (which is pretty cool, BTW) is based on the same principle, although with obviously better samples.

(Unlike producing human sounding speech, which is appalling difficult.)

The demo doesn't allow you to put in your own lyrics, keeps loading exactly the same set of words. Really awesome project though

Original author here; good catch, I've fixed it now so it'll sing whatever lyrics + notes are in the grid when you press "Set Voices". Thanks for pointing that out!

Glad to hear you got it fixed, I'll play around with it on my lunch break

This is great, nice job! I'm working on a midi player in JavaScript; it would be interesting to use this as the sound font. Maybe assigning certain words to certain pitches. https://github.com/grimmdude/MidiPlayerJS

What back end code have you integrated this with? Have you tried "flocking"? http://flockingjs.org/

(I have not, I'm just wondering about somebody else's experience with this sort of thing in JS)

I'm still pretty new to this (modern computer synthesis). I had a couple electronic music (synthesis) classes back in the day, but that day was back in the late 80s. We didn't even have any digital equipment the first time I took the class - it was analog gear with literal patch cords between LFOs, envelope generators, oscillators, filters and such. The second time we actually had some digital stuff to do FM and sampling layers.

There's no back end code used with this (assuming you mean server side). I've never tried flocking but it looks interesting. The library I wrote, MidiPlayerJS, just emits JSON events and in my demo I'm feeding those into a sound font library (https://github.com/danigb/soundfont-player). But my thought was to switch that out for this singing synth library and see what kind of sounds it can produce.

I don't know how to inform this author with the issue with 404 when try to load soundfont player.


You could submit an issue on the repo https://github.com/gleitz/midi-js-soundfonts/

Sorry about the confusion. By "back end" I meant the tone generator(s) that your "sequencer" (?) is calling.

Oh, by default the library doesn't implement any tone generator. But, in my demo (http://grimmdude.com/MidiPlayerJS/) I'm using https://github.com/danigb/soundfont-player to generate the sounds.

Thanks! And that would be awesome, definitely interested in that. MidiPlayerJS looks great. Let's try it out and see what we can make happen!

Looks cool! Combine this with Web MIDI and you could make a reasonable DAW!

Thanks for posting this!

Project author here, just want to say thanks gattilorenz for sharing (was quite the pleasant surprise to see this on the front page!) and everyone for the feedback + fascinating projects, ideas, links etc. Really cool to see so much enthusiasm for speech+singing synthesis and Web Audio!

It's been a good year for the English singing synthesizer world, with the launch of chipspeech. (https://www.plogue.com/products/chipspeech) But I'm pretty interested in whether more realistic singing synthesizers will be made, since there are a few recent new voices by Acapela Group and others developed for non-singing speech.

For some reason, I found the sample audio unnerving. I don't know. I guess there's also an uncanny valley for synthesized speech?

This is great! I'm in the very early stages [0] of creating a framework to automate and control physical instruments through hardware & software. Never thought voice would be possible, I'll have to check out integrating this! Thanks!

[0] https://github.com/fotijr/MetroDom

On Safari, instead of using the normal "AudioContext" constructor you must create a "webkitAudioContext"- a feature detection check for this would be a nice addition.

EDIT: This issue has now been fixed. However, it's led me to notice some (unrelated) timing problems in both Safari and Firefox, which will take some deeper digging to figure out. Seems like browser compatibility rabbit hole never ends!


Thanks for this; I've fixed that issue and started on getting it compatible with Safari, but turns out there are some other errors regarding Float32Array mapping and support for AudioBuffer.copyToChannel(). I'll have to look more into this, but rest assured I'll push the changes when I get it working in Safari!

Excellent. Thanks for going the extra mile to chase down these compatibility issues. I know from experience how frustrating this stuff can be.

Sounds pretty creepy, especially right after watching "Westworld" :)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact