

Text-To-Speech engine in JS using eSpeak - tilt
http://syntensity.com/static/espeak.html

======
dstein
YES!! I've been clamoring for client-side TTS for a while now. This particular
synth isn't great. Many words don't come out sounding right. But being able to
do TTS client-side in a web page has about a thousand times more usability
than any built-in OS level TTS. I can have an extremely high level of control
over what is being said and when, and I can feed it custom replacements ("for
tea" instead of "forty") and so on.

I was a little disappointed after hacking Google's Translate API to get TTS
working. Downloading an MP3 from a remote server just felt like an awful
cludge solution. And there were http delays, cut off audio clips, and trying
to queue up and play multiple <audio> clips never worked right because of
delays between each clip.

~~~
chime
While I think this client library is brilliant, this is the worst possible
approach to the problem. I'm working on an iPad app using PhoneGap to help
people with speech+motor disabilities communicate (<http://ktype.net>). All I
want is a window.speak() function that works with minimal delay. Give me the
ability to select one of the standard voices, pitch/tempo, volume, and feed it
the text to say. Callbacks/events would be nice too.

This kind of stuff should be part of HTML5 and not just input[type=range].
Most OSs from Windows to iOS includes text-to-speech for accessibility. Why
can't we hook into it from the browser? Give me a list of available
voices/types (gender, country, accent), make all the other parameters 0.0-1.0,
and I'll be a happy dev.

~~~
azakai
I am not sure using the system speech libraries is the best solution. You will
get different results depending on the OS. Which means you will need to test
with all OSes (and all versions of those OSes). And each will have its own
list of available voices and languages, etc.

Whereas if you use a JS library in your project, you will know exactly how
things will sound, you can include exactly the voices and languages you want,
and only have to test once.

~~~
chime
> You will get different results depending on the OS.

Would be no different than making web pages. Even JS isn't same across
browsers. The big question is do I want to use a JS-TTS library or native and
I think native will always be better performance/resource wise.

> And each will have its own list of available voices and languages, etc.

That can easily be solved by window.speech.listvoices like I mentioned in my
post above.

> exactly how things will sound, you can include exactly the voices and
> languages you want, and only have to test once.

I've been programming for 20 years and that hasn't happened for any system,
ever, not even once. You always have to test on different devices, OSs,
platforms, browsers etc.

~~~
azakai
I agree that making web pages takes a lot of testing on different browsers and
OSes. And that, as you said, there is no write-once-run-anywhere in software.
But I think we should minimize the problem, not make it worse. Having the same
JS TTS gets you much closer to the ideal, using system TTS gets you farther.

I do agree that a native library will have better performance, I am estimating
something like 3-5X faster in the near future. So that is a benefit to the
system TTS approach. But even the fairly unoptimized version in this online
demo isn't too slow to be useful, I don't think, and it can be made much
faster if necessary. I haven't focused on speed yet.

My concern with each OS having its own voices is that the names of the voices
aren't enough to know what your users will hear. Unless we have a standard for
TTS that includes the actual voice data, otherwise say "male UK English" may
sound very different on different platforms.

It's clear there are tradeoffs here, both ways, and you make sound points. But
I prefer the JS TTS approach, unless you are writing something like an iPad-
specific app. If you don't care about other platforms, then I agree, system
TTS is better.

------
eisbaw
Awesome!

Please consider how much the javascript code can be minified.

------
swood33
Sounds like the TTS engines from the 60s...how about one like the computer on
Star Trek..."You have 60 seconds before the Enterprise will self-destruct."

------
Sephr
This should use object URLs instead of data: URIs when supported (Firefox 7,
Chrome 11) due to the extreme inefficiency of data: URIs.

~~~
azakai
Thanks for the feedback!

Can you direct me to some docs for object URLs? I tried searching for it but
can't find anything.

~~~
Sephr
You'll want to read <http://dev.w3.org/2006/webapi/FileAPI/#creating-revoking>
and use BlobBuilder.js (<https://github.com/eligrey/BlobBuilder.js>) to smooth
out the inconsistencies (e.g. createObjectURL is on window on older WebKits)
and add BlobBuilder support to browsers that only support data: URIs.

~~~
azakai
Thanks!

------
fungi
i love it!

2 useless notes

"shit" doesn't sound right (but sounds fine from espeak on the command line)

japanese characters come out as random english characters eg す(su) comes out
as "y"

you're fucking awesome :P

------
swood33
testing 1 2 3

------
swood33
test 1 2 3

