Hacker News new | comments | show | ask | jobs | submit login
Show HN: TTS-API - Text-to-speech API (tts-api.com)
111 points by typeofNaN 1804 days ago | hide | past | web | 48 comments | favorite

Quite cool... But if this is generated by the text to speech engine from OS X, then I am afraid it is going beyond the license that come up with OS X. I remember reading through that license and it was clearly stated that using the OS X TTS was only for local usage on your Mac.

So I am extremely curious to know the license behind this tts-api? Can the OP provide such info or provide some of the tech behind it?

In case anyone else is curious, the section you're thinking of is

""F. Voices. Subject to the terms and conditions of this License, you may use the system voices included in the Apple Software (“System Voices”) (i) while running the Apple Software and (ii) to create your own original content and projects for your personal, non-commercial use. No other use of the System Voices is permitted by this License, including but not limited to the use, reproduction, display, performance, recording, publishing or redistribution of any of the System Voices in a profit, non-profit, public sharing or commercial context.""

Thank you! Exactly what I was referring to.

There's an unofficial Google API that does the same job people may be interested in: http://translate.google.com/translate_tts?tl=en&q=Hello+...

limited to 150 characters IIRC

Bing's got an API -- you need to sign up for a key but you get a very large number of free uses and I dont think there's a char limit.

For those who want to run their own copy of this, here's how to do it:

1. Find a Mac-based server (a co-located Mac Mini will be fine)

2. Run `say -o output.wav $TEXT` to generate the voice

3. Compress the WAVE file with `lame` or the system builtin `afconvert` to get the MP3 file.

`say` command supports multiple languages and dialects, but you'll have to install the necessary voice engines in OS X 10.8. Man page for `say` can be found here http://pastebin.com/nWbvJAAX

The complete list of voices/languages supported so far:

* English (Australia): 2 voices

* English (India): 1 voice

* English (Ireland): 1 voice

* English (Scottish): 1 voice

* English (South Africa): 1 voice

* English (UK): 3 voices

* English (US - Female): 7 voices

* English (US - Male): 6 voices

* English (US - Novelty): 14 voices

* Arabic (Saudi Arabia): 1 voice

* Chinese (China): 1 voice

* Chinese (HK): 1 voice

* Chinese (Taiwan): 1 voice

* Czech: 1 voice

* Danish: 1 voice

* Dutch (Belgium): 1 voice

* Dutch (Netherlands): 2 voices

* Finnish: 1 voice

* French (Canada): 2 voices

* French (France): 4 voices

* German (Germany): 3 voices

* Greek: 2 voices

* Hindi: 1 voice

* Hungarian: 1 voice

* Indonesian: 1 voice

* Italian: 3 voices

* Japanese: 1 voice

* Korean: 2 voices

* Norwegian Bokmal: 1 voice

* Polish: 1 voice

* Portuguese (Brazil): 1 voice

* Portuguese (Portugal): 1 voice

* Romanian: 1 voice

* Russia: 1 voice

* Slovak: 1 voice

* Spanish (Mexico): 2 voices

* Spanish (Spain): 2 voices

* Swedish: 2 voices

* Thai: 1 voice

* Turkish: 1 voice

Careful, though. This is expressly against the license agreement for Mac OS X.

If you use Ubuntu or similar there is also eSpeak

espeak -w output.wav 'I love jiggy'

or on Windows

new file, paste

    createobject("sapi.spvoice").speak("hi world")
save as 1.vbs, then double click it.

I'm pretty sure if you have lame installed, say can output directly to mp3.

The open-source, cross-platform equivalent of `say` is a piece of software called "SVOX Pico". There is also a Python-based wrapper for it called picospeaker. Relevant AUR link for ArchLinux users:


EDIT: SVOX Pico is a component of the Android OS.

This is a nice one, however I'm still confounded by the lack of progress since bell labs made an online text to speech converter many years ago. Particularly, the notion that the interpretation of each sentence is idempotent is just wrong. Want to see what I mean? A human would not speak like the following; there should be differences in intonation, "emotion" (sounding bored, angry, excited, etc. that varies depending on the number of times "dogs" would be said), speed, and delay. In addition, you have to breathe at some point, and even the best audiobooks have some level of breath noise.


It takes a breath for blank lines or new paragraphs. http://tts-api.com/tts.mp3?q=High%20Quality%0AWe%20believe%2...!

This is a bit off topic, but a related question: I have been looking for a "bad" text to speech library that produces Stephen Hawking-style audio, similar to what's found in old 1970/80s electronics. Examples:




On OS X, the "Fred" voice is pretty close.

say -v Fred Hello. My name is Fred.

Thanks, but I need a library I can use in an app (and not just on OS X).

Excellent and dead easy to use. Great work on making it simple.

I was actually looking for a similar API like this just a few hours ago, but with some other languages as well. What's the TTS engine driving this?

BTW, One small critique on the page copy... "You expect" could be more politely expressed and in terms of the user's pov/benefit.

> Excellent and dead easy to use. Great work on making it simple.

The acronym should reflect this ease of use for proper pronunciation. How about Text Intelligently To Speech?

just wanted to add: you can now do this all in the browser (100% client side), too -> http://lalo.li/ (it's forkable)

Well, not really; the quality is nothing like as good. It's a nifty trick being able to do synthesis at all in Javascript, though.

And, of course, it presumably doesn't have the licensing issues this other approach would appear to, if it really is using Apple's voices.

Pretty impressive, I've given it a go with a few of the more technical terms that I come across at work and that other TTS' have difficulty handling and it dictated them flawlessly. Very interested to see where this goes!


If you're looking for a client side solution, here's espeak compiled to JS using emscripten.

Neat, once again, emscripten proves useful. I do find it important though to point out the lack of a good open text-to-speech engine.

Here is a speech as rendered by tts-api.com (http://goo.gl/PoZc4). Now, for speak.js [1], to make a comparison, paste in the first few of the top paragraphs from here [2] and compare the quality between the two.

There really is a gap to fill for a good open-source alternative here. But I suspect the main barrier is that there is a large amount of data needed to generate good voices. Still, a worthy target.

[1]: I tried to make a URL for this too, but despite the URL looking as if it could take arguments it refused to work, at least for me under Firefox and Chrome.

[2]: http://www.nytimes.com/2008/09/25/business/worldbusiness/25i...

Does it support IPA or SSML[1]? I ask because AT&T's TTS API[2] does, but it kind of sucks!

For example,

  <phoneme alphabet="ipa" ph="/ˈkreɪp/"></phoneme>
[1] http://en.wikipedia.org/wiki/Speech_Synthesis_Markup_Languag...

[2] http://www2.research.att.com/~ttsweb/tts/demo.php

You might be able to do non-English pronunciations by trying phonetic spellings, which can be tricky. The best I could get for "felicidades" was this: fell isseedadesh.

Yes, it supports the subset of IPA that more or less maps to English phonemes (Getting a voice talent to produce clicks would be a neat exercise).

Where are the voices from, and how are they licensed? Could I use the output from this for commercial purposes?

Sounds like Alex from OSX.

Great, simply great API that just works. Keep up the good work!

What I would love to see now would be the ability to send compressed text to shorten the url.


twitter tracker using tts :)

thanks, such an amazing service

Is this piping to the Mac OS X "say" CLI command? Neat. I'd love to see the source behind this, if you felt like putting it on Github.

Wow, very impressed!

Try "Hello.", "Hello!" and "Hello?"

Hi, would like to have this API on Mashape.com ... I think our community of developers will like it.

Very nice. Would be good to have the option of other formats, specifically Vorbis and/or Opus.

Awesome! Will you release a speech-to-text API as well, or know of a good one? Thanks!

i've had some success with the web api provided by att: http://developer.att.com/developer/forward.jsp?passedItemId=... it's in beta i believe.

Great job! Something that was really needed. I would love to see this open sourced :)

Very smooth and simple as it should be. Are you going to implement other languages ?

Would be great if you could share your code on Github

App idea! Now listen to your tweets from timeline!

Thank you :) Really needed something like this!

pretty good. A nice to have will be to let the audio play in browser as well instead of just having a link.

What about a speech-to-text API?

Speech to text is a far more computationally difficult problem. Google has an unofficial one -- you can curl flac voice files to them but even their transcription is not terrific. (They use it for automatic captions on youtube -- use that to judge...)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact