Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask YC: Question(s) for speech technology experts.
4 points by opportunity on April 3, 2008 | hide | past | favorite | 1 comment
I would really appreciate if someone working on speech technologies (speech to text or text to speech) can provide some insight on this?

Since past couple of months, I have been completely fascinated with what speech technologies (specifically speech recognition and speech synthesis) can do and how they can enhance the user experience. I decided to delve deep into speech synthesis technologies. From my research into available solutions, there is a huuuuuuuuge difference between the open source solutions and the commercial solutions available for $$$$$.

From what I have read about speech recognition, the open source solutions perform extremely poorly when compared to their commercial counterparts.

Has anyone else here looked at the possibility of improving any of the available open source speech technologies to a level where it is close to the commercial ones? Is it even possible to improve Sphinx or festival to a level where it can be commercially used without developing everything from scratch? Is it something even worth investigating?

Is it possible for someone working in this area to articulate the challenges(technical/monetary etc.) involved?

Okay, thanks a lot for reading this. Looking forward to your comments.

P.S.:

I would really really like to get opinion from someone who has worked or is working in this area about their experiences. I am located in south bay. I am also attending the startup school this month.



I worked at Nuance Communications for about a year doing voice application development. I'm not familiar with Sphynx, but I did talk to a number of people at the company about it (this was 2004-2005).

From what I gathered it would be pretty difficult to get speech synthesis up to their level. A single "voice" will be generated by taking tens of hours of audio and using algorithms to splice them together based on the text.

The only significant monetary constraint is going to be if you want to have a real voice talent doing your recording in a studio, but I wouldn't try to tackle the technical issues without a subject matter expert




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: