

Ask HN: Review my idea: Python programming by voice - jcrocholl

I want to solve speech recognition for programmers, so that we can create and edit source code efficiently without using a keyboard or a mouse. This will be great for RSI sufferers (unless voice also fails because of overuse).<p>Accurate speech recognition is a hard problem because human language is very ambiguous. As far as I know, there's currently no working solution for programming by voice in any programming language. DragonDictate or NaturallySpeaking with custom macros may come close, but it's difficult to set up. I found some related material, but most of it seems to be either academic research or unmaintained software.<p>Initially, I'm going to focus on speech recognition for Python. I know there's probably a bigger market for C++ or Java programmers, but Python code is more similar to human speech and is rather concise, using fewer lines of code than Java for the same task. Python has a large standard library which we can pre-parse and digest to reduce ambiguity during recognition.<p>Python's interactive interpreter with speech recognition and voice output would make an awesome demo. You could say "three times five" and the computer would respond with "fifteen". Or you could say "from time import localtime (pause) call localtime without parameters slice the first three elements" and the system would say "two thousand nine eight seventeen".<p>The entire speech recognition software could be free open source, maybe based on CMU Sphinx-4 (which is in Java). The business model could revolve around a web service that lets people upload their utterances (snippets of recorded speech) during or after their programming session. We can use these files to improve the recognition engine and train the speaker-independent acoustic model. So the recognition would get better over time, but speaker-independent models only work for some "standard" pronunciation without too much accent.<p>For a small fee (e.g. $49) users could download their personal acoustic model for improved accuracy, which would be generated from the voice snippets that they have uploaded. The model training process needs several minutes or even hours of CPU, but an email could be sent to the user when the model is ready for download. When the software improves over time and they have recorded more utterances, they can pay another fee and generate an even better model.<p>If I can get speech recognition for Python code to work, maybe SQL or bash could work too (both support auto-completion, which can be useful for reducing ambiguity).<p>Please let me know what you think. I'm planning to implement a simple demo in Seattle during the next few weeks. Want to brainstorm with me over beer or coffee? We could be co-founders if we work well together.
======
Chickencha
It's an interesting idea, but this seems more like something that could be a
neat tech demonstration rather than a product that people would want to use.

First, speech recognition is very difficult. Fortunately you have a limited
number of key words you have to support. Unfortunately, there's an almost
infinite number of symbol names available, many of which can't be pronounced
in a definite way. Variables created through speech recognition are going to
be pronounceable, but what if you're working on a project with someone else
who's not using speech recognition? What if you're working on existing code
and don't want to refactor all the variable names? It seems to me that this
could be even more difficult than natural language speech recognition.

Second, what problem does this solve? You mentioned RSI sufferers, which is
certainly a problem, but how large is the market of Python programmers
suffering from RSI? I'm guessing not that large. The software would have to be
extremely accurate in order for me to not get quickly frustrated. On top of
that, it would have to offer some benefit over typing, and that's not
something I see in your pitch.

Third, I'm not so sure about your business model. I think in order for people
to want to pay to download their personal acoustic model, the default model
would already have to work really well. If it didn't, programmers would
probably find themselves thinking "If the default model works so poorly, will
the custom model really be that much better?" However, if it's too good, then
there's no incentive to get the custom model. You're creating a balancing act
that you'll have to perform in order for anyone to want to give you money.

A more interesting goal would be to supplement the keyboard and mouse setup
rather than replace it. Think of ways that any programmer might become more
productive through using voice in addition to hands. This would probably be
much simpler, which allows you to release more quickly. I think it would also
increase your potential user base, and eventually you could expand it to do
everything if need be.

------
JimmyL
I think it's a neat idea, but I don't think I personally would use it much - a
lot of how I think and organize code is visual. For example, I'll write half
of an invocation of a function, move it around a bit, and then complete the
signature. Or I'll write out something three ways, jiggle them around a bit,
combine two, and erase the third, and then move the block back to where it
actually belongs.

It's similar to math - when I was in university, I broke several bones in my
writing hand about two months before I had a calculus exam. I knew I wouldn't
be able to write for my finals, and I spoke to the Disability office to make a
reservation for someone to write for me. So I decided to hire a guy for a few
weeks to get comfortable with doing math verbally, like I would have to do for
the final.

After about a week I decided it was pretty hopeless - my whole workflow was
gone, I couldn't sketch out ideas since there was such a mental load in
"writing down" haldf an idea, etc. As a result, I just learned to write semi-
legibly with my off hand, and had the guy whose job it was to transcribe my
math recopy what I scribbled with my left into something readable.

Moral of the story, at least for how I work, is that you should have very good
support for moving text around.

------
makecheck
I think it's a neat concept, but keep in mind that many programmers can type
very quickly; speech should probably deliver something that is difficult to
achieve in as much time with a keyboard.

For instance, "begin decorator" might write the entire function-wrapping-
function code that is typical for producing a decorator, and that would
certainly be faster to say than to type. Whereas, the example of simply
importing a library is less useful (to me, anyway).

------
drobilla
Efficiently? This would be FAR slower and FAR less accurate than typing.

Frankly, I can't see any programmer taking this idea seriously at all. Go look
at a piece of code, then dictate it PRECISELY in a way that could be
unambiguously interpreted by such a thing. There's no way...

------
lpellis
I dont think there will be a big market for speaking out Python code, but I
can think of a few other uses for specialized voice recognition. (In fact I'm
working on something vaguely similar, but in way different direction) Do you
have an email address, maybe we can get in contact?

~~~
jcrocholl
Yes, my email address is now on my profile.

