
Show HN: CodeSpeak, the beginnings of a web-based speech-to-code tool - eternalcow
https://github.com/sethwilsonUS/codespeak
======
derekja
Big fan of the attempt! It's a tough area, though. A couple links that might
be of interest:

A group of folks using mostly Dragon and plugins
[https://groups.yahoo.com/neo/groups/VoiceCoder/info](https://groups.yahoo.com/neo/groups/VoiceCoder/info)

A great pycon talk by Travis Rudd a couple years ago
[https://www.youtube.com/watch?v=7F4ylvA0Dh0](https://www.youtube.com/watch?v=7F4ylvA0Dh0)

Best wishes for a successful project!

------
drusepth
I started something like this long ago but translated into Javascript rather
than Python. It was fun to work on but I pretty quickly ran into issues with
ambiguity in spoken word.

For example, say you want to loop from 1 to 10, print each number, and then
print "done". The spoken-code equivalent might be something like "from i
equals 1 to 10, print i, then print 'done'". When you translate that into
code, it could reasonably end up as either of the two following snippets:

for(i = 0; i < 10; i++) { print(i) } print("done")

or

for(i = 0; i < 10; i++) { print(i) print("done") }

The worst part about designing a grammar for spoken language is figuring out
intuitive natural language to substitute for the "silent" parts of code --
braces, semicolons, whitespace, caret position, etc. Obviously Python doesn't
have those symbols specifically, but the concepts still exist -- if you start
defining a function (with "create function [function_name]" in your code),
what extra language do you have/need to say "okay, we're done defining this
function and ready to go back to writing code wherever we were before defining
this function".

Say you line up "create function foo", "set variable x 5", and "set variable y
8" voice commands. Will y get set to 8 within that function you created? If
so, how do you signify the end of the function? How do you know where your
caret jumps back to after finishing a function?

This is a neat project and I think it has a lot of potential for use if done
correctly, but it's also insanely difficult to get something both powerful and
intuitive when you're lossily translating spoken code to executable code
because you're toeing a balance between saying something natural and saying
literally just every character you would otherwise write.

In case it's helpful, here's my now-seven-years-old attempt and proof-of-
concept in Javascript:
[https://github.com/drusepth/voice2code](https://github.com/drusepth/voice2code)

~~~
fouc
You make me think that a whole new syntax for spoken programming would be key.

Would be interesting to get a bunch of people to read out loud various
programs in various languages and in pseudocode and hammer down what might be
the most natural way to do spoken programming.

~~~
shakna
I've actually found that Lua is one of the least painful languages to write
with voice control, as it has so few symbols - they tend to be only operators
so you can switch them out easily enough in your engine. (And a few extras for
moving around in your text editor, like "com jump $n upwards").

But you still do need to replace all sorts of brackets, like any language. At
the moment I've got:

"bracket" == (

"brace" == {

"index" == [

(And "end $x" for the other side of it.)

Clearly not ideal, but works well enough for now, certainly a lot less painful
than trying to maintain indentation in Python with voice only. (com select 20
lines upwards, reverse tab... Followed by swearing as it tabs instead of
untabs.)

~~~
Cogito
For what it's worth, I've always known them as:

( parentheses (parens), alt. round bracket

{ brace, alt. curly bracket

[ bracket, alt. square bracket

The biggest confusion is between bracket and parentheses, in that people will
use bracket to mean both. If I think there might be confusion I use the
alternatives as they seem to remove all ambiguity.

~~~
shakna
Yeah, unfortunately "parenthesis" is a hard word for most of the Speech-To-
Text engines I tried, at least with my accent. ("parent thesis") And it is
somewhat of a long word to be using all the time.

~~~
Cogito
And for that reason, if I was using speech-to-code, I would probably use the
alternative forms of 'round bracket', 'square bracket', and 'curly bracket'
when I need to specify the type.

------
bobajeff
I plan to do something similar to this for a mobile code editor. If it came
out the way I was imagining it'd be very convenient and intuitive. It could
create detailed patterns in code from a description. Know your naming
patterns. Be aware of functions/classes/variables available and included in
the project. I figured I'd have to put together some really great ML models to
pull it off the way I want.

So my plan was the first start with something using computer vision. So
writing draft code and taking a picture of it having it insert into and
editor. I figured I could have something really useful that I'd use early on
versus working with voice.

I've got so many intermediary things before I can even start on that. I'm glad
I'm not the only one who's heading in that direction.

------
TBF-RnD
My God, I was searching for contact information for the author on github and
found out that the author Seth Wilson is legally blind!

In either case if you are reading this please comment I have some ideas and
resources on this that you'll find interesting for sure.

~~~
eternalcow
Thanks! Yes, I'd love any ideas/suggestions you have!

~~~
TBF-RnD
The most important things I've come up with while researching this area is the
following.

The error recovery process on a failure in the system is simply to bad. It
takes quite a long time to correct an error in speec recognition. So the idea
is this that a high error rate is actually acceptable if the recoery delay is
small. So while typing on a keyboard the error rate is quite high yet the user
doesn't notice since to fix the error is not that painstaking. The same can
not be said for speech recognition.

As such i feel that speech recognition would need a better recovery process.
How and why exactly is beyond me however the Dasher project have some ideas on
this by using a hybrid system. For the fully blind however this is a non
option.

The next point that I'd like to make is that the most commmon source of errors
seems to come from where the prediction algorithm have two or three probable
alternatives for a word. In the case of predicting data in a narrow scope such
as a programming language this is greatly mitigated.

To keep these ambiguity errors to their bare minimum the speech recognizer
ought to be fed the alternatives that are possible. This can be quite easily
be taken from the AST and is stricly defined as the syntax is much more
restricted than commonly spoken languages.

I am currently trying to compile a list of alternative input methods. As such
I find your work really interestingg. I intened to do a chapter on speech
recognition soon and would love to have your feedback on it.

In conjunction with this i want to make a resource of boilerplate code for
controlling various operating system and to have a signle resource for state
of the art prediction models.

If you are interested please don't hesitate to contact me at:
trbefr@protonmail.com

------
danenania
Speaking as an engineer with chronic wrist issues, I would definitely pay for
a production-ready version of this. Have you considered setting up a landing
page with an email capture? I’d be interested in following your progress
somehow.

~~~
TBF-RnD
Would you consider using a gamepad a possible solution? Maybe specialized
hardware further down the road but ATM I think that the market forces have
optimized the devices sufficiently.

The reason why I am asking is that I'm doing some experiments on making them
usable for productive purposes. While practicing I noticed that my posture
improved automatically. See you are sort of automatically drawn towards the
keyboard giving you a Mr Burns like posture. As if there where a gravitational
pull in the screen.

As I was discussing this with another person online I realized that it might
be of use for people with wrist pains as well.

My reasoning is this, bear in mind that I am by no means well read on biology
but I think it might be given some thought. While using a keyboard you are
forced to a certain plane. With a gamepad however you can move your hands,
arms more freely.

Along with that I am looking into chorded keyboard methods and eyetracking.

Please respond to the comment if you think this sounds interesting!

~~~
danenania
It sounds interesting for sure, but it seems like figuring out the ux for a
complex text-based activity like programming would be very challenging.

For example, I do searches of the codebase constantly while coding, so I
imagine I'd be reaching for the keyboard a lot anyway since there's no way a
gamepad could compete on efficiency when doing something text-heavy like a
search.

~~~
TBF-RnD
Challenging for sure but then again if a properly implemented theory based on
state of the art research is implemented there is a chance that a better
solution is made as well. One not burdened by backwards compability.

Think of it thiis way. You type with 10 fingers right and each can give an on
or off signal. With a PS/3 controller you loose 4 fingers for holding the
device. Still the thumbs are tied to 14 bits per second movement sensors + a
button. Then you also have movement sensors and so forth and two fingers or
more have analog input. So the bandwidth is there really. Also what you gain
from more buttons you loose in movement time and errors due to Fitt's law.

And notice how I haven't gotten into Valve's "knuckle controllers" or glove
solutions yet.

If you are interested I'd love to work with your particular use case and
problems to get a working solution that suits your particular needs.

[http://tbf-rnd.life/](http://tbf-rnd.life/) or mail me at
trbefr@protonmail.com

~~~
danenania
Unfortunately I don't have much time to spare currently for being a guinea
pig, as much as I would like to help. But if you're able to create something
that's really as efficient as a keyboard for coding (and can demonstrate that
in a video), I'll be a customer. dane@envkey.com

------
sbhn
The gherkin syntax would be the best place to start.
[https://cucumber.io/docs/gherkin/reference/](https://cucumber.io/docs/gherkin/reference/)

------
w_t_payne
I'm suffering from pretty bad arthritis ... and I would absolutely welcome
something like this.

Typing has become agony over the past few years.

------
peterdz
this sounds like the one moment where vb.net with its verbose syntax is a
really good choice

------
cutler
For desktop Talon (https:talon.com) is currently the state if the art I
believe.

------
anonymous5133
Love the idea, even with the simple demo setup. Lots of potential here!

------
revskill
Awesome !

This also helps me practice spell native English better as a English learner.

------
floki999
Fantastic idea

