
Speaking in code: hands-free programming - sohkamyung
https://www.nature.com/articles/d41586-018-05588-x
======
maoeurk
As a developer with carpal and cubital tunnel syndrome, I'm extremely excited
about the possibility of voice programming. However my experience was that
it's somewhere between non-existent and impractical when I last tried it.

It was actually the inspiration for my undergraduate research which explored
(scratched the surface, really) the use of Lojban[0] as an "interactive
metaprogramming language". The benefits of Lojban are that it is significantly
more expressive than making sounds that are mapped to vim-like commands via a
hacked natural language recognition engine, is isomorphic between text and
speech (you can speak mathematical expressions with precision), is
phonologically and grammatically unambiguous, and is standardized/has a
community -- even if it's a small one.

I still think it's a paradigm shift and would be indescribably more efficient
and powerful than traditional code exploration and editing. And I would love
to continue, but it's such a massive project and working on something that
foreign alone for long enough is really isolating because it's hard to
explain. Lojban also has its fair share of problems for this and would need to
be further standardized and scoped out for computers, so there's really not a
clear path forward in any case.

Regardless, for those _currently_ suffering from hand and arm issues
preventing them from coding and/or working, my advice is:

1) don't panic. Go on antidepressants and/or see a therapist if you need to;
it's going to take a while to recover

2) rest as much as possible and make sure to get a lot of sleep

3) wear wrist braces to sleep to prevent further damage in the short term
(consult a doctor and all that, you want to avoid muscle atrophy so don't use
them for too long without starting some kind of muscle strengthening program)

4) invest in proper tools (standing desk + ergonomic keyboard is like magic
for me, I can actually type again)

5) gain weight if you're on the low-side of normal weight -- this helped my
cubital tunnel syndrome quite a bit by giving my ulnar nerve more protection

And finally, don't give up hope; I'm able to work full time and don't wear
wrist braces to sleep at all anymore after a little over a year.

[0]:
[https://en.wikipedia.org/wiki/Lojban](https://en.wikipedia.org/wiki/Lojban)

~~~
DoingIsLearning
Just as anecdotal evidence, I was surprised to find out that a Radiologist
friend of mine spends the majority of his work day 'writing' MR reports by
speaking to (from what I understood) some modded 'Dragon' natural speaking
software. He also assured me this is pretty much the norm for all his
colleagues as well.

Although anecdotal, this is real-world evidence, that you can write something
that is, highly technical, with a lot of domain-specific knowledge, to
professional standard, via voice dictation.

The question then becomes, is there a market, for example, for a "C++ aware"
or "Python aware" modded voice application? How big would the market be? How
much would people pay? Could it be covered by health insurance in an RSI case?

~~~
maoeurk
That's a great point! There are quite a few specialized products for voice
recognition and in fact domain specific voice recognition should surely be
easier than trying to recognize general natural language. This is especially
true, in the case of programming languages, because they tend to be so
structured. There are a few major differences between most uses of voice
recognition and programming though:

The biggest issue is that most coding is not producing code; it's navigating
and editing existing code in a non-linear way. I think that makes it extremely
difficult because it needs to be integrated with static (or dynamic!) analysis
of a codebase. And once we take that step, we should be able to turn
programming from shuffling text around to a more conversational (not unlike
REPL-based) development.

There are also issues with the ability to pronounce things in some programming
languages and being able to abstractly refer to these (often not linear)
things. Because voice recognition is slower in terms of commands per second
(as far as I've seen), we'd need more powerful and expressive ways of
communicating intent to reach parity with keyboard-based programming.

What I was hoping for with "interactive metaprogramming" was to be able to do
project-wide queries and edits. It seemed that trying to find a voice-based
equivalent to the way we program with text and keyboards was never going to be
a great fit and that, really, we should have more powerful tools for
interacting with code anyways. Lojban provides a lot of fertile ground for
these ideas, I think. For example, Lojban has a really powerful way of talking
about space, time, relative positions, anaphora, and meta-linguistic
correction/negation. I can imagine that being incredibly useful for
programming by voice.

And I think the conversational style is key because the biggest issue with
voice-based interfaces is presentation of relevant information. Lojban is
based on relationships so it would be very natural to query the language
itself with datalog-like evaluation, which provides an elegant solution to the
hidden commands problem.

edit:

In regards to the price, I'd be willing to pay quite a lot. Losing the ability
to program was one of the darkest times in my life. Soul-crushing, really. I
was completely lost, unable to work and unable to enjoy myself. I remember
when I thought $300-350 for a keyboard was expensive when looking at the
Kinesis Advantage 2, but today, I'd be happy to pay 10 or 100 times the price
knowing what it did for me. It's the difference between being able to work and
enjoy myself (not being in constant pain from trying to work anyways) and
laying around feeling hopeless and worthless.

------
onion2k
I've tried to dictate blog posts on many occasions, and there seems to be
something weird about my brain (and a fairly large number of other people I've
mentioned this to) that means I can't think creatively and speak at the same
time. I suspect this means I wouldn't be able to dictate code either.

~~~
lunixbochs
Personally it was something I had to learn. Like I imagine it's harder to type
creatively when you need to look at the keyboard for a few seconds to find
each letter. I think your brain is doing something similar when you're
searching for commands and trying to mentally model the dictation system to
get the right words out.

------
LyndsySimon
I believe this is the PyCon 2013 talk to which TFA refers:

[https://www.youtube.com/watch?v=8SkdfdXWYaI](https://www.youtube.com/watch?v=8SkdfdXWYaI)

------
khitchdee
This article is skewed towards the use of hands-free programming interfaces
for people who find typing inconvenient. Also, it appears that the tools that
have been designed for the voice coding part have been designed in a generic
way so as to make them extensible and applicable in broader contexts. Two
things could be done differently.

First, voice coding could provide a far more efficient interface than typing
based because the inherent difference in speed between being able to type out
instructions and speak them out. All programming languages to date have been
designed for typing based input. However, it should not be difficult to design
one for sppech based input. In other words, it would make sense to design such
interfaces for all users instead of only such users that have some problems
using typing based interfaces.

Second, instead of creating generic, one size fits all solutions for the voice
to code interface, it would make more sense to design such interfaces on a per
language and platform basis. From a coding perspective, the language and the
platform or the system software you plan to use is what contributes to the
palette of programming instructions. For example, Ruby on Rails. A voice
interface could be designed better keeping this in mind.

~~~
lunixbochs
I'm the creator of Talon. You're exactly right about context sensitivity, and
you're wrong that I'm not designing for it. One of Talon's secret missions is
to make _extreme_ context sensitivity possible. I've done a lot of things at a
low technical level to make this possible eventually, such as:

\- You can change the entire voice grammar from anything to anything in a
couple of milliseconds.

\- I'm building a scope system, kinda like syntax scopes in a text editor like
Sublime, but for arbitrary systemwide stuff. So you'll be able to say "this
command activates on scope 'code.ruby.method' and 'filename matches test_*.rb"
(which takes advantage of the fast grammar swap feature).

Getting really specific with commands and action implementations means heavy
lifting can be done for you in a lot of really cool cases, and recognition
accuracy goes way up. It's also hard to remember all this stuff, one fix is to
use floating UI that shows you the most relevant context-specific commands.

As far as "generally accessible", another goal of Talon is to preempt RSI and
find ways to convince people who can type fine to use it anyway. Two ways are
"make many things way faster/smoother than typing" and "lots of cool eye
tracking features".

~~~
khitchdee
I think the scope system is a step in a useful direction. The speed at which
you change a grammar is not really that relevant because typically, you don't
program in more than one language. Being able to do it is. When you start with
a generic system that can be configured down to specialized use-cases,
sometimes, your focus on the end user's perspective of your tool gets clouded
by your own desire to engineer in lots of bells and whistles since you have a
system that can do that. From the end user's perspective, the specifics matter
first.

Sounds like you're making it easy to create the equivalent of macros to
abbreviate the input of common tasks.

It would be interesting to see if you could make this system function without
a screen so even a blind person could produce code.

~~~
lunixbochs
I've definitely considered blind UI, it's a different sort of paradigm. It
should be nice for other cases, like using a computer with just an earpiece.
It turns out some of my approach will make fully blind voice UI much easier,
even on existing apps / environments.

You definitely don't understand the implications of highly dynamic grammars
then. A couple instances where rapid grammar changes matter are: "I want to
switch between my text editor, a terminal, and a browser, and have exactly the
most specific and relevant commands available at all times with no delay", and
"I want to have syntax-aware voice grammars that are very in tune with how my
programming language and framework operate"

You appear to be talking about this without any significant research, sources,
or knowledge of my software (which you have multiple times made fairly
confident statements about that are not remotely true), please stop
generalizing about what I'm doing without some examples to back it up.

~~~
khitchdee
I guess the fact that you have a delay associated with the recognition of
commands not loaded into the recognizers current vocabulary is an artifact
associated with the design of the recognizer which was probably tuned towards
continuous speech. In other words your dynamic grammars idea was a way to work
around a limitation of the speech recognizer you were using.

Part of the benefit of designing a language specifically for speech based
coding is that the design of the associated recognizert could be tuned to that
language. Compared to a brute force translated version of the keyboard input
based language developer workflow, a speicifcally designed language would
likely have a far narrower vocabulary. This would make it possible to tune a
recognizer for better recognition accuracy and speed.

------
sebastienrocks
I personally have had some huge wrist pains recently (I do have a doctors
appointment planned to check that out).

This caused me some great stress because I am an avid gamer, and also a
developer.

Knowing that there are solutions, really reassures me, at least for my day
job.

~~~
shoo
Professional advice is a good idea.

One suggestion that helped for an acquaintance: if it's RSI from typing on a
standard keyboard, but isn't too severe, switch to a fancy split ergonomic
keyboard (provided that's different to what you currently use), change the key
layout to Dvorak, throw away your existing typing habits and slowly learn to
touch type again.

Another relative was diagnosed with
[https://en.m.wikipedia.org/wiki/Carpal_tunnel_syndrome](https://en.m.wikipedia.org/wiki/Carpal_tunnel_syndrome)
. Surgery can help.

~~~
InclinedPlane
Also, switch to a trackball instead of a mouse. I'm not terribly sensitive to
RSI (I've worked data entry jobs where I sometimes spent 12 hours a day 6 days
a week typing), even so I've found that using a regular mouse for a full day
of work is one of the easiest ways to get RSI. I switched to using a trackball
maybe 10 years ago and it's very noticeable how much it improves things.

------
DoctorOetker
wouldn't forth be more obvious for handsfree programming? REPL command line to
run functions or programs is the same REPL command line to define words i.e.
the IDE.

Apart from a handful of unspeakable words which would need to be named
speakably... The bigger reason I stay away from forth is mostly the
memory/paging/record system

------
singularity2001
I liked their video: [http://voicecode.io/#](http://voicecode.io/#) /
[https://www.youtube.com/watch?v=FlluHR6pgHc](https://www.youtube.com/watch?v=FlluHR6pgHc)

------
singularity2001
Ruby and Python are relatively well suited for voice programming, however a
programming language with optional sigils would be ideal:

[https://github.com/pannous/english-
script](https://github.com/pannous/english-script)

------
hermitdev
I feel like something like this may be in my future. I've osteoarthritis in
every cervical & thoracic vertebrae. Good days, it means constant pain in the
upper back & neck. Bad days mean numbness in my hands/legs, possibly spasms
and/or shooting pains down limbs. Every year it gets worse, and yeah, I'm
still in my 30s.

I was disappointed in the article in that it basically amounted to little more
than an endorsement of talk-to-text. Lacked details. Coding? Didn't even
hardly mention navigating a document other than a few "line up", "line down"
quips. Coding isn't like dictating an essay, it requires constant back and
forth navigation of a document (source file). Just this week, I've probably
touched +100 source files because we decided to change how common objects are
allocated (yes, the source is poorly arranged and has way too much coupling
and very poor cohesion).

~~~
lunixbochs
You're seeing text input demos because conventional english dictation systems
have been _horrible_ at fast and precise technical text input. This is a
classic example:
[https://www.youtube.com/watch?v=MzJ0CytAsec](https://www.youtube.com/watch?v=MzJ0CytAsec)

I believe it's far easier to solve navigation than text input. I will gladly
record for you (or anyone here who has an idea) a synthetic (pre-planned) demo
video of some more complex task. Want to come up with a specific workflow for
to test? Like maybe "find <some project> on github, fork, clone it to my
machine, perform <some specific refactor> / code editing task, commit, push"
all without my hands, and I upload the first take.

The underlying systems are far more powerful than the article covered, e.g. in
Talon you can write an editor plugin that collects filenames and turns them
into commands in realtime to switch between files. Someone did a PoC with
Talon's eye tracking to specify a symbol for insertion by looking at it. There
have been demos of scope-based navigation / selection.

Personally I heavily optimized my alphabet/number input case (there's a
specially designed alphabet that can do about 280 letters per minute at 100%
accuracy), and use Vim commands directly, but many of the users use more
conventional editors.

I plan to get into the habit of live streaming hands-free coding at some point
so people can get a better idea of what it looks like to use this sort of
thing, and so I can go back and watch and realize that I could be doing some
things much better.

~~~
fouc
Sounds like a great idea for a demo video. I'd watch it

------
gfodor
Hands free coding would be a nice win for VR, where you are typically standing
or moving around and typically want to use your hands to manipulate objects.

------
Dinius
> if you use an English word as a command, such as ‘return’, it means you can
> never type out that word.

Not exactly true. The word by itself would be recognized as a command, but in
a sentence it’ll be treated as any other word, assuming you don’t pause for
too long before it.

~~~
lunixbochs
You're probably thinking of some other voice command system, or Dragon's
built-in commands (which are bad because they have no continuous recognition,
requiring constant pauses while talking). Voicecode behaves exactly as
described in the article.

I just defined a command "return" in Voicecode which presses enter.

"testing return testing test test return test return test" spoken in one
breath types the following:

    
    
      testing
      testing test test
      test
      test
    

Voicecode defines commands in Dragon by just adding words to the English
vocabulary, leaving Dragon in dictation mode, and running a parser on the
English output. Commands are executed anywhere they land in the phrase. The
Voicecode grammar makes this somewhat less painful by using lots of non-
English words for commands... but this approach is convoluted and hurts
accuracy quite a bit imo.

~~~
foxxed
The pausing solution is incorrect, but to say you could never use the word is
also incorrect. See "keeper" such as "keeper return".

------
avodonosov
Seems convenient. Eyes-free programming would be desirable too.

