Hacker News new | past | comments | ask | show | jobs | submit login
Speaking in code: hands-free programming (nature.com)
91 points by sohkamyung on July 3, 2018 | hide | past | favorite | 37 comments



As a developer with carpal and cubital tunnel syndrome, I'm extremely excited about the possibility of voice programming. However my experience was that it's somewhere between non-existent and impractical when I last tried it.

It was actually the inspiration for my undergraduate research which explored (scratched the surface, really) the use of Lojban[0] as an "interactive metaprogramming language". The benefits of Lojban are that it is significantly more expressive than making sounds that are mapped to vim-like commands via a hacked natural language recognition engine, is isomorphic between text and speech (you can speak mathematical expressions with precision), is phonologically and grammatically unambiguous, and is standardized/has a community -- even if it's a small one.

I still think it's a paradigm shift and would be indescribably more efficient and powerful than traditional code exploration and editing. And I would love to continue, but it's such a massive project and working on something that foreign alone for long enough is really isolating because it's hard to explain. Lojban also has its fair share of problems for this and would need to be further standardized and scoped out for computers, so there's really not a clear path forward in any case.

Regardless, for those _currently_ suffering from hand and arm issues preventing them from coding and/or working, my advice is:

1) don't panic. Go on antidepressants and/or see a therapist if you need to; it's going to take a while to recover

2) rest as much as possible and make sure to get a lot of sleep

3) wear wrist braces to sleep to prevent further damage in the short term (consult a doctor and all that, you want to avoid muscle atrophy so don't use them for too long without starting some kind of muscle strengthening program)

4) invest in proper tools (standing desk + ergonomic keyboard is like magic for me, I can actually type again)

5) gain weight if you're on the low-side of normal weight -- this helped my cubital tunnel syndrome quite a bit by giving my ulnar nerve more protection

And finally, don't give up hope; I'm able to work full time and don't wear wrist braces to sleep at all anymore after a little over a year.

[0]: https://en.wikipedia.org/wiki/Lojban


Just as anecdotal evidence, I was surprised to find out that a Radiologist friend of mine spends the majority of his work day 'writing' MR reports by speaking to (from what I understood) some modded 'Dragon' natural speaking software. He also assured me this is pretty much the norm for all his colleagues as well.

Although anecdotal, this is real-world evidence, that you can write something that is, highly technical, with a lot of domain-specific knowledge, to professional standard, via voice dictation.

The question then becomes, is there a market, for example, for a "C++ aware" or "Python aware" modded voice application? How big would the market be? How much would people pay? Could it be covered by health insurance in an RSI case?


That's a great point! There are quite a few specialized products for voice recognition and in fact domain specific voice recognition should surely be easier than trying to recognize general natural language. This is especially true, in the case of programming languages, because they tend to be so structured. There are a few major differences between most uses of voice recognition and programming though:

The biggest issue is that most coding is not producing code; it's navigating and editing existing code in a non-linear way. I think that makes it extremely difficult because it needs to be integrated with static (or dynamic!) analysis of a codebase. And once we take that step, we should be able to turn programming from shuffling text around to a more conversational (not unlike REPL-based) development.

There are also issues with the ability to pronounce things in some programming languages and being able to abstractly refer to these (often not linear) things. Because voice recognition is slower in terms of commands per second (as far as I've seen), we'd need more powerful and expressive ways of communicating intent to reach parity with keyboard-based programming.

What I was hoping for with "interactive metaprogramming" was to be able to do project-wide queries and edits. It seemed that trying to find a voice-based equivalent to the way we program with text and keyboards was never going to be a great fit and that, really, we should have more powerful tools for interacting with code anyways. Lojban provides a lot of fertile ground for these ideas, I think. For example, Lojban has a really powerful way of talking about space, time, relative positions, anaphora, and meta-linguistic correction/negation. I can imagine that being incredibly useful for programming by voice.

And I think the conversational style is key because the biggest issue with voice-based interfaces is presentation of relevant information. Lojban is based on relationships so it would be very natural to query the language itself with datalog-like evaluation, which provides an elegant solution to the hidden commands problem.

edit:

In regards to the price, I'd be willing to pay quite a lot. Losing the ability to program was one of the darkest times in my life. Soul-crushing, really. I was completely lost, unable to work and unable to enjoy myself. I remember when I thought $300-350 for a keyboard was expensive when looking at the Kinesis Advantage 2, but today, I'd be happy to pay 10 or 100 times the price knowing what it did for me. It's the difference between being able to work and enjoy myself (not being in constant pain from trying to work anyways) and laying around feeling hopeless and worthless.


How long ago was this?

It certainly is not impractical now. I wrote my PhD thesis, along with most of the software using VoiceCode.io. These days I’m on Talon. Dictation has basically been my sole key input device for three years. Like anything, it takes practice.


About 2-3 years ago. Thank you! I had not seen either of these; I'll look into them and start a deep dive trying them out this weekend!


I can fully support advice 2) 4) An ergonomic keyboard will work wonders.

1) NEVER take meds that mess with your brain! That's such an American vice. 3) ? 5) yes, but building muscles is better than building fat (can be combined;)


Those are fair points! I got too excited and didn't explain what I was talking about clearly -- so it isn't great general advice. To clarify for those who may need it:

I think my case was a bit exceptional in how bad it got. I went from having no hand problems at all to having acute pain to the point where I couldn't function in daily life in a matter of weeks. Things such as working, driving, or opening doors with doorknobs became essentially impossible for me. I couldn't even sit in most chairs or sleep without my wrists being in constant pain.

2, 4, and 5 are the most useful because they're preventative. 3 is to deal with the physical trauma and 1 is do deal with the mental trauma.

I agree with you that being on psychoactive drugs is not ideal, but when your life becomes nothing but sitting around in pain, I think that they're a reasonable short-term solution. Although non-hand based exercise is surely better if you can manage it, I found that running with my wrists taped up was fine and helped as much or more than the drugs, and I stopped the drugs after a few months when I felt more able to deal with it.


I've tried to dictate blog posts on many occasions, and there seems to be something weird about my brain (and a fairly large number of other people I've mentioned this to) that means I can't think creatively and speak at the same time. I suspect this means I wouldn't be able to dictate code either.


Personally it was something I had to learn. Like I imagine it's harder to type creatively when you need to look at the keyboard for a few seconds to find each letter. I think your brain is doing something similar when you're searching for commands and trying to mentally model the dictation system to get the right words out.


I believe this is the PyCon 2013 talk to which TFA refers:

https://www.youtube.com/watch?v=8SkdfdXWYaI


This article is skewed towards the use of hands-free programming interfaces for people who find typing inconvenient. Also, it appears that the tools that have been designed for the voice coding part have been designed in a generic way so as to make them extensible and applicable in broader contexts. Two things could be done differently.

First, voice coding could provide a far more efficient interface than typing based because the inherent difference in speed between being able to type out instructions and speak them out. All programming languages to date have been designed for typing based input. However, it should not be difficult to design one for sppech based input. In other words, it would make sense to design such interfaces for all users instead of only such users that have some problems using typing based interfaces.

Second, instead of creating generic, one size fits all solutions for the voice to code interface, it would make more sense to design such interfaces on a per language and platform basis. From a coding perspective, the language and the platform or the system software you plan to use is what contributes to the palette of programming instructions. For example, Ruby on Rails. A voice interface could be designed better keeping this in mind.


I'm the creator of Talon. You're exactly right about context sensitivity, and you're wrong that I'm not designing for it. One of Talon's secret missions is to make extreme context sensitivity possible. I've done a lot of things at a low technical level to make this possible eventually, such as:

- You can change the entire voice grammar from anything to anything in a couple of milliseconds.

- I'm building a scope system, kinda like syntax scopes in a text editor like Sublime, but for arbitrary systemwide stuff. So you'll be able to say "this command activates on scope 'code.ruby.method' and 'filename matches test_*.rb" (which takes advantage of the fast grammar swap feature).

Getting really specific with commands and action implementations means heavy lifting can be done for you in a lot of really cool cases, and recognition accuracy goes way up. It's also hard to remember all this stuff, one fix is to use floating UI that shows you the most relevant context-specific commands.

As far as "generally accessible", another goal of Talon is to preempt RSI and find ways to convince people who can type fine to use it anyway. Two ways are "make many things way faster/smoother than typing" and "lots of cool eye tracking features".


I think the scope system is a step in a useful direction. The speed at which you change a grammar is not really that relevant because typically, you don't program in more than one language. Being able to do it is. When you start with a generic system that can be configured down to specialized use-cases, sometimes, your focus on the end user's perspective of your tool gets clouded by your own desire to engineer in lots of bells and whistles since you have a system that can do that. From the end user's perspective, the specifics matter first.

Sounds like you're making it easy to create the equivalent of macros to abbreviate the input of common tasks.

It would be interesting to see if you could make this system function without a screen so even a blind person could produce code.


I've definitely considered blind UI, it's a different sort of paradigm. It should be nice for other cases, like using a computer with just an earpiece. It turns out some of my approach will make fully blind voice UI much easier, even on existing apps / environments.

You definitely don't understand the implications of highly dynamic grammars then. A couple instances where rapid grammar changes matter are: "I want to switch between my text editor, a terminal, and a browser, and have exactly the most specific and relevant commands available at all times with no delay", and "I want to have syntax-aware voice grammars that are very in tune with how my programming language and framework operate"

You appear to be talking about this without any significant research, sources, or knowledge of my software (which you have multiple times made fairly confident statements about that are not remotely true), please stop generalizing about what I'm doing without some examples to back it up.


I guess the fact that you have a delay associated with the recognition of commands not loaded into the recognizers current vocabulary is an artifact associated with the design of the recognizer which was probably tuned towards continuous speech. In other words your dynamic grammars idea was a way to work around a limitation of the speech recognizer you were using.

Part of the benefit of designing a language specifically for speech based coding is that the design of the associated recognizert could be tuned to that language. Compared to a brute force translated version of the keyboard input based language developer workflow, a speicifcally designed language would likely have a far narrower vocabulary. This would make it possible to tune a recognizer for better recognition accuracy and speed.


Re: grammar changes: I assumed each language was associated with its grammar and it was not more fine grained than that. This makes sense though in the context where you're simply trying to replicate the keyboard-window-mouse based tool design with a speech-window-eye-tracking one. I guess this makes it easy for a user on the former UI to switch over.

You will note my comments were not about your software, but hands-free programming interfaces in general. They continue to be this way, you will note. I am addressing the subject of this topic at a high level. I also have a fairly good idea on where your software stands within that context. Don't need to do any research for that.


> However, it should not be difficult to design one for sppech based input.

Personally I feel that this shouldn't even be necessary. In theory all you need to be able to vocalise is what you want the code to achieve, then that could be mapped to something similar to a snippet based on what language you are using. Most programming languages share the same core abilities, it's just the syntax to achieve the task that matters once you know what the task is. So this shouldn't need to be handled by a dedicated programming language for voice coding, it could just be handled by a wrapper (A text editor for your voice with an advanced NLP based snippet system).

Something like: "Check if A is equal to B, if it is, return A, if not, return B" would be much more efficient than saying "if parenthesis capital A equals equals captial B..." and so on.


I agree. I think a wrapper would be a good way to go about it with the voice based programming interface left only for a high level description of what the programmer is trying to do.

Sort of akin to how many of today's new high level languages are actually implmented by translating to C and then using a C complier


I personally have had some huge wrist pains recently (I do have a doctors appointment planned to check that out).

This caused me some great stress because I am an avid gamer, and also a developer.

Knowing that there are solutions, really reassures me, at least for my day job.


The thing that helped me the most was realizing that my pain might not be located at the problem site. I had a nerve impingement due to https://en.wikipedia.org/wiki/Thoracic_outlet_syndrome which was causing wrist pain, even though nothing was wrong with my wrists at all.

Physical therapy can help a lot.


Professional advice is a good idea.

One suggestion that helped for an acquaintance: if it's RSI from typing on a standard keyboard, but isn't too severe, switch to a fancy split ergonomic keyboard (provided that's different to what you currently use), change the key layout to Dvorak, throw away your existing typing habits and slowly learn to touch type again.

Another relative was diagnosed with https://en.m.wikipedia.org/wiki/Carpal_tunnel_syndrome . Surgery can help.


Also, switch to a trackball instead of a mouse. I'm not terribly sensitive to RSI (I've worked data entry jobs where I sometimes spent 12 hours a day 6 days a week typing), even so I've found that using a regular mouse for a full day of work is one of the easiest ways to get RSI. I switched to using a trackball maybe 10 years ago and it's very noticeable how much it improves things.


My wife went through a lot of work to find ergonomics to mitigate RSI. In the end:

1. Adjustable desk height. 2. tried several keboards before settling on the best split. 3. tried several mice.

all helped, but... symptoms were also confusingly intermingled with a pinched nerve. PT helps that. See a doctor, yes, and also keep your mind open to other/multiple confounding causes.


You may want to keep an eye out for the Xbox Adaptive Controller for games [1]

[1] https://www.xbox.com/en-US/xbox-one/accessories/controllers/...


wouldn't forth be more obvious for handsfree programming? REPL command line to run functions or programs is the same REPL command line to define words i.e. the IDE.

Apart from a handful of unspeakable words which would need to be named speakably... The bigger reason I stay away from forth is mostly the memory/paging/record system



Ruby and Python are relatively well suited for voice programming, however a programming language with optional sigils would be ideal:

https://github.com/pannous/english-script


I feel like something like this may be in my future. I've osteoarthritis in every cervical & thoracic vertebrae. Good days, it means constant pain in the upper back & neck. Bad days mean numbness in my hands/legs, possibly spasms and/or shooting pains down limbs. Every year it gets worse, and yeah, I'm still in my 30s.

I was disappointed in the article in that it basically amounted to little more than an endorsement of talk-to-text. Lacked details. Coding? Didn't even hardly mention navigating a document other than a few "line up", "line down" quips. Coding isn't like dictating an essay, it requires constant back and forth navigation of a document (source file). Just this week, I've probably touched +100 source files because we decided to change how common objects are allocated (yes, the source is poorly arranged and has way too much coupling and very poor cohesion).


You're seeing text input demos because conventional english dictation systems have been horrible at fast and precise technical text input. This is a classic example: https://www.youtube.com/watch?v=MzJ0CytAsec

I believe it's far easier to solve navigation than text input. I will gladly record for you (or anyone here who has an idea) a synthetic (pre-planned) demo video of some more complex task. Want to come up with a specific workflow for to test? Like maybe "find <some project> on github, fork, clone it to my machine, perform <some specific refactor> / code editing task, commit, push" all without my hands, and I upload the first take.

The underlying systems are far more powerful than the article covered, e.g. in Talon you can write an editor plugin that collects filenames and turns them into commands in realtime to switch between files. Someone did a PoC with Talon's eye tracking to specify a symbol for insertion by looking at it. There have been demos of scope-based navigation / selection.

Personally I heavily optimized my alphabet/number input case (there's a specially designed alphabet that can do about 280 letters per minute at 100% accuracy), and use Vim commands directly, but many of the users use more conventional editors.

I plan to get into the habit of live streaming hands-free coding at some point so people can get a better idea of what it looks like to use this sort of thing, and so I can go back and watch and realize that I could be doing some things much better.


Sounds like a great idea for a demo video. I'd watch it


Hands free coding would be a nice win for VR, where you are typically standing or moving around and typically want to use your hands to manipulate objects.


> if you use an English word as a command, such as ‘return’, it means you can never type out that word.

Not exactly true. The word by itself would be recognized as a command, but in a sentence it’ll be treated as any other word, assuming you don’t pause for too long before it.


You're probably thinking of some other voice command system, or Dragon's built-in commands (which are bad because they have no continuous recognition, requiring constant pauses while talking). Voicecode behaves exactly as described in the article.

I just defined a command "return" in Voicecode which presses enter.

"testing return testing test test return test return test" spoken in one breath types the following:

  testing
  testing test test
  test
  test
Voicecode defines commands in Dragon by just adding words to the English vocabulary, leaving Dragon in dictation mode, and running a parser on the English output. Commands are executed anywhere they land in the phrase. The Voicecode grammar makes this somewhat less painful by using lots of non-English words for commands... but this approach is convoluted and hurts accuracy quite a bit imo.


The pausing solution is incorrect, but to say you could never use the word is also incorrect. See "keeper" such as "keeper return".


No, pausing will not prevent the word being recognized as a command in Voicecode. However, you can tell VC to ignore commands and just treat what's spoken as regular text. The command "keeper" will do this.


however, voicecode's use of dictation as I outlined in my sibling comment means keeper will type out all the nonsense words too (which is some of what I meant by voicecode's approach affecting accuracy)


Seems convenient. Eyes-free programming would be desirable too.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: