Using Voice to Code Faster than Keyboard

mechanical_fish · on Aug 13, 2013

A few months ago I had an RSI problem so bad - able to type only a minute at a time, even sitting with hands on keyboard hurt - that I started down this route. This video was, literally, a life-altering motivator for me, and I was quite obsessed with it.

Ironically, after seeing a physical therapist - which, let me tell you, you should do at the first sign of pain, because while they can't help some people I personally am batting 1.000 with PTs for RSI over my many-year career - my recovery is now so complete that I've totally fallen off the voice-computing path... for now. But I intend to keep going, not just because it is hilarious but because, well, RSI happens and it really pays to vary the routine sooner rather than later. There is nothing like trying to do a ton of emergency scripting on Python and emacs at the lowest possible point of your productivity.

The most important hint I have so far is: do not waste time with Mac OS. You need a PC running the Windows version of Dragon. The Mac version is pretty good for occasional email but lousy for emacs because it doesn't have the Python hook into the event loop that a saint hacked into the PC version years ago before leaving Dragon.

The speechcomputing.com forums are your friend.

Yeah, they say there is an open-source recognition engine that works okay, and time spent improving free recognition engines is time that really improves the world for all kinds of injured people, but here's the problem: when you need a speech system you really need it, and there are a lot of moving parts. Dragon, and Windows, and a super PC to run it on are super cheap compared to your time, especially when your time is in six-minute increments punctuated by pain.

milos_cohagen · on Aug 13, 2013

As someone with a disability (quadriplegic), who types/codes with one finger, I find it appalling that Nuance, Apple and Google haven't opened up their speech recognition systems through a rudimentary API that would allow innovation that would _directly_ help the lives of me and many other disabled people whether it's RSI or worse.

mechanical_fish · on Aug 13, 2013

It was a shock to me to discover that the livelihood and happiness of so many people depends on a dubiously-reliable unofficial API that was hacked into Dragon years ago and that has been lovingly preserved ever since, just below the radar. It feels like being critically dependent on Windows 95.

buovjaga · on Aug 13, 2013

Here's some recent (very promising) work on open source dictation: http://grasch.net/node/23

thenomad · on Aug 13, 2013

Having suffered with RSI for years (before the advent of current-gen speech recognition, so it really sucked), I heartily echo this advice:

which, let me tell you, you should do at the first sign of pain

Do that. REALLY REALLY do that.

Also, this book is superb:

http://www.amazon.co.uk/Its-Carpal-Tunnel-Syndrome-Professio...

read · on Aug 14, 2013

The most surprising information I discovered curing RSI is:

http://aaroniba.net/articles/tmp/how-i-cured-my-rsi-pain.htm...

which recommends

http://www.amazon.com/Mindbody-Prescription-Healing-Body-Pai...

darkmighty · on Aug 14, 2013

Pseudoscience? Here?

Placebo pills seem a lot less time consuming than the suggested above...

pierrend · on Aug 14, 2013

This book has been really helpful for me. I recommend it.

tavisrudd · on Aug 18, 2013

Great book!

lifeformed · on Aug 13, 2013

I guess it depends on the type of software you're working on, but input speed has never been close to being the bottleneck with coding for me...

Most of the time I'm trying to figure out what to do or how to implement an algorithm. Rarely do I get those mad-scientist frenzies where I'm typing away frantically trying to get all the words down as they come into my mind in a flash of inspiration.

cturner · on Aug 13, 2013

It's very different for different programmers.

I've worked with people who are skilled developers and who can't even touch type. They have a slow-pace, methodical way of working. Many look at the keyboard over glasses and hunt and peck. The professor who ported plan 9 to raspberry pi (recent video here) is an example of this approach.

On the other hand, I have a shocking memory and can't hold context for long. Sometimes I come to write a piece of code and find that I wrote it last week and can't remember a thing about it.

I work by crashing through. I stalk the problem, procrastinate, drink tea, write short essays about what's stopping me from getting started. Eventually I get the whole problem in my head, and then need to get it down and done before I get tired. When I'm in this state and I need to solve a problem that I could use a standard library function for, often I'll just hammer out code to make the problem go away (list comprehension, string manipulation and the like) in order not to cause any extra load on my short term memory or distraction. Raw typing speed is very important. A drop in pace would hurt a lot.

jostylr · on Aug 14, 2013

Have you tried literate programming? It is a way to download your thought process into the "code". Particularly good for those who like to write.

An example tool to create it is my own program at https://github.com/jostylr/literate-programming which uses markdown as the syntax. While the examples are web language-flavored, it can be used with any language.

DigitalJack · on Aug 14, 2013

That's an interesting approach... the writing essays about what's stopping you from getting started. I think I might have to try that.

lambda · on Aug 13, 2013

It's one thing to not be able to write code as fast as you can type. It's another to use a speech to text input method that's designed for long-form prose and try to use it to code. Can you imagine the frustration of trying to enter longCamelCaseVariableNames without a special macro to do so? I don't know the usual commands in Dragon, but I imagine it would be something like: "long delete space uppercase camel delete space uppercase case delete space upper case variable delete space uppercase names", possibly with a few false starts and undos in there as it interprets some of your words as commands rather than code.

To experience something like it, try using your phone keyboard, with word prediction on, to write code. It will be slow, and frustrating, and have a lot of false starts.

There's a big difference between "not the fastest way to enter text" and "so slow it's unusable", and the impression I get is that without extensive macros like this, most speech to text systems are so slow as to be unusable for writing code.

egypturnash · on Aug 14, 2013

That's kinda the point of this article. He's got a bunch of macros and idiosyncratic commands.

At 11:30 in the video:

"Camel this is a test" -> thisIsATest "Studly this is a test" -> ThisIsATest "Jive this is a test" -> "this-is-a-test" "Dot word this is a test" -> "this.is.a.test" "Score this is a test" -> "this_is_a_test" "Mara" -> selects all text on screen "Chik" -> delete

He says he'll release his code in a few months.

lambda · on Aug 15, 2013

Yes, that was my point. I watched the video, and picked the camel case example from it.

I was replying to someone who said that input speed is not the main bottleneck in coding, hence implying that it's not all that useful to do things to improve input speed. While I concede that input speed is not the primary bottleneck, my point is that without macros like this to speed it up, voice input would be way too slow to do anything useful.

cbhl · on Aug 14, 2013

... the video is itself a few months old.

SomeCallMeTim · on Aug 14, 2013

I frequently have times when I'm doing things like "writing articles on what's stopping me from coding" and the like. But for me, when I'm on, I'm ON, and in those periods, being able to put code together reliably and quickly is of utmost importance.

Even with the macros and shortcuts he shows, I still would be slower using a system like that. When I'm typing in a good editor, I can blast out code VERY quickly, and when I've typed it, I KNOW it's what I meant. When he says it, he has to stop and look to ensure the code matches what he said.

Yes, he can say a phrase like "camel someVariableName" quickly, and sometimes it Just Works, but when it doesn't, he has to back up and say it again. That kind of distraction can throw me off my train of thought, and the damage to my productivity would be profound.

That said, it still IS great for anyone with an RSI as an alternate way to enter code. I just don't buy the "it could be better even for people who don't need it" argument. Especially with his claim that I would need to abandon my modern editor with awesome language support for one of those relics that relies on CTAGS.

pseudobry · on Aug 13, 2013

...Did you watch the video?

moconnor · on Aug 13, 2013

This; if I'm typing and jumping around the code at full speed for long periods it usually means either:

1. I haven't understood the problem

2. I haven't understood the solution

3. I need to spend five minutes improving my Vim macros

reeses · on Aug 14, 2013

#3 is often the equivalent of taking a walk or a shower, or walking in the shower. It's enough of a context shift that your brain will forget the inessential and you'll notice the pattern you were hoping to extract.

I think it's one of the great things about working with extensible tools and having a tool-building mindset. You can maintain momentum while relaxing your brain from working on a seemingly intractable problem.

But, you should be using emacs. :P

nightski · on Aug 13, 2013

No, but the less time you spend going through the mechanical action of translating thought to code the more time you have to focus on solving problems.

dmitshur · on Aug 13, 2013

It usually happens during refactoring, or when you're doing something you've done before, so you already know pretty much exactly what needs to be done and you're just executing.

It also helps to stop coding for a minute, think what you want to do, then code until you stop typing for more than 5 seconds, repeat.

m_mueller · on Aug 13, 2013

When refactoring: Instead of typing fast, use ST multicursors or VI/emacs macros in an intelligent way. And I really really recommend ST if you don't want to debug your editor macros before debugging your build macros before debugging your code macros before debugging your code (yo dawg etcetera).

sophacles · on Aug 14, 2013

On the other end are us visual thinkers. I could do all of that proficiently just fine. In fact I do plenty of macros and scripts and so on. But at the end of the day, I think in pictures. I end up using a lot of editor short cuts and "lots of keypresses" style refactoring while I work out the shape I really want. Then, when I get that, fire off some macros to deal with the rest, cleanup, etc.

m_mueller · on Aug 14, 2013

That's what I like so much about multicursors: It's a visual way of mass editing.

DennisP · on Aug 13, 2013

People say the same thing about learning a good text editor. Personally I find that while I spend most of my time thinking, when it comes time to enter or edit code it helps a lot if I can do it as quickly as possible. That way I stay in flow instead of getting bored and clicking over to HN.

burke · on Aug 13, 2013

I was never sure which side of this argument I came down on, and then I switched to a Kinesis Advantage keyboard.

I had to slow down my typing for a couple weeks to get the finger positions right. The whole time, I felt like I was coding with a hangover. I felt like I couldn't think properly, just because of the reduced brain->computer bandwidth.

bostonvaulter2 · on Aug 14, 2013

Did you eventually get better at typing on the Kinesis?

burke · on Aug 15, 2013

Yes. After 2 weeks, I didn't feel handicapped. After 4 weeks I could type as fast as before (75-80wpm). Now, about 6 months later, I can type 95wpm on a good day.

henrik_w · on Aug 13, 2013

Tangentially related, but I'll throw it in here, since so many developers aren't taking ergonomics seriously. RSI can happen to you if you are not careful, and it can wreck your career (almost happened to me). Several years ago, I started having aches in my arms. Over half a year it got gradually worse, until it was so bad, I thought I had to give up coding altogether. Fortunately, I managed to get it under control, mostly with the aid of a break program, and an ergonomic keyboard and mouse. I'm now completely over it, but I still need to be careful not to get it back. A lot more details in this post: http://henrikwarne.com/2012/02/18/how-i-beat-rsi/

muxxa · on Aug 13, 2013

Personal anecdote: I correlated my RSI directly to drinking coffee (tea is okay). I notice when I'm caffeinated that my posture is very different and I hold postures (e.g. holding down the shift key) for much longer. If RSI starts to blight you, try substituting your morning coffee for tea or water. For me, a break program just increased the stress levels of 'wanting to get something done', which I think is the root cause of RSI (stress).

dctoedt · on Aug 13, 2013

> try substituting your morning coffee for tea or water.

Syntax [edit:] tip:

"try substituting tea or water for your morning coffee"

or

"try replacing your morning coffee with tea or water"

EDIT: For the downvoters: Fairly or unfairly, in the non-tech world people judge you by your choice and arrangement of words. (Compilers do much the same thing, of course.)

</pedantry>

mechanical_fish · on Aug 13, 2013

Nice people also judge those who try to shame people (not all of whom are native speakers of English) into silence on health-related forum threads by picking on irrelevancies.

delluminatus · on Aug 13, 2013

Yes, let's ignore non-native English speakers' mistakes. That way, they will never learn, and we can continue to subjugate them, along with those for whom English is a first language, but cannot speak it correctly, probably because they were never taught that "should have" is not spelt "should of" and that their "they're"s aren't quite there.

mechanical_fish · on Aug 13, 2013

"Spelt", in my dialect, is incorrectly spelled, and is a noun referring to a variety of wheat.

Now, was the "correction" I just offered you effective and useful, or was it merely irrelevant, provincial, chauvinistic, uninvited, uninviting, and just plain rude?

(A note to downthread grammar trolls: I just used an Oxford comma, boldly, without apology. Have fun.)

dctoedt · on Aug 13, 2013

> those who try to shame people ... into silence on health-related forum threads

That's a bit overstated. But mechanical_fish is right that that my own choice of words could have been more tactful. That's why I changed "Syntax correction" to "Syntax tip" in the GP.

egypturnash · on Aug 14, 2013

I like to deliver my occasional spelling correction comments like this: "Polite spelling correction: word, not werd". Opening the comment with "polite" seems to be a very good way to flag that you're not trying to engage in any power games or whatnot.

ajross · on Aug 13, 2013

For the life of me, I can't understand what you found wrong with that use of "substituting". It's correct. It's clear. It's perhaps less colloquial, but it's hardly inscrutable tech jargon.

Surely there are better targets for your editor's urges...

dctoedt · on Aug 13, 2013

> For the life of me, I can't understand what you found wrong with that use of "substituting". It's correct. It's clear. It's perhaps less colloquial, but it's hardly inscrutable tech jargon.

It's an issue of standard word meaning, not tech jargon. In the context of what 'muxxa appeared to be saying, his (or her?) use of substituting was exactly backwards.

What 'muxxa said was that caffeine seemed to exacerbate his RSI, and that substituting his morning coffee for tea or water helped. But the conventional use of the verb to substitute is to put or use in the place of another [1].

According to that conventional usage, therefore, 'muxxa was recommending putting his morning coffee in the place of tea or water. That seems to be exactly the opposite of what he was saying in the rest of the paragraph about the adverse effect of caffeine on his RSI.

[1] http://www.merriam-webster.com/dictionary/substitute

reeses · on Aug 14, 2013

s/coffee/tea/g

Uninflected language is a pain. Ditransitive terms (substitute, comprise, etc.) are easily misunderstood at the best of times.

lutorm · on Aug 13, 2013

Interestingly, when my RSI was bad and I was writing my PhD thesis with NaturallySpeaking, I noticed that voice fatigue was directly related to drinking coffee, too. The more coffee I drank, the move tired my voice would be at the end of the day. Then I mentioned this to a singer friend and she said basically said "of course. every singer knows that caffeine is bad for your voice."

mdaniel · on Aug 13, 2013

My counter-argument to voice-driven coding has been primarily around the input bandwidth and the fact that you must work from home with that kind of setup.

I guess the presenter conducted the "faster than the keyboard" test under very controlled circumstances (e.g. only working on his own code, so one doesn't have to deal with non-english-word variables/functions).

I don't mean to be a hater, because that was an _amazing_ demo, but I don't believe it's the holy grail the title implies it is.

mechanical_fish · on Aug 13, 2013

It is a limitation, but when your other choice is "not working at all, pain, depression, despair" having to work at home is the least of your problems.

I have a grimmer point to make: Working out of crappy half-assed "startup incubators" with lousy desks, lousy seating, and an atmosphere flavored with stress was a direct contributor to my own RSI problems. You might not want to wait until you have symptoms to conclude that having an actual desk and some quiet is a good idea.

applecore · on Aug 13, 2013

> you must work from home with that kind of setup.

Or, you have a private office with a door that closes.

mhd · on Aug 13, 2013

A high-quality headset, maybe even with some noise-canceling features should work okay, too. It wouldn't be that much different from a call center, and those don't usually get their own offices, either.

Sure, not the ideal, distraction-free environment, but neither is a cubicle farm.

WalterSear · on Aug 13, 2013

Voice recognition isn't good enough for that yet: hyper-cardioid mics can only do so much.

mhd · on Aug 13, 2013

Really, Dragon can't cope with someone sitting ten feet away and speaking at the same time? So I guess no listening to radio, either. Is it just the specifics of speech or is it that noise sensitive.

Maybe someone should do some kind of voice rec "groupware" then, where the relatively louder results of the other person are used to filter out false positives on my end...

tavisrudd · on Aug 18, 2013

The mic I used in the video can actually cope with very noisy environments. With lesser mics, speech recognition is useless with even mild background noise.

JackpotDen · on Aug 15, 2013

I think you need your own office in order not to distract coworkers

rnovak · on Aug 13, 2013

the day the average programmer gets an office, well...

I don't even get a cube where I work

georgemcbay · on Aug 13, 2013

Mentioned this on HN previously but as a nearly 40 year old developer who has been developing professionally for nearly 20 years -- it used to be the norm for programmers to get their own offices, even just the regular joe programmers... Places really tight for space might put two guys in a very spacious shared corner office...

A few years into my career the idea of cubicles caught on and quickly became the norm, and now of course we're stuck with these horrible open offices that are, in my experience, just absolutely dreadful for productivity; but since everyone is doing it nobody really notices anymore.

itafroma · on Aug 13, 2013

> A few years into my career the idea of cubicles caught on and quickly became the norm, and now of course we're stuck with these horrible open offices that are, in my experience, just absolutely dreadful for productivity; but since everyone is doing it nobody really notices anymore.

You can largely thank Jim McCarthy of Microsoft fame for that, who in the mid 90s coined the concept, "beware of a guy in a room":

https://www.youtube.com/watch?v=oY6BCHqEbyc

https://blogs.msdn.com/b/david_gristwood/archive/2004/06/24/...

http://www.codinghorror.com/blog/2008/06/dont-go-dark.html

cocoflunchy · on Aug 13, 2013

Absolutely relevant article by Joel Spolsky: http://www.joelonsoftware.com/articles/fieldguidetodeveloper...

shadeless · on Aug 13, 2013

Also relevant video: https://www.youtube.com/watch?feature=player_detailpage&v=NF...

JabavuAdams · on Aug 13, 2013

The fundamental issue is a mismatch between crafting and coordinating roles.

When I'm in a coordination role, open concept is better because I can hear everything in the office and route information accordingly.

When I'm in a crafting role, open concept is death to productivity.

sybhn · on Aug 13, 2013

it's funny. The place i'm at now, in the valley and a successful post IPO SaaS company, management believes that open space and engineers running around, yelling and waving hands is sign or productivity. I have to escape to the kitchen to get anything requiring the tiniest level of concentration.

wwweston · on Aug 14, 2013

Yep. The difference in one role (management) the conditions you're describing read as signals of activity. In the other role (development) it reads as noise interfering with the activity you're working on.

A "successful" company probably already has a culture that's going to be hard to change, but where management is trainable, you can sometimes improve things by giving them something else like else to focus on, like commit logs, test suites, or ticket updates.

sybhn · on Aug 14, 2013

Glad you bring that up. I'm a manager. I do mostly the things you mentioned to give my team some breathing room, and have to WFH when i want anything serious done. As you can imagine if the ambience is bad for me, it is horrible for my team. I try to help with some WFH days here and there. But it is a culture thing. It's in the freaking DNS of the place. Only so much i can change.

networked · on Aug 13, 2013

>My counter-argument to voice-driven coding has been primarily around the input bandwidth and the fact that you must work from home with that kind of setup.

I wonder how long it will take for reliable subvocal speech reading a-la [1] to become available in consumer products. It could potentially solve not only this problem but a lot of problems related to the use of cell phones in public spaces.

[1] http://www.nasa.gov/home/hqnews/2004/mar/HQ_04093_subvocal_s...

fsck--off · on Aug 13, 2013

"Emacs pinkie" is a non-issue if you use a keyboard with thumb clusters, e.g a Maltron or a Kinesis model. Investing in a good keyboard is just as crucial as investing in a good chair, especially if you make a living by coding. The time that you spend compensating for a bad input device by hacking your own workarounds can be more costly then spending money on a proper solution.

Once you are an adequate touch typist typing speed is only beneficial if you use a language that requires you to type a lot of boilerplate. Even then, you can use an IDE for auto-completion. I can type at very high speeds — as fast as others can input text by using their voice — but I can't remember the last time I needed to type for more than a minute at a time. If you use a language that requires you to spend more time thinking about code than it does to actually type it, typing speed really doesn't matter. Code is like speech in that it is judged by the eloquence, not the speed, of its delivery.

jamesaguilar · on Aug 13, 2013

Yep. I only wish my brain had the bandwidth to produce code at the rate my fingers consume it.

ajross · on Aug 13, 2013

It's similarly much less an issue when you map your keys correctly. Control goes to the left of "A", meta below "/". Much less pinky travel. Sun got this right way back in the 80's with the Type 3 keyboard (vi users prefer its placement of ESC too).

rwg · on Aug 14, 2013

FWIW, you can get an unused Unix layout (Control left of A, Esc left of 1, Backspace directly above Return) Sun Type 6 USB keyboard for around $40.

http://ep.yimg.com/ca/I/memx_2267_226185665

If you're using X11, you can go nuts with xmodmap and get it functioning at least as well as it did on Solaris.

I think getting a genuine Sun keyboard beats just remapping keys on a 101/104-key PC keyboard. There are 12 additional keys at the left and top-left of the keyboard just begging to be remapped for your own nefarious purposes. You also get meta keys that are separate from the Alt key, as well as Compose and AltGr keys for your åçcéñtêd character needs.

Plus when you look down and see the Sun logo, you can reminisce about the old days and have a good cry at your desk.

wtallis · on Aug 14, 2013

Apple also does well in their modifier key placement by having a narrower space bar that extends from "C" to "M" on most keyboards, meaning the modifier keys next to the space bar are easily reachable with thumbs.

pachydermic · on Aug 13, 2013

Do those keyboards really make a difference? I'd be happy to try them and give it a fair chance, but I'm not okay with a 455 pounds gamble.

The pedals look like a good idea (I'm an emacs user)... but they do seem pretty goofy.

klancaster1957 · on Aug 13, 2013

I had VERY bad RSI and had tried everything under the sun. Moving the the Kinesis stopped it dead. No more typing pain. Warning: it does take a bit to get used to.

kps · on Aug 14, 2013

Maltron offer rentals for £10/week (assuming from your use of ‘pounds’ that you're in the UK), discounted if you buy it. http://www.maltron.com/keyboard-info/keyboard-hire-uk-only

Kinesis' UK distributor has a 30 day 'sale or return' option (you pay shipping). http://www.ergonomics.co.uk/faq.html

tavisrudd · on Aug 18, 2013

I used a Kinesis for many years and had a great chair and ergonomic setup before developing the RSI that I describe in the video.

ics · on Aug 13, 2013

I was trying to work something like this out to try about a month ago but had to put it aside for later. Running my speech recognition inside a virtual machine was a dealbreaker, but not all that uncommon for people doing this sort of thing. I really, really wanted to get Julius[1] running in OS X but after a couple tries I couldn't get it to build (problem on my end– this is a good reminder to get it sorted out). If you're looking for an alternative to CMU Sphinx that's still FOSS, you really should check Julius out. There are plenty of docs on getting it running with languages other than Japanese. If you're curious about how well it can work, check out this[2] demo (requires Chrome).

[1] http://julius.sourceforge.jp/en_index.php [2] http://www.workinprogress.ca/KIKU/dictation.php

mkl · on Aug 13, 2013

If you're looking for an alternative to CMU Sphinx that's still FOSS, you really should check Julius out. There are plenty of docs on getting it running with languages other than Japanese. If you're curious about how well it can work, check out this[2] demo (requires Chrome).

[2] http://www.workinprogress.ca/KIKU/dictation.php

It seems like this demo is not using Julius, but it's mixing messages a bit. The bottom of the page says "Service provided by Google Inc.", but the link right next to it (for downloadable software, also apparently called "kiku"?) says Julius etc.

porker · on Aug 13, 2013

That does work well. I'm happy to pay for Dragon, but I find the Windows version so superior to the OSX software I refuse to run it on OSX...

reeses · on Aug 13, 2013

The OS X "version" is a nightmare. It's guaranteed to break with every major OS release. Nuance takes months to release working versions. When it does work, it's hostile to any other apps that use the accessibility hooks, such as Text Expander, Alfred, etc., which would be awesome with speech input.

The history of the Mac version (acquisition of a company that licensed the Dragon engine) means that it and the Windows versions are very likely permanently divergent. Given the relative market sizes, the Windows version has the best development, the best recognition, and the least schizophrenic product support.

I am glad that dictation (apparently powered by Nuance's engine anyway) is to be included in Mavericks, including a disconnected (i.e., non-Siri) mode. Maintaining an application with a skeleton crew and relying on system services that change at a fundamental level every couple years is not a path to customer satisfaction.

porker · on Aug 15, 2013

> I am glad that dictation (apparently powered by Nuance's engine anyway) is to be included in Mavericks

I'd missed that, very interesting. I need a disconnected mode as being able to only dictate short passages, and especially using an online system that doesn't learn from corrections, is a pain.

swayvil · on Aug 13, 2013

~99% of my time coding is spent working through the stuff in my head

Now if they could optimize that...

crazygringo · on Aug 13, 2013

Where is it backed up that it's faster than the keyboard?

For the couple of minutes I watched of him demoing it... I type waaaay faster than that. In fact, I can't possibly imagine how I could speak faster than I can code on the keyboard.

(Regular English sentences are another story, but code is full of important punctuation, exact cursor positioning, single characters, etc...)

I mean, this is awesome for people with trouble typing (which was my own case a few months back), but I don't think it needs to be over-sold by being "better"...

cbhl · on Aug 14, 2013

I think this is a silly point of contention. If I recall correctly, it's established that for English-language prose, speech recognition is easily faster (300+ wpm) than typing (150-200 wpm if you're good; 20-50 wpm typical, IIRC).

All he needs to establish is that he can do things like type aVariableNameLikeThis in six words (16% overhead) instead of fifteen[0] (200% overhead) and the rest of the claim follows.

[0] If you tried to type it using the out-of-box dictation in, say, Android or Dragon, you'd probably start with something like "lowercase a backspace uppercase variable backspace uppercase name..."

sspiff · on Aug 13, 2013

Whenever I see posts about voice controlling your computer, I spontaneously think "thank the heavens I don't have to share an office with you." I realize some people work alone, at home or in a sound proof office, but every work environment I've worked in has had a shared acoustic space.

These voice control schemes almost always end up as a cool gimmick, and rarely as a productivity boosting solution.

asgard1024 · on Aug 13, 2013

Because you're thinking about it wrong. Together with HUD, it will be a godsend for anybody who needs to have hands free and yet work with a computer. And if the microphone is close enough your mouth, you won't have to talk loudly to it.

For example, I could go to tend garden and yet think about some problem, take notes, even code. Or check email, browse internet. I can work on hardware thing and have schematics or specifications appear in front of my eyes. I can have a walk and take notes. I can eat while working.

Eventually, no office will be required. You can just stroll in the park and get the work done.

sspiff · on Aug 13, 2013

None of those usecases seem like something I would find useful, and talking with my mouth full doesn't seem convenient, I'm guessing your recognition ratio would go way down.

andreyf · on Aug 14, 2013

Stroll through the park coding on your Google Glass doesn't seem useful? Well, maybe not useful, but it would certainly be cool :)

rossjudson · on Aug 13, 2013

While I've never been able to adapt to using voice to code, what I have done successfully is use Dragon to document my code. I set up some macros that could move forwards and backwards between methods in Eclipse, added a "start doc" macro...Eclipse does a lot of very smart completion so basic features in Dragon handled it without difficulty.

Dictating your javadoc is pretty damn convenient.

JabavuAdams · on Aug 13, 2013

I have a relatively small working memory, and I've been coding since I was a little kid. Coding is like thinking out loud for me.

My default way to work is to bang some stuff into an editor and then constantly revise and reshape it. I'll draw diagrams on paper or white-board as necessary. I also tend to cut and paste "code notes" into a separate window so I don't have to keep that in my head.

MarcScott · on Aug 13, 2013

This reminded me of the guy who tried some Perl scripting using Windows Vista voice recognition.

http://www.youtube.com/watch?v=MzJ0CytAsec

colinm · on Aug 13, 2013

First thing that came to mind.

Hilarious!

asgard1024 · on Aug 13, 2013

I like it a lot. I wish there would be solution to tie this with say Google Glass, and be able to go on a walk or sit in the woods and code or make notes with it, hands free. Or while doing cooking or laundry, etc.

It's unfortunate he couldn't get the OSS speech recognition to work, though.

SimHacker · on Aug 13, 2013

Yea, Google Glass would be ideal for DoucheScript Brogramming. Everyone could listen to you reindent your code while you held up the line at Starbucks.

JabavuAdams · on Aug 13, 2013

Was just thinking of a way to be able to code on the subway. While it could annoy some, I'm often annoyed by stupid conversations on the subway. Can't close ears.

daGrevis · on Aug 13, 2013

Reminds me of VimSpeak.

https://github.com/AshleyF/VimSpeak http://www.youtube.com/watch?v=TEBMlXRjhZY

reirob · on Aug 13, 2013

Thanks for sharing!

Just watched it and I find it awesome, not just for the voice recognition but as well as a nice spoken out video of VIM usage. I learned some of nice things that I will use now more regularly in VIM.

ohwp · on Aug 13, 2013

What I think is interesting is that a lot can be done to make typing easier and more human when you can type like you speak (and think).

For example: we say/think

  for each item in list

but in a lot of languages you need to type something like

  foreach(item in list) {

A step further: we say/think

  let a be the substring of b from 1 to the end

we need to type

  a = b.substring(1)

Ofcourse the last example is much shorter and even more readable (to the machine for sure) but maybe code could be a little more human.

gd1 · on Aug 13, 2013

I disagree. You could argue that a musician probably thinks "I have to play a D# for one and a half beats" as well. Or they can draw a dotted quarter on the sheet. We have symbolic languages for a reason - they are, once learnt, superior. If anything code needs to move further away from spoken language, more in the direction of APL and its descendants.

A skilled musician likely doesn't engage the speech centres of their brain, they see a note on the sheet and translate it to motion. You should be able to take in the symbol for "apply a function to each item in a vector" at a glance without any clumsy English getting in the way. APL had it right, but coding has been crippled by catering to the lowest common denominator.

ohwp · on Aug 13, 2013

"they see a note on the sheet and translate it to motion"

Indeed. I think notes are more 'human' than most programming languages. If the music goes up, the notes go up. If the notes are short they look short (and more dense).

But I agree that typing "let a be the subtring of b from 1 to the end" is no fun. So I'm glad we have symbolic languages. But I think they could be made more 'human'.

sp332 · on Aug 13, 2013

It isn't about English, but getting closer to the way programmers think. Most people don't think b.substring(1) natively any more than a musician would think "Da Capo al Coda". There are good parts of course; b[1:] is about as natural as ♩. for notation.

speeq · on Aug 13, 2013

That was a fun talk to watch. Someone should try something similar using some kind of brainwave detecting glass gear to make it possible to code by simply thinking. That'd be awesome.

burke · on Aug 13, 2013

Brainwave tech doesn't really get that kind of bandwidth without implants (and even then, interpreting the signals usefully is decades out). The skull is an unfortunately effective faraday cage, and it makes it impossible to get appropriately high-resolution and low-latency data. Maybe we'll figure it out eventually, but we're not even close right now.

therandomguy · on Aug 13, 2013

At that point why not cut out the coding altogether? Just visualize the output and there it is.

dylangs1030 · on Aug 13, 2013

Just materialize whatever you want, in perfect working order, by thinking of it.

TeMPOraL · on Aug 13, 2013

With 2030-grade brainwave gear and 3D printing, why not?

charlieflowers · on Aug 13, 2013

Question (halfway on topic) --

Who makes the best speech recognition software in the world? Regardless of whether it is available to consumers ... who is the best at it?

In particular, how do Apple (Siri) and Google (Google Now) compare to Nuance's stuff? Is Nuance so far ahead of everyone else that they're the clear leader? Or is their codebase "legacy" and vulnerable to better, more accurate software which can be built now due to better algorithms and approaches?

DigitalJack · on Aug 14, 2013

I don't know who makes the best. But I know the history behind dragon is very sad.

http://en.wikipedia.org/wiki/Dragon_NaturallySpeaking#Histor...

charlieflowers · on Aug 18, 2013

Wow! That would lead one to speculate that perhaps they haven't had the best of engineering teams focused on improving the product over the years! Which means there might be a huge opportunity here.

cbhl · on Aug 14, 2013

A word of warning -- I started dictating all of my email and Facebook replies on my Android using Google's voice keyboard on my Nexus One a few years ago in response to RSI pain in my hands from overusing my cell phone. Within a month, I started losing my voice.

RSI comes in multiple forms; using your voice exclusively is not going to fix the problem. The trick is to switch things up, which involves having alternatives in the first place.

reeses · on Aug 14, 2013

Those vocal exercises singers do seem silly until you run into a problem such as this. They've been working on getting more mileage out of their larynxes for hundreds of years and have some pragmatic practices that can help.

Lots of water, avoiding nastiness in the air, learning the bare minimum volume of air you can push through your throat and still get results, and taking breaks when your body (either by feel or sound) tells you that it's tired.

In this specific case, adding leverage with short macros such as "laip" and "slap" is essential. There's no way you could work a full day spelling everything that wasn't in the recognizer's dictionary.

klancaster1957 · on Aug 13, 2013

In the video he mentions that he wish he had known about the previous talk. Looked it up - http://pyvideo.org/video/1706/plover-thought-to-text-at-240-.... Pretty interesting. They are applying court reporter techniques to coding, cutting down on the keystrokes immensely.

mugenx86 · on Aug 13, 2013

Anyone else find speaking commands out loud to distract from thought?

"slap... slap... jog... dot... word... chk... slap... snore"

jotux · on Aug 13, 2013

Most programmers have mnemonics for text motions and symbols so as long as they're mnemonics you're familiar with I'm sure there's no problem.

dylangs1030 · on Aug 13, 2013

This is amazing!

If you could speak a bit softer with this, maybe throw in some noise-cancelling headphones, I could totally see this being useful even in an office situation.

I could see a potential pseudo-language developing out of this to abstract a lot of the individual characters, functions and common invocations used while coding.

unclesaamm · on Aug 13, 2013

Okay, here's the million dollar question that isn't on the FAQ and no one in the audience asked.

How the hell did he code it without using his hands? With help?

To his amanuensis: Slap. York. Tork. Jorb. Chomp.

Or maybe he felt his hands going, and he spent the last few months of his pre-RSI existence coding this up.

tavisrudd · on Aug 18, 2013

Once I got the basics working with the DragonFly and Natlink libraries I mentioned, I bootstrapped the rest mostly by voice.

bshanks · on Aug 16, 2013

Here's an open source Python script i wrote a few years ago that allows you to type with your voice. It's based off of CMU Sphinx. The accuracy is almost certainly not as good as Dragon, and it doesn't have a macro facility, so you cannot code as fast as typing. I haven't improved it much over the past few years because my hands got better and i don't need it anymore.

https://sourceforge.net/projects/voicekey/ (tarball, includes language model) https://github.com/bshanks/voicekey (repo, does not include language model)

tavisrudd · on Aug 18, 2013

Hi, I'm the guy in the video. You might also be interested in a presentation I gave last Sept at Strangeloop with a much longer demo of coding in Clojure and Elisp: http://www.infoq.com/presentations/Programming-Voice

There's also this lightning talk http://www.youtube.com/watch?v=qXvbQQV1ydo from PolyglotConf (warning: crappy audio from a shaky cell phone cam).

I promised to release my duct tape code later this year. I'm a bit behind schedule with that but it should be out in a month or two.

brownbat · on Aug 18, 2013

Strangeloop was a great demo.

What's the next big leap for speech to text programming? A language designed specifically to be speakable, ie, all keywords and no symbols?

I mean, I'd like speech recognition to get more natural error correction, drawing more from the way we use inflection to give feedback about which syllables to correct. (I love how Google on mobile now gives visual indication of which syllables it heard clearly, and which it didn't. I just wish it would understand when I shout "No, X not Y" to replace just that one misheard word.)

It'd be interesting to hear about where voice is heading from someone who uses the technology far more.

unono · on Aug 13, 2013

There's a lot of potential for multimodal gamified programming using tablets. A combination of gesturing, shaking the tablet, face expression, hand drawing, myo sensing, as well speech, in addition to machine learning in the compiler and for regular expression building. Within the next year a whole raft of apps along these lines will be coming online in the app stores. Big opportunity for Indie developers on the app store, you can easily charge $20+ if they're good and disrupt the emacs/vi/eclipse monopoly/monotony.

D9u · on Aug 13, 2013

This is a cool project, as I think a voice interface would be the ultimate in computing, something like in "2001, A Space Odyssey," or "Star Trek."

I remember first playing with voice recognition and voice command on a PPC Mac back in 1994.

That the technology hasn't progressed along the same lines as cell phones and processors is testament to how difficult voice recognition actually is when dealing with a wide variation of dialect within any given language.

I would love to be able to use my voice as my main input to my computers and other devices.

balakk · on Aug 13, 2013

It's awesome that it works, but that looks totally tiring.

singularity2001 · on Aug 13, 2013

We need a new programming language optimized for voice: https://github.com/pannous/natural-english-script

frakkingcylons · on Aug 13, 2013

Interesting talk. Naturally it made me think about steps I should take to prevent any kind of RSI. Should I be seriously concerned if I type for about 4-5 hours on average per day? How can I prevent it?

DennisP · on Aug 13, 2013

Anecdotal: I was getting soreness in my finger joints, and about that time went to a presentation talking about repetitive motion causing arthritis for a lot of typists. It was pretty grisly. Padding in finger joints wears down, and little chips of bone start breaking off, causing pain from bone chips and realignment of fingers to fit the new bone faces. Padding restores with rest, so it helps a lot to catch it early.

I bought a couple nice mechanical keyboards with Cherry switches (red and brown). I type very lightly on them, seldom bottoming out the keys. Finger troubles went away.

abraham_s · on Aug 13, 2013

This might be a good place to start. http://matt.might.net/articles/preventing-and-managing-rsi/

Basically review your work environment, keyboard, chair, table and posture.

lutorm · on Aug 13, 2013

Take breaks.

quantumpotato_ · on Aug 13, 2013

Any good machine intelligence integrated with IDE? I'd love some AI autocompleting things.

SimHacker · on Aug 13, 2013

"Uuuuhh..." should trigger the autocomplete menu.

ChrisAntaki · on Aug 13, 2013

This would be amazing, especially if it one day supported Linux natively.

jerogarcia · on Aug 13, 2013

this is great , even that seems complicated and hard to get used to ... it's a fantastic option when nothing else works.

krupan · on Aug 13, 2013

Amazing, but the cubical farm is noisy enough as it is.

stretchwithme · on Aug 13, 2013

Welcome to the call center. How may I annoy you?

frozenport · on Aug 13, 2013

I wonder if we should also be voice coding in a language drastically different then for example, C++? Maybe a language more syntactically friendly for voice?

SimHacker · on Aug 13, 2013

One of the rules of Forth was that you had to provide a standard pronunciation with the documentation of all your words, so you could speak Forth code over the phone. That was important when words consist of any sequence of characters or punctuation, delimited by spaces.

singularity2001 · on Aug 13, 2013

Fully agree This why we started working on a new language called "english script" : https://github.com/pannous/natural-english-script/tree/maste...

ufo · on Aug 13, 2013

hmm, wouldnt it be better to link to some examples instead of the implementation?

robertfw · on Aug 13, 2013

I've been wondering if lisp languages would be a good fit for this, as they have a minimum of syntax.

krisc · on Aug 14, 2013

He mentions that in the video. The way Lisps are structured make it simpler for voice recognition.