A few months ago I had an RSI problem so bad - able to type only a minute at a time, even sitting with hands on keyboard hurt - that I started down this route. This video was, literally, a life-altering motivator for me, and I was quite obsessed with it.
Ironically, after seeing a physical therapist - which, let me tell you, you should do at the first sign of pain, because while they can't help some people I personally am batting 1.000 with PTs for RSI over my many-year career - my recovery is now so complete that I've totally fallen off the voice-computing path... for now. But I intend to keep going, not just because it is hilarious but because, well, RSI happens and it really pays to vary the routine sooner rather than later. There is nothing like trying to do a ton of emergency scripting on Python and emacs at the lowest possible point of your productivity.
The most important hint I have so far is: do not waste time with Mac OS. You need a PC running the Windows version of Dragon. The Mac version is pretty good for occasional email but lousy for emacs because it doesn't have the Python hook into the event loop that a saint hacked into the PC version years ago before leaving Dragon.
The speechcomputing.com forums are your friend.
Yeah, they say there is an open-source recognition engine that works okay, and time spent improving free recognition engines is time that really improves the world for all kinds of injured people, but here's the problem: when you need a speech system you really need it, and there are a lot of moving parts. Dragon, and Windows, and a super PC to run it on are super cheap compared to your time, especially when your time is in six-minute increments punctuated by pain.
As someone with a disability (quadriplegic), who types/codes with one finger, I find it appalling that Nuance, Apple and Google haven't opened up their speech recognition systems through a rudimentary API that would allow innovation that would _directly_ help the lives of me and many other disabled people whether it's RSI or worse.
It was a shock to me to discover that the livelihood and happiness of so many people depends on a dubiously-reliable unofficial API that was hacked into Dragon years ago and that has been lovingly preserved ever since, just below the radar. It feels like being critically dependent on Windows 95.
I guess it depends on the type of software you're working on, but input speed has never been close to being the bottleneck with coding for me...
Most of the time I'm trying to figure out what to do or how to implement an algorithm. Rarely do I get those mad-scientist frenzies where I'm typing away frantically trying to get all the words down as they come into my mind in a flash of inspiration.
I've worked with people who are skilled developers and who can't even touch type. They have a slow-pace, methodical way of working. Many look at the keyboard over glasses and hunt and peck. The professor who ported plan 9 to raspberry pi (recent video here) is an example of this approach.
On the other hand, I have a shocking memory and can't hold context for long. Sometimes I come to write a piece of code and find that I wrote it last week and can't remember a thing about it.
I work by crashing through. I stalk the problem, procrastinate, drink tea, write short essays about what's stopping me from getting started. Eventually I get the whole problem in my head, and then need to get it down and done before I get tired. When I'm in this state and I need to solve a problem that I could use a standard library function for, often I'll just hammer out code to make the problem go away (list comprehension, string manipulation and the like) in order not to cause any extra load on my short term memory or distraction. Raw typing speed is very important. A drop in pace would hurt a lot.
Have you tried literate programming? It is a way to download your thought process into the "code". Particularly good for those who like to write.
An example tool to create it is my own program at https://github.com/jostylr/literate-programming which uses markdown as the syntax. While the examples are web language-flavored, it can be used with any language.
It's one thing to not be able to write code as fast as you can type. It's another to use a speech to text input method that's designed for long-form prose and try to use it to code. Can you imagine the frustration of trying to enter longCamelCaseVariableNames without a special macro to do so? I don't know the usual commands in Dragon, but I imagine it would be something like: "long delete space uppercase camel delete space uppercase case delete space upper case variable delete space uppercase names", possibly with a few false starts and undos in there as it interprets some of your words as commands rather than code.
To experience something like it, try using your phone keyboard, with word prediction on, to write code. It will be slow, and frustrating, and have a lot of false starts.
There's a big difference between "not the fastest way to enter text" and "so slow it's unusable", and the impression I get is that without extensive macros like this, most speech to text systems are so slow as to be unusable for writing code.
That's kinda the point of this article. He's got a bunch of macros and idiosyncratic commands.
At 11:30 in the video:
"Camel this is a test" -> thisIsATest
"Studly this is a test" -> ThisIsATest
"Jive this is a test" -> "this-is-a-test"
"Dot word this is a test" -> "this.is.a.test"
"Score this is a test" -> "this_is_a_test"
"Mara" -> selects all text on screen
"Chik" -> delete
Yes, that was my point. I watched the video, and picked the camel case example from it.
I was replying to someone who said that input speed is not the main bottleneck in coding, hence implying that it's not all that useful to do things to improve input speed. While I concede that input speed is not the primary bottleneck, my point is that without macros like this to speed it up, voice input would be way too slow to do anything useful.
I frequently have times when I'm doing things like "writing articles on what's stopping me from coding" and the like. But for me, when I'm on, I'm ON, and in those periods, being able to put code together reliably and quickly is of utmost importance.
Even with the macros and shortcuts he shows, I still would be slower using a system like that. When I'm typing in a good editor, I can blast out code VERY quickly, and when I've typed it, I KNOW it's what I meant. When he says it, he has to stop and look to ensure the code matches what he said.
Yes, he can say a phrase like "camel someVariableName" quickly, and sometimes it Just Works, but when it doesn't, he has to back up and say it again. That kind of distraction can throw me off my train of thought, and the damage to my productivity would be profound.
That said, it still IS great for anyone with an RSI as an alternate way to enter code. I just don't buy the "it could be better even for people who don't need it" argument. Especially with his claim that I would need to abandon my modern editor with awesome language support for one of those relics that relies on CTAGS.
#3 is often the equivalent of taking a walk or a shower, or walking in the shower. It's enough of a context shift that your brain will forget the inessential and you'll notice the pattern you were hoping to extract.
I think it's one of the great things about working with extensible tools and having a tool-building mindset. You can maintain momentum while relaxing your brain from working on a seemingly intractable problem.
No, but the less time you spend going through the mechanical action of translating thought to code the more time you have to focus on solving problems.
It usually happens during refactoring, or when you're doing something you've done before, so you already know pretty much exactly what needs to be done and you're just executing.
It also helps to stop coding for a minute, think what you want to do, then code until you stop typing for more than 5 seconds, repeat.
When refactoring: Instead of typing fast, use ST multicursors or VI/emacs macros in an intelligent way. And I really really recommend ST if you don't want to debug your editor macros before debugging your build macros before debugging your code macros before debugging your code (yo dawg etcetera).
On the other end are us visual thinkers. I could do all of that proficiently just fine. In fact I do plenty of macros and scripts and so on. But at the end of the day, I think in pictures. I end up using a lot of editor short cuts and "lots of keypresses" style refactoring while I work out the shape I really want. Then, when I get that, fire off some macros to deal with the rest, cleanup, etc.
People say the same thing about learning a good text editor. Personally I find that while I spend most of my time thinking, when it comes time to enter or edit code it helps a lot if I can do it as quickly as possible. That way I stay in flow instead of getting bored and clicking over to HN.
I was never sure which side of this argument I came down on, and then I switched to a Kinesis Advantage keyboard.
I had to slow down my typing for a couple weeks to get the finger positions right. The whole time, I felt like I was coding with a hangover. I felt like I couldn't think properly, just because of the reduced brain->computer bandwidth.
Yes. After 2 weeks, I didn't feel handicapped. After 4 weeks I could type as fast as before (75-80wpm). Now, about 6 months later, I can type 95wpm on a good day.
Tangentially related, but I'll throw it in here, since so many developers aren't taking ergonomics seriously. RSI can happen to you if you are not careful, and it can wreck your career (almost happened to me). Several years ago, I started having aches in my arms. Over half a year it got gradually worse, until it was so bad, I thought I had to give up coding altogether. Fortunately, I managed to get it under control, mostly with the aid of a break program, and an ergonomic keyboard and mouse. I'm now completely over it, but I still need to be careful not to get it back. A lot more details in this post: http://henrikwarne.com/2012/02/18/how-i-beat-rsi/
Personal anecdote: I correlated my RSI directly to drinking coffee (tea is okay). I notice when I'm caffeinated that my posture is very different and I hold postures (e.g. holding down the shift key) for much longer. If RSI starts to blight you, try substituting your morning coffee for tea or water. For me, a break program just increased the stress levels of 'wanting to get something done', which I think is the root cause of RSI (stress).
> try substituting your morning coffee for tea or water.
Syntax [edit:] tip:
"try substituting tea or water for your morning coffee"
or
"try replacing your morning coffee with tea or water"
EDIT: For the downvoters: Fairly or unfairly, in the non-tech world people judge you by your choice and arrangement of words. (Compilers do much the same thing, of course.)
Nice people also judge those who try to shame people (not all of whom are native speakers of English) into silence on health-related forum threads by picking on irrelevancies.
Yes, let's ignore non-native English speakers' mistakes. That way, they will never learn, and we can continue to subjugate them, along with those for whom English is a first language, but cannot speak it correctly, probably because they were never taught that "should have" is not spelt "should of" and that their "they're"s aren't quite there.
"Spelt", in my dialect, is incorrectly spelled, and is a noun referring to a variety of wheat.
Now, was the "correction" I just offered you
effective and useful, or was it merely irrelevant, provincial, chauvinistic, uninvited, uninviting, and just plain rude?
(A note to downthread grammar trolls: I just used an Oxford comma, boldly, without apology. Have fun.)
> those who try to shame people ... into silence on health-related forum threads
That's a bit overstated. But mechanical_fish is right that that my own choice of words could have been more tactful. That's why I changed "Syntax correction" to "Syntax tip" in the GP.
I like to deliver my occasional spelling correction comments like this: "Polite spelling correction: word, not werd". Opening the comment with "polite" seems to be a very good way to flag that you're not trying to engage in any power games or whatnot.
For the life of me, I can't understand what you found wrong with that use of "substituting". It's correct. It's clear. It's perhaps less colloquial, but it's hardly inscrutable tech jargon.
Surely there are better targets for your editor's urges...
> For the life of me, I can't understand what you found wrong with that use of "substituting". It's correct. It's clear. It's perhaps less colloquial, but it's hardly inscrutable tech jargon.
It's an issue of standard word meaning, not tech jargon. In the context of what 'muxxa appeared to be saying, his (or her?) use of substituting was exactly backwards.
What 'muxxa said was that caffeine seemed to exacerbate his RSI, and that substituting his morning coffee for tea or water helped. But the conventional use of the verb to substitute is to put or use in the place of another [1].
According to that conventional usage, therefore, 'muxxa was recommending putting his morning coffee in the place of tea or water. That seems to be exactly the opposite of what he was saying in the rest of the paragraph about the adverse effect of caffeine on his RSI.
Interestingly, when my RSI was bad and I was writing my PhD thesis with NaturallySpeaking, I noticed that voice fatigue was directly related to drinking coffee, too. The more coffee I drank, the move tired my voice would be at the end of the day. Then I mentioned this to a singer friend and she said basically said "of course. every singer knows that caffeine is bad for your voice."
My counter-argument to voice-driven coding has been primarily around the input bandwidth and the fact that you must work from home with that kind of setup.
I guess the presenter conducted the "faster than the keyboard" test under very controlled circumstances (e.g. only working on his own code, so one doesn't have to deal with non-english-word variables/functions).
I don't mean to be a hater, because that was an _amazing_ demo, but I don't believe it's the holy grail the title implies it is.
It is a limitation, but when your other choice is "not working at all, pain, depression, despair" having to work at home is the least of your problems.
I have a grimmer point to make: Working out of crappy half-assed "startup incubators" with lousy desks, lousy seating, and an atmosphere flavored with stress was a direct contributor to my own RSI problems. You might not want to wait until you have symptoms to conclude that having an actual desk and some quiet is a good idea.
A high-quality headset, maybe even with some noise-canceling features should work okay, too. It wouldn't be that much different from a call center, and those don't usually get their own offices, either.
Sure, not the ideal, distraction-free environment, but neither is a cubicle farm.
Really, Dragon can't cope with someone sitting ten feet away and speaking at the same time? So I guess no listening to radio, either. Is it just the specifics of speech or is it that noise sensitive.
Maybe someone should do some kind of voice rec "groupware" then, where the relatively louder results of the other person are used to filter out false positives on my end...
The mic I used in the video can actually cope with very noisy environments. With lesser mics, speech recognition is useless with even mild background noise.
Mentioned this on HN previously but as a nearly 40 year old developer who has been developing professionally for nearly 20 years -- it used to be the norm for programmers to get their own offices, even just the regular joe programmers... Places really tight for space might put two guys in a very spacious shared corner office...
A few years into my career the idea of cubicles caught on and quickly became the norm, and now of course we're stuck with these horrible open offices that are, in my experience, just absolutely dreadful for productivity; but since everyone is doing it nobody really notices anymore.
> A few years into my career the idea of cubicles caught on and quickly became the norm, and now of course we're stuck with these horrible open offices that are, in my experience, just absolutely dreadful for productivity; but since everyone is doing it nobody really notices anymore.
You can largely thank Jim McCarthy of Microsoft fame for that, who in the mid 90s coined the concept, "beware of a guy in a room":
it's funny. The place i'm at now, in the valley and a successful post IPO SaaS company, management believes that open space and engineers running around, yelling and waving hands is sign or productivity.
I have to escape to the kitchen to get anything requiring the tiniest level of concentration.
Yep. The difference in one role (management) the conditions you're describing read as signals of activity. In the other role (development) it reads as noise interfering with the activity you're working on.
A "successful" company probably already has a culture that's going to be hard to change, but where management is trainable, you can sometimes improve things by giving them something else like else to focus on, like commit logs, test suites, or ticket updates.
Glad you bring that up. I'm a manager. I do mostly the things you mentioned to give my team some breathing room, and have to WFH when i want anything serious done.
As you can imagine if the ambience is bad for me, it is horrible for my team. I try to help with some WFH days here and there. But it is a culture thing. It's in the freaking DNS of the place. Only so much i can change.
>My counter-argument to voice-driven coding has been primarily around the input bandwidth and the fact that you must work from home with that kind of setup.
I wonder how long it will take for reliable subvocal speech reading a-la [1] to become available in consumer products. It could potentially solve not only this problem but a lot of problems related to the use of cell phones in public spaces.
"Emacs pinkie" is a non-issue if you use a keyboard with thumb clusters, e.g a Maltron or a Kinesis model. Investing in a good keyboard is just as crucial as investing in a good chair, especially if you make a living by coding. The time that you spend compensating for a bad input device by hacking your own workarounds can be more costly then spending money on a proper solution.
Once you are an adequate touch typist typing speed is only beneficial if you use a language that requires you to type a lot of boilerplate. Even then, you can use an IDE for auto-completion. I can type at very high speeds — as fast as others can input text by using their voice — but I can't remember the last time I needed to type for more than a minute at a time. If you use a language that requires you to spend more time thinking about code than it does to actually type it, typing speed really doesn't matter. Code is like speech in that it is judged by the eloquence, not the speed, of its delivery.
It's similarly much less an issue when you map your keys correctly. Control goes to the left of "A", meta below "/". Much less pinky travel. Sun got this right way back in the 80's with the Type 3 keyboard (vi users prefer its placement of ESC too).
If you're using X11, you can go nuts with xmodmap and get it functioning at least as well as it did on Solaris.
I think getting a genuine Sun keyboard beats just remapping keys on a 101/104-key PC keyboard. There are 12 additional keys at the left and top-left of the keyboard just begging to be remapped for your own nefarious purposes. You also get meta keys that are separate from the Alt key, as well as Compose and AltGr keys for your åçcéñtêd character needs.
Plus when you look down and see the Sun logo, you can reminisce about the old days and have a good cry at your desk.
Apple also does well in their modifier key placement by having a narrower space bar that extends from "C" to "M" on most keyboards, meaning the modifier keys next to the space bar are easily reachable with thumbs.
I had VERY bad RSI and had tried everything under the sun. Moving the the Kinesis stopped it dead. No more typing pain. Warning: it does take a bit to get used to.
I was trying to work something like this out to try about a month ago but had to put it aside for later. Running my speech recognition inside a virtual machine was a dealbreaker, but not all that uncommon for people doing this sort of thing. I really, really wanted to get Julius[1] running in OS X but after a couple tries I couldn't get it to build (problem on my end– this is a good reminder to get it sorted out). If you're looking for an alternative to CMU Sphinx that's still FOSS, you really should check Julius out. There are plenty of docs on getting it running with languages other than Japanese. If you're curious about how well it can work, check out this[2] demo (requires Chrome).
If you're looking for an alternative to CMU Sphinx that's still FOSS, you really should check Julius out. There are plenty of docs on getting it running with languages other than Japanese. If you're curious about how well it can work, check out this[2] demo (requires Chrome).
It seems like this demo is not using Julius, but it's mixing messages a bit. The bottom of the page says "Service provided by Google Inc.", but the link right next to it (for downloadable software, also apparently called "kiku"?) says Julius etc.
The OS X "version" is a nightmare. It's guaranteed to break with every major OS release. Nuance takes months to release working versions. When it does work, it's hostile to any other apps that use the accessibility hooks, such as Text Expander, Alfred, etc., which would be awesome with speech input.
The history of the Mac version (acquisition of a company that licensed the Dragon engine) means that it and the Windows versions are very likely permanently divergent. Given the relative market sizes, the Windows version has the best development, the best recognition, and the least schizophrenic product support.
I am glad that dictation (apparently powered by Nuance's engine anyway) is to be included in Mavericks, including a disconnected (i.e., non-Siri) mode. Maintaining an application with a skeleton crew and relying on system services that change at a fundamental level every couple years is not a path to customer satisfaction.
> I am glad that dictation (apparently powered by Nuance's engine anyway) is to be included in Mavericks
I'd missed that, very interesting. I need a disconnected mode as being able to only dictate short passages, and especially using an online system that doesn't learn from corrections, is a pain.
Where is it backed up that it's faster than the keyboard?
For the couple of minutes I watched of him demoing it... I type waaaay faster than that. In fact, I can't possibly imagine how I could speak faster than I can code on the keyboard.
(Regular English sentences are another story, but code is full of important punctuation, exact cursor positioning, single characters, etc...)
I mean, this is awesome for people with trouble typing (which was my own case a few months back), but I don't think it needs to be over-sold by being "better"...
I think this is a silly point of contention. If I recall correctly, it's established that for English-language prose, speech recognition is easily faster (300+ wpm) than typing (150-200 wpm if you're good; 20-50 wpm typical, IIRC).
All he needs to establish is that he can do things like type aVariableNameLikeThis in six words (16% overhead) instead of fifteen[0] (200% overhead) and the rest of the claim follows.
[0] If you tried to type it using the out-of-box dictation in, say, Android or Dragon, you'd probably start with something like "lowercase a backspace uppercase variable backspace uppercase name..."
Whenever I see posts about voice controlling your computer, I spontaneously think "thank the heavens I don't have to share an office with you." I realize some people work alone, at home or in a sound proof office, but every work environment I've worked in has had a shared acoustic space.
These voice control schemes almost always end up as a cool gimmick, and rarely as a productivity boosting solution.
Because you're thinking about it wrong. Together with HUD, it will be a godsend for anybody who needs to have hands free and yet work with a computer. And if the microphone is close enough your mouth, you won't have to talk loudly to it.
For example, I could go to tend garden and yet think about some problem, take notes, even code. Or check email, browse internet. I can work on hardware thing and have schematics or specifications appear in front of my eyes. I can have a walk and take notes. I can eat while working.
Eventually, no office will be required. You can just stroll in the park and get the work done.
None of those usecases seem like something I would find useful, and talking with my mouth full doesn't seem convenient, I'm guessing your recognition ratio would go way down.
While I've never been able to adapt to using voice to code, what I have done successfully is use Dragon to document my code. I set up some macros that could move forwards and backwards between methods in Eclipse, added a "start doc" macro...Eclipse does a lot of very smart completion so basic features in Dragon handled it without difficulty.
I have a relatively small working memory, and I've been coding since I was a little kid. Coding is like thinking out loud for me.
My default way to work is to bang some stuff into an editor and then constantly revise and reshape it. I'll draw diagrams on paper or white-board as necessary. I also tend to cut and paste "code notes" into a separate window so I don't have to keep that in my head.
I like it a lot. I wish there would be solution to tie this with say Google Glass, and be able to go on a walk or sit in the woods and code or make notes with it, hands free. Or while doing cooking or laundry, etc.
It's unfortunate he couldn't get the OSS speech recognition to work, though.
Yea, Google Glass would be ideal for DoucheScript Brogramming. Everyone could listen to you reindent your code while you held up the line at Starbucks.
Was just thinking of a way to be able to code on the subway. While it could annoy some, I'm often annoyed by stupid conversations on the subway. Can't close ears.
Just watched it and I find it awesome, not just for the voice recognition but as well as a nice spoken out video of VIM usage. I learned some of nice things that I will use now more regularly in VIM.
I disagree. You could argue that a musician probably thinks "I have to play a D# for one and a half beats" as well. Or they can draw a dotted quarter on the sheet. We have symbolic languages for a reason - they are, once learnt, superior. If anything code needs to move further away from spoken language, more in the direction of APL and its descendants.
A skilled musician likely doesn't engage the speech centres of their brain, they see a note on the sheet and translate it to motion. You should be able to take in the symbol for "apply a function to each item in a vector" at a glance without any clumsy English getting in the way. APL had it right, but coding has been crippled by catering to the lowest common denominator.
"they see a note on the sheet and translate it to motion"
Indeed. I think notes are more 'human' than most programming languages.
If the music goes up, the notes go up. If the notes are short they look short (and more dense).
But I agree that typing "let a be the subtring of b from 1 to the end" is no fun. So I'm glad we have symbolic languages. But I think they could be made more 'human'.
It isn't about English, but getting closer to the way programmers think. Most people don't think b.substring(1) natively any more than a musician would think "Da Capo al Coda". There are good parts of course; b[1:] is about as natural as ♩. for notation.
That was a fun talk to watch. Someone should try something similar using some kind of brainwave detecting glass gear to make it possible to code by simply thinking. That'd be awesome.
Brainwave tech doesn't really get that kind of bandwidth without implants (and even then, interpreting the signals usefully is decades out). The skull is an unfortunately effective faraday cage, and it makes it impossible to get appropriately high-resolution and low-latency data. Maybe we'll figure it out eventually, but we're not even close right now.
Who makes the best speech recognition software in the world? Regardless of whether it is available to consumers ... who is the best at it?
In particular, how do Apple (Siri) and Google (Google Now) compare to Nuance's stuff? Is Nuance so far ahead of everyone else that they're the clear leader? Or is their codebase "legacy" and vulnerable to better, more accurate software which can be built now due to better algorithms and approaches?
Wow! That would lead one to speculate that perhaps they haven't had the best of engineering teams focused on improving the product over the years! Which means there might be a huge opportunity here.
A word of warning -- I started dictating all of my email and Facebook replies on my Android using Google's voice keyboard on my Nexus One a few years ago in response to RSI pain in my hands from overusing my cell phone. Within a month, I started losing my voice.
RSI comes in multiple forms; using your voice exclusively is not going to fix the problem. The trick is to switch things up, which involves having alternatives in the first place.
Those vocal exercises singers do seem silly until you run into a problem such as this. They've been working on getting more mileage out of their larynxes for hundreds of years and have some pragmatic practices that can help.
Lots of water, avoiding nastiness in the air, learning the bare minimum volume of air you can push through your throat and still get results, and taking breaks when your body (either by feel or sound) tells you that it's tired.
In this specific case, adding leverage with short macros such as "laip" and "slap" is essential. There's no way you could work a full day spelling everything that wasn't in the recognizer's dictionary.
In the video he mentions that he wish he had known about the previous talk. Looked it up - http://pyvideo.org/video/1706/plover-thought-to-text-at-240-.... Pretty interesting. They are applying court reporter techniques to coding, cutting down on the keystrokes immensely.
If you could speak a bit softer with this, maybe throw in some noise-cancelling headphones, I could totally see this being useful even in an office situation.
I could see a potential pseudo-language developing out of this to abstract a lot of the individual characters, functions and common invocations used while coding.
Here's an open source Python script i wrote a few years ago that allows you to type with your voice. It's based off of CMU Sphinx. The accuracy is almost certainly not as good as Dragon, and it doesn't have a macro facility, so you cannot code as fast as typing. I haven't improved it much over the past few years because my hands got better and i don't need it anymore.
Hi, I'm the guy in the video. You might also be interested in a presentation I gave last Sept at Strangeloop with a much longer demo of coding in Clojure and Elisp: http://www.infoq.com/presentations/Programming-Voice
What's the next big leap for speech to text programming? A language designed specifically to be speakable, ie, all keywords and no symbols?
I mean, I'd like speech recognition to get more natural error correction, drawing more from the way we use inflection to give feedback about which syllables to correct. (I love how Google on mobile now gives visual indication of which syllables it heard clearly, and which it didn't. I just wish it would understand when I shout "No, X not Y" to replace just that one misheard word.)
It'd be interesting to hear about where voice is heading from someone who uses the technology far more.
There's a lot of potential for multimodal gamified programming using tablets. A combination of gesturing, shaking the tablet, face expression, hand drawing, myo sensing, as well speech, in addition to machine learning in the compiler and for regular expression building. Within the next year a whole raft of apps along these lines will be coming online in the app stores. Big opportunity for Indie developers on the app store, you can easily charge $20+ if they're good and disrupt the emacs/vi/eclipse monopoly/monotony.
This is a cool project, as I think a voice interface would be the ultimate in computing, something like in "2001, A Space Odyssey," or "Star Trek."
I remember first playing with voice recognition and voice command on a PPC Mac back in 1994.
That the technology hasn't progressed along the same lines as cell phones and processors is testament to how difficult voice recognition actually is when dealing with a wide variation of dialect within any given language.
I would love to be able to use my voice as my main input to my computers and other devices.
Interesting talk. Naturally it made me think about steps I should take to prevent any kind of RSI. Should I be seriously concerned if I type for about 4-5 hours on average per day? How can I prevent it?
Anecdotal: I was getting soreness in my finger joints, and about that time went to a presentation talking about repetitive motion causing arthritis for a lot of typists. It was pretty grisly. Padding in finger joints wears down, and little chips of bone start breaking off, causing pain from bone chips and realignment of fingers to fit the new bone faces. Padding restores with rest, so it helps a lot to catch it early.
I bought a couple nice mechanical keyboards with Cherry switches (red and brown). I type very lightly on them, seldom bottoming out the keys. Finger troubles went away.
I wonder if we should also be voice coding in a language drastically different then for example, C++? Maybe a language more syntactically friendly for voice?
One of the rules of Forth was that you had to provide a standard pronunciation with the documentation of all your words, so you could speak Forth code over the phone. That was important when words consist of any sequence of characters or punctuation, delimited by spaces.
Ironically, after seeing a physical therapist - which, let me tell you, you should do at the first sign of pain, because while they can't help some people I personally am batting 1.000 with PTs for RSI over my many-year career - my recovery is now so complete that I've totally fallen off the voice-computing path... for now. But I intend to keep going, not just because it is hilarious but because, well, RSI happens and it really pays to vary the routine sooner rather than later. There is nothing like trying to do a ton of emergency scripting on Python and emacs at the lowest possible point of your productivity.
The most important hint I have so far is: do not waste time with Mac OS. You need a PC running the Windows version of Dragon. The Mac version is pretty good for occasional email but lousy for emacs because it doesn't have the Python hook into the event loop that a saint hacked into the PC version years ago before leaving Dragon.
The speechcomputing.com forums are your friend.
Yeah, they say there is an open-source recognition engine that works okay, and time spent improving free recognition engines is time that really improves the world for all kinds of injured people, but here's the problem: when you need a speech system you really need it, and there are a lot of moving parts. Dragon, and Windows, and a super PC to run it on are super cheap compared to your time, especially when your time is in six-minute increments punctuated by pain.