Hacker News new | past | comments | ask | show | jobs | submit login
Hands-Free Coding: How I develop software using dictation and eye-tracking (joshwcomeau.com)
608 points by joshwcomeau on Oct 21, 2020 | hide | past | favorite | 137 comments

Appreciate all the discussion!

I touch on this in the article, but I should say: I'm an edge-case when it comes to CTS. Most cases resolve spontaneously. Part of it is that my nerve dislocates when I bend my elbow, and this mobility causes a bunch of additional friction / inflammation. This happens naturally to about 13% of the population, and mine is particularly pronounced.

All this to say, if you ever do start to get a burning or tingling in your elbows or wrists, or get numbness in the hands, it'll likely go away on its own, or with conservative treatment (a physical therapist is a great first step!).

And, for those who also fall into my unlucky percentile, hopefully it helps to know that there are tools and a community that exist to let you keep working without needing to use your hands at all!

Ah, that sounds like what I have! Subluxation of the ulnar nerve is what they told me it was - I could feel the nerve popping over my medial epicondyle. We tried steroid shots to reduce inflammation, but it came back after about nine months. It got so bad that I couldn't fully extend my arm or bring it in past ~45 degrees from extended. The inflammation showed up on an MRI. I had surgery to relocate the nerve to the inside of my elbow and have largely recovered from it, though it does flare up every so often.

The scare got me very interested in accessibility and I've looked into using Talon with eye tracking and stenography via Plover to improve my ergonomics, but I've never fully committed to any of them, but I'd love to know what communities exist. A Kinesis Advantage and a vertical mouse seem to keep it at bay on all but the longest of work days.

Does this sort of thing show up on an MRI? (nerve inflammation/mobility)

Mine did. I eventually suffered very limited mobility due to the pain until I got surgery to move the nerve out of the cubital tunnel.

What microphone are you using?

Not sure if this helps but I had a similar issue with my joints (arm, shoulder, wrist) where I would scream in pain when lifting a small book or anything above 3 lbs.

The issue was caused by my immune system acting up which would cause my tendons to become inflamed. After a year of misdiagnosis, Doctors found some heavy medication (methotrexate balanced with plaquenil) to help regulate it. One of the medicines side effects was listed as ‘death’ on the label.

After a year of experimenting, I found that major diet changes (lots more hearty greens, way less sugar and carbs, no caffeine or alcohol), improved sleep, reduced stress (quit my stressful job) completely alleviated my symptoms. I would still have flare ups from time to time which I reduced via physical therapy / exercise (to strengthen muscles supporting my tendons).

Just sharing as I had a somewhat similar condition and was surprised that the fix didn’t have to be a pill.

Drastically reducing “added” sugars (which includes alcohol), whether end-user added or added as part of the industrial food “manufacturing” process does wonders for one’s health.

Reducing sugar intake reduces inflammation (https://www.health.harvard.edu/heart-health/the-sweet-danger...).

Can confirm. Cut out almost all sugar from my diet. I've seen dramatic improvements in my health with the most significant being the elimination of brain fog.

I had pain and weakness in my right wrist for years, and also had many rounds of misdiagnosis.

It turned out to be a benign tumour. Surgery was my answer, but it's only a partial one. It's a soft tissue tumour, the surgeon didn't expect to get all of it, and it's growing back now. It's been about seven years, and I expect I'll need surgery again within seven more years to be able to do simple tasks like lift objects with that hand still.

Sometimes it's not diet - but it's like the old climate change joke. What if I switched to a better diet and a healthier lifestyle, for no good reason?

Would this happen to be a schwannoma? If so, this was the case for me as well. Took a long time and a lot of sleepless nights before an mri found it.

Hemangioma in my case. I had an MRI but in the end it was a surgeon with the right experience looking at an earlier ultrasound that diagnosed it.

I had trouble believing he'd diagnosed it when he glanced at the ultrasound and calmly said what it was. Given I'd had medical practitioners send me for neurological studies to check it wasn't all in my head, suggest fusing my wrist to stop the pain, tell me there was nothing visible in X-ray, MRI, or the same ultrasound report the surgeon diagnosed me from... It seemed unlikely anyone could do an apparently simple diagnosis like that.

I was misdiagnosed inflamed tendons and it turned out to be something else, hah

HN Reader take this message: medicine can be a to and fro, go back if you're not convinced and get a second opinion if you think somethings still wrong :)

It sounds like I was in the same boat as you but I had multiple issues with my left leg.

Saw multiple specialists and had been prescribed multiple different medications all with the same result of nothing but with new issues thanks to some side effects.

One day I decided I wanted to lose some weight and cut down my carb intake down to a very low amount per day. Shortly after making this change to my diet I noticed the problems that I was experiencing with my leg slow started to go away.

I wouldn't say the problems with my legs have completely gone away due to my diet change but it has improved my quality of life significantly.

Is that "Ankylosing Spondylitis" ? I have a friend that has that...if it's the same thing I'll pass on your advice, and thank you!

Are you me?

Maybe we're the HN equivalent of Fight Club?! Am I Brad Pitt or Edward Norton

Exploratory building of rich non-traditional (not necessarily handless) user interfaces is becoming increasingly accessible. For instance, here's a web demo of Google's MediaPipe's face and iris tracking[1]. And hand tracking[2] with a downward-facing camera, permits enriching keyboard and touchpad/tablets: annotating events with which finger was used, and where on a keycap it was pressed; hand and finger gestures in 2D and 3D; and positional touchless events. And speech to text... sigh.

But doing sensor fusion is hard. And strongly impacts system architecture. "Launch the missiles"... 1000 ms later... oh, nope, that was "Lunch is mussels in butter". "Spacebar keypress event"... 50 ms later... "btw, that was a thumb, at the 20% position". "Ok, so 2000 ms ago, just before the foo, there was a 3D gesture bar". So app state needs to easily roll backward and forward, because you won't fully know what happened now until seconds from now. Upside is traditional "have to wait a bit until we know whether mumble" latencies can be optimistically speculated away.

[1] https://viz.mediapipe.dev/demo/iris_tracking ("run" button is top right) [2] https://viz.mediapipe.dev/demo/hand_tracking

> I've heard that learning Vim can make this much more effective.

This doesn’t surprise me; Talon is a lot like Vim.

They’re both modal: Talon’s command mode roughly matches Vim’s normal mode, and its dictation mode roughly matches Vim’s insert mode.

Talon’s ordinals apply to Vim operations (normal mode) as well: operations can be preceded by a number which normally says how many times to do it. Talon’s “go left ninth” would be 9h or 9← in Vim. (h goes left one character.) “One zero third” is more cumbersome because in Vim it entails switching between modes (Talon and Vim modes don’t match precisely), but starting in normal mode, a1<Esc>3a0<Esc> works to append 1000 where the cursor is. (a to enter insert mode after the current position, 1 to type a 1, Escape to return to normal mode, 3 to say “do the next thing three times”, the next thing being appending a zero.)

Both Talon and Vim value this sort of consistency in being able to build bigger things from smaller pieces.

So I imagine Talon would work quite nicely with Vim, once you get talon to shift out of the way in some cases and let Vim handle those things. (It’s similar with browser extensions that let you interact with the browser Vim-style: when the page in consideration has extensive key bindings, especially Vim-like things like j/k for down/up, the extension actually gets in the way.) I expect you’d find that having used one made learning the other a good deal easier, since they’re working in just the same ways.

I developed much of the original Talon code using a single Python file (the combination of std.py + basic_keys.py) and Vim.

> The biggest issue I've found so far is voice strain

My singing teacher started out as a voice therapist. He is still working with a lot of people with damaged voices that are there for therapeutic reasons as well as professional opera singers that there to level-up their game.

He says, the best thing you can do for your voice is to sing.

Singing will give you the proper "support" from your lungs and train the vocal muscles without wearing them out. One of the key aspects here is producing high quality vowels, and don't let the sound be disturbed by consonants. This is opposite to how the voice is used when speaking (in most languages), where consonants are of primary importance for people to understand what you are saying. However, consonants are interruptions in the air-flow and can easily wear the voice out and cause stain.

Interesting! Maybe I'll look into a voice trainer.

I have suffered with thoracic outlet syndrome for almost 2 years. Countless physical therapy and deep tissue massages helped alleviate the pain but nothing seems to solve the root cause. I am 26 and the thought of not being able to use my hands to do something I have frightened the hell out of me.

That's when I stumbled upon John Sarno and his book "The mindbody prescription". I thought it's pseudoscience and I read it with very low expectations. And It's been almost four months since I have trouble using my computer.

In a nutshell, most of RSI injuries are not just physical problems and they are tightly coupled with your mind. The most important step is to acknowledge that subconscious thoughts related to pain and work along with it. Whenever I feel a tingling sensation, I yell at myself ( In my head ) that it's all in my mind and I have started to feel normal. Of course, this might not work for everyone but definitely worth a try.

And I also do lot of strength training and climbing which really helped strengthening my hands

100% think that all of us should know the technique that Dr. Sarno has discovered! I have had great success with neck, upper back/shoulder and lower back pain using the essence of what is going on here with pain.

The latest research I have had success with is based on Sarno's work and started with Dr. Schubiner's "Unlearn Your Pain" workbook and the "Unlearn Your Pain" podcast by Alan Gordon via Curable, plus all the Curable content has been GOLD for eliminating 90% of my pain so far. And I am at the tail end it it gets less and less every day (I can finally sleep through the night without having to move because of pain).

Life changing information here!!!

Some success stories using these techniques specifically on RSI and carpaltunnel are here:



And, the Seattle Repetitive Injury Support Team even states on their website > "we think we have finally found a cure for most RSI cases". http://www.satori.org/rist/

I once had a very severe case of insomnia I thought was caused by some medication.

It was such a difficult time for me, and after going through so many different solutions, I eventually discovered a book written by a Harvard doctor that had successfully stopped insomnia in hundreds of patients over several years.

Basically, he discovered that most insomnia is caused by belief. It becomes a vicious cycle that feeds into itself.

I didn't believe it at first. It seems so real, so profoundly difficult for me, but I was desperate and willing to try anything.

So I did. I followed the program precisely as laid out in the book. Made an Excel sheet and everything and gradually through cognitive behavioral therapy, I challenged the thoughts that I would never be able to successfully sleep again (along with careful measuring).

We tense up when we feel anxious. I can imagine that during times of great stress, the body reacts in certain ways that causes certain muscles to stay stuck or not heal, or even just have perceptional problems.

There's already a precedent for it with insomnia. It makes sense.

Thanks for the book recommendation.

Umm... book name would help.

Apologies. Here’s the book.

Say Good Night to Insomnia: The Six-Week, Drug-Free Program Developed At Harvard Medical School https://www.amazon.com/dp/0805089586/ref=cm_sw_r_cp_api_i_hS...

I think “say goodnight to insomnia” is most likely the one they mean.

I'm not so sure about this. My understanding is that RSI is caused by inflammation, and ignoring it will only make it worse, to the point of needing surgery. Although it sounds like you've been dealing with this longer, so I'll give you the benefit of the doubt

I see where you are coming from and I strongly suggest to try the conventional route to eliminate any serious physical problems before ending up here. My point is that RSI problems are not just physical but tightly coupled up with your mind. I wish someone can do a better job of explaining neuromuscular connection such that this won’t be pseudoscience anymore.

Unfortunately the conventional route can end up in unnecessary surgery and a hopeless cycle of disappointment. I would recommend trying this crazy-hard-to-believe (but is legit) route first, as there is no risk in learning about it and trying it. No drugs, no diet changes, no surgery etc. Just retraining the brain.

I recommend rsa25519 to give this a read, http://www.satori.org/rist/ (pasted below, keep in mind the cause is slightly out-of-date and not current thinking in this field, but the solution is the same)

> Welcome! The Seattle Repetitive Injury Support Team is a group of individuals who want to wipe out chronic Repetitive Strain Injuries (RSIs) such as tendinitis, thoracic outlet, and carpal tunnel. After many years of meeting to discuss treatment and coping approaches, share stories and ideas, listen to invited speakers and discuss good ergonomics, we are excited to announce that we think we have finally found a cure for most RSI cases! We know that sounds too good to be true, but please read on.

Over the years our members have tried just about everything to cure their RSI, yet we saw the same faces at meetings. People were finding coping mechanisms (massage therapy being the most recommended), but every month there we were, still in pain and not recovered -- and this went on for literally years.

Then, out of the blue, Nate McNamara contacted us about the approach he had used to cure his RSI problem. He graciously flew up to Seattle at his own expense to speak to our group. Here was someone who had been in the same situation as ourselves, and now was back to a normal, pain free life, even back working at a computer.

The essence of his approach was to recognize RSI as a form of TMS (Tension Myositis Syndrome). TMS, coined by Dr. John Sarno to describe common back pain, has a variety of possible physical symptoms. The theory is that the subconscious discovers that by reducing blood flow to various body parts it can cause pain and achieve its goal of suppressing unwanted emotions (through distraction). Sarno talks about suppressed anger, but stress and other related emotions can be a huge factor. Once you are in pain, the fear of becoming permanently disabled produces a vicious circle of fear and pain.

RSI probably starts with a case of physical overuse, which leads to pain. Normally the body would heal within about 48 hours, but the subconscious, by reducing blood flow to the areas involved prolongs this. The pain and tightness are real, but the cause is not physical but mental. It's important to stress that the sufferer is completely unaware that the subconscious is causing the problems.

As you may have guessed, the treatment is not physical, but involves foremost becoming convinced that you just have TMS and not a physical problem. This liberates you from your fear of hurting yourself further. Stopping all other treatment is also important to prove to yourself that you really believe in this solution. Various visualization exercises can help change your subconscious reactions (since you don't have direct control over these).

Amazingly, some people recover within a few hours, but most take a few weeks or months.

After Nate's talk, a number of us began to take this mind-body approach to our RSI problems. The current score to our knowledge is three cured and one doing much better! That may not sound like much, but to our knowledge, no other treatment has cured any of our members, although it is certainly possible they just never let us know.

We followed up on Nate's talk with videos and meetings on this topic, but it's now been more than a year and we feel that most of our members aren't taking these important steps to cure themselves, for whatever reasons.

Therefore we have decided to stop meeting, and instead focus on a mentoring program where those who are pursuing the TMS approach assist others directly who also want to follow this treatment philosophy.

If you are interested in seriously pursuing this approach, please contact us. To be honest, we aren't sure how good we can mentor, but we feel that it's the best way to get people cured.

To start this approach do as much research as you can to truly convince yourself that you have TMS. See the recommended background information below. The Amir book is highly recommended for it's visualization exercises.

All we have is years of personal experience to go on here, but we really believe we have found a solution to RSI. Our challenge is to figure out how to turn this into large numbers of cases cured or prevented. If you can help in any way please contact us!

As a developer who also has cubital tunnel syndrome, having keyboards that places commonly use modifier keys [0] on your thumbs, such as Kinesis advantage or Ergodox, helps reducing the fatigue and symptoms immensely.

[0] it is called Emacs pinkies for a reason, and yes, my CTS is totally emacs’s fault as well.

It's pretty telling that RMS had to stop coding because CTS.

Edit: Not carpal tunnel syndrome, as he explains in https://stallman.org/stallman-computing.html

"""In the mid 90s I had bad hand pain, so bad that most of the day I could only type with one finger. The FSF hired typists for me part of the day, and part of the day I tolerated the pain. After a few years I found out that this was due to the hard keys of my keyboard. I switched to a keyboard with lighter key pressure and the problem mostly went away.

My problem was not carpal tunnel syndrome: I avoid that by keeping my wrists pretty straight as I type. There are several kinds of hand injuries that can be caused by repetitive stress; don't assume you have the one you heard of."""

  > The FSF hired typists for me part of the day...
That must have been a job from hell, probably went through a lot of them!

Manual transcription is still a thing today. Stenographers being one of the more well-known examples.

Stenotype keyboards are way better from a RSI point of view, though. Unfortunately, no one has come up with the "theories" or systems that would allow use of them for specialized input like programming.

I've seen a few conference presentations about using steno for programming. It does seem to involve building your own library of commands, but people seem to have done it and use it daily.

Had pain a few years ago, switched to Ergo 4000's everywhere and have never had any pain since - depending on desk height I'll use the front riser but generally don't need it.

I love them so much I have spare new-in-box ones stashed as a hedge for the day they stop selling them.

They are about the cheapest good quality ergo keyboard I could find.

Not to flame-bait, but this is actually one of the reasons I've doubled down on vim. I like that I can function perfectly fine with one finger, I even managed to write a bunch of code using vim on my phone, and was surprised how well it translated.

You could definitely set up toggled modifier keys and do the same with emacs, FWIW.

ye, modifier keys suck. This is why I prefer snake-case over the other casings because it needs no modifier key.

What languages do you use?

Also along those lines: what keyboard layout? I've never seen a (QWERTY) layout where underscore isn't a shift-level key.

The key issue (pun intended) with normal keyboard is that you use your weakest fingers for ALL of your modifier keys. By shifting that responsibility to your thumbs you greatly reduce your little fingers’ workload (and hence chance of CTS)

Switching to a planck keyboard would let you customize all the positions of buttons for less strain.

The planck keyboard is a great keyboard, but possibly on par with the worst in terms of ergonomics.

Ergodox, Diverge, Keyboard.io, really any split key keyboard does worlds for keeping hands straight and are fully programmable to very high degrees.

Am I the only one who just keeps their hands/wrists straight on a regular unsplit keyboard? I've always naturally done this and never had any problems with strain.

The best way to describe it is that the home keys aren't ASDF, but more like WEFV with my forearm angled so it's straight from elbow to fingertip. This exaggerates the effect but demonstrates it clearly; the real home positions are more like intermediate points between keys.

Yeah, same. I find keeping my fingers on the home keys makes me twist my wrists outwards, which gets uncomfortable pretty fast. I mostly try to minimise wrist action in general.

I've actually had a Planck EZ for a few weeks now and I agree it's probably not very ergonomic for the classic qwerty touch typist, or at least not much better than a normal keyboard barring the programmability.

But since I don't do that anyway I find the keyboard to be pretty nice in terms of customisability and avoiding stretching.

In general I feel my hands are used most naturally in close proximity to each other (at roughly abdomen height) so I'm drawn to small keyboards with lots of modifier keys. A spherical keyboard would be pretty interesting to try out.

On my keyboard - which is.. I don't want to say 'full size', because maybe it's '80%' or whatever, but it's a regular external keyboard with numpad - f & j are centre to centre 60mm apart.

If I outstretch my arms, straight, my index fingers are pretty much in line with my shoulders.

So no, when I use my regular unsplit keyboard my hands/wrists are not straight, because my shoulders are not 60mm apart..? Am I missing something?

I believe GP was talking about elbow-hand being completely straight. That would make sense if their elbows are on arm rests or some such thing.

It does sound interesting. I did type this comment trying it, and it was pretty comfortable, with the exception of my right wrist. That being said, I've got kind of weird seating set up.

Oh.. then I'm not sure what we're doing differently, or what the difference is in hardware, but my arms/hands are straight from elbow to fingertip when I rest on 'home row' 'adsf'/'jkl;' - if I change to 'wefv' it's a significant contortion, if that's what it looks/feels like for some people using 'asdf' then I understand the problem and am not surprised they don't put up with it!

If I stick my elbow way out I can make 'wefv' as natural to me as 'asdf'. So I suppose either my elbows are more tucked in, or my keyboard's wider ('dfgh' for example contorts the other way, and analgously to 'wefv', 'rthn', or better but less analogously 'erth', is more comfortable there).

Yes, your last paragraph is my natural position, with the elbows farther out. 'wefv' exaggerates the effect to demonstrate; the real position is about halfway between 'wefv' and 'sdfv'.

I have a planck keyboard, but I think it's small size isn't fantastic for hand position. I think I may eventually purchase an ergodox ez to get better hand positions.

The ErgoDox already allows arbitrary repositioning of the keys.

The Planck keyboard is cramped. It's worse than a cheap Microsoft ergonomic keyboard.

I’ve been having similar issues lately, and after looking at talon, serenade, and caster, I ended up using caster [0]. The different programs all have significant differences in usability, and have clever ideas behind them, but unfortunately automatic speech recognition is still bad enough that the primary factor is which has a better ASR engine. Caster supports Dragon Naturally Speaking, which is expensive, but enough better to make it worthwhile.

There are moments where I think this is going to be the future of programming, since code as text is only the easiest for as long as typing is the easiest way to record things unambiguously.

But for the most part it’s still pretty frustrating. If ASR systems can get the sentence error rate down by an order of magnitude or two, I sincerely think this will take off not just for accessibility, but for normal use.

Until then, it’s a PITA that is saving my career nonetheless.

[0] https://caster.readthedocs.io/en/latest/

Talon actually can work with Dragon; it started out requiring Dragon but has since grown its own voice recognition. Anecdotally, a lot of folks on the talon slack seem to be refugees from Dragon-based solutions, and a common observation is that Talon's voice recognition is much lower-latency than Dragon. Unfortunately I can't confirm this as I haven't used Dragon myself.

My impression of talon's speech recognition is that it's good enough for voice coding, but could be improved when it comes to dictating English prose. That said, there are promising avenues of improvement. If you pay for the Talon beta, there's a more advanced voice recognition engine available that's much better at English, and you can also hook into Chrome's voice recognition for dictation.

> Talon actually can work with Dragon

More specifically, "can work" means: if you run Talon on Windows or Mac alongside Dragon, and don't configure Talon to use its own engine, Talon will automatically use Dragon as the recognition engine.

> or Mac alongside Dragon

Unfortunately it looks like Dragon on Mac isn't a thing anymore, which is a shame because my main dev machine is a Mac. Or at least it used to be, before I needed to start dictating everything.

Talon can also use Dragon remotely from a Windows machine or VM, kind of like aenea but more tightly integrated (the host is driving instead of the guest).

And I do have some users voluntarily switching from Mac Dragon to the improved speech engine in the Talon beta. Mac Dragon has been kind of buggy for the last few years so you're not missing much.

Any chance you have pointers on how to set that up? You'd probably laugh/cry to see my setup right now, with my Windows desktop on the left monitors and my MacBook on the right monitors because I need both... purely because Dragon is only sold on Windows since this started been an issue for me. A more tightly coupled super-aenea sounds pretty fantastic.

Sure: First run Talon on both sides. Then go to your talon home directory (click talon icon in tray -> scripting -> open ~/talon). There's a draconity.toml file there.

On the Dragon side, you need to uncomment and update the `[[socket]]` section to listen on an accessible IP.

On the client side (Mac in your case), uncomment / update the `[[remote]]` section to point at your other machine.

You also need to make sure both configs have the same secret value.

From there, restart Dragon and look in the Talon log on the Mac side for a line like "activating speech engine: Dragon".

To prevent command conflicts, I recommend setting Dragon to "command mode" (if you have DPI), and only adding scripts to Talon on the Mac side.

If it doesn't work, you can uncomment the `logfile` line in draconity.toml on the Dragon side, restart Dragon, and look in the log to see what's going on.

Do you know of any workflows like this using entirely open source software?

EDIT: Seems like caster itself has instructions for an open source recognition engine to pair with it. Not sure how accurate it'll be but I'm going to give it a shot!

I hate the idea of relying on closed source software to be able to continue in my profession. If this works I'm definitely going to be donating to the FOSS options

Yeah there is the Kaldi backend for caster, which I've tried on my Mac machine, since Dragon isn't a thing on Mac. Unfortunately it's not nearly as good :-(

I'd like to try to record my usage of Dragon that I could fine-tune my own model, but it's harder to get around to hobby coding projects like that now that coding is more of a pain in the ass.

The unfortunate reality in my case is that using Dragon is the least frustrating way to keep working. I don't think closed paid ASR models will stop being noticeably better better than three open ones until the state-of-the-art error rate is basically zero.

I do like that caster is open source where talon isn't, but preferences like that are pretty low priority when say your career is on the line.

> I do like that caster is open source where talon isn't, but preferences like that are pretty low priority when say your career is on the line.

For sure if its on the line right now you use what is available and works best right now, but the reality of niche software is the companies/people backing it tend to close down or move on as they realize the market is too small to sustain them. Or Microsoft updates its OS in a way that forces them to expend considerable time making it functional again and they just don't have the bandwidth for it so you're stuck for some amount of time or indefinitely.

If you have no other choice, its definitely better than nothing and you want to use what enables you to do your job the best. But in terms of where I'd be willing to throw my financial support? It'd need to be something for the better of the coding community as whole. For me, this means an open source tool or set of tools, not a proprietary system.

Fwiw talon has a setting to record everything locally, annotated with the recognized words. In the next release it will also work when using dragon.

My understanding is that Talon is built off of wav2letter's inference framework for ASR.

I do not use the wav2letter@anywhere inference frontend - I trained the acoustic model using the facebook upstream code, but the decoder is almost entirely new, and on Windows I use Pytorch for inference.

Talon ships with a libw2l.so/dylib on Linux/Mac built from my open source repos.

Feels like wav2letter will not be actively developed anymore. Understandable since it is hard to compete with Pytorch with a custom NN toolkit. Any plans to move to Pytorch/Tensorflow?

I don’t think that’s right at all. They moved development to the Flashlight repo. It seems very actively developed to me. Last commit 3 hours ago. wav2vec 2.0 blog post also went up last month (September) and the current state of the art iirc is a Google model based on Facebook’s Wav2vec 2.0 work.

For my own use I’ve already built Pytorch and CoreML frontends, with a shared model format (can convert models to/from wav2letter format and my custom format), and I have the ability to create new models in these frameworks from wav2letter architecture files.

I still run my training in the wav2letter framework, but for compatible training in Pytorch I would mostly just need criterion implementations. I assume warpCTC is fine for the CTC models. There’s also a third party Pytorch ASG criterion package but I haven’t tried it yet.

Interesting. What model are you using for your acoustic - the streaming convnet?

I didn't know that there was a pytorch implementation of the w2l architectures..

When I started my undergraduate degree, there was a PhD student everyone knew -- he looked like he ran the Linux User Group, which he did, he coded his own window manager, wrote games in Haskell etc.

At some point, he got some hand injury that meant he couldn't type normally for months. He would only have been 21 or 22.

This scared me, so since then I've followed the standard advice on avoiding computer-related injuries very carefully. I use a desktop computer with an external, adjustable monitor. I use ergonomic keyboards, and use a mouse with either hand. I learned to touch-type with the Dvorak layout, I have an adjustable desk, etc.

I haven't had any problems, and I don't know which (if any) of these actions helped, but I'm surprised when I see other developers apparently happy to hunch over a laptop keyboard for 8 hours, while sitting on an awkward chair in a coffee shop. Why do this to yourself?

A few years back I started buying disability insurance, in addition to paying more attention to my ergonomics.

My greatest economic asset is the years left in my career. For ~$1500/year I insure that. It's not cheap, but it would pay out until I was 65 at some reasonable percentage of my normal salary in the case I was severely injured, or otherwise unable to work. Anywhere from an ergonomics caused hand/arm injury, a bike-riding caused head injury, a wood-shop caused eye damage, etc.

Consider grabbing it while you're young and working in a lucrative career.

Not sure how old you are but I imagine over the next couple of decades we’ll eventually get more advanced computer interfaces.

Gestures, voice, eye tracking, ...

Being a software developer could be a job for those who physically can’t do other jobs.

Subtle input gestures with Soli would be interesting.


Assuming physical injury (like RSI or trauma), yes, it's getting easier and easier.

But there are many other ways you could be injured and unable to work. My straw-man injury when I bought the policy was: "I'm a new skier, which is famous for major head trauma". Similarly, cancer, which directly or via treatment, causes exhaustion is another example of disability that is not solved with "if only I could just input text easier".

I'm glad there's so much work in accessibility going on, and that hardware, machine learning, and social awareness has gotten widespread enough to let non-huge companies dive in and make real progress. It's important to open the field to as many people who we can.

I just want to be sure people don't rely on technology being advanced enough at the exact time you may need it. You can easily get unlucky, and insurance exists to pay somebody else to take the risk.

Yes, that makes sense. However, the older I get the more I think society should provide some sort of safety net for the unlucky.

It’s inefficient for everyone to do this and many cannot.

The premiums could jump 50% in a decade, for example. I’ve known situations somewhat like yours where premiums went up a lot.

I have a similar story. When I was in college I got a repetitive stress injury from practicing piano too much. I ended up becoming a programmer but I have followed ergonomic practices religiously in the 40 years since then. My kids make affectionate fun of my perfect posture. But I have worked with the computer keyboard for going on five decades and have never come close to suffering a repetitive stress injury.

I think ..... most people have no idea what they actually look like when hunched over a keyboard. It would do wonders to have a large mirror next to our workspace so one could see just how preposterous they're contorting their body.

I had a bit of a scare around the same time. I was following dual passions of computer science and piano playing, meaning around 12 hours a day of typing and/or piano-ing. I started to develop regular tendinitis in my wrists and fell down a rabbit hole of ergonomics and posture. With computers you can replace keyboards if you have money, but I was a musician in college. Also, you can't really change a piano keyboard all that much (and you really wouldn't want to) so I did a real deep dive into posture and getting the most out of what I already had

Especially the macbook pro keyboards. Those really hurt my hands.

I don't know why, but I've often thought about the situation the Josh finds himself in. IE - having to find a different way of building software if I couldn't use a keyboard and mouse due to medical condition.

The solution I always imagine is paying someone or having my employer pay someone to strong style pair program with me. Perhaps a student, junior developer or even someone unfamiliar with software development entirely.

For those unfamiliar with strong style, this rule sums it up: "For an idea to go from your head into the computer it MUST go through someone else's hands". Like the standard driver / navigator pair programming technique but with the navigator never touching the keyboard.

In the case of someone completely unfamiliar with software development I imagine that there would initially be a dramatic high / low skill gradient between us, with the person essentially transcribing. However given the intensity of the practise I think this gradient would level out quite quickly.

> The solution I always imagine is paying someone or having my employer pay someone to strong style pair program with me

Wow, that's some impressive out-of-the-box thinking!

The OP's situation is something I worry about, as I'm worried my neuropathy will reach a point where I can't use my arms to type/mouse all day. I spent some time researching speech input for coding a year ago, and found a few fledgling solutions, but I'd need to expend a lot of time to get something remotely workable.

Your idea is, if I may say so, genius, and having that as a backup sets my mind greatly at ease, so thanks.

I'm sorry to hear about your neuropathy, I'm glad my suggestion eases your mind!

On pairing in general: For a long period of time when I worked at bigger companies (before co-founding my current venture) I pair programmed 90% about of the time.

For the most part I really enjoyed it, we alternated pairs quite frequently and I worked with people at a range of seniorities. I really enjoyed alternating between playing the role of teacher vs learning from someone much better at the (insert programming language, database, business domain).

I think during that time I also had some of the most stimulating and deep conversations about software development. When pairing, pairs have a deep shared context that's hard to replicate in other scenarios.

There are some drawbacks though, the main one being it's extremely full on and emotionally tiring. I'd class myself as a high functioning introvert and after a day of pairing I needed some serious regenerative quiet time!

My advice to someone considering this suggestion would be to try some pairing now with friends or colleagues and get a feel for it and see if it works for your personality and working style.

> I'd class myself as a high functioning introvert

Hah, I'm stealing that phrase to describe myself too!

I've never "formally" done pair programming, but have sat and coded/reviewed with colleagues for short stints, and generally found it OK.

From time to time I have to do full days (or several days) of workshops, and find those really tiring and emotionally draining. I always think I fare better 1-1 with people though, so if pairing was a necessity, I'm sure I could adapt.

The author links to this other project, which I’ve never heard of:


He also references Tavis Rudd’s viral voice coding video, which is already 7 years old.


The future is arriving slower than i would have guessed. I thought we’d be developing on the iPad using voice, gestures, and eye tracking by now.

Can I a least get “build and run” with my voice soon?

I'm very surprised none of the videos on the serenade landing page have audio.

I would love to see/hear what it's like. I'd suspect adding audio to those videos would be a win on conversions.

Thanks for the suggestion! We recently revamped our homepage, and we're in the process of recording some new videos. Will keep that in mind!

In the meantime, here's a video that a developer using Serenade recently made about their first experience! https://www.youtube.com/watch?v=Pc-EbY1fRWk

I love the eye tracking. Even without having any issues with my hands, I'd like to have quick mouse movements without having to take a hand off of the key board. Zooming in to have more precision is clever.

It seems the current generation systems aren't working for Linux yet, but that gives me hope to have a way to work with it at some point in the future.

Keeping the hands on the keyboard would really increase productivity and reduce repetitive motion injuries. I know that editors like vim and emacs let you keep your hands on the keyboard, but interacting with anything else requires the mouse/touchpad.

At least 10 years ago, I imagined that the mouse could be replaced with something like a brainwave reader. You can train your brain to emit simple patterns that can be picked up either with a headband or remotely. It would take individual training, but with a few basic patterns, one should be able to do mouse movement and clicks. The eye tracking seems like the next best thing, especially since it is already implemented.

> It seems the current generation systems aren't working for Linux yet

This isn’t correct. Talon supports eye tracking on all of Windows, Linux, and Mac with the same hardware.

I think parent is talking about the Tobii. I was also very very excited about it (I can’t find a mouse or trackball that won’t give me hand or shoulder or wrist pain) but then I saw it only works on windows... and nothing else. No Linux support. So that’s a no go.

If someone knows of such a device for Linux, I’d love to hear about it!

Let me be more specific. The Tobii 4c, PCEye Mini, and Tobii 5* work approximately the same on Linux, Mac, and Windows with Talon out of the box.

* The Tobii 5 does work, but is a work in progress and has some caveats until I do more work on it.

Would you mind elaborating on this? I'm ready to buy a Tobii but would like to know what the model 5 caveats are and how long you think it'll be WIP. And if you think the improved specs on the 5 make it worth waiting for. Thanks!

The caveats make it short term worse than the 4C for some mouse modes, long term it should be the best option overall once I fix some things.

Oh, that's great news. I thought Talon needed the drivers to work and was constrained in that regard. I will need to free up some time and try it out.

I personally have had carpal tunnel release surgery and bilateral rib resection for thoracic outlet syndrome. These surgeries fixed most of my issues but for the last couple of years I haven't tolerated sitting well because of my back so I use a standing desk. I've thought about a computer interface that is the opposite of this article. Something that requires more non-repetitive physical movement so that you're exercising while you program. Sure it would take longer to type but we all know working on a computer all day is bad for our health.

Sounds like an interesting idea for a VR/AR game. I'm picturing somewhere between fruit ninja and a vana white simulator

Thanks for this! Sorry it had to come about as a result of injury/disease. Hopefully you're not going to hurt your vocal tract -- are there precautions you take against this?

My interest currently is in voice-assistants and developing something like a personal assistant / code-monkey / grad-student.

I did have a stroke a few years ago, from which I've recovered, but it made me realize that almost all of my work, hobbies, aspirations, are tied to fairly extensive hand-eye integration. It's good to start building a Plan B before I need it.

On a(nother?) side-note, my 88 year old dad still codes and his experiences have really shown me how bad we are at accessibility, as a developer community.

One final thing -- watching you work in this way highlights for me how low-level a lot of the stuff we do as developer is. There's a certain amount of just text massaging that seems irrelevant to what we're actually trying to get done (although it can be fun/soothing/aesthetic).

EDIT> The last point is why I can't give a fark about Vim/Emacs/whatever debates and obsessing. It seems like it might be fertile ground, but IMNSHO it's a trap.

I have had an issue with dictation software... I want a dictation software to read me what it typed on screen. Right now the software expects you to check what it wrote and you have to necessarily keep looking at the screen.

Call me lazy but I wish for a method where I could lay back, speak a sentence, wait for transcribing, listen to what was written and ahead I go.

when I was writing my last novel draft using nuance dragon 15, it had the 'read that' command which would read the last phrase I inputted, but that wasnt enough since I had a wireless headset that let me wander the house talking to the computer, but still needed to see the screen to know if it got the dictation correctly.

I ended up fishing out a cheap $120 portable walmart windows laptop 2 in 1 and VNCing it into the desktop so i could see the screen, carrying that around with me worked.

my point exactly. all dictation softwares expect you to keep looking at the screen. your "I had a wireless headset that let me wander the house talking to the computer" is what i want a dictation software to do. sadly like you i am left unsatisfied

The eye tracking sounds like it could be incredibly useful as a 3rd input.

Very cool stuff, thanks for sharing

In addition to a pointer replacement, there is an idea of using eye tracking on a virtual keyboard with a swype type entry.

I dunno... "swyping" your eyes around a virtual keyboard seems like it'd cause a lot of strain on your eyes.

There are alternative virtual keyboard styles (typically made for use with a mouse and impaired motor control) which would work better. Fewer "fine" movements.

Combined with predictive typing, and you could go pretty far.

Do you have any examples?

Not the OP, but Dasher is a classic example that works very well for a single 2D input like using a joystick: https://youtu.be/0d6yIquOKQ0

I don't know about that, but I'd love to have eye tracking for focusing windows. Since I don't generally maximize windows, I find that I'm often typing hotkeys for the window I'm looking at, not the window my mouse is on (or was last focused, depending on your wm)

That's a great idea! This would really push up input speed. I'm surprised I haven't seen any implementations in the wild yet, although research has been done to explore this [0]

[0] https://pdfs.semanticscholar.org/1170/44feb247a82ab81a30b57f...

Windows 10's Eye Control has something like this, called "Shape Writing". See the second GIF here: https://www.theverge.com/2017/8/2/16087368/microsoft-eye-con...

Long before that, Optikey has had "multi-key selection" which works in the same way, see https://youtu.be/HLkyORh7vKk?t=10

In my experience (as a developer of eye gaze interfaces, not an everyday end user) it is often more efficient to have really good next-word-prediction (such as with the Presage engine) combined with single-letter dwells, rather than using "swipe-like" spelling, where you're committed to tracing out the whole word.

The eye tracking looks great, just as a regular additional way of using the mouse for things like focusing windows or tapping dialogs. I'm very tempted to get one of those.

Does anyone with experience of them know what the multi-monitor support is like? I have 3 screens side be side and one below, so quite a large range to cover that requires some head movement to go across all of them. Will it work with viewing over that wide of a space where I'm not always directly facing the primary screen?

If you're using an eye-tracking mouse, you might enjoy Precision Gaze Mouse. Its a much faster way to click because you can skip the zoom in step. It uses eye tracking for large movements, and head tracking for small precision movements. I'm also a developer with an injury and I made it help myself, but several other people are using it now too.


I've been wanting something like Talon for years. Easily customized voice recognition for commands (you hear me say 'blah', you type/start a program/click a button labelled 'blah') is something I've not had since I was very young and tried to trick out LCARS x32 using only Windows XPs speech recognition software using the crappy stick mics that lived on top of every single beige CRT monitor at school.

I also bookmarked https://serenade.ai/ after seeing it here on HN. 10 years ago I shattered my hand and collarbone and used Dragon which sort of worked but was not specific to coding. These new tools look really neat.

Redox keyboard is amazing, never had any pain since starting to use one about 18 months ago.

Tavis Rudd, "Using Python to Code by Voice", 2013: https://www.youtube.com/watch?v=8SkdfdXWYaI . Old but fun.

Since I don't have any problems writing, I would be interested if there was some sort of system that was multi-modal in its working, or that maybe I could turn on dictation based coding with some specific rules for what I am doing at the moment.

For example if I am reading some code and I see

const updatedSortFields = Object.values(options).map(({ value: id, label }) => ({ id, label }));

If I could then say "format line 130 multilines" and it would turn it into something nicely indented over multiple lines.

Side comment, but relevant for anyone with wrist/arm RSI reading this. I published a tactical guide that includes examples, references, and tactics to manage this injury.


The one major problem with these solutions is how expensive it all is. If you don't find out about these programs BEFORE your finances get destroyed by health issues (both in cost of lost work time and in the medical care; most insurance will only cover SOME of the medical bills), you'll never be able to afford the hardware and software to work around them.

> In English, an ordinal number is one used to describe order, like "fifth" or "ninth" or "three hundredth". In Talon, they're used to repeat commands. If I wanted to go left by 9 spaces, I would say go left ninth.

Right! Like 9l in vim, but swapped.

I've always thought vim-like interfaces would be a good way to structure vocal input and I'm glad there are solutions operating in that space.

Edit: beaten to the punch :)

An alternative to eye tracking, that uses your webcam (move your head to move the mouse, smile to click):


Demo: https://www.youtube.com/watch?v=fg6Q3r2p_yE

This might be able to be a bit faster if you use a virtual QWERTY keyboard displayed with eye tracking acting as a cursor, with select being a consonant, preferably /d/. You can map certain phonemes for other shortcuts, including one for click/right-click/vim-commands.

Articulating words takes too long, even subvocalization is, which has the advantage of only using non-voiced phonemes.

This is really, really interesting -- I've often wondered about how the accessibility space is meeting the needs of disabled programmers, especially after I briefly lost my vision due to an eye injury.

The eye-tracking setup in particular looks really nice. Even if I stuck to a regular keyboard flow I feel like that would be incredibly useful in reducing strain.

The demo is incredible and heartening. I did not know the state of the art had improved so much. Thank you for sharing this, Josh.

as a dev with severe hand pain I would love more tools and help like this

Is there any way to write code on iPhone? Would a similar approach work or there is something much better?

> Honestly, it's just been such a relief to discover that my hands aren't needed for me to do my work.

This describes my reaction exactly. I'm noticing my rsi getting better, which I'm grateful for, but the gift of not being so scared feels like the bigger deal.

Has anyone with joint pain tried accupuncture? Friend swears by it when she went to Tokyo, Japan to get it done a couple years back, 2018 if I recall correctly.

She recovered without taking any medication!

Edit: updated typo and extra info about not taking any medication.

For RSI folks, my experience is that RSI can “move” to other parts of the body pretty easily, including eyes and vocal chords. Somehow doing “work” with a part of the body can strain it, even when talking all day long socially wouldn’t.

I wonder if the author has considered integrating his voice interface with an AI resource like GPT-3. I could see future development being done in this fashion.

hands free and perl out loud here https://www.youtube.com/watch?v=Mz3JeYfBTcY

Seems like it would be a useful feature in voice recognition to have a "whisper" mode to prevent strain.

I once had laryngitis for about a week, and during this time found out that for most people whispering puts more strain on the vocal cords than regular speaking: https://www.nytimes.com/2011/02/08/health/08really.html

But a whisper mode might prevent ear strain for those working around you :)

Hmm, I take it back then. Maybe a sub-vocalization mode then...

There are some high tech breakthrough in this space. You can have microphones that hear the faintest of internal voices and even non-invasive ways to hook on the nerves to get your internal monologue.


Reminds me of this story I encountered the other day. I have been on the same track, successfully too, for other types of pain (neck, shoulder and back pain). I post this here because this is an evidence based technique and think people should at least know that this is a possibility.


"Hi all,

I have been battling computer related RSI that runs from elbows to fingers for 10 years. It started 2 weeks into my first software programming job out of college, and has been a constant disability since. I went through ergonomics, biofeedback, drugs, acupuncture, physical therapy, 5 jobs, Carpal tunnel surgery, a year of rest, voice Software, foot pedals, hands free mice, you name it.

I finally gave up software and changed careers 2 years ago to a more Sales focused job and still couldn't keep up with the typing. The only thing that has worked is having a full time computer assistant who sits with me and does what I say paired with a tablet PC that does handwriting recognition. Its dreadfully embarrassing in the workplace, slow, frustrating and just plain no fun. I got a rhythm though, got back on my feet and all went well for the last 2 years until a few months ago my throat began hurting (i talk all day now) and I was diagnosed with acid reflux.

With no arms and no voice, my future looks bleak and it has already been bad for almost a decade now.

I have officially given up on western medicine (my parents are doctors) and begun to explore alternatives. I stumbled on to Sarno and this newsgroup on Monday, read Sarno's book today. Does it really work? I mean, I have had non-stop, permanent, 24/7, life altering pain for 10 years. It has been my constant nemesis, battle, and focus.

It seems too good to be true.

I definitely have the personality traits, the onset of pain coincides with a rough first job out of college and my history of back spasms, shin splints and now reflux seems to fit the symptom imperative. I want to believe but can someone recover after this long of continuous computer pain?


source: http://www.tmshelp.com/forum/topic.asp?TOPIC_ID=5187


12/12/2008 (8 weeks later)


I had 10 years of 24 hour a day, 7 days a week chronic pain in my fingers, wrists, forearms, arms, elbows from typing, mousing, and using a computer up until 12/12/2008 as a computer science student and as a software engineer. I believed I would live with it the rest of my life. It took me 8 weeks to fully recover.

I am an interesting case because I don't have any major trauma's in my childhood or daily life that led to TMS but just a generally stressful family and a high achieving, perfectionist attitude.

I grew up with a solid family unit, a very close identical twin brother, I was good at sports, school, had high self esteem. I went to college and was the captain of the track team. I found computer science and loved it. Things were on the up and up.

My parents got divorced when I was in college but it didn't seem to bother me and transitioning to the working world. I was in a cubicle typing away and I didn't like it but I "sucked it up" and did my job the best I could. I started getting pain while typing/mousing and nothing ergonomic, or medical seemed to help. I plowed through the pain and kept working. I tried everything for 10 years. It was awful, bad, horrific and depressing all wrapped up in one. I finally gave up and left computers behind to be a project manager, and then a headhunter. But the pain was permanent and it still followed me no matter what job I did and how little I used a computer. My quality of life was pretty low. I found TMS 6-7 weeks ago and the recovery began.

I wrote an entry the day I found the TMS diagnosis and checked in along the way so the best way to learn my story is to read it below:


I think I have a personality conducive to this type of pain defense mechanism. I am high-achieving, I have an intense and high achieving family, but I have a sensitivity to what others are feeling/thinking that governs much of my decision making.

I think the key is reading enough information (posts, books, etc) that you are 90% sure that this is possible and could be your diagnosis and then get ready to do some serious crying. Its a personal journey that ends really well and it starts with writing down or talking about all the crap that has ever happened in your life. I started with when my twin brother got surgery when we were 4 years old - my earliest memory - and ended with my boss telling me that the phone system I installed sucks. And it had 1500 items in between. I feel flushed out now and clean and ready to tackle the world. This newsgroup has been helpful and the process is hard but I honestly believed I would have to live with this my whole life, and now I don't. My phone number is at the end of my thread listed above. Call me, I would be happy to tell you about it live.

- dan"

source: http://www.tmshelp.com/forum/topic.asp?TOPIC_ID=5301

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact