
Ask HN: Is there any work being done in speech-to-code with deep learning? - raidicy
 Is there any work being done on speech to code in a deep learning area . I have severe RSI which prevents me from coding at all . I have tried to use speech recognition software such as vocola and windows speech engine . but it required me to speak in such a way that I always would hurt my throat . I have also injured my throat multiple times so I am searching for a solution that is more conversational then command  driven . I have written over 10000 lines of command Fargo Cola and they&#x27;re still too many edge cases which require me to continually speak in an Abrupt manner that causes  strain on my throat .
======
daanzu
Windows Speech Recognition is far from the best, so perhaps your trouble could
be partly caused by how you had to speak in order to be understood, rather
than the command style? I used to use WSR to code by voice, and it was far
more laborious than my current setup.

I develop kaldi-active-grammar [0]. The Kaldi engine is state of the art for
command and control. Although I don't have the data and resources for training
a model like Microsoft/Nuance/Google, being an open rather than closed system
allows me to train models that are far more personalized than the large
commercial/generic ones you are used to. For example, see the video of me
using it [1], where I can speak in a relaxed manner without having to over
enunciate and strain my voice.

Gathering the data for such training does take some time, but the results can
be huge [2]. Performing the actual training is currently complicated; I am
working on making it portable and more turnkey, but it's not ready yet.
However, I am running test training for some people. Contact me if you want me
to use you as a guinea pig.

[0] [https://github.com/daanzu/kaldi-active-
grammar](https://github.com/daanzu/kaldi-active-grammar)

[1] [https://youtu.be/Qk1mGbIJx3s](https://youtu.be/Qk1mGbIJx3s)

[2] [https://github.com/daanzu/kaldi-active-
grammar/blob/master/d...](https://github.com/daanzu/kaldi-active-
grammar/blob/master/docs/models.md#fine-tuning-for-individual-speakers)

~~~
StavrosK
It looks like Kaldi can use different backends, which I imagine have very
different performance characteristics. Can you rank them from best to worst,
with relative distances?

~~~
daanzu
Just to be clear, the Dragonfly speech recognition command and control
framework has multiple "backends" (speech recognition engines), including my
Kaldi one. Probably the most used one currently is the Dragon Naturally
Speaking backend.

The Kaldi engine, being developed primarily for research in speech
recognition, can support a huge variety of "models". I think the consensus
general best for most use cases (particularly for real time, low latency,
streaming use) currently would be considered to be the "nnet3 chain" models,
which are what my kaldi-active-grammar uses/supports.

~~~
StavrosK
Thank you, I think I understand partially, but not fully, as I'm not very well
versed in speech recognition software.

Basically, my question (and I assume many other users') is "I run
<Linux/Windows/Mac OS>, what are my options and how good will my recognition
be with each?". Your answer above helps, but it doesn't entirely satisfy me,
as I'm not sure if a model is the recognition engine, or if the engine uses
the model, or how I can use it, etc.

------
apeddle
I came across Serenade ([https://serenade.ai/](https://serenade.ai/))
recently. It's still beta but I was very impressed. In the past I've used
vocola, and a few other open-source options. Serenade felt much more natural
and powerful. The founders are also super hands-on and genuinely seem to care
about the problem.

~~~
salt-licker
Yes, one of the founders is a coder who developed RSI and couldn’t find a good
tool. In a form of UX bootstrapping, I’m pretty sure the product is being used
to build itself

------
tbabej
While (likely) not using directly deep learning, I found the following talk
[1] by Emily Shea on her code dictation setup (based on Talon Voice) both
insightful and impressive.

EDIT: Actual demo with coding starts at 18.00:
[https://youtu.be/YKuRkGkf5HU?t=1076](https://youtu.be/YKuRkGkf5HU?t=1076)

[1]
[https://www.youtube.com/watch?v=YKuRkGkf5HU](https://www.youtube.com/watch?v=YKuRkGkf5HU)

------
bmc7505
Shameless plug, but I have been working on an open source IDE plugin [1] for
the IntelliJ Platform which attempts to do this. Previously, we used an older
HMM-based speech toolkit called CMUSphinx [2], but are currently transitioning
to a deep speech recognition system. We also tried a number of cloud APIs
including Amazon Lex and Google Cloud Speech, but they were too slow --
offline STT is really important for low latency UX applications. For
navigation and voice typing, we need something customizable and fairly
responsive. Custom grammars would be nice for various contexts and programming
languages.

There are a few good OSS offline deep speech libraries including Mozilla
DeepSpeech [3], but their resource footprint is too high. We settled on the
currently less mature vosk [4], which is based on Kaldi [5] (a more popular
deep speech pipeline), and includes a number of low-footprint, pretrained
language models for real-time streaming inference. Research has shown how to
deploy efficient deep speech models on CPUs [6], so we're hoping those gains
will translate to faster performance on commodity laptops soon. You can follow
this issue [7] for updates on our progress. Contributions are welcome!

[1]: [https://github.com/OpenASR/idear/](https://github.com/OpenASR/idear/)

[2]: [https://cmusphinx.github.io/](https://cmusphinx.github.io/)

[3]:
[https://github.com/mozilla/DeepSpeech](https://github.com/mozilla/DeepSpeech)

[4]: [https://github.com/alphacep/vosk-api](https://github.com/alphacep/vosk-
api)

[5]: [https://github.com/kaldi-asr/kaldi](https://github.com/kaldi-asr/kaldi)

[6]: [https://ai.facebook.com/blog/a-highly-efficient-real-time-
te...](https://ai.facebook.com/blog/a-highly-efficient-real-time-text-to-
speech-system-deployed-on-cpus/)

[7]:
[https://github.com/OpenASR/idear/issues/52](https://github.com/OpenASR/idear/issues/52)

~~~
daanzu
I agree with everything you said, but I would add that a critical component of
voice command and control is strict grammars. There is so much structure and
context in what we speak, and being able to limit what can be recognized to
only what can be reasonably spoken (based on the current context) can allow
massive increases in accuracy. (EDIT: ah, you edited to add a mention of this
as well.)

And one shameless plug deserves another! Vosk is a great project, but my
kaldi-active-grammar [0] (mentioned in another comment here) also uses the
same Kaldi engine, but extends it and is designed specifically for this use
case. It supports defining many grammars, in any combination, and
activating/deactivating them at will instantly per-utterance. I think it's
probably a better fit as a backend for your project than vosk. My work focuses
on the backend technology, so it would be great to have more front ends using
it to put it within users' reach (so to speak).

[0] [https://github.com/daanzu/kaldi-active-
grammar](https://github.com/daanzu/kaldi-active-grammar)

------
mkl
I use dictation a bit for prose, but my voice wouldn't be able to handle more
than a couple of hours a day of that.

Can you use a touch screen or mouse? I went ~13 years without using a
keyboard, and typed with mice (some customised), trackballs, and touch
screens, mostly using predictive typing software I wrote. In that time I did a
lot of programming, including a whole applied maths PhD.

One of the best mouse setups I came up with a variety of versions of was
moving the cursor with one hand, and clicking with the other. Holding the
mouse still to click the button accurately is a surprisingly problematic
movement. I made a button-less mouse with just a flat top to rest the side of
my hand on, with a bit sticking up to grip. Standalone USB numeric keypads can
be remapped to mouse clicks and common keys.

Touch screens can also be very good, if set up right, as all the movement can
come from the big muscles and joints of your upper arm and shoulder, and your
fingers and wrist don't need to do much. The screen needs to be positioned
well, not out in front of you, but down close and angled in a comfortable
position to hold your arm for long periods.

~~~
daanzu
For me at least, dictation is actually the more straining mode of speech
recognition, as compared to using my command grammars. With dictation, you
might say anything, so the computer is given wide leeway in what to recognize,
and so you must speak as clearly as possible. With commands (especially a nice
simple command grammar), however, what you can say is greatly restricted,
which allows you the freedom to speak indistinctly and still be understood by
the computer. This can even be magnified by personalized training of the
speech recognition model.

When using commands at my computer, I frequently find myself muttering and
grunting things that even I think to myself (that is utterly un-
understandable), yet the computer understands just fine. Dictating for
prolonged periods can be tiring for me, but I can happily code by voice
commands all night.

More info in my other comments here:
[https://news.ycombinator.com/item?id=23507363](https://news.ycombinator.com/item?id=23507363)
[https://news.ycombinator.com/item?id=23507829](https://news.ycombinator.com/item?id=23507829)

------
downerending
Not sure exactly how bad "severe" is, but I had a lot of luck with my RSI
switching to two-fingered typing for a (long) while. It's crucial to keep
everything below your elbows utterly relaxed, like a pianist, sort of.

Also, I bought a keyboard tray that supported a deep negative angle, which
helped me keep a very anatomical (relaxed and natural) position.

Also, figure out that mouse, somehow. Something like the above, plus switch
sides frequently.

I've no idea if that could help you, but after a few years, I'm largely in
remission.

I know this isn't really what you were asking, but I'm somewhat hopeful you
can find relief. Good luck.

------
xenonite
Is the hackernews process broken? I currently see three comments being
downvoted without any apparent reason:
[https://news.ycombinator.com/item?id=23507041](https://news.ycombinator.com/item?id=23507041)
[https://news.ycombinator.com/item?id=23506992](https://news.ycombinator.com/item?id=23506992)
[https://news.ycombinator.com/item?id=23507486](https://news.ycombinator.com/item?id=23507486)

------
setzer22
I've been working on a similar use case at work (going from discoursive speech
to cli-like commands, using a semi-rigid language), and I didn't find any off-
the-shelf purely ML-based solution that would work for us.

In my experience, I've found any services claiming to do deep learning
produced far worse results than what we could get with simple approaches. That
is, when faced with non-grammatical sentences (or rather, sentences with a
different grammar than English's). Of course that's because models are not
typically trained with this use-case in mind! But the fact that you need a
huge load of data to even slightly alter the expected inputs of the system, to
me, was a deal breaker.

For the specific case of programming with voice, Silvius comes to mind. It's
built and used by a developer with this same problem. It's a bit wonky having
to spell words sometimes with alpha-beta-gamma speech, and it won't work
without some customization, but on the other hand it's completely free and
open source: [https://github.com/dwks/us](https://github.com/dwks/us)

~~~
lunixbochs
You've see seen openai's new english -> bash demo right?

That said, Silvius is more of demo than a product, the IMO best voice
programming options right now are (in alphabetical order):

\- Caster/dragonfly (fully open-source if you use daanzu's Kaldi engine, which
is way better than Silvius afaik, I think even the creator of silvius uses
dragonfly with dragon instead of using silvius)

\- Serenade (fully commercial, I haven't looked at it much recently but
biggest caveats afaik are accuracy, the fact speech recognition is web based,
and it's restricted to specific languages and IDEs while caster/talon are for
full system control and not just programming)

\- Talon (my project, semi-commercial as I work on it full time and draw
income from it but aim to give all necessary features away for free, some
benefits include a fully offline and open-source speech recognition engine,
and I have other bonuses like eye tracking and noise recognition)

~~~
setzer22
> You've see seen openai's new english -> bash demo right?

Not yet, but will do, thanks!

However, I'd still be hesitant to build a product on top of that: Does voice
to bash help us if we now want to do, say, voice to python? At least we'd need
to re-train the system with completely new data, and even if we use transfer
learning to our advantage, it's not an easy task. There's also no guarantees
that the chosen neural network architecture that works for bash, will work the
same for any programming language (think of a radically different syntax, like
Lisp for example).

The training must also be re-done for any variation in the input format to
some extent. i.e., accent, expected background noise levels, and of course
(human speaker) language.

ML has its use case, but I typically see these nice demos as that, demos. When
you have to build a real product and solve user problems, you can't rely on a
black box doing what you want.

~~~
lunixbochs
I think some of your comment does not apply to GPT3 in the conventional sense,
they did not do any specialized training for text2bash afaik. They've been
tooting about "one shot learning". If their demo is to be believed, text2bash
is just their _massive_ generic model + a few lines of examples.

Also they do have a related Python demo:
[https://news.ycombinator.com/item?id=23507145](https://news.ycombinator.com/item?id=23507145)

Speech is a completely different stack to this, but honestly (english) speech
is much more of a solved problem here than general knowledge.

------
O_H_E
Related: there was a famous thread here a few months ago that would be very
helpful.

Ask HN: I'm a software engineer going blind, how should I prepare?
([https://news.ycombinator.com/item?id=22918980](https://news.ycombinator.com/item?id=22918980))

------
byteface
On a mac there is a tool called 'voice control' which can trigger custom
'commands' or keyboard shortucts. You can use it to trigger shortcuts in any
IDE. So if your IDE supports custom shortcuts for templating you're away.

~~~
xenonite
Indeed. And I really don't get why this has been downvoted. Yes, the OP wrote
he didn't like a command driven approach. But thats on windows, which comes
with its own problems, see e.g.,
[https://news.ycombinator.com/item?id=23507363](https://news.ycombinator.com/item?id=23507363)
.

~~~
lunixbochs
Voice Control is much worse than WSR at this task, not for accuracy reasons,
but for API and extensibility reasons. (I went above and beyond to try to make
it work, the underlying APIs crash and/or hang if you try to load in large
grammars, too much custom vocab, or repeatable commands). In Voice Control you
can basically only define simple commands that require pauses, and there's no
way to build a system for something like spelling words or inserting specific
special characters without requiring a large. pause. between. every. single.
thing. you. say.

------
suby
I'm also curious about this.

The best project I've seen for voice coding is Talon Voice, but I doubt
anything novel is being done with it and deep learning. I'd suggest trying it
out if you haven't. They also have a pretty active slack channel, you might
have some luck asking them if they know about anything on the horizon.

[https://talonvoice.com/](https://talonvoice.com/)

~~~
lunixbochs
Talon uses deep learning for the speech part, but not so much (yet) the code
part. However, the continuous command recognition it uses can cause less
strain than individual abrupt commands. You can string a dozen command words
together more like a sentence instead of repeatedly saying a command then
abruptly stopping and waiting to say the next command (which is how many older
systems worked).

I'm definitely open to incorporating deep learning directly. I've already
signed up for the GPT3 API waiting list and I have some ideas on how to use
it, and I generally have some ideas on how I might otherwise approach more
natural feeling voice programming down the line.

------
gtmtg
Check out [https://serenade.ai](https://serenade.ai) \- another startup
working on this!

------
cellis
I really hate to see programmers/typists suffering from RSI when it is
entirely preventable with the right ergonomics. Having worked on production
NLP systems, I have to say I think typing will remain a more effective way of
coding for many years to come ( for many reasons, but primarily because
syntaxes change often and training for syntax and context is hard ). I also
had RSI for many years, and finally it started to affect me playing sports and
esports.

So first, I switched my mouse to my non dominant hand ( left hand for me ), as
that hand already has many things to deal with. I'm also using a workstation
that allows me to mount my displays at eye-level while sitting or standing.
Not hunching over is ergonomics 101. Second, I switched from a standard
keyboard to a split keyboard. I tried many -- Goldtouch, Kinesis Advantage2,
Kinesis Freestyle -- and ultimately settled on the Ultimate Hacking Keyboard.

I could write many more paragraphs on how I customized it and why it won out,
but the most important thing is that is is split and it "felt" best, once I
mastered the key placements ( arrows are in different places ).

Third, I started learning VIM. Vim is really awesome but up until recently
didn't have great IDE or other editor support. Now it does so there's no
reason to not use it. I mostly use it for quickly jumping around files and
going to line numbers.

Fourth, I'm always looking to optimize non-vim shortcuts in my editor. For
example, expand-region ( now standard in VSCode ) is one of my favorite
plugins.

Fifth, I'm very conscious of using my laptop for long stretches of time.
Mousing on the mousepad is much more RSI inducing than using a nice gaming
mouse and the UHK keyboard.

All of this to say that RSI doesn't have to be career ending. If you're doing
software work and you have functioning hands and wrists you should definitely
look to optimize typing before looking to speech to code. Good luck!

------
tluyben2
Probably not good for your case, but end of the summer we are going in beta
launch for our product which is a visual + speech controlled programming
language. It's very niche as it's a new language and IDE, from scratch, but so
far it's been fun working on it.

~~~
raidicy
i'd be interested in seeing it. can i follow you somewhere

------
mk4
Not coding - but openai's API in beta has a speech to bash function
[https://openai.com/blog/openai-api/](https://openai.com/blog/openai-api/)

~~~
raidicy
thank you I ended up signing up for the waitlist I saw the demo for the
English to bash one shot learning if I could somehow use that and combination
with a speech recognition software I might be able to achieve something along
the lines said I'm looking for

~~~
mk4
Not to be too forward - but maybe ask then on twitter - there are a few
conversations around accessibility and perhaps you could give them feedback
from your needs.

~~~
raidicy
thank you for the suggestion I've never really used much of Twitter but it is
worth a shot . if i just reply to their thread will they see it?

------
xchaotic
Not sure if off topic but it wasn’t the RSI that put the nail in the coffin
for my programming. It was spine injury and I had to have a surgery. There’s
lots of jobs around programming that don’t require as much typing and even
when you do, it’s easier to dictate email than code. Basically you got RSI
from coding and generally spending too much time with the keyboard. Maybe at
least consider alternatives where you are not spending lots of screen time
again.

------
netman21
I was hoping this was a question about applying NLP to coding tasks, but based
on the answers it is about voice to text for the special use case of coders.

I am not a coder, I am a writer. I wonder why all these AI people are trying
to create things that will displace my means of earning a living instead of
something that will create applications?

Why can't I tell my Mac: "Computer: take this collection of files and extract
all the addresses of people in Indiana."

~~~
smt88
> _I wonder why all these AI people are trying to create things that will
> displace my means of earning a living instead of something that will create
> applications?_

AI writing is currently terrible. If your writing income can be replaced by
AI, you're likely summarizing simple topics (e.g. unemotional, analysis-free
facts of a recent event, like a change in a stock price). In that case, your
job isn't safe anyway, because there are $2/hr humans in other countries that
can also take your job.

Note that all software is capable of eliminating jobs, whether it's automating
writing or something else.

> _Why can 't I tell my Mac: "Computer: take this collection of files and
> extract all the addresses of people in Indiana."_

NLP is nowhere near being able to generally understand requests like this. Try
talking to your Google Assistant and doing anything other than a simple
lookup. It fails spectacularly.

Think about all the cognition in this task: first it has to understand what
you're asking it. Then it has to figure out which collection of files you
mean. Then it has to understand what an "address" is and what it looks like.
Finally, it has to have a concept of "Indiana" and an understanding that an
address can be in Indiana.

In terms of effort, it's far more efficient for you to learn the simple
programming skills to do this yourself (or to use a purpose-built tool) than
to create AI that does this generally, which will cost at least billions of
dollars more in research time.

~~~
sdenton4
I dunno... English to sql query is probably easier than English to French. You
don't actually need to understand that Indiana is a place, just recognize that
it's one of those tokens that's getting stuffed into the where clause...

The tough part is really probably the implicit knowledge of what the database
columns are called. Is Indiana a "state" field or a substring in an address
field? Depends on the database.

This mixing of "soft" ml queries with "hard" requirements is a really
interesting space, which I think will produce massive results if we can get it
under control. It's similar to the problems in the AI dungeon master space...
And probably tutoring, as well...

~~~
retrac
> I dunno... English to sql query is probably easier than English to French.

You can have substantial success in transforming English to French _without
understanding what is said_. Some big grammar rulebooks, some dictionaries,
and a few terabytes of sentence-aligned parallel texts and data on statistical
colocations in the languages will get you most of the way. It's far from 100%
accurate but the human brain is surprisingly good at filling in the gaps with
the mangled grammar and odd turns of phrase.

I'm not so sure the same is true about English to SQL query. The exact meaning
of the English must be understood, in context, in order to construct the query
correctly. No room for error or fuzziness. The database engine doesn't have
human cognition to fall back on should the query be slightly malformed or
ambiguous.

------
riedel
Here is an article summarizing some nice stuff :
[https://www.nature.com/articles/d41586-018-05588-x](https://www.nature.com/articles/d41586-018-05588-x)

I always wanted to learn vimspeak:
[https://news.ycombinator.com/item?id=5660633](https://news.ycombinator.com/item?id=5660633)

------
fareesh
I remember watching a video of this guy from maybe 10-12 years ago who used
IBM Dragon natural speak into some kind of vim commands and shorthand language
which enabled him to write code faster than he could using the keyboard. It
was a fascinating demo, but the efficiency seemed to come from his short
language that he developed for his workflow like "bop chop hop" etc.

~~~
phrotoma
Probably this clip of Tavis -
[https://youtu.be/8SkdfdXWYaI?t=530](https://youtu.be/8SkdfdXWYaI?t=530)?

------
boomersooner
I have similar issues. I use a combination of kinesis advantage, penguin
mouse, and dragonfly/DNS. Having a good microphone does make a difference, as
does retraining/tweaking command vocab. The biggest thing overall is the
ergonomics of desk work - I take a break every 15 minutes (or try to) by
setting timers.

------
mulmen
I can’t help you with actually converting speech to code but it occurs to me
this would be a benefit to everyone. Speaking the words that are represented
by the code we write would require a much deeper understanding of what we are
doing and why.

Food for thought for sure. Good luck.

~~~
lunixbochs
There's been some pretty cool work in this area recently:
[https://www.youtube.com/watch?v=fZSFNUT6iY8](https://www.youtube.com/watch?v=fZSFNUT6iY8)

~~~
mulmen
Wow. This is exactly what I imagined but it already exists. This is a great
illustration of how we as humans could use our abilities to disambiguate to
collaborate with a computer and write code. Very impressive stuff.

To me it seems like learning how to talk to Alexa or Cortana or "Google" is a
limitation or regression for humans. This shows that it could actually be
beneficial.

Thanks for this philosophical rabbit hole just in time for a weekend.

~~~
lunixbochs
As someone who works in voice tech, I think talking to Alexa is setting back
our expectations of voice tech by at least a decade. The actual tech and
capabilities we have available right now are so much better than static
capabilities over a high latency internet connection.

------
tibu
One of my friends was just diagnosed with ALS. Such softwares listed in this
topic make their life useful and enjoyable for their remaining years. Guys,
keep up the good work! I'll definitely check where I could contribute too.

------
mtrimpe
Have you tried VoiceCode? [https://voicecode.io/](https://voicecode.io/)

That in combination with switching to a Lisp (Clojure) almost made it feasible
for me to code with RSI.

I just became a manager instead because I couldn’t work from home and talking
like that in the office was a no-go for me.

If that’s your cup of tea you’d be surprised at how happy upper management is
to have someone who’s actually good at technology be willing to engage with
them.

~~~
lunixbochs
voicecode has been defunct for years now

------
idontevengohere
I'd love to help! If anyone's working on this in this thread, lemme know :)

------
j88439h84
The options are Caster and Talon. Talon is closed source.

------
mpourmpoulis
Though unfortunately I cannot provide you with the conversational solution you
are looking for, I believe there are some steps you can take/solutions that
are currently available and that could help make your voice programming
experience less exhausting, so it might be worth it if you gave them a try

1) try to minimize the amount you have to speak by leveraging auto completion
as much as possible. For me TabNine [1] has been great help in that regard

2) try to use snippets as much as possible to reduce boilerplate code and
because you can simply tab through the various fields. For me it has been
great help that with sublime it is possible [2] without installing anything to
have all of my snippets inside dragonfly grammars or even generate them
dynamically [10] providing for much-needed structural control over what you
write. I know this is more primitive (at least for the time being, there are
ideas to improve it) than what you are asking for but for me it has been
enough to make C++ enjoyable again! unfortunately my pull request to integrate
this into Caster [3] has fallen behind but all of the basic functionality
along with various additional utilities is there if you want to give it a try.
Just be aware of these little bugger [4] that applies here as well!

3) not directly related to code generation but if you find yourself spending a
lot of time and vocal effort for navigation consider either adding eye
tracking to the mix or utilizing one of the at least three project that
provide syntactical navigation capabilities. As author and more importantly as
a user of PythonVoiceCodingPlugin [5], I have seen quite a bit of difference
since I got it up to speed, because a) even though it is command driven
,command sound natural and smooth b) though they can get longer ,in practice
utterances are usually 3 to 5(maybe 6) words , which makes them long enough so
that you do not to speak abruptly but short enough that you do not have to
hurry to speak them before you run out of breath c) and yeah I personally need
less commands compared to using only keyboard shortcuts so less load for your
voice! The other two project in this area I am aware of are Serenade [6] and
VoiceCodeIdea [7] so see if something fits your use case!

4) use noise input where you can to reduce voice strain. Talon [8][9] is by
far the way to go in this field but you might be able to get inferior but
decent results with other engines as well. For instance, DNS 15 Home can
recognize some 30+ letter like "sounds" like "fffp,pppf,
tttf,shhh,ssss/'s,shhp,pppt,xxxx,tttp,kkkp" , you just have to make sure that
you use 4 or more letters in your grammar (so for instance ffp will not work).
recognition accuracy is going to degrade if you overloaded too much but it is
still good enough to simplify a lot of common tasks.

5) give it a try with a different engine, I was not really that much
satisfiedwith WSR either

6) see if any of the advise from [11] helps and seek out professional help!

I realize that my post diverges from what you originally asked for but I feel
the points raised here might help you lessen the impact of voice strain for
the time being until more robust solutions like the gpt3 mentioned in one of
the comments above are up and running. My apologies if this is completely off
topic!

[1] [https://www.tabnine.com/](https://www.tabnine.com/) [2]
[https://github.com/mpourmpoulis/CasterSublimeSnippetInterfac...](https://github.com/mpourmpoulis/CasterSublimeSnippetInterfaceExample)
[3] [https://github.com/dictation-
toolbox/Caster](https://github.com/dictation-toolbox/Caster) [4]
[https://github.com/mpourmpoulis/PythonVoiceCodingPlugin/issu...](https://github.com/mpourmpoulis/PythonVoiceCodingPlugin/issues/15)
[5]
[https://packagecontrol.io/packages/PythonVoiceCodingPlugin](https://packagecontrol.io/packages/PythonVoiceCodingPlugin)
[6] [https://serenade.ai/](https://serenade.ai/) [7]
[https://plugins.jetbrains.com/plugin/10504-voice-code-
idea](https://plugins.jetbrains.com/plugin/10504-voice-code-idea) [8]
[https://talonvoice.com/](https://talonvoice.com/) [9]
[https://noise.talonvoice.com/](https://noise.talonvoice.com/) [10]
[https://github.com/mpourmpoulis/CasterSublimeSnippetInterfac...](https://github.com/mpourmpoulis/CasterSublimeSnippetInterfaceExample/blob/master/example8.gif)
[11] [https://dictation-toolbox.github.io/dictation-
toolbox.org/vo...](https://dictation-toolbox.github.io/dictation-
toolbox.org/voice%20strain.html)

------
redis_mlc
I don't know about your specific RSI case, but moving from Java or C to a
scripting language like Perl or Python can be helpful, since there's up to 10x
less LOC.

Also, talk to an ergonomics person about it, and it sounds like notebooks are
out at this point unless you have an external keyboard, mouse and monitor.

------
Vinceo
Are you sure you have RSI and not TMS (tension myositis syndrome)? It's a
condition that causes real physical symptoms (of which wrist pain is a common
one) that are not due to pathological or structural abnormalities. Rather, the
symptoms are caused by stress and repressed emotions.

Check out this success forum of people who have healed from all kinds of
chronic pain symptoms by dealing with stress and changing their mindset:

[https://www.tmswiki.org/forum/forums/success-stories-
subforu...](https://www.tmswiki.org/forum/forums/success-stories-subforum.27/)

~~~
winrid
I have definitely found my RSI like symptoms are dependant on my
mindset/stress levels.

Rarely does the RSI flare up when I'm doing something I enjoy. Stress is
definitely a component.

