
On Voice Coding - buchuki
https://dusty.phillips.codes/2020/02/15/on-voice-coding/
======
lunixbochs
This post starts out talking about expecting to spend around $1,000.

There are at least two cross-platform projects where the biggest expense is a
microphone instead of software.

1\. My project, Talon. Windows/Linux/Mac support, and a first party local
speech recognition engine that is pretty good and getting better. It’s free,
but the engine is in a private beta (which is $15/mo to support development,
optional if there’s a financial issue).

2\. Serenade. They are VC backed. Currently free, unsure about their longer
term plans. They use cloud based recognition.

~~~
daanzu
This comment fails to mention Dragonfly with my Kaldi Active Grammar backend
[1], which is cross platform (Windows/Linux now and Mac functional and to be
released soon), completely free with no private beta features (although I do
accept donations), and 100% open source (unlike Talon). The speech recognition
is local, with extremely low latency. See the video demonstration [2] on the
project page. I think the underlying Kaldi engine delivers unmatched accuracy
as a free non-commercial engine.

I created Kaldi Active Grammar because I didn't trust relying on closed source
software for something so crucial to my productivity, where a decision by an
outside party determines whether I can function. As a bonus, open source means
I can make it work better to fit my needs than closed source ever could.

Furthermore, the original article mentions Caster (which is built on
Dragonfly), but doesn't mention that KaldiAG works with it, and that work is
underway to expand Caster's platform support.

[1] [https://github.com/daanzu/kaldi-active-
grammar](https://github.com/daanzu/kaldi-active-grammar)

[2] [https://youtu.be/Qk1mGbIJx3s](https://youtu.be/Qk1mGbIJx3s)

~~~
lunixbochs
I think to state "unmatched accuracy" in good faith we should actually come up
with a common benchmark and measure against it. I believe there aren't really
any clean benchmarks for command accuracy floating around (it would be ideal
if we used a strict grammar to properly measure the command decoders), and a
wav2letter model holds the state of the art for librispeech WER% as of 2019.

I found your measurement here [1] which is against an unknown wav2letter
acoustic+language model pair, as the web demo is at any given point in time
running an arbitrary model based on having users test in-progress models, and
it has never been running the model I am currently shipping with Talon.

(As a small example, the unfinished wav2letter experiment I am training right
now has a 3.17% WER on speech commands, and 6.86% WER on librispeech clean,
both numbers without using a language model)

[1] [https://github.com/daanzu/kaldi-active-
grammar/blob/master/d...](https://github.com/daanzu/kaldi-active-
grammar/blob/master/docs/models.md)

~~~
daanzu
I am all for devising a good, fair apples to apples comparison. If you have
any suggestions, let me know. In lieu of that, I use what I have available.
While accuracy numbers from papers are informative and interesting, I don't
think they directly apply to our usage particularly well. I would prefer to
use numbers from actual usage.

------
russellbeattie
Just want to point out a very accurate, yet completely inelegant solution to
voice text input: Saying each letter and symbol individually.

If you haven't tried it, you'll think I'm crazy, but it's amazing how fast
computers can recognize individual letter names. You can just blurt them all
out. I discovered this while entering a domain name by voice - just spell it
out and poof, no problem, no corrections.

Not sure I'd want to spend all day doing it that way, but rather than fighting
with voice recognition for misunderstood homonyms, just fall back on
individual keys.

~~~
lunixbochs
You’re exactly right, and this is something I put a lot of work into! I
designed a new phonetic alphabet - the “Talon alphabet” - (which isn’t
proprietary to Talon, people have used it with Caster as well).

The trick is one-syllable words to represent each letter in the alphabet, but
they have high phoneme diversity (sub-syllable unique sounds) and care was
taken to ensure they can be differentiated even when saying them “too fast” in
a stream without gaps, and I tested them extensively to make sure I picked
words that didn’t clog up your mouth too much. The result is I can type 58wpm
for short bursts using the alphabet alone, which is enough to play ztype,
write entire words that would be hard to dictate, and control vim directly
without an additional command set.

The base alphabet is:

air bat cap drum each fine gust harp sit jury crunch look made near odd pit
quench red sun trap urge vest whale plex yank zip

Go ahead, see how fast you can read the alphabet in order. Then try reading it
even faster, blurring the word boundaries together a bit. You can still tell
the words apart!

Some people change a couple of words if they don’t work well in their accent,
but the base idea holds strong.

~~~
22c
> air bat cap drum each fine gust harp sit jury crunch look made near odd pit
> quench red sun trap urge vest whale plex yank zip

How did you settle on jury? Notably it seems to be the only word that is more
than one syllable.

~~~
lunixbochs
I think it’s more like more like 1.5 syllables in practice? I had a hard time
finding a short word in the J space that was phonetically diverse but also
didn’t clog the mouth. It’s worth noting that when doing the thing I mention
where you blend the words together to go faster, “sit jury crunch” can be
spoken more like “sitch ury crunch” or even “sitch re crunch” and still
recognized fine.

And see sibling thread suggesting “jree” like “tree”, which is similar to how
it sounds when you’re going quickly.

Edit: For people suggesting new words, please read my comments downthread.
Also the biggest complaint I’ve had about jury in practice is it can sound
similar to “three”. Not a ton of words are spelled with J, and when doing vim
input it’s faster to use numbers or a repeat command than say the letter
repeatedly. So as it’s a lower priority word to trim half a syllable on, it’s
worth getting it right if you’re going to change it.

The alphabet isn’t strict law, but it’s best to replace individual words after
you’ve had specific problems with them and not before, as it’s hard to match
the phonetic feel.

~~~
kylec
How about "joy"?

~~~
lunixbochs
See sibling for testing methodology. It seems good at face value to me,
however I did stumble a bit when doing the full alphabet out loud test with it
(joy air, joy bat, joy crunch, etc...). I think there’s a harder to quantify
component that causes the stumbles/tongue twisting, that is where your
mouth/tongue position ends up when you finish speaking the letter.

------
jorgeer
I had several onsets of RSI a few years back, and had to resort to voice
coding as a last resort, after stretches, pauses and ergonomic everything did
not do the job. It was pretty awful.

But then, after having seen doctors and neurologists, and finally a physical
therapist, I came across my salvation: \- Exercising my hands.

I very rarely see this mentioned for some reason. I exercised regularly, but
only the bigger muscle groups, rarely grip strength and wrist strength. It
felt counter-intuitive to exercise my already extremely painfully aching hands
(when typing), but using grip weights and other methods to work out my hands
and wrists, the pain went away quickly! If you are not diagnosed with carpal
tunnel, and not already doing this, definitely try it, it saved my career.

~~~
benterris
Could you please elaborate on the exercises you're finding helpful ? I have a
mild RSI myself (outer part of the forearm, near the elbow) and have been
trying some eccentric exercises for a few months now but I'm not seeing a big
improvement.

~~~
ajmarcic
I list the low-hanging fruit first:

Look up nerve flossing exercises on YouTube. Routinely doing these had the
largest impact for me. You'll feel the ones that work on whatever nerve is
inflamed.

Try to improve your posture. My general mantra is "lift your head as much as
possible. Pull your shoulders back". It gets easier eventually. If your head
lies forward from the spine you have a hunch. You may notice a small lump of
muscle behind your neck. That's bad. There are exercises to try and strengthen
the opposite muscles.

If you sleep on your arms try to stop as well. I recommend sleeping on your
back. Fluffy couches and back rests sacrifice posture. Don't use a laptop in
bed.

Sleep, eat nutrient-dense foods, and run or swim. Avoid alcohol. If you do
pushups or bench press, make sure to exercise your upper back equally to avoid
imbalance.

------
random3
I had my second surgery just a few months ago. I can type again, but each time
I had to use my left hand for months. Initially, it feels like your brain
doesn't function properly anymore (not mentioning the psychological effort you
have to make in order to be focused on work when you feel your hands are
falling apart). Keyboard speed is directly related to how fast you can move
your hands to support your thought flow. I tried Kinesis and even a split
vertical keyboard (KeyboardIo) but none avoided the pain and numbness that
came with typing. The other problem with thumb-cluster keyboards is that your
IDE productivity goes to zero. I was faster with just my left hand on a
regular keyboard than with both. I think this would be fixable with a good
amount of time remapping shortcuts, etc. Now that my hand works again, I think
I should start spending time getting used to my KeyboardIO and at least try to
buy some time.

The "voice coding" space is maybe not a mess, but far from great or even
acceptable. However, there seem to be more recent efforts to make better
tools. I would definitely check [https://serenade.ai/](https://serenade.ai/)
out.

The main problem, I think is that "voice coding" is too much focused on editor
typing which they can't do right as, when combined with code syntax, it
becomes too complex. Instead, they should focus on higher level actions (which
btw, Serenade does) along with a different approach to typing. I think Vim is
a good example of where editing should be. IntelliJ refactoring is where voice
coding should _start_. With all the AI buzz, it's unbelievable how bad voice
recognition is. I'm not talking about "Siri set an alarm", but instead
separating context from tone, not having to say things 2-3 times having good
response latency, etc.

Lastly, I wish there was simple voice assistance for code navigation - like go
to definition, find usages, etc. This is much simpler to "parse" than code
structure. Unfortunately, this is not even tackled by any tool as far as I've
seen.

~~~
mwiethoff
I’m one of the creators of Serenade—thanks for mentioning us! We totally agree
about the need for higher-level layers of abstraction, and we’re working on
some of the code navigation functionality you mentioned right now. If you have
any other ideas or feedback, we’d love to chat more, I’m matt@serenade.ai.

~~~
nshm
Among all demos above Serenade seems the only product that does the right
thing. Why the hell I should say aloud "colon" or "quote" where tool can put
them automatically.

------
melling
“I type fast enough that I have never had much use for snippets in my text
editor. Perhaps if I’d used them more effectively, I would not have developed
the RSI symptoms that I have!"

Often we have discussions about advanced code completion on HN. Many
developers feel they don’t need it, or that it gets in their way, for example.

Reading stories like this convinces me even more that our editors (tools) need
to be smarter. There is so much repetition is coding, it’s hard to believe we
can’t do better.

This tool is often mentioned: [https://tabnine.com/](https://tabnine.com/)

~~~
est
I found that Kite is comparable with Tabnine, at least with Python.

[https://kite.com/integrations/kite-vs-
tabnine/](https://kite.com/integrations/kite-vs-tabnine/)

~~~
lunixbochs
Because this says “single token completion” as the tabnine example gif I’m
guessing it isn’t comparing to deep tabnine at all, which is on an entirely
different level

------
dwiel
I've been voice coding for about 5 years now. For those of you not on windows,
I use talon voice on mac (linux version is in beta). It works quite well and
I'm at least as productive writing code by voice than I ever was by hand. I
was someone who would spend the time to get my emacs and then later vim
configs highly optimized, but there is something liberating about not
constraining yourself to key bindings. I used to type gcC to comment a python
class in vim, now I say comment class. For commands you type frequently to get
into muscle memory this isn't a huge gain, but for all the things you don't
use regularly, it's so much easier for me to remember normal words than
keyboard shortcuts.

~~~
lunixbochs
At this point all of the projects mentioned in this thread
(caster/talon/serenade) have some option for supporting the three main
(win/lin/mac) platforms.

~~~
gleb
How do you get/request access to Talon on Linux?

~~~
lunixbochs
You can follow the Talon website in my HN profile to the Donate link (Patreon)
and sign up for the beta tier then join the Slack to get access. Or go
straight to the Slack and make your case for not being able to support the
beta on Patreon at this time.

------
ijidak
FYI, I thought my programming career was over due to RSI.

Now, I only type while wearing long-sleeves.

And of course, I still have to take regular breaks.

I no longer suffer RSI symptoms. I'm guessing because it increases blood flow
to the area and perhaps the warmth helps keep ligaments and muscles flexible
and loose.

Simple solution, but took a while to figure out.

Hopefully this helps someone reading this.

~~~
nemo1618
Same, but for me the cure was John Sarno. Unfortunately RSI seems to be one of
those disorders that everyone cures differently (if at all!) and you just have
to stumble around until you find a method that works for you.

~~~
l_t
Are you referring to the "Sarno method for psychosomatic symptomps" (e.g.
[https://www.morrisonhealth.com/sarno-method-psychosomatic-
sy...](https://www.morrisonhealth.com/sarno-method-psychosomatic-symptoms/))?

This is the first I've heard of such a therapy, it sounds fascinating.

~~~
geewee
Yeah he probably is. Sarno wrote a whole range of books, primarily about back
pain ([https://www.saxo.com/dk/healing-back-pain_john-e-
sarno_paper...](https://www.saxo.com/dk/healing-back-pain_john-e-
sarno_paperback_9780446392303)), but also dealing with RSI. If you read up on
it, a great deal of people have said it has helped them with their RSI.

Anecdotally it's done nothing for my RSI, but it has helped me deal with a 7+
year "chronic" upper back pain, though it did take a bit of time to swallow
the concept at first.

------
LexiconCode
Greetings Everyone, I help maintain the Caster project. The key difference
from other solutions out there as we seek to support a completely open source
voice coding stack. Open source is only way to go long term if you're going to
being using a tool for most of your life. Fortunately for some it acts as a
bridge until their RSI symptoms becomes manageable or goes into remission.

We are working towards cross-platform support Linux and Mac as well as adding
support for Kaldi. Dragonfly is already cross-platform so just a few windows
specific functions to be ported yet in Caster.

Kaldi via daanzu's kaldi active grammar. [https://github.com/daanzu/kaldi-
active-grammar](https://github.com/daanzu/kaldi-active-grammar)

Talon may be free but is closed sourced.

~~~
daanzu
I said this in another comment, but it can't be emphasized enough: I created
Kaldi Active Grammar because I didn't trust relying on closed source software
for something so crucial to my productivity, where a decision by an outside
party determines whether I can function. As a bonus, open source means I can
make it work better to fit my needs than closed source ever could.

For what it's worth, my voice is quite abnormal, so most untrained speech
recognition is terrible for me, and even performing the normal "training" for
Dragon still resulted in very poor accuracy. However, apparently their
training is quite limited, because once I developed Kaldi Active Grammar, and
did my own direct training, the results were fantastic in comparison, with
orders of magnitude better accuracy.

Open source is what allows this.

~~~
lunixbochs
I understand and won't argue on your preference for the core Talon app.
However, as all of my wav2letter code, models, tools, training methodology,
and general advice (e.g. I am very active on the
github/facebookresearch/wav2letter issue tracker helping others) are open
source, and wav2letter as used in Talon is built from the public repository
and dynamically linked, I don't think the speech engine is the place to speak
against Talon's source policy.

~~~
daanzu
Sorry, I was only using the speech engine accuracy as an example. But the
freedom of open source stands for any part of software: Dragon's spectacular
failures for me are only in part because of its engine. Also, is the command
portion of Talon's wav2letter backend open source? Nonetheless, thank you for
releasing some of your work. It is all helpful.

~~~
lunixbochs
Yes, the decoder in the open-source talonvoice/wav2letter/decoder will decode
commands alongside speech if you hand it an NFA blob describing the command
graph. It's up to you to generate that NFA, but it's probably identical to the
graph you're creating with FSTs, and the C structures are described in the
source/header.

------
pfraze
I was curious what this might look like and found a couple videos on YouTube

[https://www.youtube.com/watch?v=fBhBqlQj00Q](https://www.youtube.com/watch?v=fBhBqlQj00Q)

[https://www.youtube.com/watch?v=hGPNs5C1Lp0](https://www.youtube.com/watch?v=hGPNs5C1Lp0)

~~~
gitgreen
Here is a very early(2007) and unintentionally humorous attempt to script in
Perl with voice recognition.

[https://www.youtube.com/watch?v=MzJ0CytAsec](https://www.youtube.com/watch?v=MzJ0CytAsec)

~~~
lunixbochs
I know it’s funny, but I hate the prevalence of this video so much, because
people reference it as a reason voice programming doesn’t work, and I think it
discourages people who may have otherwise had success with voice programming.
You should consider linking to the middle of Emily Shea’s PerlCon video where
she first plays this video then shows that it’s actually possible to dictate
the same code effectively.

~~~
gitgreen
I don't know which video you're referring to and a cursory search didn't turn
it up. Perhaps you could link it?

~~~
lunixbochs
[https://youtu.be/Mz3JeYfBTcY](https://youtu.be/Mz3JeYfBTcY) at 4:15 is when
she plays the Perl Vista video

(note the fiddling around video playing isn’t a fundamental issue with voice
input, google slides with video embeds turned out to be pretty unreliable with
keyboard input and I believe that was ironed out in her later talks)

------
qixxiq
It's still surprising to see Dragon Speech Recognition as the recommended (and
only) choice here.

Is anyone working on decent speech recognition for Mac/Linux or know good
resources for that? The ideal output is a stream of what could have been said,
as well as some alternatives, each with a confidence.

Every alternative I've tried has not been as effective as the version of
Dragon I used from 2011. I think the focus on accents and training is a big
thing here -- I'm happy to spend a couple hours training it for better
results.

~~~
lunixbochs
The Talon beta ships with wav2letter and a really good many-accent English
model that can handle both arbitrary commands and free form English. All of my
trained models and some information is posted here:
[https://talonvoice.com/research/](https://talonvoice.com/research/)

~~~
woodson
Are you going to release the speech data you collect at
[https://speech.talonvoice.com/](https://speech.talonvoice.com/) or is it
proprietary?

~~~
lunixbochs
I don’t consider it proprietary. As per the agreement I specifically ask for
an open license so I will be able to release it in the future.

Right now it’s about 5 hours total, which isn’t a ton for actually training
on, which is why I haven’t prioritized releasing it and haven’t even trained
on it myself yet. I’ve been mostly using it for evaluation so far.

If someone approaches me and says “I have a compelling need for a bit of
training data in the form of your prompts” I’ll probably prioritize a release
higher.

As another perspective, a majority of the people at this point submitting
their voice are already using Talon and just want the engine to be more
robust.

------
totalthrowaway
Example (with Perl!):

[https://www.youtube.com/watch?v=Mz3JeYfBTcY](https://www.youtube.com/watch?v=Mz3JeYfBTcY)

------
est
Off topic, but I really wish to find out any Alexa-like "smart speakers"
capable of voice programming.

For example:

1\. I would like to command the speaker listen for a keyword like the Fizz
Buzz Test[1] if I counted to certain number.

2\. Ask the speaker to remind me of something when hearing certain topics
during a conversation. Much like the "if" keyword in text based computer
programming languages.

3\. Program a poem into the speaker over the microphone, tutor my kids to
memorize it, correct the wrong parts. Share the snippet to other parents.
program simple home made riddles and tests over voice.

4\. The ability to store certain list/map structure as global variables. e.g.
asking the speaker, who is the second oldest son in this family? Who got up
first this morning?

5\. Voice memos and search engine. Stored and indexed securely offline on my
home NAS.

[1]: [https://wiki.c2.com/?FizzBuzzTest](https://wiki.c2.com/?FizzBuzzTest)

~~~
jedberg
I think all the big smart speaker makers would tell you to just write a
serverless function and hook it up to your speaker. It's unlikely they would
ever create such a functionality.

Most likely if you want to make it work you'd either have to build your own
smart speaker or make a serverless function that used one of the other voice
programming programs mentioned in this thread as it's backend.

~~~
est
> build your own smart speaker or make a serverless function

If I build my own smart speaker it would surely be run on my home server. Not
so much for server-less. Yeah but I get it. the voice commands should be
counted as new "keywords" or "functions". Let there be a general "voice
programming" language.

------
azinman2
You gotta be careful not to get RSI in your vocal chords. I almost did when I
tried voice coding years ago. For me at least, it tends to shift to whenever
“work” is being done. Same thing happened when I experimented with eye
tracking.

------
disease
I've had some close calls with RSI and the most helpful things for me were:

1) Getting an ergo keyboard, in my case the Microsoft Sculpt 2) Remapping my
keys to better match my workflow. Left and right parens are mapped to left and
right shift - same for Ctrl-Braces and Alt for brackets. Mapping Caps lock to
delete one word back was also a big one. Further, I have the number pad on the
left to both make using the mouse require less movement in addition to
remapping all the numpad keys to useful programming commands.

~~~
lunixbochs
A minor tip for people fiddling with keyboards/layout. Be careful of regular
pinky use for modifiers. Pinkies are weak and overuse can cause ulnar issues.

Corollary, if you have ulnar (pinky side) issues in your forearm, reduce pinky
use.

------
lbj
Just a quick shout out to Microsoft (no affiliation) - Their Sculpt keyboard
has taken all the discomfort out of 15hour coding sessions. If you're a
professional developer, get one.

~~~
yreg
I'd recommend avoiding 15-hour coding sessions over using an ergonomic
keyboard.

~~~
lbj
Probably better. But when the motivation grabs me, I cant stop.

------
cjbassi
For those on Linux, I've been working on a Talon inspired voice coding program
called Osprey that uses the Google Cloud speech to text API:
[https://github.com/osprey-voice/osprey](https://github.com/osprey-
voice/osprey).

It's still very much a work in progress but it's already been working very
well for me and I'm actually using it to type out this response right now.

~~~
nshm
Why would you work with google when there are much more accurate open source
speech recognizers based on Kaldi? With that specific usecase it is very easy
to beat Google on accuracy.

~~~
lunixbochs
I think google as the primary engine results in stuff like this:
[https://github.com/osprey-voice/osprey-starter-
pack/blob/mas...](https://github.com/osprey-voice/osprey-starter-
pack/blob/master/src/text/formatters.py#L8)

On the other hand it’s probably better at general (non command) English.

~~~
cjbassi
Yeah it uses a lot of machine learning and context based inference, which is
great for dictating phrases but less so for commands.

------
DantesKite
You can solve most carpal tunnel issues by massaging your forearms with a ball
(or your knuckles).

I know it sounds crazy, but I solved a very intense bout of plantar fasciitis
by massaging my calves. Took some time, but eventually the pain went away.

And when I had carpal tunnel, I did the same (although not learning heavily on
my wrists helped a lot too).

You'll know if you're hitting the right spot because it'll hurt. A lot.

~~~
lunixbochs
* I am not a doctor, I have simply had a lot of experience with treating my own injuries and speaking to others in similar situations

This is partially good, but dangerously incomplete advice. The spectrum of RSI
is far bigger than carpal tunnel, such as cubital tunnel, tennis elbow,
tendinitis, and general neuropathy.

It can be caused by many things including various muscle groups in your
shoulders, neck, posture, past injuries, nerve adhesion, flexibility, lack of
strength.

Massage is a wonderful way to address nerve adhesion, flexibility, general
tightness, but if you have a different issue it’s not really a panacea.

My philosophy is that you should do these things in strict order if you have
symptoms:

1\. reduce use (take breaks, get hobbies that take you away from the screen at
home)

2\. vary conditions of use (standing desk, ergonomics, using more than one
kind of keyboard/mouse)

3\. work out (yoga, swimming are highly recommended)

4\. physical treatment (see a doctor, physical therapist, deep masseuse, etc)

5\. alternative input (voice, eye tracking, etc)

If you do these in a different order or skip any of them you’re probably doing
yourself a disservice.

Also. Don’t ice. Don’t take anti inflammatories or other pain meds while you
continue to work. Blood flow is important to recovery - heat can soothe your
pain similarly to as ice, but without causing further damage. And pain meds
during working hours will make you ignore the damage that is causing your
pain, allowing you to cause more damage. Bad.

I’ve also seen some negative studies on steroids and surgery, suggesting if
you get those without changing your habits, your pain will recur within a year
in 90% of cases. Do your own research, but you might as well change your
habits first and only use surgery/drugs as an extreme last resort. (Unless you
have the specific form of carpal tunnel that can be permanently fixed in a day
with a minor surgery in exchange for lower grip strength?)

------
dotasm
Wow DragonFly, that’s a name I haven’t heard in a while. I wonder if the
speech to text space will change now that ML has gained popularity.

~~~
lunixbochs
There’s recently Common Voice from Mozilla, which is a huge free English
dataset (1500 hours and growing), and wer_are_we [1] has shown really
impressive accuracy increases in published research the past few years.
Exciting times.

[1] [https://github.com/syhw/wer_are_we](https://github.com/syhw/wer_are_we)

------
mikob
I have a free Chrome Extension that let's you control the browser via voice:
[https://www.lipsurf.com](https://www.lipsurf.com)

Rest is ultimately the best way to prevent hand injuries and since I spend
most of my time in Chrome, this extension lets me do it hands free.

------
anotherevan
A good accompaniment to this is a recent presentation at Linux.conf.au this
year by Shervin Emami: Desktop Linux, without a keyboard, mouse or desk.

[https://youtu.be/3aQfwS5pyrg](https://youtu.be/3aQfwS5pyrg)

------
rijoja
How well would this work in an open office setting, especially if several
people are using it?

~~~
PaulRobinson
I suspect this setup would not be ideal as background noise getting worse
would be annoying for all concerned and even more challenging than an open
plan office already is, and your colleagues might not appreciate this unless
you're already surrounded by people talking all day (e.g. support or sales
teams on calls).

That aside, in terms of worrying about your mic picking up other people's
voices and the voice dictation getting confused, most dedicated microphones
these days (i.e. not ones that are built into your phone's headphones), are
pretty good at background noise reduction.

I've not used the one OP recommends - I'd never have considered a table based
mic like that before - but the noise reduction on the Plantronics Blackwire
3215 headset I use is so good that if I move the mic boom a few inches up or
down away from my mouth, people can't really hear me on calls. It's superb at
getting rid of background noises, and if somebody else was in my home office
using voice dictation it would not be picked up by my headset.

~~~
rijoja
Very interesting. But what about people who simply just rather use their
fingers? Would you not rather type than speak if possible?

~~~
PaulRobinson
I think I personally would, but then I would rather hear some of my colleagues
talking rather than using their mechanical keyboards in an open plan office.

I work mostly from home, so I'm interested in this as a programming method to
see how or if it changes the way I approach it as work.

------
singularity2001
That’s why I’m working on angle, a speakable dialect of python

[https://github.com/pannous/angle](https://github.com/pannous/angle)

------
AstralStorm
This article could use a better title.

Voice coding makes me think of audio compression for voice.

This is programming using voice recognition instead.

~~~
jakemal
Isn't that voice _en_ coding?

~~~
aidenn0
Speech coding actually (though a "vocoder" may be used as part of speech
coding):

[https://en.wikipedia.org/wiki/Speech_coding](https://en.wikipedia.org/wiki/Speech_coding)

------
mfrye0
I went through a similar situation and ended up using
[https://voicecode.io](https://voicecode.io) for a few years. It worked well
enough for me, but I always found it hard using it in a public setting.

Ironically, I solved my RSI issue after coming across a book on HN. Everyone
seemed to recommend the Mind Body Prescription, and after trying it, it did
the trick.

[https://www.amazon.com/Mindbody-Prescription-Healing-Body-
Pa...](https://www.amazon.com/Mindbody-Prescription-Healing-Body-
Pain/dp/0446675156)

Edit: I've been wanting to try this project for several years now:
[https://www.ctrl-labs.com](https://www.ctrl-labs.com). I think something like
this is the solution moving forward vs voice coding.

