
Writing and coding by voice with Talon - blakewatson
https://www.blakewatson.com/journal/writing-and-coding-by-voice-with-talon/
======
dwiel
I use talon (and previously voicecode) for nearly all interactions with my
computer for the past few years. Talon+dragon is definitely the state of the
art and is quickly improving. Controlling the computer by voice has completely
changed my understanding of user interfaces.

One difference between voice vs keyboard/mouse is that the number of
'commands' that can be in scope at any moment is much much larger. With a
mouse you must sacrafice screen space for each command, for keyboard shortcuts
the user must memorize often somewhat arbitrary key combinations for every new
app/command.

With voice, there are still descoverability and some memorization issues, but
they feel a lot different. I can remember the command "rename variable" much
more easily than "command-alt-r", or was it "command-shift-r"? These commands
can also be common across editors, reducing the friction of trying different
editors.

One other observation is how I now think of apps more as (often poorly
defined) sets of APIs whose inputs are keyboard/mouse events. I really wish
more apps had a cleaner method of exposing this API. Some apps have a command
pallet or something similar where a keyboard shortcut brings up a auto
completed list of all possible actions available, which is still clumsy, but
workable. Atom and jupyter notebooks both have this. Something more direct
would be very nice.

------
geewee
Talon is looking extremely promising, with its zoom mouse that works via gaze-
tracking. Generally coding by voice is a graveyard of abandoned projects. I
wrote an overview of the field last year, if anyone's interested in
alternatives:

[https://medium.com/bambuu/state-of-voice-
coding-2017-3d2ff41...](https://medium.com/bambuu/state-of-voice-
coding-2017-3d2ff41c5015)

~~~
lunixbochs
I think of Talon more as a tool for building interactions like the zoom mouse
easily, than the sum of its current features. Zoom mouse took me about a day
at a user’s request (someone who had trouble using their neck for head
movement to fine tune the cursor).

------
melling
I have a collection of Programming by Voice links here:

[https://github.com/melling/ErgonomicNotes/blob/master/progra...](https://github.com/melling/ErgonomicNotes/blob/master/programming_by_voice.org)

------
CodeXs
Talon is promising but is not open source at its core nor does Nuance has drop
support for Dragon Naturally Speaking on OSX.

Caster project seems up-to-date and Dragonfly as well with a new fork that's
actively integrating other speech recognition engines.

Combined with Aenea I can dictate code on Linux

~~~
lunixbochs
Hi, I’m the sole developer behind Talon. It’s not open source for several
reasons, one of which is that I work on it full time, live on my savings, and
give it away for free (because the best thing I can get out of releasing Talon
is reducing hand pain in others at scale, not money). It has eye/head
tracking, noise recognition, and an extremely advanced scripting engine, so
it’s not just a speech recognition project. It also might come to Linux sooner
than you think.

It doesn’t depend on Dragon at all, and Nuance dropping support for Dragon 6
isn’t a huge problem yet as long as you can still buy it (Talon has a builtin
speech engine, and also fixes/works around most of the long-standing problems
in Dragon Mac, and I’m probably more able to help with problems than their
support). Integrating more engines is on the roadmap, but slightly after multi
platform support (since there’s already a good free engine on Mac).

Let me know if you have any questions or if there’s anything I can do to make
Talon work better for your use case.

It’s also worth reading back through my HN comments as I’ve talked in detail
about Talon there.

~~~
melling
How good is Talon’s speech recognition? It would be nice to have a better
native Mac solution.

~~~
lunixbochs
Well, even when using it with Dragon, Talon only uses the speech recognition
part. So you get Dragon’s recognition accuracy but don’t need to worry about
most of the bugs.

The builtin engine is a much nicer experience than Dragon (very fast startup,
no config), but simply isn’t as good at recognizing English, and there’s also
a weird behavior in it I haven’t worked around yet that makes it harder to mix
English with commands (it greedily prioritizes commands sometimes). The
accuracy is quite good for American accents at least. I found it to be
consistently more accurate than wav2letter (haven’t tried ++) and DeepSpeech
with some simple test audio.

------
inciampati
Is there a similar system available for Linux?

~~~
shakna
The state of voice on Linux is awful.

I've got something close to the Talon control, without eye tracking, I'm
working on now and then, using the Google API for voice interpret, because
CMUSphinx was awful (50% accuracy per word, whereas Google was closer to 90%).

I'm hoping that Mozilla Voice when it comes will finally solve this or make it
easy to build a decent control system.

~~~
eggie
Does this mean that Talon is based on private APIs provided in OSX?

~~~
lunixbochs
It’s not locked to macOS at all. I plan to support win/lin/mac equally, Talon
has its own grammar compiler and engine-independent word parser. The main hold
up for porting is all of the interaction APIs like key simulation, drawing
overlays on the screen, that sort of thing.

------
j88439h84
Hi linuxbochs, I'd love to contribute to a project like this, is there a plan
to open source it in the future? Thanks!

~~~
lunixbochs
The “core” is not open source, which is mostly low-level platform integration
code. Talon is basically a pile of APIs built around a Python scripting
engine. My goal is to make all the user facing features either completely
defined in open source user scripts/plugins or fully scriptable/configurable.
You can already contribute to the user scripts. Someone could even have a
project like Caster that supports using the Talon APIs but is itself fully
open source.

I have open-sourced over 100 of my projects. My current decision is to not
open-source Talon at least as long as it is my full-time job for no real
salary. I’m putting a lot of work into it and giving it away for free, this is
what you get.

------
jDawg21
Another real user here, I loved talon. I just started using the community
repository which gives you a lot of functionality. I'm currently finding that
code snippets plus talon is the optimal combination for people who cannot
type.

------
malloryerik
It'd be great to have preconfigured rule sets for various programming
languages.

------
tomcam
Really neat project. I hope you decide to commercialize it.

------
swlkr
I noticed the keyword for dash-separated-words is spinal, I always thought it
was called kebab case. I guess it's different for different people

~~~
lunixbochs
I think spinal was the command used by voicecode, and the community repo was
put together by ex-voicecode users. You can change a command easily by editing
the script that defines it anyway (changes take effect immediately without a
restart).

------
rijoja
Any actual users around to testify about this?

~~~
porpoisely
I was wondering about testimonials myself. It sounds wonderful for people with
physical impairments who can't use a keyboard or mouse. But in terms of
productivity or ease of use for most developers, are there any benefits to
using voice over a keyboard and mouse?

~~~
lunixbochs
This is an important question for me to be able to answer, because if I can’t
convince you it’s worth splitting your time between alt input (like voice and
eye tracking) and keyboard/mouse, I can’t reduce your chance of developing
RSI.

There’s the fear approach (I personally believe >50% of people who use
computers full-time will develop at least minor hand injuries), but I don’t
think fear is enough on its own.

The most promising thing I think is workflow improvements. For voice, the
ability to issue commands like “next song” while typing feels _amazing_. You
can also be more specific about many commands, like “focus chrome” is nicer
than mashing cmd-tab a bunch or binding a key to each app. For eye tracking, I
think autoscrolling text as you read and jumping your mouse to the right spot
when you look at your second monitor are two big ones.

~~~
dwiel
Similarly to "next song", I enjoy using it for things I dont use often enough
to justify memorizing a shortcut or short phrase. "Connect to vpn", "move
window desk 5", "rename variable", "activate virtual environment", etc.
Similar to creating aliases in the terminal, but for more than just the
terminal and with easy to remember spoken phrases instead of terse
abbreviations: `lsa -> ls -al`.

That said, once you are proficient, writing code can definitely be as fast or
faster than skilled keyboard.

