First is dialect. One would think the Internet/youtube/videochat would vaporize geography differences, and it's starting to happen, but that's not (yet) the case. Lots of signs have "synonyms" based on signer's preference, etymology, who they learned from and who they hang out with, etc.
Another is grammar. What OP is doing in his video, for example is English grammar, where signs are directly substitited for words. But ASL has its own sign order, modifiers like facial expressions, and idioms for brevity. For example "have you ever been to San Francisco before?" might be "SF TOUCH YOU FINISH" with raised eyebrows at the end, as a modifier. Note also there are no conjugations or articles like some languages, but there are pronouns. In fact, there are local bindings where you make up a sign name for someone on the fly and then use it during a conversation.
I think with large enough training sets, this will all be mitigated, like Google needed years of speech samples to get Translate working okay.
That said, I think the OP is still on the right track here:
> I put it together so you can train it on your own set of word and sign/gesture combos.
A Deaf person should be able to train the system on each command they want so that it works for them automatically in the dialect they want.
> there are local bindings where you make up a sign name for someone on the fly and then use it during a conversation
My very favourite thing about ASL, having learned programming beforehand, was that I could assign people to variables/registers in space. "John point at spot to my left said to Susan spot on right bla bla bla" and then later be able to just point at my left and everyone knows I mean John. And then if John moves to the right, I've just moved him into that register and can reassign the left one to someone or something else. And with a group of experienced Signers, everyone just comprehends this perfectly.
There's also this thing where you introduce a character, then make up a "temporary name sign" on the fly. Don't know what that's called either :)
Hand pose tracking is becoming increasingly viable, both with rapidly advancing ML of camera video, and perhaps with VR gloves. So as someone interested in expert UIs for programming inside VR/AR, I ask...
Any suggestions for ASL linguistics resources to mine for non-novice UI idioms/vocabularies/grammars/etc?
Your "3-space as namespace" as one example.
Gaze tracking will similarly be available. Facial expression, at least while wearing HMDs, regrettably not so much (despite prototypes). Is there signing experience with hand-held objects? - fast fine-control "fiddling with a pencil" is available with sub-millimeter few-degrees precision 6DOF, but it's unclear what vocabulary/grammar to use with it.
Big picture: VR/AR seems an opportunity to leverage accumulated insights from signing.
 eg, https://github.com/xinghaochen/awesome-hand-pose-estimation
This page (https://signly.co/, for a sign app that gives prerecorded interpretation of public info) has an example of BSL vs English grammar - see "Show Example".
It appears the user is doing an ASL form of Signed English.
An example might be the sign RAIN, which could mean anything from drizzle to monsoon depending on how energetic and exagerated the motion. Differentiating which would would need some context, baselining how animated the signer is in the conversation. If you wrote that down you might need RAIN... for one and RAIN!!!!! for the other.
Web Demo: https://shekit.github.io/alexa-sign-language-translator/
Does ASL-to-text already exist outside of voice assistants, or is that part totally new?
I think BSL uses a variant of Stokoe notation.
Presumably the system of the OP could be modified to recognise dance/skating/snowboarding/martial arts moves.
It seems that once you have an ability to model every aspect of body position from video (as there doing for enhanced love-action sequences in movies now) that this sort of thing becomes much easier.
Most of my experience in the deaf community has been at trivia nights where the signing is incredibly fast (to me) and grammatically loose (just like everyone that's voicing at the bar).
In the end I think it's an invaluable avenue to explore, the tech is important no matter what the end device looks like. Great work.
plus I don't feel comfortable that there is a silent listening device lurking 24x7 for all family members, so it is not just they're useless, but I intentionally do not want to have them on at all 99% of the time.
the only amazon device I use is FireTV for netflix and Amazon prime video, all amazon ereaders, echo, alexa are replaced by my cellphone...
to that extent, i feel amazon stock price is too high, too much hype for the alexa/echo/kindles at least in my opinion
They are listening for the wake word. If they were streaming this data anywhere, people would notice. Heck, I would notice as I keep a close eye on them.
I find it amusing that you are not ok with Echo, but you are ok with a cellphone, which not only is listening all the time too ("Ok Google", "Hey Siri") but also has your location information, emails, photos, all sort of messages AND has its own backchannel to upload this information (the cellular data network), which is much more difficult to keep track of.
My Echos are not useless, when you add some home automation they become much more useful. They are still pretty "dumb" though.
I cook a lot and I find it a lot easier to just say "Alexa set a chicken timer for 15 minutes" or "Alexa add black pepper" than it is to get out my phone and do those things.