- being able to spell out a word that is not understood (mainly for proper names)
- teach new commands vocally (hey assistant, when I say “hit it”, actually play that song and turn the lights of the living room on)
- being able to understand words in a different language as part of the sentence.
That one makes Alexa quite a pain to use in some countries. For example, if Alexa is set to French and you wanna listen to an English song, you then need to make sure the title of the song is pronounced with a french accent, otherwise it won’t get it. The same is true if Alexa is set to English and you ask for a French or Chinese song title.
It makes it so frustrating that it’s unusable.
This applies the other way around to - please, some satnav app, give me the option of the directions reading out as "take exit 52 Bravo" instead of "take exit 52bee". (To me, 52bee and 53 sound way too similar, and require a screen glance to disambiguate!) I would switch apps for that feature, so long as the navigation was at least okay and it still read out street names.
https://yingtongli.me/blog/2020/01/25/osmand.html does it to customize something else.
I speak Norwegian. There are about five million language users. The Norwegian median wage is perhaps the highest in the world, possibly the top fifth, so there is a lot of pull to have Norwegian users. But only a few voice assistants have Norwegian interfaces (Siri was the first, Google may have joined the crowd). I work in IT, and making a conversational interface is rarely on the table; never if you exclude chat bots.
To get an idea of the work involved, I know the guy who translated Siri through friends. Allegedly, that was at least six months of a single person working. I don't know if he only did the translation or if he also did a lot of AI work, but that's a lot in either case.
The result isn't too impressive either. Siri has problems understanding my girlfriend because of her Bergen dialect, which differs from my Oslo dialect mainly in how the R sound is pronounced.
My two cents.
For a language like Norwegian, you are unlikely to find so much labeled and unlabeled data without forking out a large amount of money to literally ask some Norwegians to help. I suspect the work is being done, but it's extremely time consuming and researchers are likely spending time building models that can learn high-quality high-level representations of language that can be transferred to different scripts/languages with a small amount of data.
If your language isn't supported by the platform, you're just out of luck. The best you can do is request your platform to translate it, but that is - like vages described - a lot of work and will take time.
(On a more serious note, maybe some kind of throat mic?)