I must confess I haven't used it yet, but snips.ai [1] is in my list of potential raspberry pi projects and claims to be open sourcing soon. On their FAQ page [2] you can find the reason why I have it in my list:
"Snips, on the other hand, runs completely on your device, with nothing being sent to the cloud. This means it guarantees your Privacy, works offline and doesn't have variable costs!"
Unfortunately this leads me to wonder whether or not the service will remain free, but for the moment it is "free for makers and for building prototypes". Commercial costs apply.
>Snips, on the other hand, runs completely on your device, with nothing being sent to the cloud. This means it guarantees your Privacy, works offline and doesn't have variable costs!
Does this mean the language model will not learn as it is used more? I feel the main reason for sending & storing data in cloud is to train the algorithm to get better at recognizing & answering the questions.
None of those are trivial problems but #4 is notable because it is often used as one of the ways Google beats Alexa. It can be boiled down to if a human simply asks for what they want, what are the odds the assistant will have an answer?
That to me is the part least likely to have a good open source alternative.
Jasper[1] is one such that popped up on HN a while back. I spent a wet weekend getting it going on a Raspberry Pi -- it was quite the effort to get all the moving pieces working together.
At the time you had the option of using AWS or Google to handle the voice, or possibly (if you have the time and knowledge) train it up to use a local service - this was gently discouraged by the documentation (it was referenced, but not well explained what it involved).
I believe you can now use Watson to offload the voice to text, too.
But in all those cases you're sending data off-site, which may be a concern. And each of those services has some usage constraints that should be enough to cover household use, but I'm not sure what happens when you start to hit those limits.
Inside the Jasper documentation, there is some discussion about the configuration of the Speech to text engine. They seem to list 5, 2 of which do not use off-site aids.
I don't know if other portions of the code might use off-site access data processing. I wish more projects would create a system block diagram of their software.
As far as I know there are no open source solutions for wake word detection. Most 'open source Alexa' projects require you to press a button to make it listen.
There is a free offline wake word detector here: https://github.com/Kitt-AI/snowboy ... ok that was closed source when I looked last! Looks like you still need to use their website for training though.
I have used Snowboy – it's open, but the models are not. You can train a voice model using three samples using the site or pay for an API (so you could integrate it into a platform). They will also build you a custom generic voice model if you pay them.
"Snips, on the other hand, runs completely on your device, with nothing being sent to the cloud. This means it guarantees your Privacy, works offline and doesn't have variable costs!"
Unfortunately this leads me to wonder whether or not the service will remain free, but for the moment it is "free for makers and for building prototypes". Commercial costs apply.
[1] https://snips.ai/
[2] https://github.com/snipsco/snips-platform-documentation/wiki...