Why do we do this? We want assistants of the future to respect user privacy, and not stream your voice or your most important questions to servers that you do not control.
With Snips, 100% of what we do runs on the device (the platform ships for Raspberry Pi, more platforms are available for entreprise customers, firstname.lastname@example.org)
We are using state-of-the-art deep-learning Automated Speech Recognition and Natural Language Understanding to allow makers to plug a voice assistant in their device in 5 minutes.
We are actually benchmarked our NLP and are outperforming most of the commercially available NLU providers: https://medium.com/snips-ai/benchmarking-natural-language-un...
No reason Google should know what you ask your Voice Assistant to buy, or to remind you of!
I still have no idea if I'm actually interested in voice UI, but I'd actually experiment with this one, unlike any existing implementation.
Edit - any other existing implementation.
> The HomePod uses local recognition to hear the Hey Siri command, and encrypts all your communication.
That's great as far as it goes (at least it's not streaming all audio) but it doesn't rule out the rest of the command from being sent upstream, and encryption just stops third parties from intercepting it - if I don't want the vendor listening in either, it doesn't really solve my problem.
Happy to be proven wrong by some more in-depth info from Apple though!
I'd love it if they released it, and I'd start buying their stuff immediately. But until they put their money where there mouth is... meh
And you're commenting on an article where the maintainer has said that open source is "coming in the future."
In the meantime:
I think it's safe to say they are betting the company on this, and putting plenty of money into it.
And far from being smoke and mirrors vaporware, this is hardware and software that you can hold in your hand and use and rely on, and millions of people do, every day.
And I don't see how Google is relevant to the conversation. They don't release source, so I'm not really interested in their products either. Just because one company is does things in a way I don't like doesn't mean I have to accept it when others do it that way.
"coming in the future" is nice, but I've heard that speech before, and I'm not holding my breath. If they are betting the company on this, then great! I'll put a lot of money into their products. But until they do, meh, I'm not interested.
And by smoke and mirrors I don't mean vapor-ware, I mean distracting from the real issues, and hiding their flaws instead of fixing them.
You don't have to hold your breath. I gave you the links. The open source is already there. Breathe.
And you seriously think Apple is not fixing flaws?
Watch their WWDC sessions. They have fixed tons of flaws.
Your hate is showing through. Maybe set aside your feelings once in a while and look at the facts.
I'm sure they are fixing security flaws, but they are still not addressing the underlying issue. I like what they say their philosophy is, but until they open source all their stuff, I have no reason to trust them.
I definitely hate them, because I really want a company like them to succeed. But they refuse to put their money where their mouth is. I'd invite you to point me to any facts that I've missed, but I don't see where I've missed any.
You could do previous years too and those would be complementary, not redundant, but that's quite a time commitment and a lot to ask. Just this year's would be a really great start. Then I'd love to hear what you think after that.
Yeah they won't be completely open source any time soon, that's a nice dream, lol. But imho you should open your mind to learning about what they are doing now, before being so negative and down on them.
Nope, not going to. Frankly it's a waste of my time because I'm not going to buy their product. They can say nice things on a stage all they want, but until they open source it, then I really don't care. If they open-source it, then I would buy their products in a heartbeat.
You say it's a nice dream, I say it's reality. I have a laptop/desktop system that works great and is FOSS. I don't see how phones or any other computerized device is any different. None of it matters until they put their money where there mouth is. I'm only being negative towards them because they refuse to fully commit to what they preach. I find nothing wrong with that, and frankly, I don't get how other people don't have a problem with that.
Stuff like stop tracking, limit tracking, don't capture so much data, protect privacy this way, that way, do machine learning on device in such a way that data never makes it to any cloud, use storage techniques that keep the keys away from anyone but the user including Apple, etc.
A lot of developers buy into protecting privacy but to many developers they were not saying nice things on stage, as you unthinkingly imagine.
If you don't get things, and lack understanding, the solution is to seek knowledge. I suggested one good way to do that. Your ignorance is on you, not on them.
Until they do that, why would I look into them? I choose ignorance in a lot of things that waste my time. Looking into a company that won't put their money where their mouth is would also be a waste of that time.
You've yet to give me any answer to that question, and I'd invite you to point me to anything that I have stated that is incorrect.
I can run PocketSphinx on Raspberry Pi fine, and the recognition is fine too - as long as there isn't too much background noise. When I say 'computer open curtains', for example, it open my curtains through my HA system; but the motors in the curtains make some noise, so I can't say 'stop' halfway through because it just won't pick it up. Another part of the problem is that audio chunks are fed to PocketSphinx by recognizing end of sentence through silence in the input; so when the curtains are still going, it doesn't recognize that the command has stopped and the audio should be passed into PocketSphinx.
So, how does Snips deal with this?
This should be paired with a good microphone using microarray to locate the speech very precisely!
Our hotword solution is also robust to noise and could be used to detect a word over noise
Edit: softened the sales pitch
We are comparing what you get by using an alternative service like api.ai (which you need to start by feeding manually) with what you can do with our solution.
Manually feeding 70 queries per intent already takes a good deal of energy and imagination, so we thought it was representative of a good effort from a user. This is why Snips proposes to its users to automatically to use generative methods to create high-quality datasets (over 2000 queries per intent).
All hypotheses are clarified in our blog post on performances which I encourage you to read! https://medium.com/@alicecoucke/benchmarking-natural-languag...
We even post all the underlying data and results on GitHub so people will be able to verify our claims: https://github.com/snipsco/nlu-benchmark/tree/master/2017-06...
Only if you have to manually make them up. If you have access to the query stream (which I imagine api.ai etc give you since they're online platforms), it's quite quick and easy.
I can definitely appreciate integrating things together to make things easy, but this is a pretty biased comparison IMO.
It's great that you've made the data available, but the headline metric that most people will see seems misleading.
Then you show the F1 score for snips on 2000 queries. Obviously you see improvements and that's good. What I don't understand is why don't you show the F1 score for the other competitors on 2000 queries? If you're able to generate 2000 input from the original 70, what prevents you from feeding those to the other NLP engines? Do they all get a 93% F1 score ?
One of the reasons I want to build my own is fix issues I see in all the commercial ones, namely the lack of conversation back and forth & being so fixed to a single wake word. For example, when my timer is firing, I don't want to say 'alexa, turn off the timer.' I just want to say 'ok', 'stop', 'cancel timer,' etc.
Also do you have any hardware vendors on board who want to include Snips natively?
Subscribe to our newsletter on https://snips.ai to stay tuned and see our blog for more datascience and DIY content!
Already signed up!
edit: Any plans to see Snips natively in end-user devices?
my application would need only around 20 distinct words
2) can they be non dictionary hotwords ?
say "fruznumble" ?
3) how do headset mics work. a lot of the hotness is mic arrays for roomscale commands, but how does it work with older style mics ?
The hotword detector uses deep-learning and requires quite a bit of data to be robust to noise and to various accents.
We are working on a version which would allow more people to build hotword robustly and fast
The next version is smooth!
It'd be like an artist having a macaroni collage that they made during kindergarten in their portfolio.
What's the pricing though? I have an impression this is a paid product, but no pricing info is present.
EDIT: also, in trained_assistant.json, what does "tfidf_vectorizer_vocab" represent, and why it includes words like "nazi" and "hitler"? ;).
Seems very interesting, reminds me of wit.ai when I first started using it, but the training seems to work pretty well from some small examples (which I had trouble with using wit).
I've been struggling with alexa and google assistant, to make something useful for embedding. So much backwards and forwards with setting up infrastructure for skills.
This is smashing.
Shelved that project while I was busy working on other things. This has me excited to give it another go!
The future is so cool. 1990's childhood me approves greatly.
Edit: I’m referencing the demo video. https://www.youtube.com/watch?v=wThoRtIeExo
you need to add a MQTT listener in your code (there are many libraries in many languages), you can see an example on the documentation https://github.com/snipsco/snips-platform-documentation/wiki...
pip install paho-mqtt
and writing a dozen line of Python!
Right now, there doesn't seem to be anything aside from the RPi build (the download page for which also requires me to login with my email address, so I didn't proceed to actually download it). That's a shame.
I'm part of the "privacy and security" demographic. The thing you've got to understand about us is that we've been burned before, many times, by closed source products that are allegedly local only, but are secretly snitching behind our backs.
Market penetration may be difficult until people can see the source. If you're looking for a business model to attach to this, I recommend holding the training data in reserve as your paid product, and open sourcing the platform.
I'm guessing there's not a large demographic of people who want privacy-focused software, but don't care if it's closed source.
The only real argument for on-device is privacy, which most people don't care for
It's nice to have single turn interactions for turning the lights on. But not so nice for Eg: booking a complex flight, or navigating unknown options.
Over time we want to add full dialog capacity to the AI
Snips is an AI