I'm a co-founder of Snips, we are building a private-by-design Voice Assistant platform which allows companies and makers to build a smart assistant 100% on-device.
Why do we do this? We want assistants of the future to respect user privacy, and not stream your voice or your most important questions to servers that you do not control.
With Snips, 100% of what we do runs on the device (the platform ships for Raspberry Pi, more platforms are available for entreprise customers, contact@snips.ai)
We are using state-of-the-art deep-learning Automated Speech Recognition and Natural Language Understanding to allow makers to plug a voice assistant in their device in 5 minutes.
This is awesome. Every other assistant AI type product seems to be designed primarily as a pretext for invading its users' privacy. Thanks for building something that might be both useful and trustworthy at the same time!
You should take a look at what Apple is doing if you think that. Especially the developments announced last week. Going forward I would expect to see a vast number of products on that platform both from Apple, and from third party developers, that are doing on-device and privacy-focused implementations of AI applications.
Do you have a good link for technical info? I found the Wired article but all that says is:
> The HomePod uses local recognition to hear the Hey Siri command, and encrypts all your communication.
That's great as far as it goes (at least it's not streaming all audio) but it doesn't rule out the rest of the command from being sent upstream, and encryption just stops third parties from intercepting it - if I don't want the vendor listening in either, it doesn't really solve my problem.
Happy to be proven wrong by some more in-depth info from Apple though!
No need to be snide about it. Of course they aren't going to open source the entire OS. Just like Google hasn't open sourced all of Android. They always hold something back.
And you're commenting on an article where the maintainer has said that open source is "coming in the future."
I think it's safe to say they are betting the company on this, and putting plenty of money into it.
And far from being smoke and mirrors vaporware, this is hardware and software that you can hold in your hand and use and rely on, and millions of people do, every day.
I'm not sure how my comment is "snide". It's just a fact: I don't trust them.
And I don't see how Google is relevant to the conversation. They don't release source, so I'm not really interested in their products either. Just because one company is does things in a way I don't like doesn't mean I have to accept it when others do it that way.
"coming in the future" is nice, but I've heard that speech before, and I'm not holding my breath. If they are betting the company on this, then great! I'll put a lot of money into their products. But until they do, meh, I'm not interested.
And by smoke and mirrors I don't mean vapor-ware, I mean distracting from the real issues, and hiding their flaws instead of fixing them.
I really don't think I understand what you mean. Those links are some open source stuff that they've done. Great. That's not the OS, nor the apps. I want them to open source the OS and their apps.
I'm sure they are fixing security flaws, but they are still not addressing the underlying issue. I like what they say their philosophy is, but until they open source all their stuff, I have no reason to trust them.
I definitely hate them, because I really want a company like them to succeed. But they refuse to put their money where their mouth is. I'd invite you to point me to any facts that I've missed, but I don't see where I've missed any.
Let me know when you've watched their WWDC talks from this year on privacy and security. Especially privacy.
You could do previous years too and those would be complementary, not redundant, but that's quite a time commitment and a lot to ask. Just this year's would be a really great start. Then I'd love to hear what you think after that.
Yeah they won't be completely open source any time soon, that's a nice dream, lol. But imho you should open your mind to learning about what they are doing now, before being so negative and down on them.
> Let me know when you've watched their WWDC talks from this year on privacy and security. Especially privacy.
Nope, not going to. Frankly it's a waste of my time because I'm not going to buy their product. They can say nice things on a stage all they want, but until they open source it, then I really don't care. If they open-source it, then I would buy their products in a heartbeat.
You say it's a nice dream, I say it's reality. I have a laptop/desktop system that works great and is FOSS. I don't see how phones or any other computerized device is any different. None of it matters until they put their money where there mouth is. I'm only being negative towards them because they refuse to fully commit to what they preach. I find nothing wrong with that, and frankly, I don't get how other people don't have a problem with that.
They were not saying just nice things on stage, contrary to your little imagined scenario. They were saying stuff a lot of developers didn't want to hear.
Stuff like stop tracking, limit tracking, don't capture so much data, protect privacy this way, that way, do machine learning on device in such a way that data never makes it to any cloud, use storage techniques that keep the keys away from anyone but the user including Apple, etc.
A lot of developers buy into protecting privacy but to many developers they were not saying nice things on stage, as you unthinkingly imagine.
If you don't get things, and lack understanding, the solution is to seek knowledge. I suggested one good way to do that. Your ignorance is on you, not on them.
I think your missing my point: I don't care what they say. I care what they do. Actions speak louder than words. If they are encouraging privacy, cool. I can respect that. But they aren't doing privacy.
Until they do that, why would I look into them? I choose ignorance in a lot of things that waste my time. Looking into a company that won't put their money where their mouth is would also be a waste of that time.
Do you do any pre-processing on the audio input to filter out background noise? I notice that in the demo video, you're in a quiet room.
I can run PocketSphinx on Raspberry Pi fine, and the recognition is fine too - as long as there isn't too much background noise. When I say 'computer open curtains', for example, it open my curtains through my HA system; but the motors in the curtains make some noise, so I can't say 'stop' halfway through because it just won't pick it up. Another part of the problem is that audio chunks are fed to PocketSphinx by recognizing end of sentence through silence in the input; so when the curtains are still going, it doesn't recognize that the command has stopped and the audio should be passed into PocketSphinx.
We use state-of-the-art deep-learning models that are very robust to noise (much more robust than what you get with GMM using Sphinx) and we train the model on data containing noise.
This should be paired with a good microphone using microarray to locate the speech very precisely!
Our hotword solution is also robust to noise and could be used to detect a word over noise
We've been experimenting with three setups so far: the Matrix Creator, the ReSpeaker and the Conexant. We will soon publish an in-depth benchmark, as well as (highly needed!) tutorials on how to set them up so that they work optimally as audio capture devices on Raspberry.
Coming from a signal processing background, this is interesting. Would your company be interested in doing a newsletter with https://enterprisedeeplearning.github.io?
We are comparing what you get by using an alternative service like api.ai (which you need to start by feeding manually) with what you can do with our solution.
Manually feeding 70 queries per intent already takes a good deal of energy and imagination, so we thought it was representative of a good effort from a user. This is why Snips proposes to its users to automatically to use generative methods to create high-quality datasets (over 2000 queries per intent).
> Manually feeding 70 queries per intent already takes a good deal of energy and imagination
Only if you have to manually make them up. If you have access to the query stream (which I imagine api.ai etc give you since they're online platforms), it's quite quick and easy.
I can definitely appreciate integrating things together to make things easy, but this is a pretty biased comparison IMO.
It's great that you've made the data available, but the headline metric that most people will see seems misleading.
API.ai and companies which keep user data on their servers are putting user privacy at a risk, and they don't really match the performance we can have when we do datascience carefully
In the links you provide, you compare the F1 score for snips with other major competitors on 70 queries.
Then you show the F1 score for snips on 2000 queries. Obviously you see improvements and that's good. What I don't understand is why don't you show the F1 score for the other competitors on 2000 queries? If you're able to generate 2000 input from the original 70, what prevents you from feeding those to the other NLP engines? Do they all get a 93% F1 score ?
It would be great to have a hands-on tutorial on how a developer or user could set this up to control speakers or a magic mirror like this https://github.com/HannahMitt/HomeMirror :)
Yes absolutely. In fact we created a dedicated Labs team within Snips (check out http://labs.snips.ai) with that in mind: to create DIY hacks, proof-of-concepts, benchmarks and related material for the community. We are just getting started, and have published our first post on building a connected speaker: https://medium.com/snips-ai/how-to-build-a-voice-controlled-.... Stay tuned, many more are coming over the next months.
Are you going to generalize your hotword detection like snowball? Or allow me to mix and match... e.g. use snowball to wake up then use you for voice recognition / intent matching?
One of the reasons I want to build my own is fix issues I see in all the commercial ones, namely the lack of conversation back and forth & being so fixed to a single wake word. For example, when my timer is firing, I don't want to say 'alexa, turn off the timer.' I just want to say 'ok', 'stop', 'cancel timer,' etc.
We will add a way for users to record hotwords with their own voice and learn a detector on-device. You can subscribe to our newsletter to be informed when new features will come!
Thanks! We are committed to open-source and transparency and we want to open-source most of the major inference component of our platform! We started doing some open-source that you can look at on Github now
I found your github[1] which includes platform documentation[2] and various API code but I didn't see any trained weights or Tensorflow models. Some users might want to extend the model and open source could be a great way to get momentum behind your platform.
Thanks! We will release a mobile companion app that people can use to connect to their devices and we are working with partners to embed our platform on native boards!
Subscribe to our newsletter on https://snips.ai to stay tuned and see our blog for more datascience and DIY content!
Thank you! You made the first voice assistant platform I'll actually give a serious shot to. I hope your attitude towards user privacy catches on with your competition.
We will be pushing an update of the platform on Friday which solves a small bug where the ASR is taking a bit of time to start listening after the hotword is detected
But it's true! And my first language was actually Pascal and then assembly :) I had the chance of having some family working around computers at the time, and showing me how to code multiplication tables while I was learning them at school
I couldn't find a video of it in action (things like response delay are quite important to me) on the actual site, but managed to dig this up on Youtube:
Thanks, we are working on adding more content on the website, you can also subscribe to the newsletter to be informed when we add some DIY tutorials to build cool assistants!
I'm especially interested in the synergy between the voice/intent recognition and what was being displayed on the tablet. Is that part of the platform? Or something else?
Thank you both!
I'll be fervently following this. I've got a Home Assistant controlled home, this should fit in beautifully :-) I wish you luck on your enterprise offering!
This is part of the platform, which has been built to be open and extensible, we will release a mobile app which allows makers to easily interact with their device and see what the platform is recognizing when users are talking to it !
As far as I can tell, it's free for personal use, and paid if you want to build a product with it.
Seems very interesting, reminds me of wit.ai when I first started using it, but the training seems to work pretty well from some small examples (which I had trouble with using wit).
I was also hunting for pricing info, before deciding if I wanted to pursue something like this further. Is it determined on a case-by-case basis, or same for everyone? If same, any reason not to publish it somewhere on the website?
We are still evolving the exact pricing, so you should tell us about you and we will see what would be the best way to work together, we are happy about helping as many makers as we can ! contact us at contact@snips.ai
I've been struggling with alexa and google assistant, to make something useful for embedding. So much backwards and forwards with setting up infrastructure for skills.
This is exactly what I was looking for a few months ago. I wanted to build a voice activated gym assistant, mostly to help me track my sets, reps, and breaks. I was hoping to have it all on-device using a Raspberry Pi, but I didn't find anything great. It ended up being "easier" to use Google Cloud for speech recognition (easier in quotes, because dealing with Google's APIs is never easy).
Shelved that project while I was busy working on other things. This has me excited to give it another go!
The voice and confirmation sounds in the later part of the video reminded me of Siri. Is that resemblance intentional? Or is this only in the demo video to give Apple an idea of how this compares to Siri (subconsciously pushing for an acquisition)?
Cool, I have been experimenting with Google's VoiceHAT board, and the Picroft open source voice control from Mycroft which runs on Pi, both of which are fairly straightforward to set up to control real world devices. How does one do this in Snips?
Not very familiar with MQTTother then that a few micro boards I use like the ESP8266 can comeready for it. I mean just set up a voice key word to toggle a pin on the raspberry Pi to control a relay. With the VoiceHAT you just setup a word in a Python program, simple. In Picroft, you do a similar thing. Here I toggle an LED with the VoiceHAT; for example, https://plus.google.com/115226830543207487087
MQTT is a simple protocol for publishing/subscribing to events (over a network if you like). The Python clients on https://github.com/mqtt/mqtt.github.io/wiki/libraries look like they all have examples of a simple "run this python function whenever XYZ message happens" program.
I'd love to try it out, just don;t know how to doit honesty, nor how to add an MQTT handlerfor that matter. I can install and run programs on the Pi, and wire relays to pins, that's about it.
It is actually configurable, and as the platform is open and extensible it won't be too hard. For now you can have your broker send some messages on our MQTT bus, but later we will make this fully configurable
A source code release (even under a non-free license) would be much appreciated for those of us who would rather actually attempt to build this for non-RPi platforms ourselves than wait for the vendor to get around to it (if they get around to it at all).
Right now, there doesn't seem to be anything aside from the RPi build (the download page for which also requires me to login with my email address, so I didn't proceed to actually download it). That's a shame.
Last time I was at a machine learning meetup where Rand gave a speech about the features of your framework he told us that for the speech recognition part you were using Google/Apple APIs, only the NLP/NLU would be handled by you. Is it still the case? Because I don't think Google does "privacy by design" and when someone sells me a product with that brand I expect it to be 100% private, not half private.
Now we have developed our own Speech to Text component, all offline. We have only support for US English for the moment but we are working to support more languages. Feel free to try it on our web site!
I'm keenly looking forward to that, I've been looking for something like this for a while. Lucida looked nice on the surface, but once you get it running there's not a lot there.
I'm part of the "privacy and security" demographic. The thing you've got to understand about us is that we've been burned before, many times, by closed source products that are allegedly local only, but are secretly snitching behind our backs.
Market penetration may be difficult until people can see the source. If you're looking for a business model to attach to this, I recommend holding the training data in reserve as your paid product, and open sourcing the platform.
Nothing, except they never cared about this. We want to do AI and privacy right from the start, so we are committed to building AI with state-of-the-art performance which protects the users
HN won't let me respond to your comment.
When I meant hardcoded I meant hardcoded or rule based system rather than a generalized dialog manager with Knowledgeable lookup. Sounds like you're not there yet.
Why do we do this? We want assistants of the future to respect user privacy, and not stream your voice or your most important questions to servers that you do not control.
With Snips, 100% of what we do runs on the device (the platform ships for Raspberry Pi, more platforms are available for entreprise customers, contact@snips.ai)
We are using state-of-the-art deep-learning Automated Speech Recognition and Natural Language Understanding to allow makers to plug a voice assistant in their device in 5 minutes.
We are actually benchmarked our NLP and are outperforming most of the commercially available NLU providers: https://medium.com/snips-ai/benchmarking-natural-language-un...