Hacker News new | comments | show | ask | jobs | submit login
Show HN: Snips is a AI Voice Assistant platform 100% on-device and private (snips.ai)
372 points by oulipo 180 days ago | hide | past | web | favorite | 129 comments



I'm a co-founder of Snips, we are building a private-by-design Voice Assistant platform which allows companies and makers to build a smart assistant 100% on-device.

Why do we do this? We want assistants of the future to respect user privacy, and not stream your voice or your most important questions to servers that you do not control.

With Snips, 100% of what we do runs on the device (the platform ships for Raspberry Pi, more platforms are available for entreprise customers, contact@snips.ai)

We are using state-of-the-art deep-learning Automated Speech Recognition and Natural Language Understanding to allow makers to plug a voice assistant in their device in 5 minutes.

We are actually benchmarked our NLP and are outperforming most of the commercially available NLU providers: https://medium.com/snips-ai/benchmarking-natural-language-un...


This is awesome. Every other assistant AI type product seems to be designed primarily as a pretext for invading its users' privacy. Thanks for building something that might be both useful and trustworthy at the same time!


Thanks! And we take privacy very seriously, as should most companies now because the people will ask for this in the future.

No reason Google should know what you ask your Voice Assistant to buy, or to remind you of!


Ditto.

I still have no idea if I'm actually interested in voice UI, but I'd actually experiment with this one, unlike any existing implementation.

Edit - any other existing implementation.


You should take a look at what Apple is doing if you think that. Especially the developments announced last week. Going forward I would expect to see a vast number of products on that platform both from Apple, and from third party developers, that are doing on-device and privacy-focused implementations of AI applications.


Do you have a good link for technical info? I found the Wired article but all that says is:

> The HomePod uses local recognition to hear the Hey Siri command, and encrypts all your communication.

That's great as far as it goes (at least it's not streaming all audio) but it doesn't rule out the rest of the command from being sent upstream, and encryption just stops third parties from intercepting it - if I don't want the vendor listening in either, it doesn't really solve my problem.

Happy to be proven wrong by some more in-depth info from Apple though!


Without source code that seems like all smoke and mirrors to me.

I'd love it if they released it, and I'd start buying their stuff immediately. But until they put their money where there mouth is... meh


No need to be snide about it. Of course they aren't going to open source the entire OS. Just like Google hasn't open sourced all of Android. They always hold something back.

And you're commenting on an article where the maintainer has said that open source is "coming in the future."

In the meantime:

https://opensource.apple.com

https://github.com/apple

http://llvm.org

https://swift.org

https://developer.apple.com/opensource/

I think it's safe to say they are betting the company on this, and putting plenty of money into it.

And far from being smoke and mirrors vaporware, this is hardware and software that you can hold in your hand and use and rely on, and millions of people do, every day.


I'm not sure how my comment is "snide". It's just a fact: I don't trust them.

And I don't see how Google is relevant to the conversation. They don't release source, so I'm not really interested in their products either. Just because one company is does things in a way I don't like doesn't mean I have to accept it when others do it that way.

"coming in the future" is nice, but I've heard that speech before, and I'm not holding my breath. If they are betting the company on this, then great! I'll put a lot of money into their products. But until they do, meh, I'm not interested.

And by smoke and mirrors I don't mean vapor-ware, I mean distracting from the real issues, and hiding their flaws instead of fixing them.


Well, you are making assumptions from ignorance.

You don't have to hold your breath. I gave you the links. The open source is already there. Breathe.

And you seriously think Apple is not fixing flaws?

Watch their WWDC sessions. They have fixed tons of flaws.

Your hate is showing through. Maybe set aside your feelings once in a while and look at the facts.


I really don't think I understand what you mean. Those links are some open source stuff that they've done. Great. That's not the OS, nor the apps. I want them to open source the OS and their apps.

I'm sure they are fixing security flaws, but they are still not addressing the underlying issue. I like what they say their philosophy is, but until they open source all their stuff, I have no reason to trust them.

I definitely hate them, because I really want a company like them to succeed. But they refuse to put their money where their mouth is. I'd invite you to point me to any facts that I've missed, but I don't see where I've missed any.


Let me know when you've watched their WWDC talks from this year on privacy and security. Especially privacy.

You could do previous years too and those would be complementary, not redundant, but that's quite a time commitment and a lot to ask. Just this year's would be a really great start. Then I'd love to hear what you think after that.

Yeah they won't be completely open source any time soon, that's a nice dream, lol. But imho you should open your mind to learning about what they are doing now, before being so negative and down on them.


> Let me know when you've watched their WWDC talks from this year on privacy and security. Especially privacy.

Nope, not going to. Frankly it's a waste of my time because I'm not going to buy their product. They can say nice things on a stage all they want, but until they open source it, then I really don't care. If they open-source it, then I would buy their products in a heartbeat.

You say it's a nice dream, I say it's reality. I have a laptop/desktop system that works great and is FOSS. I don't see how phones or any other computerized device is any different. None of it matters until they put their money where there mouth is. I'm only being negative towards them because they refuse to fully commit to what they preach. I find nothing wrong with that, and frankly, I don't get how other people don't have a problem with that.


They were not saying just nice things on stage, contrary to your little imagined scenario. They were saying stuff a lot of developers didn't want to hear.

Stuff like stop tracking, limit tracking, don't capture so much data, protect privacy this way, that way, do machine learning on device in such a way that data never makes it to any cloud, use storage techniques that keep the keys away from anyone but the user including Apple, etc.

A lot of developers buy into protecting privacy but to many developers they were not saying nice things on stage, as you unthinkingly imagine.

If you don't get things, and lack understanding, the solution is to seek knowledge. I suggested one good way to do that. Your ignorance is on you, not on them.


I think your missing my point: I don't care what they say. I care what they do. Actions speak louder than words. If they are encouraging privacy, cool. I can respect that. But they aren't doing privacy.

Until they do that, why would I look into them? I choose ignorance in a lot of things that waste my time. Looking into a company that won't put their money where their mouth is would also be a waste of that time.


Meh. Do your homework.


What homework have I not done? What fact am I missing? I have all of the information needed to make an accurate judgment.

You've yet to give me any answer to that question, and I'd invite you to point me to anything that I have stated that is incorrect.


At Snips we are serious about privacy and happy to see so many people taking it as seriously as us!


Do you do any pre-processing on the audio input to filter out background noise? I notice that in the demo video, you're in a quiet room.

I can run PocketSphinx on Raspberry Pi fine, and the recognition is fine too - as long as there isn't too much background noise. When I say 'computer open curtains', for example, it open my curtains through my HA system; but the motors in the curtains make some noise, so I can't say 'stop' halfway through because it just won't pick it up. Another part of the problem is that audio chunks are fed to PocketSphinx by recognizing end of sentence through silence in the input; so when the curtains are still going, it doesn't recognize that the command has stopped and the audio should be passed into PocketSphinx.

So, how does Snips deal with this?


We use state-of-the-art deep-learning models that are very robust to noise (much more robust than what you get with GMM using Sphinx) and we train the model on data containing noise.

This should be paired with a good microphone using microarray to locate the speech very precisely!

Our hotword solution is also robust to noise and could be used to detect a word over noise


Are there any off-the-shelf array microphones you recommend, for pairing with Snips on the Pi?


We've been experimenting with three setups so far: the Matrix Creator, the ReSpeaker and the Conexant. We will soon publish an in-depth benchmark, as well as (highly needed!) tutorials on how to set them up so that they work optimally as audio capture devices on Raspberry.


Looking forward to it! Maybe using the VoiceHat as well?


Coming from a signal processing background, this is interesting. Would your company be interested in doing a newsletter with https://enterprisedeeplearning.github.io?

Edit: softened the sales pitch


Sure, you can send us some info at contact@snips.ai


Great, looking forward to trying this.


So, why does your marketing material compare api.ai trained on 70 examples to Snips trained on 2000 examples?


Hi!

We are comparing what you get by using an alternative service like api.ai (which you need to start by feeding manually) with what you can do with our solution.

Manually feeding 70 queries per intent already takes a good deal of energy and imagination, so we thought it was representative of a good effort from a user. This is why Snips proposes to its users to automatically to use generative methods to create high-quality datasets (over 2000 queries per intent).

All hypotheses are clarified in our blog post on performances which I encourage you to read! https://medium.com/@alicecoucke/benchmarking-natural-languag...

We even post all the underlying data and results on GitHub so people will be able to verify our claims: https://github.com/snipsco/nlu-benchmark/tree/master/2017-06...


> Manually feeding 70 queries per intent already takes a good deal of energy and imagination

Only if you have to manually make them up. If you have access to the query stream (which I imagine api.ai etc give you since they're online platforms), it's quite quick and easy.

I can definitely appreciate integrating things together to make things easy, but this is a pretty biased comparison IMO.

It's great that you've made the data available, but the headline metric that most people will see seems misleading.


API.ai and companies which keep user data on their servers are putting user privacy at a risk, and they don't really match the performance we can have when we do datascience carefully


In the links you provide, you compare the F1 score for snips with other major competitors on 70 queries.

Then you show the F1 score for snips on 2000 queries. Obviously you see improvements and that's good. What I don't understand is why don't you show the F1 score for the other competitors on 2000 queries? If you're able to generate 2000 input from the original 70, what prevents you from feeding those to the other NLP engines? Do they all get a 93% F1 score ?


Is this a homebrew speech recognition engine, or are you using something existing like sphinx/julius/kaldi?


Based on their website, it would seem to indicate TensorFlow is being used.


Tensorflow is only mentioned for NLU, no mention of it for ASR.


It would be great to have a hands-on tutorial on how a developer or user could set this up to control speakers or a magic mirror like this https://github.com/HannahMitt/HomeMirror :)


Yes absolutely. In fact we created a dedicated Labs team within Snips (check out http://labs.snips.ai) with that in mind: to create DIY hacks, proof-of-concepts, benchmarks and related material for the community. We are just getting started, and have published our first post on building a connected speaker: https://medium.com/snips-ai/how-to-build-a-voice-controlled-.... Stay tuned, many more are coming over the next months.


Are you going to generalize your hotword detection like snowball? Or allow me to mix and match... e.g. use snowball to wake up then use you for voice recognition / intent matching?

One of the reasons I want to build my own is fix issues I see in all the commercial ones, namely the lack of conversation back and forth & being so fixed to a single wake word. For example, when my timer is firing, I don't want to say 'alexa, turn off the timer.' I just want to say 'ok', 'stop', 'cancel timer,' etc.


We will add a way for users to record hotwords with their own voice and learn a detector on-device. You can subscribe to our newsletter to be informed when new features will come!


Will you publish the model model and Tensorflow source? Or is this just a binary install?


Thanks! We are committed to open-source and transparency and we want to open-source most of the major inference component of our platform! We started doing some open-source that you can look at on Github now

https://medium.com/snips-ai


I found your github[1] which includes platform documentation[2] and various API code but I didn't see any trained weights or Tensorflow models. Some users might want to extend the model and open source could be a great way to get momentum behind your platform.

[1] https://github.com/snipsco/

[2] https://github.com/snipsco/snips-platform-documentation


Are you going to be releasing a Snips app for mobile, or are you planning on leaving this up to enterprise groups to develop themselves independently?

Also do you have any hardware vendors on board who want to include Snips natively?


Thanks! We will release a mobile companion app that people can use to connect to their devices and we are working with partners to embed our platform on native boards!

Subscribe to our newsletter on https://snips.ai to stay tuned and see our blog for more datascience and DIY content!


Great news, thanks.

Already signed up!

edit: Any plans to see Snips natively in end-user devices?


Yes! we are talking to some smart speaker companies and other builders!


Even better news, thanks again


Thank you! You made the first voice assistant platform I'll actually give a serious shot to. I hope your attitude towards user privacy catches on with your competition.


That is a great benchmark. Can you give a rough idea of how your NLU works? Or some keywords to search?


We are using state-of-the-art deep-learning with Tensorflow for most of our processing


Yeah but I meant which algorithms are you using?


1) how many hotwords can you have ?

my application would need only around 20 distinct words

2) can they be non dictionary hotwords ?

say "fruznumble" ?

3) how do headset mics work. a lot of the hotness is mic arrays for roomscale commands, but how does it work with older style mics ?


For now there is a hotword shipped in the platform, but if you are an entreprise customer we can train a hotword for you.

The hotword detector uses deep-learning and requires quite a bit of data to be robust to noise and to various accents.

We are working on a version which would allow more people to build hotword robustly and fast


We will be pushing an update of the platform on Friday which solves a small bug where the ASR is taking a bit of time to start listening after the hotword is detected

The next version is smooth!


Is there potential for a MIPS build? Would be excited to try this out on an Onion Omega.


We are working on expanding the platforms, stay tuned!


Do you just magically happen to take CIFRE Thésards in NLP/Machine Learning?


Sure you can apply to https://snips.ai/jobs !


Is there a api for web?


what kind of microphone do you recommend for pi?


Many microphones can work because our ASR is robust, ideally you want a microarray with noise cancelling


Could you link to a "microarray" microphone that's available for purchase?


Here is what we've been playing with: - Matrix Creator: https://creator.matrix.one/ - ReSpeaker: https://www.seeedstudio.com/s/respeaker.html - Conexant: http://conexant.com (quite expensive, but powerful) We are also waiting for the Matrix Voice, still on Indiegogo, due late Summer: https://www.indiegogo.com/projects/matrix-voice-open-source-.... As mentioned earlier, we will soon release a benchmark of the various mic array solutions, as well as tutorials for setting them up.


The Matrix Voice seems the most affordable... for makers that is :-)


I'm extremely tired of seeing "Coding since age 8", especially on a company about page[0].

It'd be like an artist having a macaroni collage that they made during kindergarten in their portfolio.

[0] https://snips.ai/about/


But it's true! And my first language was actually Pascal and then assembly :) I had the chance of having some family working around computers at the time, and showing me how to code multiplication tables while I was learning them at school


I don't see anything wrong with either. It shows passion.


If one kid is doing macaroni-level coding while everyone else is literally gluing noodles, that's still a pretty strong predictor of future skills.


I'd argue that programming requires much more critical thinking than macaroni art


I couldn't find a video of it in action (things like response delay are quite important to me) on the actual site, but managed to dig this up on Youtube:

https://www.youtube.com/watch?v=wThoRtIeExo


@oulipo why is this video not discoverable on your site? There should be a lot more videos like it!


Thanks, we are working on adding more content on the website, you can also subscribe to the newsletter to be informed when we add some DIY tutorials to build cool assistants!


I'm especially interested in the synergy between the voice/intent recognition and what was being displayed on the tablet. Is that part of the platform? Or something else?


Every component publishes a message on the local network, so you can have other devices like an iPad capture the output


Thank you both! I'll be fervently following this. I've got a Home Assistant controlled home, this should fit in beautifully :-) I wish you luck on your enterprise offering!


This is part of the platform, which has been built to be open and extensible, we will release a mobile app which allows makers to easily interact with their device and see what the platform is recognizing when users are talking to it !


Anyone know if that is an Re-Speaker being used as the microphone array on the desk?


Yes we are using a few re-speaker microphones


@oulipo Awesome, great work. If you have any links/tips to help me get my re-speaker array going with Snips, I'd appreciate


We'll publish a blog post soon on microphones, subscribe to the blog!


Great! Glad to see someone doing IoT the right way!

What's the pricing though? I have an impression this is a paid product, but no pricing info is present.

EDIT: also, in trained_assistant.json, what does "tfidf_vectorizer_vocab" represent, and why it includes words like "nazi" and "hitler"? ;).


As far as I can tell, it's free for personal use, and paid if you want to build a product with it.

Seems very interesting, reminds me of wit.ai when I first started using it, but the training seems to work pretty well from some small examples (which I had trouble with using wit).


Based on https://en.m.wikipedia.org/wiki/Tf–idf , I'd tend to surmise that whatever corpus was used for training included those words.


The product is free for personal use, and paid when you are selling devices


I was also hunting for pricing info, before deciding if I wanted to pursue something like this further. Is it determined on a case-by-case basis, or same for everyone? If same, any reason not to publish it somewhere on the website?


We are still evolving the exact pricing, so you should tell us about you and we will see what would be the best way to work together, we are happy about helping as many makers as we can ! contact us at contact@snips.ai


The product is free for makers, and paid for commercial usage


Ooo, the intent interface is brilliant

I've been struggling with alexa and google assistant, to make something useful for embedding. So much backwards and forwards with setting up infrastructure for skills.

This is smashing.


This is exactly what I was looking for a few months ago. I wanted to build a voice activated gym assistant, mostly to help me track my sets, reps, and breaks. I was hoping to have it all on-device using a Raspberry Pi, but I didn't find anything great. It ended up being "easier" to use Google Cloud for speech recognition (easier in quotes, because dealing with Google's APIs is never easy).

Shelved that project while I was busy working on other things. This has me excited to give it another go!


Ahh. Finally a realistic chance of cleanly adding "Tea. Earl Grey. Hot." to my benchtop drink dispenser project.

The future is so cool. 1990's childhood me approves greatly.


We'd love you to build the best Tea Serving intent on the platform, play with it !


The voice and confirmation sounds in the later part of the video reminded me of Siri. Is that resemblance intentional? Or is this only in the demo video to give Apple an idea of how this compares to Siri (subconsciously pushing for an acquisition)?

Edit: I’m referencing the demo video. https://www.youtube.com/watch?v=wThoRtIeExo


We used an open-source library of sounds :)


glad to see snips finally releasing a product after all those years...


Cool, I have been experimenting with Google's VoiceHAT board, and the Picroft open source voice control from Mycroft which runs on Pi, both of which are fairly straightforward to set up to control real world devices. How does one do this in Snips?


You can code any handler you want using your favorite language, the output of the Natural Language Understanding is sent on a MQTT bus

you need to add a MQTT listener in your code (there are many libraries in many languages), you can see an example on the documentation https://github.com/snipsco/snips-platform-documentation/wiki...


Not very familiar with MQTTother then that a few micro boards I use like the ESP8266 can comeready for it. I mean just set up a voice key word to toggle a pin on the raspberry Pi to control a relay. With the VoiceHAT you just setup a word in a Python program, simple. In Picroft, you do a similar thing. Here I toggle an LED with the VoiceHAT; for example, https://plus.google.com/115226830543207487087


MQTT is a simple protocol for publishing/subscribing to events (over a network if you like). The Python clients on https://github.com/mqtt/mqtt.github.io/wiki/libraries look like they all have examples of a simple "run this python function whenever XYZ message happens" program.


It is the same with our platform, you can bind to the "hotword detected" message and run the code you want


I'd love to try it out, just don;t know how to doit honesty, nor how to add an MQTT handlerfor that matter. I can install and run programs on the Pi, and wire relays to pins, that's about it.


Take a look at https://github.com/snipsco/snips-platform-documentation/tree..., it is as simple as

pip install paho-mqtt

and writing a dozen line of Python!


I already run an MQTT broker for home automation stuff. How hard would it be to tell Snips to talk to that?


It is actually configurable, and as the platform is open and extensible it won't be too hard. For now you can have your broker send some messages on our MQTT bus, but later we will make this fully configurable


I've been trying to get a VoiceHAT. Any pointers on buying one without the magazine?


That is a tough one. It was difficult for me to even get the outsput schematic on the voice HAT to control hardware. Working with a guy named Shabazz eventually figured out a presumptive mosFET circuit google must be using. The best teardown guide I have seen is here: http://hackaday.com/2017/05/30/diy-google-aiy/ with some more history on the outputs here: https://www.element14.com/community/community/raspberry-pi/b...


A source code release (even under a non-free license) would be much appreciated for those of us who would rather actually attempt to build this for non-RPi platforms ourselves than wait for the vendor to get around to it (if they get around to it at all).

Right now, there doesn't seem to be anything aside from the RPi build (the download page for which also requires me to login with my email address, so I didn't proceed to actually download it). That's a shame.


Last time I was at a machine learning meetup where Rand gave a speech about the features of your framework he told us that for the speech recognition part you were using Google/Apple APIs, only the NLP/NLU would be handled by you. Is it still the case? Because I don't think Google does "privacy by design" and when someone sells me a product with that brand I expect it to be 100% private, not half private.


Now we have developed our own Speech to Text component, all offline. We have only support for US English for the moment but we are working to support more languages. Feel free to try it on our web site!


Maybe I'm missing the obvious, but where's the source? I don't see a link anywhere?


We will be open sourcing it gradually over the year


I'm keenly looking forward to that, I've been looking for something like this for a while. Lucida looked nice on the surface, but once you get it running there's not a lot there.

I'm part of the "privacy and security" demographic. The thing you've got to understand about us is that we've been burned before, many times, by closed source products that are allegedly local only, but are secretly snitching behind our backs.

Market penetration may be difficult until people can see the source. If you're looking for a business model to attach to this, I recommend holding the training data in reserve as your paid product, and open sourcing the platform.


Cool. Is there a way I can get notified once it is fully open?

I'm guessing there's not a large demographic of people who want privacy-focused software, but don't care if it's closed source.


You can subscribe to the newsletter and get information about cool DIY posts on our blog and the new features!


Doesn't appear that the product is open source, though some building blocks are. https://github.com/snipsco


What is preventing the bigger players from having on device voice assistants as well?


The fact that they perform worse, given they have constraints like size of memory/hard disk and speed of processing to worry about.

The only real argument for on-device is privacy, which most people don't care for


You can't make endless money off something people can buy. Don't worry, as long as you pay rent the cloudlords are great.


Nothing, except they never cared about this. We want to do AI and privacy right from the start, so we are committed to building AI with state-of-the-art performance which protects the users


Are there plans to extend this to a multi-turn (goal based) agent?

It's nice to have single turn interactions for turning the lights on. But not so nice for Eg: booking a complex flight, or navigating unknown options.


Yes! there is a first version of dialog, and the next version will allow multiple turns and customizable dialog!


How are you building out multi-turn/goal based approaches? hardcoding a rule engine, or something clever?


The first version is hardcoded, the next version will allow custom behavior designed by the users

Over time we want to add full dialog capacity to the AI


HN won't let me respond to your comment. When I meant hardcoded I meant hardcoded or rule based system rather than a generalized dialog manager with Knowledgeable lookup. Sounds like you're not there yet.


Great! I've been thinking for a while we need compelling examples of offline functionality and along this comes.


We all want this. Guess since I'm on parental leave it's my duty to try it out.


  Snips is an AI
not

  a AI


Thanks!


i wait japanese language support


iOS support?


We are working to add support for various platforms, including iOS & macOS sure! Stay tune on our site.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: