Hacker News new | past | comments | ask | show | jobs | submit login
Picovoice – Embed private voice AI into any product instantly (picovoice.ai)
182 points by ScottWRobinson on Oct 12, 2018 | hide | past | web | favorite | 75 comments

Link to the github repo: https://github.com/picovoice/porcupine

The keyword generating application: https://github.com/Picovoice/Porcupine/tree/master/tools/opt...

The keyword generating application is free for personal use only and requires a license for commercial use, but there's no pricing available and it only provides an email address for contacting about purchasing a license.

If anyone from the team is reading this, the information above should be front and center on the landing page. I would guess that 99% of your site visitors are going to bounce on your landing page because the relevant information is buried so deeply.

This is Alireza. I am the founder of Picovoice. Thanks a lot for the information and also sharing. I 100% agree we do need to work on the website and its content. It is definitely in our TODO list. Will prioritize. In the meantime happy to answer any commercial questions via email contact@picovoice.ai

Thanks again for sharing

You could have answered the pricing question here, directly, in response to someone asking about it.

You said it needed to be addressed, and was a priority yet the question remains unanswered. It gives me the cognitive-dissonance.

I take it this is one of those "contact us for a quote" type of deals.

The reason the quote is not present is that there are just too many factors involved. What is the platform you are using (iOS, Android, ARM Cortex-M, DSP, etc)? What is the scale? Are you deploying to ten devices or ten million? Do you have specific runtime requirements? Some people have limited RAM (e.g. I only have 32KB of RAM), some limited CPU, some limited FLASH, etc. How many models do you need? Are there proper words or brand/made up names? Is it English? How much engineering support do you need?

In reality, it is a lengthy decision tree. I do not want to put that here or on the company's website as it will just maximize the confusion. But it makes sense to put the common easy cases and then ask for contact in the rest. Which is probably the path we take as it will save us some time.

That being said I challenge you to find a company who offers similar tech and have pricing on their website. I suspect what I mentioned is the reason. I could be wrong.

I see it may be a bit more complex than I had intially thought. Might I suggest a pricing table kind of thing showing some basic plans and usage and price points?

Something as simple as a few basic use cases, android app, no engineering support, under 10,000 unite, x price for example maybe.

I'm a small developer and tinkerer, so my choice to explore more on not is really price sensitive. However I do also consult with other groups, and may suggest your product as a fit if it meets other criteria I look for "private voice AI" - you've already checked a few boxes!

However, any time I see a "contact us for price inquiries" - I shut down. I know if they can't tell me the price on the page I can't afford it. At that point I don't bookmark the site, I don't research any further, and it reinforces the awesomeness of other projects I've put into my memory for use.

This is not just you, it's a lot of projects/ site on the web.

This is really good feedback. Agreed. What I have seen work is to put common cases. It probably makes our life on Picovoice side easier as answering repeating price questions is not a fun day to spend our time. We will address this soon. I promise.

I think it will work better for ya. Certainly you've run across sites that show some small access for cheap, and more access for more money, and "requests over X calls / transaction / users, call for enterprise quote" kind of thing..

I assume these businesses are mainly looking for those high volume / high price clients and or people with that kind of money to buy them out completely.. all the while making a cheap plan for people to tinker with, build an MVP and maybe scale up.. and perhaps get enough mid range sales to prove they are worth something to others..

Of course that's not the game plan for every service out there. Anyhow good luck to ya, glad to see people working on ways to do things more privately one way or another.

> But it makes sense to put the common easy cases and then ask for contact in the rest. Which is probably the path we take as it will save us some time.

That would indeed be great.

Also, if complex decision tree is your real objection, then consider putting a price range. "Depending on X, Y and other factors, the price ranges from A for minimal deployment low on Z, to B for large-scale solutions.". Or something like that.

Exact prices are always the best, but estimates for typical cases and a price range are second-best. It lets a potential user decide whether to even bother checking your service out.

Agreed. We try to put the price for easy cases. I rather to not put estimates. It probably ends up making customers unhappy.

How easy would your tech adjust to recognising sound patterns, for example the idea I have would be an intelligent baby monitor that would identify the various noises a baby makes and alert you accordingly to the babies needs from food, nappy change, distress, pain, etc.

Would that type of stuff be viable with your technology?

Though would be a way of automated nappy changing, but an automated rattle or such toy triggered by the baby may well make some tasks less impacting and equally more engaging for the baby. Though identification of needs and with that recognition via audio would be the start.

"I've soiled myself... how embarrassing!"


Sound classification is in our roadmap in 2019.

Braina has pricing, after a few clicks.

Mycroft is all FOSS

Kitt seems all FOSS too

Snips has a big CTA to 'Contact Enterprise Team'

It's clear you're trying to discover the right price and of course it's complex. Please be upfront about it, what you've said here could be copy-pasta to your site already.

I've been watching Pico for a while, it's very interesting but I'm feeling like you keep teasing more FOSS and being unclear on costs.

I want to like your project too but currently the others are doing a bit better on presentation.

Thanks for the feedback :) Hopefully, we find a way to make you like our project!

> Picovoice is a team of applied scientists and engineers who strive to build a future where our lives are enhanced with ambient voice AIs, while respecting your privacy.

Oh wow, thank you!

I also saw a reference that this is all open-source but https://github.com/Picovoice/ does not have a Picovoice repository. Is it https://github.com/Picovoice/Porcupine?

Porcupine is just wake word detection, which in itself is very useful indeed.

Not sure what the plans for open sourcing the rest of the components are though.

Hello. Thanks for the comment. We have plans to open source two more products this year. with similar licensing compared to Porcupine.

1- Speech-to-Intent: It allows you to issue complex voice commands in a specific domain and in turn returns the intent. For example, in the case of a coffee maker you can say "Please may I have a single shot espresso with no milk and two sugars". The engine returns a JSON-like object with {"product": "espresso", "milk": "no", "sugar", "two", "# shots": "2"}. It is a tightly coupled domain-specific speech recognition and NLU. It is small (less than 3MB and 8% CPU usage on RPi3) and ideal for home automation, industrial application, service industry, etc.

2- Speech-to-Text: It is large vocabulary speech recognition software that runs locally. It will support all platforms currently being supported. It allows you to do large vocabulary transcription with high accuracy locally on an embedded platform.

Are those two multi language? I can think of a lot of use cases for industrial IoT if they support Spanish and Portuguese

Stay tuned. We will add two more repositories (1) speech-to-intent (2) speech-to-text

The open source reference is this phrase: "Picovoice's speech to text repository ranks among top 10 open source machine learning projects."

It's apparently only talking about their benchmarking tool: https://github.com/Picovoice/stt-benchmark

The engine itself is called Cheetah, and appears to be a closed source, "inquire for pricing/licensing" product.

We will open source Cheetah this year with a similar licensing model as Porcupine.

I wonder how this compares to snips[1]. I recently connected snips to my lights and it works pretty flawlessly. Picovoice does look like its easier to integrate into an app though.

[1] http://snips.ai/

I am the founder of Picovoice. I am NOT going to comment on comparison as obviously I will be biased ;)

With Picovoice you can use the voice control engine to accomplish this. Maybe something similar to this demo?


The cool thing about this engine is that it is tiny. It uses less than 8% CPU on RPi3 and altogether it is less than 2MB (code, model, etc). Technically you can run it on something much smaller and cheaper than RPi.

Alternatively, the speech-to-intent engine could be a good candidate. More information on this along with an interactive demo will be released this weekend.

I would love to see some benchmark numbers on Picovoice. The small size and low CPU is definitely interesting but I'm worried that performance is hindered because of this. Also, is 8% peak usage? Having this loaded on some small IoT chip will be awesome to see.

You should check this out: https://github.com/Picovoice/wakeword-benchmark

We do work with a couple of SoC manufacturers and will disclose some of the results when our partners are ready. In general, we can run on any MCU with a C compiler and 200KB of RAM (maybe less if there is fast FLASH available). We already of models working on ARM Cortex-M and Cadence's HiFi4.

How much work do you think will an ESP32 port take?

I would love to see Picovocie on ESP32. We've looked into this as we get many requests for Picovoice on ESP32. The challenge is to find a commercial request at a reasonable scale to cover the porting effort. I suspect it should be less than a month of work on our side.

Your website says:

> Runs in real-time with only 5.6 MB of memory and 25% CPU usage on a Raspberry Pi 3.

For speech-to-text right? I provided the quote for voice control engine: https://picovoice.ai/#wake-word-detection

The voice control comes in two variations standard and tiny. The tiny one consumes even fewer resources. I provided metrics for the standard one. You can check the benchmark repo on benchmark it yourself as well :) https://github.com/Picovoice/wakeword-benchmark

Yep! I see the difference now. I'll give this a try. Thanks.

you are welcome!

So I had a quick poke around on the wake word github repo but it looks like you cant generate custom keywords for the raspberry pi. Is that correct? So you have to use the pre-built ones in the resources directory? There seem to be a lot files in there but how do I know what each one does?

I suggest doing a quick read on one of the demos (Python for example) that should clear things up. If not you can always open an issue...

Yeh I think I understand now. The filename of each file in the resources folder represents the hot-word that's detected. When you specify multiple files it'll give the index corresponding to the file and hence the word that was detected. So the problem is that since you don't support generating your own hot-words for raspberry pi, you are stuck with the small random set of words in the repo. That's kind of a huge limitation. So while I'm sure this is a great project for use with x86 and mac, it's a non-starter for me. Presumably proper support for raspberry pi is coming at some point so I'll be sure to check back.

I think whichever privacy centric voice assistant will come to market first will build an unstoppable first mover advantage. There are so many people just waiting for an alternative to the Googles/Amazons of the world.

Snips looks like a promising prospect!

So you said in your only other comment. Hacker News is a community site and it's not just to promote something, so could you please participate community-style instead?


The demo is pretty cool - the thing I liked best was this line:

"You can turn off your internet connection and it will keep working."

That's nice!

It is true! Have you tried it? :)

We will add another demo for speech-to-intent module fairly soon (over this weekend). Stay tuned!

Not yet - my work computer doesn't like being cut off from the world due to financial apps that I run, but I'll try when I'm home this weekend.

Just took a closer look at the examples and played around with some code. It works really well and a surprisingly low footprint. Basically doing exactly what is promised.

However i wonder why there is no way (visible?) to generate words for Javascript? Or at least a documentation on how the format for those byte arrays is build.

Assuming this is a licensing thing, i would really suggest to not put limit in that way. On first impression i assumed that this is useable for free for everything except commerical projects.

Came here to say this.. Had a great idea for using this to help train pilots to do the RT (i'm doing my PPL now, and this would be perfect..) The lack of JS optimizer means i can't really hack on what, could have become a nice passive income site.

Can I use this for my home automation setup? It seems the github link is only part of the product. Do you license this for individual users?

You can use this for free under Apache 2 license for a personal non-commercial use. No licensing needed. You don't need to pay us a penny as long as you don't make money off it.

The Apache 2 license allows for non-personal and commercial use...

This is my question too. I looked at Snips but they had no timeline on Windows support and were more interested in talking about their crypto coin than answering basic questions about their software.

I had pretty good luck reaching out to the developers on Discord these past couple weeks. Not sure when you tried but I recommend trying it again. I personally I think snips is way more appropriate on something embedded like a RPi. I was up and running with their new sam package within minutes. Their new update (about two months?) really made it more user friendly. Unfortunately, their windows packages is still very broken.

E: You should probably try in the late night or early morning time for the States (EST) since they are located in Europe I believe.

BTW, Picovocie runs on RPi (all variant not only RPi3).

I'm currently running snips on a RPi 2 (Model B) and it's working so far so good! Not too sure about the original Rpi though. I'm debating on getting a RPi 3 to see if the performance is better.

RPi3 is definitely faster. We also run on RPi 1/zero/etc.

What sort of CPU usage do you see? Picovoice claims 25% (on RPi3).

Where were you enquiring on? I've had a lot of help from their discord channel. You could give that a try.

I was on the Discord channel. Nobody answered my question. In fact, here was my chain of asking questions on Snips:

I went to the website, and had questions, so I tried to email them. I got a reply to the email telling me to join the Discord. I did, and asked in the Discord, and the only person who bothered to reply told me to read the website. ...I wouldn't have asked the questions if they were answered on the website.

We do support Windows!

Is this not just https://github.com/ARM-software/ML-KWS-for-MCU trained on new keywords? Maybe I'm missing what is so special here?

the project you mentioned is wake-word for ARM only. Great project, BTW. For wake word, we provide on-demand model generation. We do also more than wake word. Finally, we can run on other CPUs/OSs as well.

How about numbers or complex/unknown words? I would love to proxy everything that isn't known AFTER the offline trigger word to Google voice recognition or something. Is that possible?

Numbers and complex words (I am assuming you mean something like "ok blah"?) are doable easily. I am not sure what you mean by unknown words. Could you elaborate?

Obviously, you can just grab the audio stream after the phrase is detected and route it to whatever you like. Google ASR or even a local one running on the device!

That is exactly what I ment if I get the audio stream out as well.

I have literally 3 projects, one a current big that depend or would heavily benefit from this. Can't wait to give it a try!

Sweet! Go for it!

Is this related to picotts?

This seems really neat, but my main, blocking issue is that there is absolutely no way to add pronunciation for a wake word. Therefore, if the developers have not explicitly added the word to their vocabulary, you are completely out of luck.

If you have a commercial application we can build you the model for any wake word. Since this requires some engineering work on our side we can only offer it to commercial customers at this point.

Why not make it possible for anyone to train it instead of you having to do engineering? Is that the profit model? If so, totally valid, but I'm wondering if I'm understanding this right.

two reasons:

1- the business model 2- in some cases, it actually needs some engineering. for example a new brand name, etc.

It would be nice to support a grammar, a la CMUSphinx: https://cmusphinx.github.io/wiki/tutoriallm/#grammars

Agreed. The reason for not supporting grammar in this product is that we want to keep it extremely lightweight. We do have upcoming products that support grammar as well.

Awesome! I look forward to trying out the speech-to-intent and speech-to-text products. Will these also be Apache 2.0 licensed?

@kenarsa any plans for a Go integration?

Pretty impressive tech by the way.

Which would you vote for picovoice or snips as a voice assistant AI product? I've been meaning to do more research and maybe your comments can help gain more insights.. thanks

Doesn't work on Android chrome.

The demo is using WebAssembly which is supported by Chrome. It also uses Web Audio API which I believe is again supported by Chrome. I just used the demo on my Android phone using Chrome. I wonder what could be a problem. It the mic on? :) If yes, maybe provide me the version of chrome you are using. I will look into it.

Works in Fennec (Firefox but from f-droid), but it took a sec to load the library it seems. Maybe that's the issue?

It works. But didn't on the first or second try for me. Suddenly it just worked. Dunno

This is perfect for a Home Assistant (hass.io) integration.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact