
Picovoice – Embed private voice AI into any product instantly - ScottWRobinson
https://picovoice.ai/#voice-control-demo
======
pault
Link to the github repo:
[https://github.com/picovoice/porcupine](https://github.com/picovoice/porcupine)

The keyword generating application:
[https://github.com/Picovoice/Porcupine/tree/master/tools/opt...](https://github.com/Picovoice/Porcupine/tree/master/tools/optimizer)

The keyword generating application is free for personal use only and requires
a license for commercial use, but there's no pricing available and it only
provides an email address for contacting about purchasing a license.

If anyone from the team is reading this, the information above should be front
and center on the landing page. I would guess that 99% of your site visitors
are going to bounce on your landing page because the relevant information is
buried so deeply.

~~~
kenarsa
This is Alireza. I am the founder of Picovoice. Thanks a lot for the
information and also sharing. I 100% agree we do need to work on the website
and its content. It is definitely in our TODO list. Will prioritize. In the
meantime happy to answer any commercial questions via email
contact@picovoice.ai

Thanks again for sharing

~~~
edoceo
You could have answered the pricing question here, directly, in response to
someone asking about it.

You said it needed to be addressed, and was a priority yet the question
remains unanswered. It gives me the cognitive-dissonance.

~~~
RussianCow
I take it this is one of those "contact us for a quote" type of deals.

~~~
kenarsa
The reason the quote is not present is that there are just too many factors
involved. What is the platform you are using (iOS, Android, ARM Cortex-M, DSP,
etc)? What is the scale? Are you deploying to ten devices or ten million? Do
you have specific runtime requirements? Some people have limited RAM (e.g. I
only have 32KB of RAM), some limited CPU, some limited FLASH, etc. How many
models do you need? Are there proper words or brand/made up names? Is it
English? How much engineering support do you need?

In reality, it is a lengthy decision tree. I do not want to put that here or
on the company's website as it will just maximize the confusion. But it makes
sense to put the common easy cases and then ask for contact in the rest. Which
is probably the path we take as it will save us some time.

That being said I challenge you to find a company who offers similar tech and
have pricing on their website. I suspect what I mentioned is the reason. I
could be wrong.

~~~
Zenst
How easy would your tech adjust to recognising sound patterns, for example the
idea I have would be an intelligent baby monitor that would identify the
various noises a baby makes and alert you accordingly to the babies needs from
food, nappy change, distress, pain, etc.

Would that type of stuff be viable with your technology?

Though would be a way of automated nappy changing, but an automated rattle or
such toy triggered by the baby may well make some tasks less impacting and
equally more engaging for the baby. Though identification of needs and with
that recognition via audio would be the start.

~~~
barbecue_sauce
"I've soiled myself... how embarrassing!"

[https://simpsonswiki.com/wiki/Baby_translator](https://simpsonswiki.com/wiki/Baby_translator)

------
anc84
> Picovoice is a team of applied scientists and engineers who strive to build
> a future where our lives are enhanced with ambient voice AIs, while
> respecting your privacy.

Oh wow, thank you!

I also saw a reference that this is all open-source but
[https://github.com/Picovoice/](https://github.com/Picovoice/) does not have a
Picovoice repository. Is it
[https://github.com/Picovoice/Porcupine](https://github.com/Picovoice/Porcupine)?

~~~
xf86alsa
Porcupine is just wake word detection, which in itself is very useful indeed.

Not sure what the plans for open sourcing the rest of the components are
though.

~~~
kenarsa
Hello. Thanks for the comment. We have plans to open source two more products
this year. with similar licensing compared to Porcupine.

1- Speech-to-Intent: It allows you to issue complex voice commands in a
specific domain and in turn returns the intent. For example, in the case of a
coffee maker you can say "Please may I have a single shot espresso with no
milk and two sugars". The engine returns a JSON-like object with {"product":
"espresso", "milk": "no", "sugar", "two", "# shots": "2"}. It is a tightly
coupled domain-specific speech recognition and NLU. It is small (less than 3MB
and 8% CPU usage on RPi3) and ideal for home automation, industrial
application, service industry, etc.

2- Speech-to-Text: It is large vocabulary speech recognition software that
runs locally. It will support all platforms currently being supported. It
allows you to do large vocabulary transcription with high accuracy locally on
an embedded platform.

~~~
fermuch
Are those two multi language? I can think of a lot of use cases for industrial
IoT if they support Spanish and Portuguese

------
syntaxing
I wonder how this compares to snips[1]. I recently connected snips to my
lights and it works pretty flawlessly. Picovoice does look like its easier to
integrate into an app though.

[1] [http://snips.ai/](http://snips.ai/)

~~~
kenarsa
I am the founder of Picovoice. I am NOT going to comment on comparison as
obviously I will be biased ;)

With Picovoice you can use the voice control engine to accomplish this. Maybe
something similar to this demo?

[https://picovoice.ai/#voice-control-demo](https://picovoice.ai/#voice-
control-demo)

The cool thing about this engine is that it is tiny. It uses less than 8% CPU
on RPi3 and altogether it is less than 2MB (code, model, etc). Technically you
can run it on something much smaller and cheaper than RPi.

Alternatively, the speech-to-intent engine could be a good candidate. More
information on this along with an interactive demo will be released this
weekend.

~~~
syntaxing
I would love to see some benchmark numbers on Picovoice. The small size and
low CPU is definitely interesting but I'm worried that performance is hindered
because of this. Also, is 8% peak usage? Having this loaded on some small IoT
chip will be awesome to see.

~~~
kenarsa
You should check this out: [https://github.com/Picovoice/wakeword-
benchmark](https://github.com/Picovoice/wakeword-benchmark)

We do work with a couple of SoC manufacturers and will disclose some of the
results when our partners are ready. In general, we can run on any MCU with a
C compiler and 200KB of RAM (maybe less if there is fast FLASH available). We
already of models working on ARM Cortex-M and Cadence's HiFi4.

------
MR4D
The demo is pretty cool - the thing I liked best was this line:

"You can turn off your internet connection and it will keep working."

That's nice!

~~~
kenarsa
It is true! Have you tried it? :)

We will add another demo for speech-to-intent module fairly soon (over this
weekend). Stay tuned!

~~~
MR4D
Not yet - my work computer doesn't like being cut off from the world due to
financial apps that I run, but I'll try when I'm home this weekend.

------
herbst
Just took a closer look at the examples and played around with some code. It
works really well and a surprisingly low footprint. Basically doing exactly
what is promised.

However i wonder why there is no way (visible?) to generate words for
Javascript? Or at least a documentation on how the format for those byte
arrays is build.

Assuming this is a licensing thing, i would really suggest to not put limit in
that way. On first impression i assumed that this is useable for free for
everything except commerical projects.

~~~
simonvc
Came here to say this.. Had a great idea for using this to help train pilots
to do the RT (i'm doing my PPL now, and this would be perfect..) The lack of
JS optimizer means i can't really hack on what, could have become a nice
passive income site.

------
roel_v
Can I use this for my home automation setup? It seems the github link is only
part of the product. Do you license this for individual users?

~~~
ocdtrekkie
This is my question too. I looked at Snips but they had no timeline on Windows
support and were more interested in talking about their crypto coin than
answering basic questions about their software.

~~~
syntaxing
I had pretty good luck reaching out to the developers on Discord these past
couple weeks. Not sure when you tried but I recommend trying it again. I
personally I think snips is way more appropriate on something embedded like a
RPi. I was up and running with their new sam package within minutes. Their new
update (about two months?) really made it more user friendly. Unfortunately,
their windows packages is still very broken.

E: You should probably try in the late night or early morning time for the
States (EST) since they are located in Europe I believe.

~~~
kenarsa
BTW, Picovocie runs on RPi (all variant not only RPi3).

~~~
syntaxing
I'm currently running snips on a RPi 2 (Model B) and it's working so far so
good! Not too sure about the original Rpi though. I'm debating on getting a
RPi 3 to see if the performance is better.

~~~
kenarsa
RPi3 is definitely faster. We also run on RPi 1/zero/etc.

------
natvert
Is this not just [https://github.com/ARM-software/ML-KWS-for-
MCU](https://github.com/ARM-software/ML-KWS-for-MCU) trained on new keywords?
Maybe I'm missing what is so special here?

~~~
kenarsa
the project you mentioned is wake-word for ARM only. Great project, BTW. For
wake word, we provide on-demand model generation. We do also more than wake
word. Finally, we can run on other CPUs/OSs as well.

------
herbst
How about numbers or complex/unknown words? I would love to proxy everything
that isn't known AFTER the offline trigger word to Google voice recognition or
something. Is that possible?

~~~
kenarsa
Numbers and complex words (I am assuming you mean something like "ok blah"?)
are doable easily. I am not sure what you mean by unknown words. Could you
elaborate?

Obviously, you can just grab the audio stream after the phrase is detected and
route it to whatever you like. Google ASR or even a local one running on the
device!

~~~
herbst
That is exactly what I ment if I get the audio stream out as well.

I have literally 3 projects, one a current big that depend or would heavily
benefit from this. Can't wait to give it a try!

~~~
kenarsa
Sweet! Go for it!

------
gpm
Is this related to picotts?

------
jscheel
This seems really neat, but my main, blocking issue is that there is
absolutely no way to add pronunciation for a wake word. Therefore, if the
developers have not explicitly added the word to their vocabulary, you are
completely out of luck.

~~~
kenarsa
If you have a commercial application we can build you the model for any wake
word. Since this requires some engineering work on our side we can only offer
it to commercial customers at this point.

~~~
lucb1e
Why not make it possible for anyone to train it instead of you having to do
engineering? Is that the profit model? If so, totally valid, but I'm wondering
if I'm understanding this right.

~~~
kenarsa
two reasons:

1- the business model 2- in some cases, it actually needs some engineering.
for example a new brand name, etc.

------
bmc7505
It would be nice to support a grammar, a la CMUSphinx:
[https://cmusphinx.github.io/wiki/tutoriallm/#grammars](https://cmusphinx.github.io/wiki/tutoriallm/#grammars)

~~~
kenarsa
Agreed. The reason for not supporting grammar in this product is that we want
to keep it extremely lightweight. We do have upcoming products that support
grammar as well.

~~~
bmc7505
Awesome! I look forward to trying out the speech-to-intent and speech-to-text
products. Will these also be Apache 2.0 licensed?

------
arendtio
@kenarsa any plans for a Go integration?

Pretty impressive tech by the way.

------
KaranRaut
Which would you vote for picovoice or snips as a voice assistant AI product?
I've been meaning to do more research and maybe your comments can help gain
more insights.. thanks

------
bufferoverflow
Doesn't work on Android chrome.

~~~
kenarsa
The demo is using WebAssembly which is supported by Chrome. It also uses Web
Audio API which I believe is again supported by Chrome. I just used the demo
on my Android phone using Chrome. I wonder what could be a problem. It the mic
on? :) If yes, maybe provide me the version of chrome you are using. I will
look into it.

------
toniprada
This is perfect for a Home Assistant (hass.io) integration.

