
Mycroft – A.I. for everyone - kevlar1818
https://docs.mycroft.ai/
======
ThomPete
I know there have been many discussions about speech interfaces here lately,
but I really think it's going to be as transformative as the touch screen in
that it removes abstraction between the users intent and the means to input
it. Speech is very natural to us and as soon as the AI become more contextual
it will be truly powerful in our every day lives.

I can of course only speak from personal experience but it completely changed
the way we access various services in our family and allow us to be together
while using Google home as a fifth member of the family.

Making it open source is great but I do believe the hardware needs to be much
more polished to compete with Amazon and Google.

------
nl
So today I built this same thing[1], using a Logitech universal remote, Bing
Speech recognition, ifttt.com and about 6 lines of Python.

[1] OK, it's the easy part. And I agree the whole system of skills and
features sounds great. But I can say "Turn on TV" and my TV comes on, and
"Turn on Xbox" and the TV and Xbox come on. And.. 6 lines of Python.

~~~
kevlar1818
Interesting. Could you elaborate how the remote, Bing STT, IFTTT, and your
Python code interact with one another?

~~~
daveguy
I expect audio and Bing STT are handled by the Python SpeechRecognition
library:

[https://pypi.python.org/pypi/SpeechRecognition/](https://pypi.python.org/pypi/SpeechRecognition/)

Python to IFTTT here (using maker channels):

[https://github.com/briandconnelly/pyfttt](https://github.com/briandconnelly/pyfttt)

And then using the logitech universal remote (harmony remote) from IFTTT:

[https://ifttt.com/harmony](https://ifttt.com/harmony)

Although you can also control a harmony remote directly from python:

[https://github.com/jterrace/pyharmony](https://github.com/jterrace/pyharmony)

More about the harmony remote here:

[http://www.logitech.com/en-us/harmony-remotes](http://www.logitech.com/en-
us/harmony-remotes)

~~~
nl
Exactly this.

------
StavrosK
Does anyone know what the hardware looks like? I haven't managed to find any
info other than "it's a raspberry Pi with LEDs". I'm especially interested in
whether it had a microphone array and how they process audio to remove noise.

~~~
devoply
The key feature of an Echo is its wide microphone array. Those are somewhat
expensive.

~~~
goda90
Are they expensive because they are complicated or because the market for them
has been small? I'm curious how hard it would be to make one.

~~~
StavrosK
I don't know how expensive they are, but I did get a PS3 eye camera with a
four-mic array for $4. I just don't have the software to do noise reduction
for the streams.

------
amelius
Since it's open-source: are the underlying text-to-speech and speech-to-text
engines already available as libraries for Linux?

~~~
kevlar1818
This is the biggest shortcoming of this project IMO; it only uses PocketSphinx
to recognize the trigger phrase, and then uses Google STT for the command
phrase.

A big reason I (and I believe others) are wary of Amazon Echo and Google Home
are because we don't like the idea of having an always-on microphone in my
home, shuttling everything my family says to these giant companies.

I really hope the MycroftAI-backed OpenSTT[1] project gets off the ground so
they (and others) can divorce themselves from Amazon, Google, Bing, etc.
services.

[1] [https://openstt.org/](https://openstt.org/)

~~~
amelius
Good points. I also think we really should not label software "open source"
unless it is really open source.

But good to see that at least the project has the intention of becoming open
source.

------
alexellisuk
A notable effort, however at first glance this seems like an imitation of the
Alexa service with like-for-like terminology and backend concepts even using
the word "intent". Does this not risk stepping on the toes of Amazon's
copyright/trademark?

One thing I do regard highly is the focus on testing vs. Alexa's SDK where
Amazon don't even seem to be a vaguely concerned.

~~~
tyingq
>even using the word "intent"

Words like "intent", "sentiment", and "entities" were standard vernacular for
natural language processing long before Amazon decided to jump into the space.

~~~
omash
Like "chrome" in the browser space.

~~~
tyingq
I think Google trying to forbid Firefox from using the word "bookmark" would
be a closer analogy.

------
anexprogrammer
How does this compare for privacy with the recently failed Protonet Zoe?

Can it keep your Speech recognition and Io(broken)T devices to the local net?

------
Dowwie
Great work! Are you collaborating with the OpenAI team ? It's exciting to see
how projects will cross-pollinate each other with their best ideas.

------
Tharkun
What are the privacy implications of this thing?

~~~
traverseda
Similar to the implications of siri and Google now, I imagine.

~~~
Tharkun
From what I understand Siri isn't always-on. In the sense that you have to
switch it on. This seems to be listening all the time.

Another comment mentioned that they use some Google speech recognition
component, which probably means all of your conversations are delivered to
Google. That's a big red flag for me.

~~~
omash
It first detects its wake word using pocketsphinx on around the last two
seconds of audio. Audio older than two seconds is discarded. After detecting
its wakeword it begins recording until when it thinks the user has finished
speaking.

I believe the Poketsphinx part is offline so only commands would be delivered
to Google.

------
neilellis
Just a note, GPL 3 is not a permissive license, it's a viral license. BSD and
MIT are permissive licenses.

~~~
rvern
The license is LGPL 3, not GPL 3.

