Please, please, please be a completely open, extensible platform... I want to be...

oxguy3 · on May 18, 2016

According to Ars Technica, Google Home is actually gonna be more locked down than Amazon Echo.

> Initially, Google says that it will not be creating APIs for Assistant and Home and that as such, any integrations with services and other devices will have to come from Google first. This approach is a contrast with the Echo, which is designed to be extensible.

https://arstechnica.com/gadgets/2016/05/google-assistant-and...

Dreams = crushed :(

dragonwriter · on May 18, 2016

Initially doing internal integrations, then releasing API access to trusted partners, then making APIs publicly available is how Google has done lots of things. So, I wouldn't be surprised if that's the route Google does with this.

capitalsigma · on May 19, 2016

Yeah, I suspect the idea is that "public APIs are forever," especially in hardware. They probably want to be able to collect some real user data, make a few mistakes and get a better idea of what role the product is actually going to fulfill before committing to something that they'll have to maintain indefinitely.

danellis · on May 18, 2016

That's not what they said in the keynote. They were explicit about the fact that developers would be able to extend it. They used Uber as an example.

ENGNR · on May 19, 2016

I wonder what the point of even announcing it at a dev conference was, save it for CES

supergeek133 · on May 18, 2016

If true.... seriously Google?

rtkwe · on May 19, 2016

The key word there is initially. Echo didn't have an SDK when it was first released either neither did Google Now for a while. That's Google's MO for new services and APIs, limited initial release to iron out the bugs and not flood with low effort apps/services.

TeMPOraL · on May 18, 2016

I have only two wishes:

> Please, please, please be a completely open, extensible platform...

That's one. The second one is, please make it self-hosted. No cloud bullshit.

I know I'll probably never live to see the second one coming true.

scrollaway · on May 18, 2016

How would you make it self-hosted without making it suck? High quality voice recognition in a small box doesn't seem to be a thing that's even remotely possible today, let alone the query processing and knowledge database that comes with it.

You could build this on a pi with a mic, speakers, some foss stt and tts engines and some basic training data. But it'll suck.

TeMPOraL · on May 18, 2016

Ten years ago I played with Microsoft Speech API - which was completely off-line and trained off your voice. In restricted grammar mode, it worked flawlessly - I built a music control application on it, and utilized it like you would use Amazon Echo - I just said "computer, volume, three quarters" from any place in the room, and the loud music turned down a notch. Etc. That was ten years ago, with a crappy electret microphone I soldered to a cable myself and sticked to my wardrobe with a bit of insulating tape.

I'm not buying you couldn't make a decent, self-contained, off-line speech recognition system. Sure, it may not be as good as Echo or Google Now (though the latter does suck hardly at times, it's nowhere near reliable to use, and it doesn't understand shit over a quite good and expensive Bluetooth headset). But it would be hackable, customizable. You could make it do some actual work for you.

Oh, and it wouldn't lag so terribly as Google Now does. Realtime applications and data over mobile networks don't mix.

dgacmu · on May 19, 2016

"In restricted grammar mode"

That's a key limitation, though.

But we're getting close to the point where you can do some of this. For example - http://arxiv.org/pdf/1603.03185.pdf - LSTM speech recognition running on a Nexus 5.

The more serious problem with this is that it's going to be expensive -- and somewhat wasteful. There's a lot of pressure to keep consumer devices as cheap as possible, and the cloud is an awesome way to do that. Having shared cloud-based infrastructure for the speech recognition as opposed to putting it into every device (even though it's only used for ~5 minutes every day) is probably a lot cheaper. Consider the hardware in an Amazon Echo:

https://www.ifixit.com/Teardown/Amazon+Echo+Teardown/33953

256MB DRAM and a TI DSP: http://www.ti.com/product/dm3725 with a single Cortex-A8 core (about $23 + a smidgeon for the dram)

vs. a Nexus 5 (2GB DRAM, 4 core 2.2Ghz Krait 400) -- the N5 has roughly 8x the DRAM and compute of the CPU in the Echo.

Would you pay an extra $150 for a LocalEcho that still had to send most of your queries to a search engine for resolution, or to a cloud music service for music? (You & I might, but most consumers wouldn't.)

pmlnr · on May 19, 2016

> "In restricted grammar mode"

> That's a key limitation, though.

Why would it be? Sophisticated exchange of theorems and not essential for this scenario, is it?

michaelt · on May 19, 2016

Depends if you want to support things like "OK Google, invite Pawel Moczydłowski to my barbecue" and "OK Google, how do you spell d'Artagnan?"

highwind · on May 19, 2016

> I'm not buying you couldn't make a decent, self-contained, off-line speech recognition system.

I agree. It's not a problem of technology, it's a problem of incentive. There's no money in developing self-contained, off-line speech recognition system, unfortunately.

scrollaway · on May 19, 2016

> There's no money in developing self-contained, off-line speech recognition system

Nonsense. Self-hosting is highly valued in the enterprise sector. But we're not talking about the sort of products that could be sold to consumers for a few hundred dollars here.

delluminatus · on May 18, 2016

A desktop PC is more than able to do good speech recognition as long as it's able to train the model for individual voices. Getting good results without training the model for the user beforehand is harder, and you would probably never be quite as good as a cloud-based system.

A Pi, though, couldn't do well at all, just like you said. If I wanted to build a system like this for myself, I would target an HTPC form factor.

edit: Another possibility, which was explored elsewhere in this thread, would be to keep the listening device "thin", but have the ability to offload the processing to a machine in my LAN instead of one the "cloud".

mbrock · on May 18, 2016

Hey, people with experience in speech recognition, please chime in!

Just the other day I was looking at CMU's Sphinx project for speech recognition. It seems quite capable, even of building something like this Google thing, but I haven't tried to actually use it.

Large-vocabulary recognition probably needs something better than a Raspberry Pi... so, just use a more powerful CPU.

Yes, Google has an incomprehensibly enormous database of proprietary knowledge and information. Good for them! If we want to build a home assistant that doesn't depend on Google, we'll have to make tradeoffs. That doesn't mean it has to suck.

CaptSpify · on May 18, 2016

I have an RPI running Sphinx. It's OK, not great. The biggest issue I have is that you have to pre-define commands.

mbrock · on May 18, 2016

Your own custom software based on Sphinx?

Is it PocketSphinx?

I was mostly interested in automated transcription, didn't look much at the live recognition stuff.

CaptSpify · on May 18, 2016

It was pocketsphinx. Automated transcription would probably be pretty sad.

mbrock · on May 19, 2016

I think the non-pocket version (Sphinx4) should be more capable, no?

CaptSpify · on May 20, 2016

That may be. I haven't had a chance to look into that version

niutech · on May 18, 2016

Sirius (http://sirius.clarity-lab.org/) is open source and self-hosted.

bschwindHN · on May 19, 2016

I have "Offline speech recognition" with Google Voice Typing that seems to work perfectly well in airplane mode. The downloaded language pack (English) is 39 MB.

Is there something I'm missing?

supergeek133 · on May 18, 2016

Here is the problem, not all devices you could work with it are self-hosted and doesn't allow cloud interactions. Now if you're talking about Home's dependence on a cloud for local interaction, then I get you.

But, on the other side, if it's not open and you can't use any device with it... I'm going to be really upset on a personal level.

The reasons consumer IoT isn't huge yet are: 1) Disparate connection types (e.g., I could buy Z-wave, Wifi, BLE, etc and they all onboard differently) 2) I can't choose which device I want to use with which platform because of politics.

Some of these devices (thermostats or security systems for instance) aren't impulse buys. If I have a Honeywell thermostat, and Home doesn't support it, I either buy a new thermostat or don't buy Home.

That's a crummy choice for a consumer.

danellis · on May 18, 2016

> please make it self-hosted

I rather suspect that the knowledge graph it uses is a rather hefty dataset. Probably not suitable for a home installation. And how would you keep it up-to-date without the cloud? Would you have it scrape websites and consume feeds itself?

TeMPOraL · on May 18, 2016

Knowledge graph could be a separate service. It handles only a subset of requests anyway; no reason for the request itself not to make a "pit stop" under my control before it is sent to fetch data. You could also use more than one provider of a knowledge graph in this case.

The more important aspect of it is fixing the problems with said knowledge graph. For instance, Google doesn't have the data on the public transportation in my city. I could easily write a scrapper that would fetch me the bus/tram timetables - but there's no way to integrate that source of data with Google Now. It's one example, but in practice Google's knowledge graph is pretty much useless for me. At best, it can answer me some trivia questions sometimes.

pmlnr · on May 19, 2016

> Would you have it scrape websites and consume feeds itself?

Let me introduce you to PuSH: https://en.wikipedia.org/wiki/PubSubHubbub

erstorreyk · on May 18, 2016

I want subqueries.

1. What's the name of that film that came out around the time of Jane's birthday party, the one with that guy in that I always confuse with Adam Sandler?

2. Where can I go for lunch and sit outside in the sunshine?

3. Play me some music that I'd like but nothing too recent.

tamana · on May 18, 2016

The Corporate Integrations Committee will consider these feature request for a future release.

supercanuck · on May 18, 2016

I would love for it to connect my sonos and my spotify together, rather than having to run a node.js server for the purpose

surfaceTensi0n · on May 18, 2016

Doesn't sonos already have integration with spotify? Or is that only available if you're paying for spotify?

djloche · on May 18, 2016

Correct: it does have a spotify integration, but it is only available for paying spotify customers.

niutech · on May 18, 2016

There is a fully open source voice-controlled platform: Jasper https://jasperproject.github.io/

_ofdw · on May 20, 2016

How will the companies trap you into their proprietary walled garden if they let you change the settings?

lowglow · on May 18, 2016

Hey can you email/chat with me (info on profile), I'd like to chat more about your use cases!

dominotw · on May 18, 2016

you can do all those with amazon echo by writing your own app.

rplnt · on May 19, 2016

> If Google Home can deliver on these points, I would switch from Amazon Echo in a heartbeat.

I think it will mostly deliver ads.