Hacker News new | past | comments | ask | show | jobs | submit login

Please, please, please be a completely open, extensible platform...

I want to be able to control my Apple TV with my Google Home device.

I want to be able to control my Phillips Hue and LiFX bulbs.

I want to be able to build my own custom home automation server endpoints and point my Google Home commands at them.

I want to be able to remote start my car with a voice command.

I want to be able to control my Harmony remote, and all of the devices connected to my Harmony hub.

I want to be able to access my Google calendar.

I want to be able to make hands-free phone calls to anyone on my Google contacts.

If my grandmother falls, I want her to be able to call 911 by talking to the Google Home device.

I want to be able to ask wolfram alpha questions by voice.

I want to be able to have a back-and-forth conversation to arrive at a conclusion. I don't want to have to say a perfectly formulated command like, "Add an event to my calendar on Jan 1, 2016 at 2:00 pm titled go to the pool party". I want to be able to say, "Can you add an event to my calendar?", and then answer a series of questions. I hate having to formulate complex commands as a single sentence.

I want to be able to have a Google Home device in each room, without having to give each one its own wake-up word. Just have the closest one to me respond to my voice (based on how well it can hear me).

I want to be able to play music on all of my Google Home devices at the same time, and have the music perfectly synchronized.

This is my wish list. I am currently able to do more than half of these items with Amazon Echo, but I had to do a bunch of hacking and it was a pain in the ass.

If Google Home can deliver on these points, I would switch from Amazon Echo in a heartbeat.




According to Ars Technica, Google Home is actually gonna be more locked down than Amazon Echo.

> Initially, Google says that it will not be creating APIs for Assistant and Home and that as such, any integrations with services and other devices will have to come from Google first. This approach is a contrast with the Echo, which is designed to be extensible.

https://arstechnica.com/gadgets/2016/05/google-assistant-and...

Dreams = crushed :(


Initially doing internal integrations, then releasing API access to trusted partners, then making APIs publicly available is how Google has done lots of things. So, I wouldn't be surprised if that's the route Google does with this.


Yeah, I suspect the idea is that "public APIs are forever," especially in hardware. They probably want to be able to collect some real user data, make a few mistakes and get a better idea of what role the product is actually going to fulfill before committing to something that they'll have to maintain indefinitely.


That's not what they said in the keynote. They were explicit about the fact that developers would be able to extend it. They used Uber as an example.


I wonder what the point of even announcing it at a dev conference was, save it for CES


If true.... seriously Google?


The key word there is initially. Echo didn't have an SDK when it was first released either neither did Google Now for a while. That's Google's MO for new services and APIs, limited initial release to iron out the bugs and not flood with low effort apps/services.


I have only two wishes:

> Please, please, please be a completely open, extensible platform...

That's one. The second one is, please make it self-hosted. No cloud bullshit.

I know I'll probably never live to see the second one coming true.


How would you make it self-hosted without making it suck? High quality voice recognition in a small box doesn't seem to be a thing that's even remotely possible today, let alone the query processing and knowledge database that comes with it.

You could build this on a pi with a mic, speakers, some foss stt and tts engines and some basic training data. But it'll suck.


Ten years ago I played with Microsoft Speech API - which was completely off-line and trained off your voice. In restricted grammar mode, it worked flawlessly - I built a music control application on it, and utilized it like you would use Amazon Echo - I just said "computer, volume, three quarters" from any place in the room, and the loud music turned down a notch. Etc. That was ten years ago, with a crappy electret microphone I soldered to a cable myself and sticked to my wardrobe with a bit of insulating tape.

I'm not buying you couldn't make a decent, self-contained, off-line speech recognition system. Sure, it may not be as good as Echo or Google Now (though the latter does suck hardly at times, it's nowhere near reliable to use, and it doesn't understand shit over a quite good and expensive Bluetooth headset). But it would be hackable, customizable. You could make it do some actual work for you.

Oh, and it wouldn't lag so terribly as Google Now does. Realtime applications and data over mobile networks don't mix.


"In restricted grammar mode"

That's a key limitation, though.

But we're getting close to the point where you can do some of this. For example - http://arxiv.org/pdf/1603.03185.pdf - LSTM speech recognition running on a Nexus 5.

The more serious problem with this is that it's going to be expensive -- and somewhat wasteful. There's a lot of pressure to keep consumer devices as cheap as possible, and the cloud is an awesome way to do that. Having shared cloud-based infrastructure for the speech recognition as opposed to putting it into every device (even though it's only used for ~5 minutes every day) is probably a lot cheaper. Consider the hardware in an Amazon Echo:

https://www.ifixit.com/Teardown/Amazon+Echo+Teardown/33953

256MB DRAM and a TI DSP: http://www.ti.com/product/dm3725 with a single Cortex-A8 core (about $23 + a smidgeon for the dram)

vs. a Nexus 5 (2GB DRAM, 4 core 2.2Ghz Krait 400) -- the N5 has roughly 8x the DRAM and compute of the CPU in the Echo.

Would you pay an extra $150 for a LocalEcho that still had to send most of your queries to a search engine for resolution, or to a cloud music service for music? (You & I might, but most consumers wouldn't.)


> "In restricted grammar mode"

> That's a key limitation, though.

Why would it be? Sophisticated exchange of theorems and not essential for this scenario, is it?


Depends if you want to support things like "OK Google, invite Pawel Moczydłowski to my barbecue" and "OK Google, how do you spell d'Artagnan?"


> I'm not buying you couldn't make a decent, self-contained, off-line speech recognition system.

I agree. It's not a problem of technology, it's a problem of incentive. There's no money in developing self-contained, off-line speech recognition system, unfortunately.


> There's no money in developing self-contained, off-line speech recognition system

Nonsense. Self-hosting is highly valued in the enterprise sector. But we're not talking about the sort of products that could be sold to consumers for a few hundred dollars here.


A desktop PC is more than able to do good speech recognition as long as it's able to train the model for individual voices. Getting good results without training the model for the user beforehand is harder, and you would probably never be quite as good as a cloud-based system.

A Pi, though, couldn't do well at all, just like you said. If I wanted to build a system like this for myself, I would target an HTPC form factor.

edit: Another possibility, which was explored elsewhere in this thread, would be to keep the listening device "thin", but have the ability to offload the processing to a machine in my LAN instead of one the "cloud".


Hey, people with experience in speech recognition, please chime in!

Just the other day I was looking at CMU's Sphinx project for speech recognition. It seems quite capable, even of building something like this Google thing, but I haven't tried to actually use it.

Large-vocabulary recognition probably needs something better than a Raspberry Pi... so, just use a more powerful CPU.

Yes, Google has an incomprehensibly enormous database of proprietary knowledge and information. Good for them! If we want to build a home assistant that doesn't depend on Google, we'll have to make tradeoffs. That doesn't mean it has to suck.


I have an RPI running Sphinx. It's OK, not great. The biggest issue I have is that you have to pre-define commands.


Your own custom software based on Sphinx?

Is it PocketSphinx?

I was mostly interested in automated transcription, didn't look much at the live recognition stuff.


It was pocketsphinx. Automated transcription would probably be pretty sad.


I think the non-pocket version (Sphinx4) should be more capable, no?


That may be. I haven't had a chance to look into that version


Sirius (http://sirius.clarity-lab.org/) is open source and self-hosted.


I have "Offline speech recognition" with Google Voice Typing that seems to work perfectly well in airplane mode. The downloaded language pack (English) is 39 MB.

Is there something I'm missing?


Here is the problem, not all devices you could work with it are self-hosted and doesn't allow cloud interactions. Now if you're talking about Home's dependence on a cloud for local interaction, then I get you.

But, on the other side, if it's not open and you can't use any device with it... I'm going to be really upset on a personal level.

The reasons consumer IoT isn't huge yet are: 1) Disparate connection types (e.g., I could buy Z-wave, Wifi, BLE, etc and they all onboard differently) 2) I can't choose which device I want to use with which platform because of politics.

Some of these devices (thermostats or security systems for instance) aren't impulse buys. If I have a Honeywell thermostat, and Home doesn't support it, I either buy a new thermostat or don't buy Home.

That's a crummy choice for a consumer.


> please make it self-hosted

I rather suspect that the knowledge graph it uses is a rather hefty dataset. Probably not suitable for a home installation. And how would you keep it up-to-date without the cloud? Would you have it scrape websites and consume feeds itself?


Knowledge graph could be a separate service. It handles only a subset of requests anyway; no reason for the request itself not to make a "pit stop" under my control before it is sent to fetch data. You could also use more than one provider of a knowledge graph in this case.

The more important aspect of it is fixing the problems with said knowledge graph. For instance, Google doesn't have the data on the public transportation in my city. I could easily write a scrapper that would fetch me the bus/tram timetables - but there's no way to integrate that source of data with Google Now. It's one example, but in practice Google's knowledge graph is pretty much useless for me. At best, it can answer me some trivia questions sometimes.


> Would you have it scrape websites and consume feeds itself?

Let me introduce you to PuSH: https://en.wikipedia.org/wiki/PubSubHubbub


I want subqueries.

1. What's the name of that film that came out around the time of Jane's birthday party, the one with that guy in that I always confuse with Adam Sandler?

2. Where can I go for lunch and sit outside in the sunshine?

3. Play me some music that I'd like but nothing too recent.


The Corporate Integrations Committee will consider these feature request for a future release.


I would love for it to connect my sonos and my spotify together, rather than having to run a node.js server for the purpose


Doesn't sonos already have integration with spotify? Or is that only available if you're paying for spotify?


Correct: it does have a spotify integration, but it is only available for paying spotify customers.


There is a fully open source voice-controlled platform: Jasper https://jasperproject.github.io/


How will the companies trap you into their proprietary walled garden if they let you change the settings?


Hey can you email/chat with me (info on profile), I'd like to chat more about your use cases!


you can do all those with amazon echo by writing your own app.


> If Google Home can deliver on these points, I would switch from Amazon Echo in a heartbeat.

I think it will mostly deliver ads.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: