I'm not usually one to hop on something and criticize when it's a Show HN.
But I have literally everything that this is doing already set up in my home right now using an Amazon Echo, which is $200, instead of $10,000.
I can say "Trigger lights for projector", and lights dim how I want it. Or "Trigger lights for guests" and bing, lights. I can say "Play my Lumineers station on Pandora", and my Echo will play it. And of course I can ask Echo all kinds of things like "What is the distance to the moon?" or "What is the population of India?"
Plus a lot more.
So what is the value add here? How is this different/better than Echo + IFTT + a Wink hub? I'm really genuinely curious, because this is something I am doing right now and am totally willing to spend money on to do better, but based on this short video I have no idea how this is better...
So I guess take that as constructive feedback from (presumably?) your target audience -- show me something really cool this does to get me to consider spending $10k on it.
One thing I don't like about Echo, IFTT, and others like them is the fact that everything has to go through "the cloud." If I'm sitting at home and want to dim my lights, why do I need to send my voice to Amazon's server, have it call out to IFTT, which calls back into my home? Especially since I have a data cap. All of that can easily just go through a locally-installed system and you don't need expensive hardware to do it.
That's what's driven me to write my own home automation code. (Though my code's terrible, not HN-worthy.) Anything that insists on a cloud API is out in my book. A local server is potentially worth serious money, especially if it's easy to use.
I've been toying with something similar (mostly on paper so far) involving custom GStreamer elements for voice recognition/synthesis, a Wit.ai-like intent resolver for command & control, and ChatScript-like pattern matching for conversational dialogue. My goal is to put it all behind a WebRTC and SIP gateway and have a low-latency personal assistant that I can access from virtually any device (even an old landline telephone) and that runs on my own private server. That's the dream anyway. I'm stuck on the voice synthesizer, at the moment...
I started at device control. Once I get device control where I want it on my home server, I'll expand out to being able to reach it elsewhere, and then voice.
Depending on how often I use it and the audio formats, it could be. Maybe not for someone with a 300GB/mo cap, but I'm stuck with 15, so every megabyte counts. Either way, my point is that there's no reason for it to use any data at all to control devices on my own local network.
All good points. With Josh, we're targeting large homes with complex environments. Our goal is to make it super simple to set up, configure, and use Josh. While Echo + IFTTT + a Wink hub might not sound too complicated to someone on HN, it's far too hacky for a 10,000 sq ft home where the average home owner doesn't want to understand what's truly going on.
We're also focused as much on home monitoring and access when you're away from the home as we are on control from within. It might seem subtle, but the number of times you want eyes and ears and the ability to change the environment of your home when away is super important.
The longer term play which will take some time to realize is all about AI. We see Josh as an invisible companion that grows with you, understands natural language, and can interact with the connected devices around you.
Seems like you're targeting multimillionaires. Just looking at the Portland metro area (nowhere near the largest target market), there are about 6 homes currently for sale that are over 10k sqft. Ranging from 1.5M way out in the burbs to 8M+. It's likely going to need quite a lot of polish to have a large uptick for such a small total audience.
Fair enough. I mean in that case I would probably have spent some time in the video showing how easy it was to set up all the devices to work with Josh. Since what you're saying is that while the end result is about the same (currently), the setup is easier. As you said not really for me then since I like having the hackability of using IFTT to do all sorts of things, but I can see how someone in a 10k sqft home might not want to deal with it.
Of course that's a pretty small audience, so there you go. "Sell to the classes, eat with the masses", as they say...
Awesome! This is great man. How long have you been working on this for, and what are you using to processing the speech (is it internally developed or are you leveraging an API or something)? This seems like a fun hack just for home.
A couple years ago in my dorm, I set up a clapper and set up a server to play my music when I was out of my dorm, so that when I was coming home I could play the sound of the clapper to get my lights started. Or if I was in my room's WiFi network, I could ask Siri to play "Turn on the Lights", and it would play an MP3 of me clapping and turn on the lights. This is obviously way cooler, and I always thought I'd one day sit and hack out something like this with Wit.ai or something. What are your plans for this?
Thanks! The NLP engine is a couple years in the making, but the app and product are just a few months in. Were initially building for large homes where voice control can be super helpful, but we're already working on developer tools to let others build Josh into their systems. Your hack sounds pretty awesome. If you don't already have one, think about getting a Sonos Play:1 and tinkering with the unofficial API. You can do some pretty fun things with it.
* just a note, we are working with an official private Sonos API, it's just hard to get access if you're starting out tinkering.
While that is in the guidelines, it doesn't seem to be very strictly applied. I think the goal of the guideline is primarily to have the opportunity to get to know the project in order to critique it thoroughly, and sometimes a presentation is the only viable option.
If that's the case then the guidelines should either be changed or actually be enforced. Although I've seen dang remove the "Show HN" from posts before. To me it's misleading given the current guidelines.
I don't buy the argument that "sometimes a presentation is the only viable option." In this case and any other, the viable option is to keep working on the product until it's ready for people to test.
Is this "an AI", or "a voice control system preloaded with a bunch of sentence command templates"?
I mean this as a neutral question, because it's cool either way, and does seem like a bizarre little hole in the current "voice control" ecosystem, but there's a difference in what I, at least as an HN reader, expect between the two things, and I wanted to give you that feedback.
We're very much an AI company, and that's the focus, but the product will roll out in stages. The foundation is built on a home-made NLP engine that's pretty flexible. That said, NNs are only used for speech recognition currently. With a relatively stable NLP engine we're focused now on device integration to control the majority of IoT products our customers might have. We're implementing some learning and pattern recognition models but it will take some time before those come close to resembling any true "intelligence".
Incidentally, I just complained in another post about the way that needing to "invoke" voice mode makes it much less fluent. You may at least want to pop that one on to the mental back burner to see if there's a solution you can think of for it. I don't know what, but something that makes the friction of invoking a voice app less would help a lot. (I assume this is where Amazon Echo idea comes from.)
1. A number of early customers we've spoken with don't like the idea of an always listening device. The perception is an invasion of privacy.
2. There's a fear that someone outside the home could simply yell a command like "unlock the doors" to gain access. There are a number of ways one could solve this, but it's a fear we've heard about.
So the extra friction of taking out your phone and pressing a button so far seems worth the added layer of privacy and security, but we're definitely thinking hard about this one.
> One of the challenges to building a fully automated house, particularly one that grows and evolves over time, is the ability to program with ease. In theory it’s great if your sprinklers turn off when it rains, or if the lights go on at sunset, but these actions today take a skilled programmer many hours to assemble. [...] Even complicated queries can be effortlessly programmed, such as, “At sunrise if I’m home slowly fade the bedroom lights on, open the drapes, turn on the radio, and brew a pot of coffee.”
That's interesting, yeah. I mean I spent a few hours recently building a little script that makes my apartment's foyer light change based on the temperature and weather during the day, and go to a dim red at night, and I really wish it was a simpler process than "kludging together a Python script".
If you've got any of that kind of stuff up and running, you should be linking to that, not to a video that is essentially nothing you can't replicate with Siri and a light system that registers with HomeKit, or Echo, Cortana, or Google Now with similar plugins.
Also is there supposed to be any actual lists of brands of connected devices in the 'works with' page? All I see is greyscale photos of various anonymous devices with a word or two and an icon superimposed on them. It works with "LIGHTING" and "OUTDOORS" and "SECURITY", great, does it work with my lighting or outdoors or security?
Also: If the product is named "Josh", why is its default voice female? Has "Josh" shifted from a male-coded name to a female-coded one when I wasn't looking? Or is your product intended to present as a transman who really needs to work on his voice?
Gender: we have a variety of systems set up and we try to give them each a unique personality. My home, for example, is a male named Theodore with a British accent. The LA office is Scarlett and the Denver office is Samantha.
Website: we have a new website launching soon with a lot more information. You stumbled upon a page I hope to finish building today. Sorry for the confusion.
"Voice programming": We chose to focus on getting a product to market and proving customer demand before building out the entire developer portal. It's definitely the plan and we want to do it, it's just not where we see early revenue coming from and as a small team we have to pick our battles. I can't wait to open this stuff up when ready.
Yep, price tag is $10k. There's quite a bit under the hood you're not seeing in this video, particularly considering large 10,000+ sq ft homes and managing full home automation. If you're familiar with Crestron or Control4, we're playing in that space for v1.
I have little to no experience with houses that size (I live in a condo), but I understand the concept and the market segment, yes. It mentions that this functions locally, so I take it there's a server involved, that I imagine is some decent portion of this cost?
You may wish to find some way to clarify that market segment in your marketing, I think a lot of HN visitors might assume this is going to be a $99 hub or something similar. We need to see that value represented, because we are definitely going to dig up that price tag. ;D
It's not that far off when you consider replacing just light switches with Z-wave switches in a 2000-2500sq ft home. Figure there's probably 25-30 light switches in a house that size. You're looking at about $900-$1200 worth of equipment just to control just whole room lighting. If you went the Phillips Hue route, as an example, you'd easily approach that with 2-3 rooms worth of light bulbs.
Add in control of individual outlets, motion sensing, thermal controls, security ... it adds up pretty quick, even for homes of the "middle class".
Thanks! I am honestly amazed so many people were able to get to that page. We plan on going live with the website next week and right now it takes a fair bit of digging around to get there. That said, it is technically "public" facing so I'm fixing it up as soon as Stripe permits. Will probably disable the button in the meantime if you go back.
But I have literally everything that this is doing already set up in my home right now using an Amazon Echo, which is $200, instead of $10,000.
I can say "Trigger lights for projector", and lights dim how I want it. Or "Trigger lights for guests" and bing, lights. I can say "Play my Lumineers station on Pandora", and my Echo will play it. And of course I can ask Echo all kinds of things like "What is the distance to the moon?" or "What is the population of India?"
Plus a lot more.
So what is the value add here? How is this different/better than Echo + IFTT + a Wink hub? I'm really genuinely curious, because this is something I am doing right now and am totally willing to spend money on to do better, but based on this short video I have no idea how this is better...
So I guess take that as constructive feedback from (presumably?) your target audience -- show me something really cool this does to get me to consider spending $10k on it.