
Ask HN: Can we build Ironman's Jarvis with 2019 tech? - hsikka
I was rewatching movies in preparation for the new Marvel movie, and I felt some nostalgia and childhood fascination when I saw the depiction of Jarvis in the Ironman movies.<p>I work in ML research and am currently in graduate school, and I know we&#x27;re nowhere near real intelligence. But some of those features, question generation, voice commands, object detection, image reconstruction, and others are certainly doable.<p>Do you think we could build something starting to approach Jarvis in 2019, at least in function i.e. helping your everyday work?
======
escapologybb
I'm pretty much doing that very thing. I'm quadriplegic, so every single
action I want to take in the world has to be mediated through a third party.
That said party is anything from another person to one of the digital
assistants.

As another poster mentioned I think, I control pretty much everything in my
house including but not limited to unlocking doors, windows, climate control,
the lights, posting to Twitter and on and on and on with a cobbled together
solution of many different components from different companies.

It's done with a combination of all three of the digital assistants, various
scripts and the very wonderful Home Assistant[^1] mostly gluing it altogether.
It is by no means a single complete solution, but when I'm controlling the
house using just my voice and somebody sees me doing it they react as though
I'm some sort of dark wizard.

I love having an automated house and I think Home Assistant is probably one of
the best solutions for making all of the different IoT devices communicate at
the moment, I think the further down this road we go the more a single
solution will probably evolve.

I would have to say though that the home automation stuff that enables me to
call for help in an emergency is the most important thing that it does, I can
flash the lights different colours when I need help depending on the level of
assistance I need. Or if my Fitbit notices that my heart rate has gone below
40 BPM for more than 10 seconds I get a notification, as it is almost
indicative of an attack of Autonomic Dysreflexia. I can honestly say that this
home automation system has actually saved my life without an ounce of
hyperbole.

[^1]: [https://Home-assistant.io/](https://Home-assistant.io/)

~~~
oulipo
And if you want to add a 100% offline and private-by-design Voice AI which can
run on a Raspberry Pi 3 (and iOS, Android, Linux), take a look at what we are
building at [https://snips.ai](https://snips.ai) (it also integrates to Home
Assistant)

It works for english, german, french, japanese, spanish, italian, and more
coming soon!

~~~
guzik
Looks awesome! Any plans for FreeRTOS?

~~~
dvndvn
Snips.Ai runs as docker container in home assistant. I'm sure you can docker
stuff on FreeRTOS

------
setquk
Siri is still half deaf and tries to send me to the chip shop when I ask it to
send a message and they have a ridiculous amount of budget behind that so
probably not.

At the moment our technology is like Zorg’s desk in Fifth Element. Stuffed
full of toys but ultimately will let you down too often to be relied upon.

~~~
arethuza
"send me to the chip shop"

Do you have an accent by any chance - pretty much all voice recognition
technology (Android, Apple, cars...) appears to be utterly flummoxed by my
fairly mild Scottish accent.

[Edit: to be fair, my accent is probably fairly mild by _Scottish_ standards,
not compared to RP].

~~~
hondadriver
Must be it, a golden oldy about Scottish speech recognition (2010):
[https://youtu.be/NMS2VnDveP8](https://youtu.be/NMS2VnDveP8)

~~~
arethuza
I have to stop myself posting that every time the subject comes up (I know
what it is without opening the link) :-)

------
ovi256
You could probably get 90% of the utility with a janky collection of shell
scripts fronted by a voice recognition engine. Think Alexa/Google
Assistant/Siri/etc with a bunch of company- or task- specific scripts. I
purposefully ignore the robot-arm in an open universe capability, because
that's still too far away, AFAIK.

Jarvis in the movie was Hollywood smooth: made to look _great_ in a movie.
Even the goofs were great, and it behaved like a puppy, to be likable to the
audience. You don't need that in a work assistant. But if you remove that,
plus the robot-arm, you're left with a voice controlled Outlook assistant,
which is useful, but not sexy.

~~~
threeseed
Actually you could've built Alexa/Siri etc back in 1997 with DragonDictate and
a whole bunch of scripting.

And in many areas it would've been just as good as what we have today.

~~~
TeMPOraL
I had a better UX around playing music with WinAMP, Microsoft Speech API, and
a little program I wrote to glue the two together via Win32 API WM_ messages,
in 2007, than I have today with Google Now.

Today the problem is a) the voice recognition part, which happens on-line
though it shouldn't (introducing large latency and unnecessary Internet
dependency), and b) scriptability, which is something none of the voice
assistant vendors want to give to you, for business reasons. A better reality
would have voice parsed on-device (possibly with models from on-line service
providers), and available as a voice-to-text API that could be easily accessed
by local scripts.

------
undecisive
I'm guessing this is where the Dragon Helmet [1] is headed - it has software
(open source) that hopes to fulfil the responses and command execution. And of
course mycroft [2] et al.

I think the real problem is getting the computer to do things that you haven't
programmed it to do. If you imagine an Alexa / Google home where you have a
conversation like:

"Hey Homexa, do a web search for an image of a piece of cheese, enlarge it to
2000x1500 pixels and email it to me please"

If it then came back with "Ok, please tell me how to enlarge an image" and you
could give it step by step instructions, that would be amazing. But I don't
think the comprehension required is quite there yet. But when we do, and
especially if we can crowdsource the learned commands, things could get very,
very interesting.

[1] [http://dragon.computer/](http://dragon.computer/)

[2] [https://mycroft.ai/](https://mycroft.ai/)

------
basetop
I don't think we are anywhere close to Jarvis which is general AI ( aka real
AI aka "conscious" AI ) no more than we are anywhere close to building
Ironman's nanotech suit. But we are improving on "dumb/unconscious" AI like
Alexa, Siri, etc that could help with day to day life whilest spying on you
24/7.

------
harrygeez
I think before we talk about capabilities we should talk about UX.

Even the first version of Jarvis shown in Iron Man 1 was way ahead in terms of
understanding speech and context compared to the cutting edge we have now that
is Google Assistant.

At least Google Duplex sounds super natural now. I think when digital
assistants can really understand us, adding all those capabilities is just a
piece of cake. But I have a feeling we are just probably at most a decade away
from something that vaguely resembles that, and I'm super excited about it.

------
_bxg1
The movies (and the major real-life assistants) tend to focus on the idea that
you're talking to your computer as if it were a person, but this makes the
problem an order of magnitude more difficult than it needs to be. What if we
looked at it more like a vocal command prompt? What if we applied the Unix
philosophy of composing programs and data transformations into more complex
use-cases? You'd have to memorize the commands (though there could be vocal
"man pages" too), but could finally do more than toggling lights and asking
what the weather is like.

There's some precedent for this; in Avengers Endgame (I'll keep it vague),
Tony gives some very precise and technical commands to the system when running
a scientific simulation. He's not conversing and quipping, he's basically
calling parameterized functions with his voice. I think that would be very
doable today.

------
guzik
Ha, this is how we've been advertising our wearable, personal assistant
(aidlab.com) during the launch on Facebook (such a copyright infringement!).
We've been facing the hardware limitation to add the NLP directly on the MCU,
as we wanted to omit the 3rd parties. Secondly, adding ‚reasonable’ voice
commands like setting appointments, wiki search or alarms is nothing we could
have done alone, so we’ve discontinued this idea for now.

If we define J.A.R.V.I.S. as a pseudo-intelligent assistant, with a basic
understanding of voice commands and biosignal tracker then it’s certainly
doable, as there are some commercial implementations. Otherwise (like real-
time voice chat, or even accurate suggestions based on every-day habits), it
is still too soon IMO.

------
harperlee
I think we are closer than we think.

The problem is that all voice control up until now is a closed app with
simplified intents.

In my opinion what we need is a programming language whose REPL UX is voice-
oriented, that eases the separation between using and extending the system.
Prolog or a lisp could be quite near but we would require some changes such
as: syntax easily read (:- in prolog is not very good), the ability to manage
ums and ahhhs, the ability to “play” or “test” with functions through voice,
etc.

~~~
decasteve
We need some type of Structured Query Language, call it SEQUEL, or maybe
abbreviated SQL for short. /s

Jokes aside, maybe a LISP-like Sentence Processing Language might be the
future of this type of HCI.

~~~
harperlee
Look into Attempto Controlled English for an alternative closer to what I was
thinking about :)

------
silversconfused
I wrote a little shell script that lets you move the mouse around with your
voice, ask what time it is, etc. My thinking for future applications is that
specific appliances should take specific commands, so there would be little
intelligence needed. Oven: preheat to 350 degrees. Lights: on. Music: off.
Etc.

------
new_guy
Sure you can. Services like Alexa are just glorified paperweights and aren't
reflective of real world capabilities.

If you want to build your own Jarvis, don't listen to anyone else just get
stuck in. You'll be surprised just how far you can get with it.

~~~
kaybe
You could also build on top of one of the open source versions, eg Mycroft.

[https://mycroft.ai/](https://mycroft.ai/)

~~~
krick
That's interesting. Basically the only reason I don't use Alexa or something
like this is that I don't want 3rd party listening to me 24/7, so an open
source alternative would be of great use. But I don't understand: it says it's
open-source, yet the first thing I have to do is to create an account on
_their_ platform. So, what's that "open source" thing I ought to download and
install, is it just a client? Can I use the thing completely self-hosted?

Also, how does it compare in terms of usability to Alexa/Siri? In fact, I'm
not even sure I've seen a good open-source TTS at this point, nothing
comparable to proprietary stuff. Is this mycroft thing better?

~~~
ewang1
I've recently started playing with a Home Assistant set up and came across
Snips ([https://snips.ai/](https://snips.ai/)). A bunch of Raspberry Pis with
microphones running Snips is what I'm looking to try next. IMO, the
offline/on-device processing is a key differentiator.

------
sandreas
Well, for the speech detection part, I think, there are some projects that
look promising.

I recently found [https://github.com/gooofy/zamia-
speech](https://github.com/gooofy/zamia-speech), which works for english and
german. German is pretty bad atm, but english should work fine.

Nice part: You only have to spin up a docker container and a python script and
can perform offline speech to text :-) Microphone input is also supported.

------
rsyring
If you are really asking for it to be like Jarvis, where "like" implies some
reasonably similar approximation, then:

No. Absolutely not. Not even close.

------
thewhitetulip
Surprised that nobody mentioned this

[http://jasperproject.github.io/](http://jasperproject.github.io/)

~~~
synesthesiam
Shameless plug:
[https://github.com/synesthesiam/rhasspy](https://github.com/synesthesiam/rhasspy)

Rhasspy is inspired by Jasper, but works by having you specify intents/voice
commands via a grammar (rather than via Python). It then trains a speech and
intent recognition system together to recognize those commands. Recognized
commands are published over MQTT or POST-ed directly into Home Assistant.

------
alexheikel
You would be able to integrate Hal
([http://halisback.com](http://halisback.com)) that’s is the best assistant
out there to all the different components from different companies and make
him do basically everything

------
overcode
If we could, we would have. But we haven't, so we can't.

~~~
krageon
Without context, this statement means nothing new should ever be done. It's a
terrible axiom. Even within the context of this question, if we used this
reasoning for the next 500 years we would surely never solve it.

~~~
overcode
Oh sure, but that's why you have the context. My point is that the time gap
between a tool like this becoming technically possible and someone executing
it successfully will be incredibly small. So the chances of the current moment
in time being in that gap is tiny.

------
kthejoker2
Everyone here is focusing on the voice assistant and IoT piece, but the thing
that's really missing is the idea of a contextually rich personal API and some
POSSE mechanism to selectively share that API with others and vice versa.

Quantified Self movement is maybe 30% of the way.

Until this space matures we're kind of a standstill. Commercial options have
to be generic or face the wraith of GDPR, data breaches, and privacy
advocates.

The only way to square the circle is let people own their data completely and
then incentivize them to share with your service for the benefits you provide.

I think the killer app is an Alexa/JARVIS clone that "spies on you 24/7" but
keeps your digital twin 100% offline and owned by you. It's the learning and
the "personal schema" that's compelling.

When it knows _why_ you want to turn on the lights we're getting somewhere.

