

Sirius: An Open Intelligent Personal Assistant - tectonic
https://github.com/jhauswald/sirius

======
brianshaler
Projects like this are great. Too many valuable services exist within a walled
gardens that are actually just straightforward pieces of software that anyone
should be able to run without having to buy a specific cell phone or sign up
for an account on an ad publisher's site/app.

I can't help but wonder, though, if the way we tend to go about it can really
affect change in a meaningful way. How many people are actually going to go
through the effort of setting this up and running it? And keep it running? And
put up with inevitable bugs? And updating it? How well can it inter-operate
with other types of services? While they've gone above and beyond to document
a contribution policy and use GitHub Issues, is it only possible to add a
feature by hacking on a huge, monolithic project?

I don't know the best path forward, but it seems like the first few questions
could be addressed by something like sandstorm or docker. The rest is
architectural, and it's not a problem with this project so much as it's a lack
of a standard for extensibility or interoperability in the OSS community,
which has enough trouble getting past issues of language and init system
preferences..

~~~
mraison
_> monolithic project_

Actually it's not. Looking at what is on Github, it seems that the project is
mainly about bridging a few open-source modules (kaldi, sphinx, pocketsphinx,
openephyra) together to make an end-to-end system. So if you want to
contribute, you can always work on improving one of the base modules (that
handle things like speech recognition, question answering, etc).

------
michaelbuckbee
I'm all for seeing the source, but a better starting point to see what Sirius
is capable of is their videos page at:

[http://sirius.clarity-lab.org/category/watch/](http://sirius.clarity-
lab.org/category/watch/)

Also, the site was having some DB connection errors with the attention, so it
looks like you can view the same vids directly on YouTube at:
[https://www.youtube.com/channel/UCEiLPIvZiW4jq9ZMonYdhcg](https://www.youtube.com/channel/UCEiLPIvZiW4jq9ZMonYdhcg)

~~~
otoburb
Digging a little deeper (since the Clarity Labs site is still throwing the DB
connection errors), Jason Mars is a professor at University of Michigan. A
Sirius paper[1] is listed on his homepage[2]. The second publication link
about Protean Code is also extremely interesting. It seems Prof. Mars focuses
on Warehouse Scale Computing (WSC), and Sirius and Protean Code[3] are two
noteworthy research results described by his group as having immediate
commercial applicability.

[1] [http://web.eecs.umich.edu/~profmars/wp-content/papercite-
dat...](http://web.eecs.umich.edu/~profmars/wp-content/papercite-
data/pdf/hauswald15asplos.pdf)

[2]
[http://web.eecs.umich.edu/~profmars/index.html%3Fp=8.html](http://web.eecs.umich.edu/~profmars/index.html%3Fp=8.html)

[3] [http://www.cse.umich.edu/eecs/about/articles/2014/Protean-
Co...](http://www.cse.umich.edu/eecs/about/articles/2014/Protean-Code.html)

------
guru_meditation
The Clarity Lab group reuses mature open source libraries for speech
recognition, computer vision AND question answering.

They link these into a whole using what appears to be a minimal amount of glue
and then proceed to optimize certain components by trying out to make a
multithread (hence CPU multicore) or a GPU port for bottlenecks.

In my opinion, a major contribution is a GPU-optimized Machine Learning
thingamajig (GMM) that's plugged back into a well known frequently used speech
recognition library (Sphinx).

Superficially, one would expect a publication to some AI, ML or Robotics
workshop/conference/journal. However, if you spend time digging through the
source code you will see that there is zero contribution in these areas. Hence
their first publication is to what appears to me to a be a systems research
publication. The conclusion of their paper talks about total cost of ownership
in datacenters by using GPUs etc.

The amount of marketing and vague biz-dev type speak almost made me discard
this whole site as BS, but now I see that if nothing else, they've given away
a GPU accelerated ASR library under a BSD license. Cool work, dudes and
dudettes.

------
Animats
Nice idea. But the demos are disappointing. The promotional video is awful.
Talking heads and stock shots of a server farm. Please.

The demo video is better. But the question answerer seems rather dumb. There's
no indication that it has any context; that is, it doesn't seem to use what
you said previously in any useful way. The screen message for the recognition
of the Empire State Building says "Parsed metadata", so they may have not
actually recognized the image at all. Also, answering "Where is the Empire
State Building" with "New York" is not particularly useful if you need
directions.

Context is the hard problem. Without conversational context, it's like having
voice input to a search engine. If you don't get the answer you want, you have
to ask a new question, treated independently of the original one. You can't
continue the conversation in the area of interest.

~~~
mixedmath
It seems to me that what you really want is proper implementation of
coreference resolution [1]. This is extremely hard and extremely poorly done
so far. Perhaps with more open sourcing, the problem would be more widely
tackled. Or perhaps not.

[1]:
[http://en.wikipedia.org/wiki/Coreference#Coreference_resolut...](http://en.wikipedia.org/wiki/Coreference#Coreference_resolution)

~~~
Animats
Coreference resolution is a start. You need that just to disambiguate
questions.

I wonder how well Hello Barbie does at this.

------
frik
question-answer.tar.gz contains JAR files like Javelin.jar, which code
vanished from the public web.

 _Ephyra also contains some components that have been adopted from the JAVELIN
system._ , though the link is dead:
[https://web.archive.org/web/20081203004045/http://durazno.lt...](https://web.archive.org/web/20081203004045/http://durazno.lti.cs.cmu.edu/wiki/moin.cgi/Javelin_Project)
and it seems there exists no archived source code of it.
[https://web.archive.org/web/*/http://durazno.lti.cs.cmu.edu/...](https://web.archive.org/web/*/http://durazno.lti.cs.cmu.edu/javelin_public/releases/)
(not archived). Javelin was used in the TREC competition, and there a lot of
papers about it e.g. [http://www.cs.cmu.edu/~llita/papers/lita.javelin-
trec2005.pd...](http://www.cs.cmu.edu/~llita/papers/lita.javelin-trec2005.pdf)

I wrote the maintainer of Ephyra an email.

~~~
HillRat
Appears that the JAVELIN homepage has now moved to
[https://mu.lti.cs.cmu.edu/moin/Javelin_Project](https://mu.lti.cs.cmu.edu/moin/Javelin_Project).
However, the code and data page appears to be locked behind an authentication
wall.

~~~
frik
Thanks a lot!

I haven't received an answer yet from the Ephyra maintainer.

------
rcpt
Science Friday did a good podcast on a related project:

[http://www.sciencefriday.com/segment/11/21/2014/would-you-
tr...](http://www.sciencefriday.com/segment/11/21/2014/would-you-trust-a-
robot-to-schedule-your-life.html)

They discuss "Amy" from x.ai. Turns out that x.ai needs to employ an "AI
trainer" (discussed at 7min in) to handle all the situations that the
algorithms can't.

------
zyxley
So... just saying here, but with "Siri" in the name, the first thing the title
made me think was "oh, is it a clever hack to pipe Siri's processed text into
a different program?".

~~~
cnp
"Siri" \-- Apple, closed source -- amended with "us" \-- the community. Not a
clever hack but a pretty awesome statement.

------
ribs
I'm worried by the lack of package definitions in the Java source I looked at.
You shouldn't be putting Java classes into the global namespace. I have to
wonder what they were thinking.

~~~
javajosh
Pretty sure they were thinking about implementing AI. You'll often see
computer scientists running roughshod over engineering best practices for the
simple reason that _they aren 't engineers_. And you know what? That's okay!

------
PSeitz
They all use Sphinx for speech recognition under the hood, which sadly works
very poor for me.

