I can't help but wonder, though, if the way we tend to go about it can really affect change in a meaningful way. How many people are actually going to go through the effort of setting this up and running it? And keep it running? And put up with inevitable bugs? And updating it? How well can it inter-operate with other types of services? While they've gone above and beyond to document a contribution policy and use GitHub Issues, is it only possible to add a feature by hacking on a huge, monolithic project?
I don't know the best path forward, but it seems like the first few questions could be addressed by something like sandstorm or docker. The rest is architectural, and it's not a problem with this project so much as it's a lack of a standard for extensibility or interoperability in the OSS community, which has enough trouble getting past issues of language and init system preferences..
Actually it's not. Looking at what is on Github, it seems that the project is mainly about bridging a few open-source modules (kaldi, sphinx, pocketsphinx, openephyra) together to make an end-to-end system. So if you want to contribute, you can always work on improving one of the base modules (that handle things like speech recognition, question answering, etc).
"I can't help but wonder, though, if the way we tend to go about it can really affect change in a meaningful way. How many people are actually going to go through the effort of setting this up and running it? And keep it running? And put up with inevitable bugs? And updating it? How well can it inter-operate with other types of services? While they've gone above and beyond to document a contribution policy and use GitHub Issues, is it only possible to add a feature by hacking on a huge, monolithic project?"
How do ideas and open source generally work? How does a tree grow?
This project is trying to provide replacements for Google's various Android services:
This fits into that sort of attitude. I have not idea how mature or useable microg is.
Also, the site was having some DB connection errors with the attention, so it looks like you can view the same vids directly on YouTube at: https://www.youtube.com/channel/UCEiLPIvZiW4jq9ZMonYdhcg
They link these into a whole using what appears to be a minimal amount of glue and then proceed to optimize certain components by trying out to make a multithread (hence CPU multicore) or a GPU port for bottlenecks.
In my opinion, a major contribution is a GPU-optimized Machine Learning thingamajig (GMM) that's plugged back into a well known frequently used speech recognition library (Sphinx).
Superficially, one would expect a publication to some AI, ML or Robotics workshop/conference/journal. However, if you spend time digging through the source code you will see that there is zero contribution in these areas. Hence their first publication is to what appears to me to a be a systems research publication. The conclusion of their paper talks about total cost of ownership in datacenters by using GPUs etc.
The amount of marketing and vague biz-dev type speak almost made me discard this whole site as BS, but now I see that if nothing else, they've given away a GPU accelerated ASR library under a BSD license. Cool work, dudes and dudettes.
The demo video is better. But the question answerer seems rather dumb. There's no indication that it has any context; that is, it doesn't seem to use what you said previously in any useful way. The screen message for the recognition of the Empire State Building says "Parsed metadata", so they may have not actually recognized the image at all. Also, answering "Where is the Empire State Building" with "New York" is not particularly useful if you need directions.
Context is the hard problem. Without conversational context, it's like having voice input to a search engine. If you don't get the answer you want, you have to ask a new question, treated independently of the original one. You can't continue the conversation in the area of interest.
I wonder how well Hello Barbie does at this.
Ephyra also contains some components that have been adopted from the JAVELIN system., though the link is dead:
https://web.archive.org/web/20081203004045/http://durazno.lt... and it seems there exists no archived source code of it. https://web.archive.org/web/*/http://durazno.lti.cs.cmu.edu/... (not archived). Javelin was used in the TREC competition, and there a lot of papers about it e.g. http://www.cs.cmu.edu/~llita/papers/lita.javelin-trec2005.pd...
I wrote the maintainer of Ephyra an email.
I haven't received an answer yet from the Ephyra maintainer.
They discuss "Amy" from x.ai. Turns out that x.ai needs to employ an "AI trainer" (discussed at 7min in) to handle all the situations that the algorithms can't.