http://www.nvidia.com/object/gpu-accelerated-applications-te...
But it is a little bit out of date when it comes to version numbers.
If you're going to try TensorBox (https://github.com/TensorBox/TensorBox) it will get a bit harder still because of conflicts and build issues with specific versions of TensorFlow.
There has to be an easier way to distribute a package.
That said, all this is super interesting and Google really moved the needle by opensourcing TensorFlow and other ML packages.
See http://reference.wolfram.com/language/guide/NeuralNetworks.h..., also look at Examples > Applications under http://reference.wolfram.com/language/ref/NetTrain.html for some worked examples. Fun example of live visualization during training (very easy to do, will get even easier in future versions): https://twitter.com/taliesinb/status/839013689613254656
The annoying thing for GPU training is handling the cudNN dependency, which Google's guides are annoyingly lacking.
-(Eng) We need to switch to this new NLT framework
- (VP) Ok, why? Which one is it?
- (Eng) Huh, it's called Parsey McParseface, developed by ...
- (VP) WTF? Don't waste my time with jokes, go build your own
- (Eng) But ...
- (VP) Meeting's over.
...and apparently will release a major version update today. Ouch.
I do wish SyntaxNet were a bit easier to use. A lot of people have asked for SyntaxNet as a backend for spaCy, and I'd love to be using it in a training ensemble. When I tried this last year, I had a lot of trouble getting it to work as a library. I spent days trying to pass it text in memory from Python, and it seemed like I would have to write a new C++ tensorflow op. Has anyone gotten this to work yet?
>On the time-honoured benchmark for this task, Parsey McParseface achieves over 94% accuracy, at around 600 words per second. On the same task, spaCy achieves 92.4%, at around 15,000 words per second. The extra accuracy might not sound like much, but for applications, it's likely to be pretty significant.
If spaCy is able to increment the accuracy and maintain the large performance gap, it'll still be my go-to NLP framework!
[0] https://explosion.ai/blog/syntaxnet-in-context
[1] https://github.com/tensorflow/models/tree/master/syntaxnet
Alexa, turn the lights on in the kitchen.
Alexa, turn on the kitchen light.
Alexa, light up the kitchen.
Should all accomplish the same task using this framework.
Just simply treating the sentence as a bag of words and looking for "on" or "off" or "change" (and their synonyms) and the presence of known smart objects works extremely well. I could say "Hey Marvin, turn on the lights and TV", or "Hey Marvin, turn the lights and TV on", or even "Hey Marvin, on make lights and TV."
(It's named Marvin it after the android from The Hitchhiker's Guide, my eventual goal is to have it reply with snarky/depressed remarks).
Adding 30 seconds of "memory" of the last state requested also made it seem a million times smarter and turns requests into a conversation rather than a string of commands. If it finds a mentioned smart object with no state mentioned, it assume the previous one.
"Hey Marvin, turn on the lights." lights turn on "The TV too." tv turns on
The downside to this approach is I would be showing it off to friends, and it could mis trigger. "Marvin turn off the lights." lights turn off "That's so cool, so it controls your TV, too?" TV turns off But it was mostly not an issue in real usage.
Ultimately I've got the project on hold for now because I can't find a decent, non-commercial way of converting voice to text. I'd really rather not send my audio out to Amazon/Google/MS/IBM. Not just because of privacy, but cost and "coolness" factor (I want as much as possible processed locally and open-source).
CMUSphinx's detection was mostly very bad. I couldn't even do complex NLP if I wanted because it picks up broken/garbled sentences. I currently build a "most likely" sentence by looping through sphinx's 20 best interpretations of the sentence and grabbing all the words that are likely to be commands or smart objects. I tried setting up Kaldi, but didn't get it working after a weekend and haven't tried again since. I don't really know any other options to use aside from CMUSphone, Kaldi, or a butt SaaS.
I've wanted to add a text messaging UI layer to it. Maybe I'll use that as an excuse to try playing with ParseySaurus.
Same concern here... so my voice->text method is via android's google voice - forced to offline mode. The offline mode is surprisingly good.
Re mis triggers... I also have opencv running on the same android. It only activates the voice recognition when I am actually looking directly at the android device (an old phone).
May also be interpreted as:
Alexa, set fire to the kitchen
The bit about guessing the part of speech, stem, etc. for previously unseen words should (I think) make it much more useful in contexts that succumb to neologizing, verbing nouns, nouning verbs, and so on (such as business writing, technical writing, academic papers, science fiction & fantasy, slang, etc.).
I wonder how well it would do at parsing something that seems deliberately impenetrable, like TimeCube rants, or postmodern literary criticism.
