
Deep Learning Sentiment Analysis for Movie Reviews Using Neo4j - kennybastani
http://www.kennybastani.com/2014/09/deep-learning-sentiment-analysis-for.html
======
lightsidelabs
How does this compare to other benchmarks? It looks like you're using the
sentence-level dataset out of Cornell, based on your Github. Even a naive
unigram baseline easily beats the 70% threshold you mentioned in your post. A
few years ago, I co-authored [1] a publication with very similar graph-based
features on this dataset that achieved 77% accuracy, and the state of the art
has moved beyond that since then. Without a comparison to baseline it's hard
to tell whether this (much more sophisticated) technique is adding value.

[1] Shilpa Arora, et al. "Sentiment classification using automatically
extracted subgraph features." Proceedings of the NAACL HLT 2010 Workshop on
Computational Approaches to Analysis and Generation of Emotion in Text.

~~~
kennybastani
The benchmarks really depend on the classifier used. The explanation in this
blog post is more about the natural language parsing model. I thought that
part up based on books I read from thought leaders (Gleick, Kurzweil, J.
Hawkins). The value I'm hoping to provide is less in my own research and more
on the use of a graph database. I'll leave the research part to fine people
like you.

As for benchmarks, I've seen differences at different sample sizes during
training. The model seems to do better with more training examples. Though
that increases the number of features and the dimension of the vectors when
calculating cosine similarity. I'm really hoping to attract more input like
this as Graphify grows as an open source project. Please feel free to get in
touch with me. Skype is kenny.bastani.

I'll post benchmarks in the next blog post.

------
izyda
Nice post but when you say you used deep learning, what exactly do you mean?
You describe your method for picking your features and then you used deep
learning to find features from that presumably should be the most informative
for classifying.

It would be helpful to know what specific deep learning algorithm
(convolutional, deep belief?). Or at the very least, what / who's
implementation of neural nets did you use in your model and how it compares
performance wise to the more conventional tools in NLP (when you give them the
same original features to start with).

~~~
kennybastani
Great question. To be honest, I used deep learning algorithms as a metaphor
into Neo4j's property graph data model. Graph databases like Neo4j store data
as a graph, which is a similar data structure to a neural network. I store
weights in the relationships based on the frequency that a feature has been
matched from the low-level representations near the bottom of the tree, to
higher-level representations.

So there are two parts, there is building a natural language parsing model and
then there is a Vector Space Model classifier that uses TF-IDF weights as
vectors to calculate the cosine similarity between inputs.

I explain more about the high-level idea here:
[http://bit.ly/1lMjSm5](http://bit.ly/1lMjSm5)

Let it be known that I've arrived at most of this stuff by means of intuition
and graph data modeling in Neo4j. I'm a hobbyist when it comes to the machine
learning stuff. My goal is to show how amazing a combined
application/persistency solution, like a Neo4j extension, is for solving these
kind of machine learning problems.

People smarter than me should take a look at it to solve similar problems.

