
Implementing a CNN for Text Classification in Tensorflow - dennybritz
http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/
======
cogware
Great tutorial - Well written and good patterns for TensorFlow usage, e.g.
checkpointing, name scopes for cleaning up the graph visualization, and using
summaries/TensorBoard, and also nice explanations of the concepts.

Though I'm curious why you used VALID padding not SAME for the conv layers? It
seems like it would be simpler to use SAME.

Also, minor nit: TensorFlow and TensorBoard should both have two letters
capitalized

~~~
dennybritz
Thanks! I think VALID and SAME are probably giving the same results. The
reason I used VALID is only because the original paper seems to be doing that
as well.

I will fix the capitalization!

------
primaryobjects
Looks neat. Why did you bother using <PAD> words to have sentences be the same
length, when you're using a bag-of-words (document-term matrix) model anyway?

Each sentence vector ends up being the length of the vocabulary, so they're
already the same length. You can probably drop step #3 in this case.

~~~
dennybritz
Hi! It is not using a BoW model. Each input sentence is a vector of size
[sentence_length] (or, in theory, a matrix of size [vocab_size,
sentence_length] with one-hot vectors) so the padding is required.

There is a way to do it without padding, but it's less efficient from a
training point of view. You could instantiate a new network for each possible
sentence length then share the paramaters between them, and then batch based
on your sentence length.

Also, the padding isn't striclty necessary in theory. The feature vector will
always end up being the same length, regardless of sentence length, due to the
pooling layer. However, Tensorflow forces you to specify the exact size of the
pooling operation (you can't just say "pool over the full input"), so you need
it if you're using TF.

------
nl
This looks pretty nice. It's worth pointing to the seq2seq TensorFlow example
which covers a lot of similar topics.

Is there an example anywhere of how to initilize from the word2vec embeddings?

~~~
dennybritz
I'm not sure if there is an example somewhere in the TF docs, but initializing
a variable is pretty easy. All you need to do is:

session.run(W.assign(numpy_word2_vec_matrix)). W would the embedding matrix
created in first layer of the code. [1]

Of course you'd first need to load word2vec and filter its vocabulary to match
your own vocabulary. That's most of the code and not specific to TensorFlow.
You could use gensim [2] for that.

[1]
[https://www.tensorflow.org/versions/master/api_docs/python/s...](https://www.tensorflow.org/versions/master/api_docs/python/state_ops.html#Variable)

[2]
[https://radimrehurek.com/gensim/index.html](https://radimrehurek.com/gensim/index.html)

~~~
nl
BTW, your CNN for NLP post is interesting too. You might find _LSTM-BASED DEEP
LEARNING MODELS FOR NON- FACTOID ANSWER SELECTION_ [1] (from the IBM Watson
team) interesting.

They combine a CNN with a LSTM for question answering on complex, non-factoid
questions. Their LSTM+Attention model performs slightly better, but it's a
pretty interesting approach.

[1] [http://arxiv.org/abs/1511.04108](http://arxiv.org/abs/1511.04108)

