

Text Understanding from Scratch Using Temporal Convolutional Networks - drewvolpe
http://arxiv.org/abs/1502.01710

======
sgt101
Astonishing? 3% better than bag of words after n days training on GPU's? I
have misunderstood because I am not astonished.

~~~
bradneuberg
It's at the lower level character rather than word level which is unique. The
convolutional net also doesn't need to be told what each word's role is, but
rather learns that feature itself.

~~~
ninjin
> It's at the lower level character rather than word level which is unique.

No, it is not unique. We have, among other things seen character-level
language models (Sutskever et al. 2011) [1] and character-level part-of-speech
tagging (Santos et al. 2014) [2]. What is unique are the convolutional
aspects.

I am still for from convinced. The baselines are really weak sauce, sure, new
datasets and wanting to use the same baseline for all tasks, but a Bag-of-
Words model is pretty much the weakest baseline there is for Natural Language
Processing tasks. Also, using the 5,000 most frequent words will hurt the BoW
model for plenty of tasks since it will cover mostly function words rather
than rare nouns due to the Zipfian nature of language. It is pretty much
common knowledge that these rare nouns can be far more useful than function
words for tasks such as topic classification.

[1]: [http://www.cs.utoronto.ca/~ilya/pubs/2011/LANG-
RNN.pdf](http://www.cs.utoronto.ca/~ilya/pubs/2011/LANG-RNN.pdf)

[2]:
[http://jmlr.org/proceedings/papers/v32/santos14.pdf](http://jmlr.org/proceedings/papers/v32/santos14.pdf)

~~~
fspeech
Not an expert here but could the fact that character based techniques work at
all indicate that linguistics inspired ML may be superfluous? The authors here
argued that the biological based consideration should point to phoneme based
training. As the Chinese romanization corresponds tightly to phonemes (no
irregular pronunciations as in English) the approach worked well with Chinese
pinyin even though the native Chinese written system is totally different with
thousands of characters.

What is interesting to me is that if ConvNet works well both for language and
for visual processing that may well be because the human circuitry for
processing both are very similar, while formalized grammar is at a different
level (like logic) above speech as opposed to the linguistic view of a
universal grammar undergirding speech.

------
eva1984
What surprises me is that (BOW model + logistic regression) works just fine in
most of the benchmarks(except for Amazon Review), interesting paper anyway.
Could it be that because the vocabulary for BOW is limited to 5000, a lot of
information is lost?

~~~
ameasure
Fascinating paper but the benchmarks seem incredibly weak. 5000 features for a
bag of words model is nothing,these models normally have tens or hundreds of
thousands of features.

~~~
MayanAstronaut
True, it comparisons do a lot better with more features.

This paper looks to just show the major winning aspect of using CovNets as
they do not need many features as the deep net learns its own representations
of the training data. It more to show CovNets work on more then just vision.

But architeching the pooling layers IS adding complex to the simple input
feature set. Therefore the comparison should be of only state of the art ML.

------
sushirain
Open questions:

* Compare to RNNs with character level input.

* Compare to dedicated methods of sentiment analysis and topic categorization.

------
petercooper
This paper is pretty fascinating, thanks! Having trouble visualizing it but
getting there..

