Hacker News new | past | comments | ask | show | jobs | submit login

Did they release any trained model like Google did for word2vec?



Conceptnet Numberbatch (https://github.com/LuminosoInsight/conceptnet-numberbatch) is a pre-trained model that outperforms the results reported in this paper (and of course far outperforms the pre-trained word2vec models, which are quite dated).

Here are the almost-comparable evaluations:

              fastText    Numberbatch
    en:RW          .46           .601
    en:ws353       .73           .802
    fr:rg65        .67           .789
The difference actually should be larger: Numberbatch considers missing vocabulary to be a problem, and takes a loss of accuracy accordingly, while FastText just dropped their out-of-vocabulary words and reported them as a separate statistic.

I'm using their Table 3 here. I don't know how Table 2 relates, or why their French score goes down with more data in that table.

What's the trick? Prior knowledge, and not expecting one neural net to learn everything. Numberbatch knows a lot of things about a lot of words because of ConceptNet, it knows which words are forms of the same word because it uses a lemmatizer, and it uses distributional information from word2vec and GloVe.


It would be nice to have a FB-curated classification model set, but I wonder if it would be much more than sentiment labels (as is mentioned). Those are a dime-a-dozen.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: