Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Link to original HN submission: https://news.ycombinator.com/item?id=14337275

It's worth noting for future reference that in terms of supervised learning of labels given a text document input, fasttext (https://github.com/facebookresearch/fastText) is leagues ahead of conventional approaches in both accuracy and training speed, and there is a Python interface (https://github.com/salestock/fastText.py) for use with Django/Flask (unfortunately, recent fasttext changes have broken the interface for now).



Can you suggest any unsupervised learning? I want to take a body of text associated with users and come up with keywords/topics with each user. Thanks! :)


Some fairly widely-used techniques include LSI, LDA, and word2vec or doc2vec. There a lot of different techniques out there! I'm one of the creators of Tagger News, and we used LDA with python's Gensim package. Here's a good tutorial: https://radimrehurek.com/gensim/tut2.html


Note that fasttext is the next generation of word2vec/doc2vec, and shares many of the same creators.


How does fasttext compare to vowpal wabbit?



Vowpal wabbit is approx. 4-8x faster than gensim but the accuracy will be less compared to gensim.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: