
Document Clustering with Python - sytelus
http://brandonrose.org/clustering
======
rjurney
If you like this, you might like this post I wrote a while back:
[http://datasyndrome.com/post/69514893525/yelp-dataset-
challe...](http://datasyndrome.com/post/69514893525/yelp-dataset-challenge-
part-0-geographic)

------
strong_ai
It would be really interesting to see a visualization (maybe using PCA) of the
LDA vectors for each document. The topics are not super convincing that the
LDA approach worked well.

Other than that, this is a good intro to NLP and calculating document
similarity. Well done!

------
stuartaxelowen
I'm very interested in ways we can expand LDA (and other topic models) to
retain more of the meaning of the documents, especially for small feedback
(like reviews), such that a human could explore the results and find
impactful, actionable data.

Using N-grams increases the dimensionality way too much.

I've thought about word-vectors as an option, but am unsure how similar terms
could be grouped internally in LDA.

This kind of subject is probably worth a PHD thesis, honestly.

