

LSA: A Solution to Plato's Problem - JMiao
http://lsa.colorado.edu/papers/plato/plato.annote.html

======
urlwolf
I can comment on this, since I did my PhD work with Landauer and Kintsch and
was the webmaster of lsa.colorado.edu for a while.

The current scene of statistical semantics is very active right now. I'm not
sure that the SVD is easier to implement than the probabilistic versions (LDA,
topics model). The svd code LSA uses required sparse matrices, and the code
runs using the Lanczos algorithm; it also needs to place the entire matrix in
memory at some point. This limits the scale of the corpus you can deal with.
The probabilistic versions are iterative, and while they may take more CPU and
time, they are not memory bounded.

------
thomaspaine
I believe that PLSA is generally preferred over LSA, and LDA is preferred over
PLSA. PLSA is equivalent to LDA when you assume the prior is a uniform
Dirichlet distribution. I guess LSA is probably the easiest to implement since
it just involves a singular value decomposition, but I don't know how often
it's actually used in practice anymore.

<http://en.wikipedia.org/wiki/PLSA>

<http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation>

