

Automatically generating a concept index for freeform notes - rsaarelm
http://jsomers.net/blog/semantic-notes

======
samstokes
Worth a read for the vector maths interpretation:

 _I could calculate the vectors for all of my notes and use something like the
k-means algorithm to find semantically-related clusters of notes._

If you're familiar with information retrieval techniques, there's probably
nothing new here, but eye-opening if you're rusty like me.

~~~
thomas11
Look up Latent Semantic Indexing (LSI) on how to group related terms to
concepts.

------
chime
This is very similar to something I did few years ago with 6-7 years of my
blog entries. I wrote a script that generates timeline-based tag-clouds from
plain-text: <http://chir.ag/projects/tagline/> and here's an example:
<http://chir.ag/projects/preztags/>

The basic algorithm is nearly the same and it does use stemming (though not
synonyms, just related spelling). It takes an XML input file and spits out an
HTML file with the required JS embedded.

~~~
LiveTheDream
The presidential speech example is really interesting. Do you have that
cleaned-up dataset?

