

First Pass at Palin Emails by Opani - DrJ
http://opani.com/help/sarah-palin-email

======
onecreativenerd
Here's a first pass at auto-discovering topics in that text using LDA (Latent
Dirichlet Allocation) in R: <http://opani.com/ryan/sarah-palin-email-
topics/148845806827/>

The biggest deficiency now is the stopwords list is not working 100% so
mundane topics are creeping in. I'll have limited time to play with it in the
next few days, so feel free to make some cool discoveries. :)

------
janesvilleseo
Awesome work, but I find it strange that all of the emails in the example post
are @yahoo.com.

~~~
alphaG77
This has been explained in the news by a number of journalists -- I believe it
is because, Palin used her yahoo email account to discuss "off-the-record"
thoughts with various members of her administrative staff. However, her dumb
ass didn't realize that by sending from a yahoo account to staff email
accounts, makes those received emails subpoena-able for public review. So the
folks requesting the release of email records, were also able to get inbound
email records from staffers received from Palin's yahoo as well as emails sent
from staffer personal email accounts that were inbound to Palin's non-personal
account. What is missing from the entire data set would be emails from Palin's
yahoo account to staff personal accounts and vice versa.

As one journalist I saw on the news discussing this issue, what's interesting
is the stuff that falls between the lines, i.e. redacted or missing from
certain conversations that might be referred to.

