
Stop Words as Social Signals (a result that goes against IR common knowledge) - hamilton
http://www.cpdiehl.org/2010/04/social-signaling-and-language-use.html
======
klochner
One of the two linked papers (AnnualReview.pdf) discusses formal settings,
including those with disparities in power:

'Brown & Levinson’s (1987) polite-ness theory takes into account an
individual’s efforts to preserve the “face(s)” of others with whom one
communicates. For example, they propose impersonalizing the speaker and hearer
by avoiding the pronouns I and you, using past tense to create distance and
time, diminishing the force of speech by using hedge words such as perhaps,
using slang to convey ingroup membership, and using inclusive forms (we and
let’s) to include speaker and hearer.'

------
conanite
Amazing that [to, to, on, my, to, on, be, on, to, to ... etc] can reveal so
much. One of the linked articles (
[http://homepage.psy.utexas.edu/homepage/faculty/pennebaker/r...](http://homepage.psy.utexas.edu/homepage/faculty/pennebaker/reprints/Chung&JWP.pdf)
) describes how the relative frequency of the use of "I" in communication is a
powerful indicator of status (less "I" => higher status).

~~~
patio11
Stop word distributions are also sufficient for identifying gender in most
English corpora with about 80% accuracy -- and you don't need much text,
either.

[Edit to add linky: [http://u.cs.biu.ac.il/~koppel/papers/male-female-text-
final....](http://u.cs.biu.ac.il/~koppel/papers/male-female-text-final.pdf) ]

~~~
conanite
I wonder would James Chartrand have been found out ...
<http://news.ycombinator.com/item?id=994284> [Why James Chartrand Wears
Women’s Underpants]

~~~
patio11
I put her four posts written right before the "reveal" post through the first
gender guesser I found on Google, and they all came out female. That proves
essentially nothing, granted, but this has been studied fairly extensively. If
you want to do twenty minutes of work with automation you can grab everything
she's ever posted on CopyBlogger and run it through your classifier of choice.
Then, heck, run the guys on CopyBlogger against it and see if you peg them
right, too.

Natural language processing remains my first love.

------
tlb
He seems to be claiming that in the phrase _I don't feel comfortable signing
up for a $7,000 extra flight without talking to you guys_ , it's the word _to_
that signals the manager/subordinate relationship. It seems to me that _...
without talking ... you_ is the best clue.

In English, subjunctive case usually requires a _to_ , and people asking
permission use the subjunctive a lot. Is that the signal he's finding? Is that
the best NLP can do?

------
sesqu
I was always uncomfortable with stopwords. Ignoring some data smells like
overfitting, or perhaps a naïve algorithm (as in, it has no concept of
structure – which is often the case, for computational reasons).

That said, the last time I dabbled in IR, my algorithm choked on stopwords. I
should revisit it.

------
Vivtek
Good _Lord_ that is counterintuitive!

------
rozim
See also SpotSigs ("...stopwords may however be very good indicators of the
actual interesting parts of a web page...")

[http://infoblog.stanford.edu/2008/08/spotsigs-are-
stopwords-...](http://infoblog.stanford.edu/2008/08/spotsigs-are-stopwords-
finally-good-for.html)

------
lallysingh
Outside the old post-2001 TIA/NSA efforts, what can this be used for?

Maybe for determining social hierarchies for mind share in product
recommendations or advertising?

Yeah, I can't think of one noncreepy way to use this.

------
akshaybhat
I think the following phrase is unnncessary "a result that goes against IR
common knowledge"

IR aims to retrieve information and not sentiments from the text. This kind of
work is related to sentiment analysis, where use of stops words as signal is
common. Thus the whole argument that some holy criteria in IR has been shown
to false is clearly incorrect. Even original authors don't make such
assertions.

