
Do Social News Sites Deteriorate? (Analysis of 1.8M HN Comments) - tansey
http://blog.effectcheck.com/2011/05/31/do-social-news-sites-deteriorate/
======
pg
"Thus, if we look at the PG analysis as a measure of his mental state, we can
speculate this is due to the prosperity he’s experienced in these last few
years."

Actually it's because the site has become so large that I have to be more
diplomatic. When it was small I could say harsh-sounding things without
worrying about being misinterpreted. Now I have to be more of a politician.

~~~
tansey
That was my other hypothesis, but it didn't really correlate well with the
growth in the site's user base. HN has experienced an exponential growth rate,
but your comments show a linear decline in negative impact. I went with the
"better life" theory because it seems like personal prosperity and happiness
would be more likely to show gradual, linear change. This is also supported by
the trends I saw in other famous top HN users, particularly those that I know
have experienced increased financial prosperity.

~~~
staunch
Instant link bait if you let us look up our own charts.

------
tansey
Hi everyone,

So there seems to be a lot of skepticism about whether our algorithm can
actually measure emotional impact accurately. For the long answer, I'll refer
you to the about page[1] for EffectCheck. For the short answer:

My co-founders are an AI PhD and a Clinical Psychology PhD. They spent three
years curating a huge dictionary of words using a methodology similar to the
Harvard Psychosocial Dictionary [2], but with the twist that they were focused
on lexical impact of words. The dictionary is pretty accurate at measuring
both impact and sentiment [3]. For example, we can predict Amazon reviews as
being positive or negative, using the stars to validate if we are correct--
blog post on that coming soon.

Regarding context of the word usage: Both behavioral studies and fMRI scans
have confirmed that context is not as important as one might believe. Our
brains process multiple meanings of words in parallel, and the emotions
associated with those words linger in our subconscious even after we know the
correct context. Similarly, in cases of reviews and comments, people who use
hostility-evoking words are often hostile themselves (angry people tend to
make others angry) and the same is true of the other five fundamental emotions
we measure.

Happy to answer more questions. Also happy to analyze any data that you would
like to see in order to verify accuracy of the algorithm-- just give me a link
or the text. :)

[1] <http://effectcheck.com/about>

[2] <http://www.wjh.harvard.edu/~inquirer/homecat.htm>

[3] Note that sentiment is correlated to impact only if the writer or speaker
is writing without detailed attention to word choice. For example, political
speech writers comb over every word to make sure they have the desired
impact-- thus, the final text likely has little correlation to the sentiment
of the orator.

~~~
hugh3
I can easily believe that you can predict quite well ratings of Amazon reviews
by a simple keyword-counting algorithm. Just by counting words like "good",
"enjoyed", "fantastic" vs "terrible", "boring", "awful" you're gonna get a
very strong correlation in that limited domain.

But what tests have you done on your broader methodology? What experiments can
you really do to figure out the extent to which the use of the word "nosegay"
is correlated with _actual_ depression?

Also, as someone else said, where are the error bars? If there really _is_ a
correlation between word choice and other metrics, then some simple statistics
should give you error bars on your other metrics, right?

Oh, one more thing: in the example on your website you say that the sentence:

"That joke kills me!"

is "subconsciously" aggressive. My question: would your algorithm rate that at
exactly the same level of aggression as the sentence:

"I'm gonna kill you!"?

cuz, y'know, intuitively one seems rather more aggressive than the other.

~~~
rhizome
It seems that the point is to introduce their special-sauce black box, with an
argument to authority about its methodology. I think the correlations you ask
for are where the problems will lie, in that there is a value judgement that
is being hidden. If I can put myself out on a limb here, I'd say that that
measurement is going to be fundamentally unscientific.

------
noelsequeira
I don't mean to come across as hostile or undermine the effort that's been
invested, but I can't help but be a tad skeptical about these visualizations,
especially given the fact that the author (in what might be construed as
cavalier fashion) presents seemingly nebulous metrics like they are absolute
matter-of-fact ("Anxiety/Confidence Ratio", "Hostility/Compassion Ratio",
"Depression/Happiness Ratio").

It would certainly help if the algorithm used to compute these metrics were
shared and dissected. I tend to believe that sentiment analysis is much more
art than science, given how tricky and profound context can be.

To conclude, I'm not going to end this comment on a negative note. I'd much
rather reserve my vitriol and caustic criticism for another thread and day and
abstain from calling this an attempt at gaming HN to promote a startup.

I do this not out of a lack of indignation towards what I've just read, but
because I'd like to end this seemingly hateful comment on a note that isn't
bitter or negative but is instead quite the opposite (without using a single
word that would help the OP's algorithm figure this out).

And with that, I throw down the gauntlet. Analyze this!

~~~
tansey
_> And with that, I throw down the gauntlet. Analyze this!_

Sure. Scored with EffectCheck:

Anxiety - Very High

Hostility - High

Depression - Very High

Confidence - Low

Compassion - Low

Happiness - Very Low

I will post a longer explanation detailing how/why it works, since others have
had this question as well.

~~~
sfk
I'm not sure that I understand. You have just demonstrated that the algorithm
does _not_ work.

~~~
ma2rten
Depends if you actually ment what your wrote. I think his point is that if you
did, you would have chosen a different wording from a psychological point of
view.

------
rryan
Where are the confidence intervals and error bars? In order to be taken
seriously when aggregating 1.8MM comments, you need those. The variability in
pg's plots makes me think the data for any individual is going to be just as
noisy.

~~~
tansey
We are looking at a time-based analysis of the monthly means of PG and the HN
community. The null hypothesis is that the slope of a regression line should
be 0. Since we are dealing with a time-based analysis of the mean, the
variance of each bucket is irrelevant. I re-ran the analysis to make sure that
my results were statistically significant:

Anxiety/Confidence PG: p <= 0.0007 HN: p <= 0.0001

Hostility/Compassion PG: p <= 0.0005 HN: p <= 0.0001

Depression/Happiness PG: p <= 0.0067 HN: p <= 0.0001

Note that for the HN comments, even though I used a 2nd order (parabola) fit
for the graphs, the p values above are for a linear regression as that is the
more appropriate fit for determining statistical significance here.

~~~
vecter
p values only make sense when the residuals are normal and independent. Is
that true for this data set?

------
pgroves
What jumps out at me is that the community curves all peak in the months
following May 2009, which is when the stock market bottomed. The community was
least happy when the economy was at it's "darkest before the dawn" moment.

In other words, I wonder if these graphs are just tracking general sentiment,
and would look the same for any site.

~~~
tansey
It's a good question, and I guess we'll see how well the correlation holds.
It's worth noting that the stock market bottomed in March 2009, so this may be
a lagging indicator.

------
topomorph
In case anyone wants to make their own trends (less sophisticated, but maybe
more transparent), I have an app at <http://hn-trends.heroku.com> that plots
word percentages on HN over time.

For example, some emotive words that appear to have increased since the
beginning:

* sad: <http://hn-trends.heroku.com/trends?q=sad>

* fuck: <http://hn-trends.heroku.com/trends?q=fuck>

* shit: <http://hn-trends.heroku.com/trends?q=shit>

Some words that appear to have decreased:

* great: <http://hn-trends.heroku.com/trends?q=great>

* cool: <http://hn-trends.heroku.com/trends?q=cool>

* lol: <http://hn-trends.heroku.com/trends?q=lol>

* reddit: <http://hn-trends.heroku.com/trends?q=reddit>

* digg: <http://hn-trends.heroku.com/trends?q=digg>

I haven't looked at any of this closely, so make of it what you will.

(I was meaning to wait until I had time to dig deeper and add extra features
like time series smoothers and trend/slope metrics before "releasing" this,
but figured this might be useful now given the post. I still plan on adding
those later.)

------
ChuckFrank
Certainly we can all argue about the underlying structure of the analysis of
the data. However, I think that @tansey has done two great services here:

1.He's proposed a full service taxonomy with by proposing a name for the
phenomenon SND. Which is much a much better choice than JTS - Jumping the
Shark or otherwise.

2\. He's asking how we can evaluate that phenomenon. Proposing one solution.

So the question becomes, how else can we evaluate the phenomenon and what can
we do to reduce SND?

Well having spent time elsewhere, here are a number of clear indicators of
SND:

1\. Shorter, less thoughtful responses, often veering into humor or the
absurd. With chuckles getting the most upvotes

2\. Less fact checking and less source linking in both posts and comments

3\. More image / pic posting

4\. Linkjacking, with materials not linked to the original materials.

5\. More community centered posts aka AMA etc.

6\. Fewer news links.

So perhaps simply evaluating the length of comments of that same 1.8M HN data
could support PG allegations.

The next question is what can be done to prevent SND?

I think that would be clear: Don't support the characteristics that lead to
the decline.

~~~
stcredzero
_1\. He's proposed a full service taxonomy with by proposing a name for the
phenomenon SND. Which is much a much better choice than JTS - Jumping the
Shark or otherwise_

I think "Evaporative Cooling" is descriptive and very apt.

------
nbashaw
How does the author measure anxiety, hostility, depression, confidence,
compassion, and happiness? That seems to be left out.

~~~
tansey
Hi! My startup [1] is based on an algorithm that can measure the emotional
impact of text. Our blog is where we post interesting results from applying
our algorithm.

[1] <http://effectcheck.com>

~~~
raganwald
Speaking of your startup, it appears to have "Oracle Pricing" as its revenue
model. You have to talk to a salesperson to get a price, which is often code
for "We need to know how much you can afford and how much you are willing to
pay before we quote a price."

I like everything else I see.

------
brudgers
Hypothesis: PG's mood is improving because YC of which HN is a part is
increasingly successful.

Hypothesis: The net mood of HNer's is deteriorating because it the ratio of
financially successful members is lowered as a more people join. In addition
as HN grows the ratio of preexisting and external relationships among members
gets smaller, e.g. the number of HN'ers affiliated with YC companies has not
kept pace with the overall growth in HN membership.

------
elblanco
At risk of coming out of HN retirement, a friend forwarded this topic to me. I
wrote an extensive commentary before signing-off:

<http://news.ycombinator.com/item?id=2059012>

(now back to peaceful slumber)

------
bhousel
Any plans to compare against corpora from other social news sites?

BTW, I think what you're doing is brilliant, and I can think of about a dozen
practical applications for your technology.. Ignore the naysayers.

~~~
tansey
Thanks! I'd be happy to analyze of corpora like reddit or Digg, but I don't
want to scrape the sites as that is a great way to have my IP banned. Are
there publicly-available datasets that I can download somewhere?

------
mef
Thanks for this interesting analysis.

I would challenge the assumption that the symptoms of SND include a decrease
in the positive emotion measured in the average HN comment.

Is it not possible that there is a growing segment of the HN user base which
posts and enjoys the kind of comment that induces negative sentiment in people
like pg? Would that not raise the sentiment of the average HN comment, while
at the same time lowering the sentiment of the average "good old days" user?

------
sidww2
Are the number of votes for a posts available? It could be interesting to use
the metrics given to compare posts with high vs low number of votes and could
also help validate the use of given metrics.

~~~
tansey
I did originally have a similar idea, where I started out by looking at the
top 10% of comments each month (ranked by upvotes). However, the resulting
trends looked pretty similar to the overall community:
<http://blog.effectcheck.com/top_dep_hap.png>

------
recoiledsnake
Others have done a good job at critiquing the automated analysis, so I will
concentrate on the hypothesis.

>There is a perceived phenomenon among online social news communities (e.g.
Digg, reddit) that as the popularity of the community increases, the average
member becomes baser and the overall quality of discourse decreases.

I think that's not an accurate portrayal. Having been through the "decline" of
Reddit (been reading it even before it had comments!), I observed that the
real decline was in the quality(as perceived by me) of stories that made the
frontpage(s) and in the quality(as perceived by me) of comments that were
upvoted.

In 2006, I would literally read almost every story on the first few pages and
the content was very interesting, there were long essays and articles that
were on the front page. Now? After the influx of the general public and the
juveniles(not saying this in a demeaning manner), quality has dropped in every
way. Pun and argumentative threads were the ones that were on top. I do like
humor but not just endless pages of snark and humor. Pictures or story-in-
headline posts requiring an attention span of 3 seconds tend to get the most
votes,

>If a community were suffering from SND, there are a few symptoms we might
observe:

>Anxiety, hostility, and depression would rise.

>Confidence, compassion, and happiness would fall.

>The ratios of anxiety/hostility, hostility/compassion, depression/happiness
would rise.

>Since each of the negative emotions has an opposite emotion, the last point
enables us to measure the general negative/positive trend in each of the three
main categories.

How does this analysis address the above points? All the supposed-to-be-funny
short comments probably would count as a positive under this analysis.
Reddit's community now is pretty happy, confident and compassionate. The
analysis is pretty lacking when applied to a supposed-to-be intellectually
stimulating social site.

Anyway I see that Slashdot(although not as popular now) and HN seem to have
bucked the trend. Although there is some decline on HN, it's not as as steep
or as bad as Reddit so far.

Edit: Here's an idea, do your analysis of all the comments of the first x
posts of Reddit vs. the same on HN. Could be interesting since they both have
almost the exact same format.

~~~
AJ007
Here is a metric that should be measured, reading level:
[http://www.google.com/support/websearch/bin/answer.py?hl=en&...](http://www.google.com/support/websearch/bin/answer.py?hl=en&answer=1095407)

In the 16 or so years I've been online I've watched many communities grow,
expand, and vanish. For better or worse a large community never functions the
same as a small one. Beyond theories and conjectures, it would be interesting
to know why.

------
ignifero
The solution: divide the community to more specialized subgroups. That's what
<http://textchannels.com> is doing

