

Show HN: We just launched a Bayesian-Based Sentiment Tracker - spxdcz
http://www.groubalcsi.com/

======
toast76
Call me crazy, but things going UP is generally good. I don't see how you can
logically describe something as having an "increase in worsening
satisfaction".

"AT&T is ranked 1st out of 244 brands"... they must be AWESOME...oh wait, no
they're not.

It says it's a customer satisfaction index, when it's actually a customer
DISsatisfaction index.

Also, I don't mean to sink the boots in but... "that is awesome that you got
SVU to discuss the boycott of the pedophile book on Amazon! I cannot wait to
see how it goes!" Is that a good comment or a bad comment?

------
random42
This is not how sentiment analysis work (or should work). I worked on
something similar (Naive Bayes based sentiment analyzer
<https://github.com/mohitranka/TwitterSentiment>). I also work for a company
which is in the same space as groubalcsi.com (Brand/Product opinion mining)

Sentiment analysis is not a _classification_ problem (like spam detection),
but it is an _identification_ problem, because sentiments are always
associated with an entity (and attribute, if specified).

For example, a tweet saying, "Dell is not as good as apple" requires to
identify entities (Dell and Apple) and associate sentiments to them (Negative
and Positive, respectively). It is incorrect to try to associate sentiment
(whatever it may be) to the tweet itself.

------
rayval
Interesting but possibly flawed exercise. It would be good to show the entire
set of brands sorted from bottom (i.e., good) to top (i.e. bad).

I sorted the data and present here two groups:

1\. This is a sample of supposedly the most satisfying, from the best on down
(er, up): TGI Fridays, Best Western, Zenith Electronics, JVC, Chili's,
Denny's, Hampton Inn, Olive Garden, Applebee's, Sams Club, Yahoo, AOL.

2\. By contrast, here is a sample of some of the worst, listed from the top
(high dissatisfaction) on down: Wikipedia, Apple, Nokia, Facebook, Volkswagen,
YouTube, Amazon, Nike, Sony, Ikea, Range Rover, Rolex, Porsche, Google,
Netflix, Louis Vuitton, CNN, American Express. Wall Street Journal, Intel.

Group 1 and 2 do not overlap in their scores.Meaning that Intel (the best of
the worst) is at 404, with a higher dissatisfaction rating than AOL (the worst
of the best).

This grouping does not make sense to me, because if you showed me the two
lists above and asked which of these two sets had better satisfaction scores,
I would have picked Group 2 over Group 1.

What could explain this? Perhaps there is demographic skew, in that down-
market brands (Dennys, Sams Club, Zenith) are not talked about as much among
upscale social media people, who would rather complain about Apple, Sony, and
Porsche.

Or perhaps there is a mismatch of expectations. People expect the premium
brands to deliver more, and complain loudly when they fall short in the
slightest. And conversely, perhaps people expect a mediocre experience with
downmarket brands.

------
tel
What are the units of dissatisfaction used throughout the page? How do they
map to the y-axis of the dissatisfaction graph? What sense of scale do I need
to have to understand the units? Is a 945 bad? How bad? Is hate linear? Since
AAPL scores roughly half as much as AT&T does that mean that the average
Twitterer hates AAPL half as much? What happens if someone scores a perfect
1000? Can they be hated no further?

What time zone is the next update measured in? What makes your classifier
'Bayesian' besides just using something called a 'Naive Bayes Classifier'?
What is the 90% accuracy determined from? Why should I care? Is a 24-hour
improvement in customer satisfaction a significant thing? How quickly does
hate fluctuate? What is your uncertainty in each of these measurements? Is
there an overall brand hate level that I can compare these things to? How are
they affected by overall sentiment toward companies?

    
    
        ------
    

It's an interesting complementary site to your primary interest in Groubal.
I'm just a skeptic to methods in sentiment analysis in general. To analyze
data properly is very hard. Applying tools to observe what happens is still
interesting though.

But I'm not sure I learned a whole lot to see graphs proclaiming that
Twitterers dislike AT&T, Time Warner, Banks, Internet Providers, and Zynga.
Tylenol and Enterprise were interesting to find though. I have no idea what it
means for Tylenol to be 100 units less hated though.

So perhaps what you should tune your ML stuff to seek out is not just some
difficult to quantify measure of dissatisfaction, but instead look for things
like Tylenol and Enterprise where people might not expect themselves to have
such trouble with the brand. In such a case, it becomes automatic, insightful
rabble-rousing instead of methodologically sparse hate-ranking.

------
spxdcz
If anyone's interested, we're using the Google Graph API for all the graphs
(the spark lines and the big transparent ones at the top), and the Bayesian
stuff is based on the PHP work I wrote up here: [http://danzambonini.com/self-
improving-bayesian-sentiment-an...](http://danzambonini.com/self-improving-
bayesian-sentiment-analysis-for-twitter/)

EDIT: Also, we're not really using it yet, but I thought it was interesting
how you can also easily calculate the 'agreement' on sentiment by using the
MySQL STDDEV function (or similar) to work out the variance in sentiment.

~~~
acangiano
Just FYI, clicking on a company name that has been removed, leads to a 403
error (e.g., Rogers).

Also, I sort of broke the site by passing a non-existing company name (e.g.,
<http://www.groubalcsi.com/company/ted>).

~~~
spxdcz
Thanks so much - I knew I could rely on HN'ers to find these things. I'm
hoping the Rogers one is a one-off (we're just adding it this morning), but
I'll double check all of this. Thanks again.

EDIT: just fixed the 'TED' (company doesn't exist) issue. Thanks!

EDIT2: just fixed the Rogers issue too. Thanks! (Plus, I _love_ Coda for
making my life easier/faster for versioning and uploading changes!)

------
physcab
pretty cool. I would change high meaning bad and low meaning good, unless its
a rank out of the total. Its a bit counter-intuitive. Why do you just place an
emphasis on dissatisfaction instead of giving the option to look at both?

~~~
spxdcz
Thanks for the feedback.

Yeah, certainly the 'high = bad' thing is something we grappled with (and
still do). The site is a sister-site to a consumer-complaint/petition website
(<http://www.groubal.com/>), hence we're more interested in
measuring/highlighting who is doing 'badly'. But yes, this could be done in a
more intuitive way (showing the 'bottom' of a graph that had the axis in the
traditional orientation, for example).

~~~
SkyMarshal
I think you only need to change the wording to clarify it:

'Lowest Satisfaction' => 'Most Disatisfied' or 'Highest Disatisfaction' or
'Most Complaints'

'Customer Satisfaction Index' => 'Customer Disatisfaction/Complaint Index'

'High scores are negative' => 'High scores indicate high disatisfaction' or
'Increasing scores indicate increasing disatisfaction'

High scores bad, low scores good.

Link 'high' and 'increasing' with 'bad' and 'disatisfied' a little more
explicitly and consistently, since most people make the opposite association.

~~~
mdda
Amount of dissatisfaction is too mushy IMHO. Just title it "Crapometer" or
"Hate-o-meter".

Better yet, just flip the Y-axis. I'd think that it would be easier to get a
company to pay to improve upwards. Do you really have to match the sister
site?

------
jhamburger
Something came to mind that will skew this heavily. People will mention a
company by name for mainly one of two reasons, either to complain or to tell
people about some cool new thing. If someone mentions a ubiquitous company
like Google, Verizon, etc, it's usually to complain. They're probably not
telling the world about the wonders of Google search. On the other hand, if
someone mentions a smaller company it's probably the cool-new-thing factor.

~~~
endtime
Are you sure about that distinction?

"Google builds robotic car"

"Why Groupon is terrible for merchants"

------
DeusExMachina
Is the time span so short just because of initial lack of data? If not, I
think it would be useful to change the span of the graph to more than a week,
to understand the long term trend. Fore some of the lines I see high
fluctuations, so the graph is not so meaningful.

------
jhamburger
I like how "iHop" is capitalized as if it were Steve Jobs' take on the pogo
stick.

------
rhizome
I don't know if it's intentional, but I would expand your scope beyond
complaints. If your math is good you could have a very nice reputation tracker
and analytics package in general.

~~~
spxdcz
It picks up on any Twitter/FB updates that we can measure a discerning
'sentiment' for, so although it's limited to companies/brands at the moment,
it could certainly be used for other things.

------
rorrr
Google is #7 worst web company? <http://www.groubalcsi.com/sector/web-based-
services>

BMW is the worst in motor vehicle? <http://www.groubalcsi.com/sector/motor-
vehicles>

~~~
spxdcz
Just an example, some of the latest sentiment on Google, that would suggest
why it hasn't got a great ranking (though also, it's not a terrible ranking):

Ok, my phone or Google Voice is unable to pick up calls when I press '1' to
accept, and I can't connect via Gizmo5. WTF?

Google gps sucks all of a sudden

WTF? I open up Google and the first thing i see is "Will Justin Bieber get
naked for Love Magazine?" and im like WHAT???

