

Ask HN: Review ContextSense - deduces context and sentiment for any URL/text - paraschopra
http://www.wingify.com/contextsense/

======
yannis
<http://en.wikipedia.org/wiki/The_Holocaust>

Slightly Negative (0.33)

Extract From the same link (text only as shown below) Highly Positive 0.81!

>The usual German term for the extermination of the Jews during the Nazi
period was the euphemistic phrase Endlösung der Judenfrage (the "Final
Solution of the Jewish Question"). In both English and German, "Final
Solution" is widely used as an alternative to "Holocaust".[16] For a time
after World War II, German historians also used the term Völkermord
("genocide"), or in full, der Völkermord an den Juden ("the genocide of the
Jewish people"), while the prevalent term in Germany today is either Holocaust
or increasingly Shoah.

NLP is H A R D, I know and defining a range value for sentiment even harder.

~~~
paraschopra
Yup, I agree -- though more data provides more oppurtunities to make a better
guess.

------
paraschopra
Hey guys, my startup (Wingify) has exposed an API for determining sentiment
and context from any URL or piece of text. I'll be glad if you could review
it.

ContextSense was made to demonstrate our contextual targeting capabilities --
and sentiment aspect was added to avoid traditional blunders of displaying ads
on pages/news on catastrophies. Best to try ContextSense with text heavy URLs.
Also try it with news items (both positive and negative).

I will be happy to provide API access in case any of you is interested in
trying it out.

~~~
elcron
Cool! I tried something like this, it classified blog posts into 100 different
moods... it had ~8% accuracy (It also classified as positive or negative, but
I don't remember the accuracy, I'll have to dig it up at some point...). Good
luck with it.

~~~
paraschopra
Do you remember what was it? I will be really curious to have a look. Thanks.

~~~
elcron
It was accurate around 56% of the time classifying as positive, negative, or
neutral. I used a naive Bayesian classifier and guessed the highest rather
than showing the percent positive.

------
byoung2
I searched for
[http://school.discoveryeducation.com/schooladventures/slaver...](http://school.discoveryeducation.com/schooladventures/slavery/)

 _Few human practices have provoked such deep and widespread outrage as the
practice of one human being enslaving another. So why has slavery survived for
thousands of years? How did it become so important to civilization? Explore
the ways that slavery has been woven into the fabric of societies in America
and around the world._

Sentiment: Highly Positive (0.77)

Seems off in cases where the tone of the text is neutral but the subject
matter is negative.

------
rwolf
The comments so far refer to the "sentiment deducer" aspect of the service.
They've brought up some examples of how it's not working right now, but they
missed an important question: what in the world is "positive sentiment"
abstracted from context, and why would I want to know if a site had it?

An issue we haven't brought up is tags. Current tags for HN: "ago, points,
comments, com, hours, hour, hacker, discuss, minutes, tornado." I see some
good ones, but it's quite noisy (and it seems to have grabbed the tornado
story as a main topic--scraping the title?). This feature seems most useful to
me, and most easy to improve.

~~~
paraschopra
No, not just a title but the whole website. The algo currently works on TF-IDF
of the tags, and HN is full of points, ago, hours, etc. repeated all over it.
This increases the TF. We would be putting an upper limit on it though.

~~~
rwolf
I didn't mean the title tag--I meant the first story title (the SEO kids talk
about google fixating on the first h1 tag of a site, I though it might be a
similar idea). I have no idea what this upper limit change would do, but good
luck.

------
frig
I put this in:

<http://en.wikipedia.org/wiki/C*-algebra>

The results are better than I would've guessed, but not really so great (this
article is pretty degenerate I think, versus more "human-interest" type
stories).

One thing I'm curious out: the list of topics is here:

People (7.88) Algebra (4.47) Research Groups (3.41) Software (2.42) Math
(2.03) Functional Analysis (1.53) Journals (1.01) Science (0.71) Publications
(0.64) Logic and Foundations (0.6)

...does looking at the content give you any sense for why "People" comes in
first?

~~~
paraschopra
Yes,the false positive rate is a not the least and we are working on
decreasing it!

~~~
frig
I guess what I'm asking is: is "People" just a default answer?

~~~
paraschopra
No, People is not a default answer. It is due to underlying taxonomy data.
Working on refining the database.

------
erikwiffin
Reading the other comments, it looks like your biggest problems are false
positives. It might not be a bad idea to put together some kind of feedback
mechanism - in the event that I disagree with the sentiment/tags/categories
that I could provide better ones and you could eventually use that data to
refine your algorithm.

That being said, this looks really cool and full of potential, now I just need
a reason to be auto-generating that kind of meta data.

------
onewland
I think positive/negative is too shallow. I actually do enjoy this, but I
think you need to rate sentiments on other vectors, so if I rate
<http://teddziuba.com/> I get something like:

Anger: 25 Smugness: 50 Vulgarity: 15 Boredom: 5

etc.

I'm not even sure that's a good example or if it's clear, I just think
positive/negative is an oversimplification of the sentiment of most sites.

~~~
wlievens
Whatever you do, Ted Dzubia will break any scale.

------
quizbiz
Have it recommend changes. Tell us what we can do to improve.

------
mishmax
Would be nice if it didn't need http in front of a url.

------
DanielStraight
Can't get it to work. Won't load.

~~~
paraschopra
Which browser? Platform? And which URL?

~~~
DanielStraight
Firefox 3.5.3 on WinXP 64 with the URL that comes up when you load the site.
The status spinners just keep spinning forever.

~~~
thorax
What Firefox extensions? Works great here on XP 64.

~~~
DanielStraight
None that should interfere... Firebug, Site Launcher, Colorzilla.

I typically have cookies disable, but I turned them on and it made no
difference.

