
How Vector Space Mathematics Helps Machines Spot Sarcasm - svenfaw
https://www.technologyreview.com/s/602639/how-vector-space-mathematics-helps-machines-spot-sarcasm/
======
daveguy
Looking at the paper here:
[https://arxiv.org/abs/1610.00883](https://arxiv.org/abs/1610.00883)

Can anyone tell me what the P R and F "features" mean in table 2?

They use F-score (
[https://en.wikipedia.org/wiki/F1_score](https://en.wikipedia.org/wiki/F1_score)
) which doesn't take into account true negatives at all. Maybe not an issue
since they are evaluating the same set of GoodReads quotes across all four.

The unique features contribute to an F-Score improvement of around 1% with
baselines in the 50-80% range. That doesn't seem to be a significant
improvement and the F-Score is just a within-test percentage measure of how
good the technique is at measuring sarcastic vs not sarcastic. Both of their
unique features give improvements across all encodings, but they aren't
additive and they differ depending on encoding. That may be statistically
significant in some sense (regardless that the improvements are around 1%),
but I don't see any statistical evaluation.

Table 2 apparently shows baselines for word2vec, but I expect the baselines
are different and the improvements are relative to the appropriate baseline.

The model of how the features determine sarcasm is created using SVM (
[https://en.wikipedia.org/wiki/Support_vector_machine](https://en.wikipedia.org/wiki/Support_vector_machine)
).

Can someone with experience in NLP comment on this paper?

~~~
cerrelio
P = precision R = recall F = F-score

F-score is just the harmonic mean of precision and recall.

[https://en.wikipedia.org/wiki/Precision_and_recall](https://en.wikipedia.org/wiki/Precision_and_recall)

The paper isn't that extraordinary. Sarcasm detection is considered a hard
problem. However, if you get a result that's "better" than some other
published paper, you usually publish your work. You get (at most) a 5% boost
in F-score using embedding features. Word embeddings are easy to work with, so
it's not usually difficult to add them in with commonly used NLP features. You
can give your model metrics an effortless bump.

Also, it looks like there could be a deep learning model that performs better
already ([https://techxplore.com/news/2016-08-deep-neural-network-
appr...](https://techxplore.com/news/2016-08-deep-neural-network-approach-
sarcasm.html)).

I've done two deep learning projects at work. I started learning about the
techniques this past spring. They're ridiculously good, especially if you have
a ton of data. Feature engineering (like in the paper) is usually a laborious
process, and it often requires you to be knowledgeable in the problem domain.
With neural nets if you have a reasonable architecture and choose sensible
data representations (so your backpropagation converges), it can often "just
work" without much tweaking.

------
petters
“Vector Space Mathematics”? I think “linear algebra” is a more well-known
term.

~~~
matk
lol, yeah; ML seems to be an area where people like making new terms a lot.

~~~
philipov
It's been my experience that the most common cause of making up new terms is
not knowing the existing ones.

~~~
gohrt
The article's authors don't even seem to know what sarcasm is.

Their example “A woman needs a man like a fish needs a bicycle.” is not
sarcastic so much as colorful analogy.

The researcher's word2vec approach is really a search for 'non-sequitur' \--
any sentence that semantically goes of the rails at the end, it's probably
better at detecting humor in general than 'sarcasm'

~~~
webmaven
Mmm... Seems somewhat sarcastic to me in terms of subverting the simile-as-
framing-device (in roughly the same way as "I need that like I need another
hole in my head"). But in terms of which side of the fuzzy line between irony
and sarcasm this lies, it _may_ be closer to irony.

------
1024core
The flaw in this approach is that many words do not have a single vector
representation. Consider "Time flies like an arrow, fruit flies like a
banana". The word "flies" has 2 distinct meanings; but the representation in
word2vec will be a single, hybrid vector which is neither here nor there. And
often sarcasm relies on this dual meaning to make a point.

~~~
trendia
space2vec can distinguish between multiple uses of the same word. I suspect
that the current version would struggle with that sentence, but it's possible

[0] spacy.io

------
narrator
This is why I love it when I post a comment to HN and people can't tell if I'm
being sarcastic or not.
([https://news.ycombinator.com/item?id=12698590](https://news.ycombinator.com/item?id=12698590))

I'm thinking in the future, an adversarial neural network could be used for
just this purpose. Take my sarcastic comment and alter it just enough so that
it appears non-sarcastic or vis versa. It's kind of like using neural style to
mess with the face recognition algorithms of social media profile pictures.

~~~
webmaven
This seems like a good match of methodology to the problem.

Sarcasm seems rather adversarial by nature: the goal in delivery is often to
signal, with intonation, only _just_ enough to disambiguate intent. In highly
competitive environments this results in deadpan delivery being common or even
the default, and sarcasm having to be inferred entirely from content &
context.

(consider the opposite: delivery with an _exaggerated_ sarcastic tone: It's
actually insulting and rude regardless of the content.)

People who haven't been exposed to such high sarcasm environments rely much
more strongly on intonation, and correspondingly find it relatively difficult
to identify sarcasm in text (again, this matches well with GAN failure modes).

OTOH, I have personally been inadvertently fooled the _opposite_ way,
listening to a speaker with growing joy at his elaborate windup of the
surrounding audience in a social setting, only to realize "holy shit, this guy
is actually seriously a neo-nazi advocating eugenics. Back away slowly."

------
KasianFranks
Vector space is the place:
[https://www.google.com/?tbm=pts#tbm=pts&q=%22vector+space%22](https://www.google.com/?tbm=pts#tbm=pts&q=%22vector+space%22)

I see vector space approaches as the tip of the spear when it comes to
advancing NLP/AI.

------
mrcactu5
are you kidding? what is the vector space being use in word2vec?

