> sarcasm is labelled by the author
They literally just searched out "/s". Clever. Though I'm guessing the "independently verified" entailed reading a lot of those comments.
Did they also read through the nonlabelled comments to catch any unlabelled sarcasm? (Guessing not since the pitch is of "self labelled sarcasm") wonder if that'll trip any usage up.
Seems from what was said above that this is something that has not been taken into account.
I haven't read it in a few years, and my copy is at my parent's house in another country, but his writing always avoided the obtuse, impenetrable style that a lot of linguists are unfortunately guilty of. It is also approachable for anyone without a linguistics background.
I had some fun exploring the data so I wrote a short blog post about it: https://davefernig.com/2015/10/19/the-lowest-form-of-wit-mod...
And this sort of thing happens both with written and oral communication, unless I really focus on providing facial and other body language clues as to my intent, which I find to be somewhat annoying. I am, after all, of Scandinavian extraction, and excessive emotional expression is not only frowned upon culturally, it has also been systematically bred out of my genetic code for dozens of generations.
And to be fair, I find the lack of emotional outbursts to be a rather enjoyable part of society. It lets me relax and not have to keep track of so many "approved" emotions to keep track of.
So Reddit is 0.2% sarcastic. That sounds accurate.
My daugther ended it with: "I'm afraid we're caught in a sarcasm trap."
I'd be worried two sarcasm bots would end up similarly entangled.
Marginally related: on cs.CL the other day was "Punny Captions: Witty Wordplay in Image Descriptions"[0]. A mashup of these two projects would bring us that much closer to the dream of Social Media In A Box.
[0] https://arxiv.org/abs/1704.08224
