Hacker News new | past | comments | ask | show | jobs | submit login
Understanding Cohen's Kappa in Machine Learning (surgehq.ai)
6 points by CarrieLab 54 days ago | hide | past | favorite | 1 comment

I often see subtle misuses of interrater reliability metrics.

For example, imagine you're running a Search Relevance task, where search raters label query/result pairs on a 5-point scale: Very Relevant (+2), Slightly Relevant (+1), Okay (0), Slightly Irrelevant (-1), Very Irrelevant (-2).

Marking "Very Relevant" vs. "Slightly Relevant" isn't a big difference, but "Very Relevant" vs. "Very Irrelevant" is. However, most IRR calculations don't take this kind of ordering into account, so it gets ignored!

Cohen's kappa is a rather simplistic and flawed metric, but a good starting point to understanding interrater reliability metrics.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact