

Analyst, measure thyself - wvanbergen
http://blog.noblemail.ca/2012/05/analyst-measure-thyself.html

======
msellout
Why not use logistic regression to estimate likelihood of signup rather than
clustering the population into 5 groups? I'd imagine the resulting predictive
model would not only have more direct methods for measuring prediction error,
but also provide more business insight.

~~~
carbocation
Absent underlying qualitative differences, there is rarely a good reason to
break a continuous distribution into discrete groups _for modeling purposes_
or for _model performance evaluation_ [1].

But for human consumption (and I believe this article is an example of that),
it can help. It's basically an ocular Hosmer-Lemeshow; not a rigorous or even
consistent approach to model performance evaluation, but often interesting to
those consuming the model's output. For example, we do it here to give
students a sense of what their chances have meant historically:
[https://www.parchment.com/c/college/college-1404-University-...](https://www.parchment.com/c/college/college-1404-University-
of-California,-Berkeley_chance-details.html)

[1] See the Hosmer-Lemeshow test, now uncommonly used.

