The promise of this quantifying is that results are more objective and precise, so everyone can come to rational agreement. In practice, it doesn't do that. What attributes are chosen to quantify, how they're scored, how they're weighted—these are all subject to a great deal of fiddling and constant debate. "You over-weighted X!" or "You didn't consider Y!" Et cetera. Truly never-ending, and unless participants already well-aligned, doesn't secure genuine consensus.
Results are also highly perturbable. Tweak the weights and/or scores but a little and they tell an entirely different story. New winners emerge, clear victories become dead heats, and the former Red Lantern Award winner is suddenly in the middle of the pack.
Something else to consider is that weights scaling linearly may not make sense.
More importantly, criteria may overlap. In the example in the article, I suspect technical ease and scaling ease are highly correlated, which effectively means you're double counting.
Basically the technique only works when you have fully independent criteria which cover the full spectrum of what matters and which can be weighted objectively using a scale that represents true relative importance.
Then, I created two conversations. One was about our confidence on those estimated ratings together, and the second was the proposed weights and whose they were, which I revealed after so we could add up what the model provided and have a much faster discussion about whether it yielded what we ought to do.
Nobody wants to be subject to process, but almost everyone wants to appeal to it to get their way, so I wouldn't use this tool to make decisions themselves, but instead to make higher quality ones on teams faster.
Clearly someone who disagrees with you will just tell you you got the weights and scores wrong.
The main value of this exercise is to sit down and think about a problem in a slightly deeper way and try to rationalize why we think solution A is better than solution B. That is good in itself. But people shouldn't mistake a mental exercise for a quantitative analysis.
Aka, the I.T. House of Quality!!!!
I've found this approach works generally well for humans, however results may not be pareto-optimal.
When following this kind of approach I typically define the criteria before the solutions. This reduces the chance of adding bias to your criteria.
A handy mnemonic: Meaning before metric, measure before method (2MBM)
Reminds me of the process of designing scientific experiments. State exactly what you care about and measure it directly (not the proxies). Define the metrics before conducting the experiment and a way to aggregate these metrics. Pitch different approaches against each other by evaluating them on these metrics and choose the best one in a fair manner by detaching yourself from any approach.
The blog puts forth a nice framework that one can actually remember to apply in real life. I often unintentionally treat scientific decisions differently than real life decisions (not as quantitatively). So this is a nice way to force you to define your preferences to the best extent that you can. I also like the aspect of having this framework in a team and understanding everyone's POVs, allowing for more transparency. Often in larger groups, discussions tend to meander with viewpoints all over the place. This should also force people to start with and end with 1 objective metric at a time. I bet this exercise is quite fun when done with a team.
A strength of DEA is that criteria are not weighted but that alternatives are compared against the efficient frontier (similar to risk/return in a Markowitz portfolio). DEA is very useful for benchmarking many alternatives.
AHP uses pair-wise comparison which is less prone to manipulation than scoring is but it needs a group of people that do the comparison.
Decisions made by consensus are not necessarily the optimal strategy (as defined by efficacy to achieve targets). This method simply reflects that.
However, the benefit of this method is that all participants are forced to be more rigorous about why they hold certain beliefs, and then making an attempt to aggregate those preferences to drive an outcome.
I’m curious whether it can be modified to find global maximum, but avoid McNamara fallacy
I guess the main issue will be the quality of input data