Hacker News new | past | comments | ask | show | jobs | submit login
Unskilled and unaware: Misjudgments rise with overconfidence in low performers (frontiersin.org)
61 points by aiNohY6g 9 days ago | hide | past | favorite | 16 comments





Suppose you give a test to a room full of perfectly average B-grade students who know they are average B-grade students. Most will get a B but a few will do a little bit better and a few will do a little bit worse.

Now, you focus in on everyone who got a C and you find that everyone who got a C estimated themselves as a B student. From this you conclude that low performers overestimate their ability.

Then you look at the A students and find that they all also thought they were B students. You conclude that high performers underestimate their ability.

But this is just a statistical artifact! It'a called regression to the mean and this study does not account for it. If you isolate low-performers out of a larger group you will pretty much always find that they expected they would do better (which they were right to expect). You are just doing statistics wrong!


That's not what they're doing here. They're asking the students how confident they are that they got what they think they got. Doesn't matter what the C group actually got, or what they think they got, they are still more confident that what they think they got was what they actually got than the B group, while the A group was less confident than the B group.

To be honest, I misunderstood the study when I first ready it. However, the study is also not saying what you're saying. The authors had a bunch of students take a test and also predict their own score on it, as well as how confident that they were about their prediction.

The study says "for low performers, the less calibrated their self-estimates were the more confident they were in their accuracy". By "calibrated" the authors mean that the actual and predicted scores were the same. In other words, the C and D students were very confident that they got A and Bs.

The authors go on to explain:

"In other words, [for low performers] the higher the discrepancy between estimated score and actual scores, the greater participants’ confidence that their estimated scores were close to their actual scores... As expected, high performers showed the opposite pattern. High levels of miscalibration predicted a decreased in SOJ [second-order judgment]..."

Suppose everyone in the class was a B student and knew it. After taking the class, most got Bs but a few got A and a few got Cs and Ds.

Focusing exclusively on the D students (low performers), we find that they all expected to get a B. For these low performing students, the more miscalibrated they were the more confident they were. This makes sense because they expected to get a B and didn't expect to get a C or D.

Now let's take a look at A students. It makes sense that the more miscalibrated they are, the less confident they are because they all expected to get a B.


"Overestimation and miscalibration increase with a decrease in performance"

"Common factor: participants’ knowledge and skills about the task performed."

I understand the corporate use case. Justifying impact of low performers and quantifying the potential results.

Still, this kind of research feels tautological. It'd be surprising if anyone actually wondered if adding more low performers helped anything.

Even in tasks that require no skill, adding a person who isn't performing means they won't perform well.


You cannot increase the number of wits by multiplying half-wits.


"I choose not to draw vast conclusions from half-vast data."

I think it is still meaningful because it's extremely common for management to favor hiring cheaper 'talent'. Pointing out the issues with that in various different ways is still valuable.

Had to shorten the original title, which is: "Unskilled and unaware: second-order judgments increase with miscalibration for low performers"

The edited title is more accurate.

I've had the opposite problem. I'm a front end dev and have worked with a lot of full stock people: none that I really respect. I recently came across a real personable one but at the end suffered from the same issues: believes his acquired knowledge as a backend dev transfers over to full stack. I have my own flaws but am very self aware: I don't implement anything shiny unless I thoroughly review the dom validity, responsiveness, accessibility, then finally functionality. Most people only review functionality and it's sad.

The problem in software is not that Dunning-Kruger exists, but the frequency with which it exists and how that frequency corresponds to Dunning-Kruger related research.

Most research in Dunning-Kruger related experiments makes a glaring assumption that results on a test are evenly distributed enough to divide those results into quartiles of equal numbers and the resulting population groups are both evenly sized and evenly distributed within a margin of error.

That is fine for some experiment, but what happens in the real world when those assumptions no longer hold? For example what happens when there is a large sample size and 80% of the tested population fails the evaluation criteria? The resulting quartiles are three different levels of failure and 1 segment of acceptable performance. There is no way to account for the negative correlation demonstrated by high performers and the performance difference between the three failing quartiles is largely irrelevant.

Fortunately, software leadership is already aware of this problem and has happily solved it by simply redefining the tasks required to do work and employing heavy use of external abstractions. In other words simply rewrite the given Dunning-Kruger evaluation criteria until enough people pass. The problem there is that it entirely ignores the conclusions of Dunning-Kruger. If almost everybody can now pass the test then suddenly the population is majority over-confident.


"software leadership is already aware of this problem"

What makes you so sure? In general, most security certifications HR gets excited about aren't worth the paper they are printed on.

Process people by their very nature are an unsustainable part of a poisoned business model.

The other misconception is a group of persistent well-funded knuckle-dragging troglodytes are somehow less likely to discover something Einstein overlooked.

https://en.wikipedia.org/wiki/Illusion_of_control#By_proxy


> Most research in Dunning-Kruger related experiments makes a glaring assumption that results on a test are evenly distributed enough...

Not only evenly distributed; isn't the very first underlying assumption they make, so fundamental that they never even mention it, that the tests are more accurate than the self-evaluation? Sure, over time and across a population they probably are, but that's not (as I understood it) what they measured here.

Haven't we all been there sometimes -- took a test on something we actually know pretty well, but got questions on the one sub-area we know less about (or just had a bad day), so we got a worse test result than what actually reflects our knowledge? Or the other way, took a test on something we don't know as well as we should, but lucked out with the questions hitting exactly what little we know (or got in some lucky guesses), so the test result is better than we actually deserve? I sure have.

That's another source of uncertainty, and directly relevant to what they're trying to investigate, so it feels like a big minus that they just totally ignore it.


Is there a study which has shown a decrease in Dunning-Kruger effect with varying competence over time? If the effect is real, then you’d see more accurate self-assessments with increasing competence.

I also think these self-assessment vs actual performance studies don’t control for post-assessment cognitive stress. Stress almost always impairs judgment, and I wonder if asking for a self-assessment on the day of the exam and sometime after the exam would show a difference. If stress is a factor for self-assessment, then both high and low performers will score themselves more accurately given more time after a test.

Looking at the study design of this paper, I am not sure how the authors themselves would assess its strength for the kind of broad claim they’re making…And we’ve already seen many studies on this type of claim, so I am confused why the authors didn’t ask the “next step” type of question as I mentioned above.


"Experts, trying to learn, criticize those actually learning"



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: