Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Again, check that table. It says a lot:

https://stanfordmlgroup.github.io/competitions/mura/

On just about every test set, the model is beaten by radiologists. Even the mean performance is underwhelming.



I was referring mainly to this one (from the same group and it actually surpassed humans on average):

https://stanfordmlgroup.github.io/projects/chexnet/

In their paper they even used "weaker" DenseNet-121 instead of DenseNet-169 for Mura/bones. DenseNet-BC I tried is another refinement of the same approach.


Those are some sketchy statistics. The evaluation procedure is questionable (F1 against the other 4 as ground truth? Mean of means?), and the 95% CI overlap pretty substantially. Even if their bootstrap sampling said the difference is significant, I don't believe them.

Basically, I see this as "everyone sucks, but the AI maybe sucks a little less than the worst of our radiologists, on average"




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: