Hacker Newsnew | past | comments | ask | show | jobs | submit | teendifferent's commentslogin

I miss the era of deep learning where we actually diagnosed failure modes instead of just scaling compute.

I spent the weekend auditing ViT vs. CNN decision boundaries using a custom perturbation pipeline. I bypassed the standard LIME segmentation (Quickshift produces too much high-variance noise) and injected a custom SLIC pipeline to force semantic superpixels.

The results show that "Clever Hans" is still very much an issue.

Spurious Correlations: The ViT predicted "Jeep" (p=0.99) purely by overfitting on the muddy terrain texture. The attention map showed it ignored the vehicle geometry entirely.

Hallucinations: EfficientNet hallucinated a "toaster" solely because it detected a white counter + flowers context.

Accuracy metrics are masking the fact that our models are just exploiting dataset biases. Full write-up on the surrogate loss implementation and visual audits here.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: