Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It looks like that's the data behind figure 3.7.4 - "LLMs implicit bias across stereotypes in four social categories" - on page 199 of the PDF: https://hai-production.s3.amazonaws.com/files/hai_ai_index_r...

They released a separate PDF of just that figure along with the CSV data: https://static.simonwillison.net/static/2025/fig_3.7.4.pdf

The figure is explained a bit on page 198. It relates to this paper: https://arxiv.org/abs/2402.04105

I don't think they released a data dictionary explaining the different columns though.



Interesting, thanks for the references!

Upon a second look with a fresh mind now, I assume they made the LLM associate certain adjectives (left column) with certain human traits like fat vs thin (right column) in order to determine bias.

For example: the LLM associated peace with thin people and laughter with fat people.

If my reading is correct




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: