I think this actually gets to the heart of one of the big problems with this stu...

seanhunter · 2023-08-18T06:33:01

To make this more concrete:

1. They don't establish correlation of the ChatGPT impersonations of Republicans and Democrats with the real views of actual Republicans and Democrats. They should have had a representative sample of each actually do the compass test and compared the distributions of those with the distributions of the impersonation results to ensure they are calibrated. As such they are at best showing that on this test, ChatGPT's answers are more like the answers chatgpt gives when it impersonates a Democrat and less like when chatgpt impersonates a Republican. This doesn't say anything at all about true bias, just about how chatgpt impersonates groups.

2. Say chatgpt's impersonations are calibrated to real democrat and republican views (ie ignore 1). They seem to assume non-bias means equidistance from the republican and democrat positions in lots of important ways. eg "If ChatGPT is non-biased, we would expect that the answers from its default do not align neither with the Democrat nor the Republican impersonation, meaning that 𝛽1 = 0 for any impersonation" well, no. Bias isn't a function of what Democrats or Republicans think, bias is lack of actual neutrality in some sort of Platonic sense. Ie if this questionnaire was 100% calibrated then neutrality would be the origin of the coordinates. Given that, say democrats currently have a default position that is more extreme in one or other (or all) dimensions on this questionnaire than Republicans, then a neutral position would be closer to the Republican position and vice versa. If uncomfortable with the abstract definition of neutrality here then maybe a better one would be to pick a third representative sample of politically neutral people and calibrate their views relative to the test. Then neutrality or bias would be distance from this group not distance from the centroid between Democrats and Republicans.

3. These problems (1&2) are repeated in the "Professions" section, and compounded by a slightly weird inference about referencing the population mean (Republican/Democrat) of each group without calibrating any of the actual responses with any actual real person's responses (ie still just comparing chatgpt to itself impersonating someone).

4. They say they checked the Political Compass test versus IDR Labs Political Coordinates test "as a robustness test", but don't show any results from IDR labs or comparison between the result sets. That seems very odd.

I personally think this topic is very important so I find this all in all to be an extremely disappointing paper, which will nevertheless no doubt garner significant press attention without people actually reading it or trying to understand the methodology or conclusions.