Possibly. We don't know exactly what's going on during the inferencing process. ...

Possibly. We don't know exactly what's going on during the inferencing process. It might be making up a random plausible sounding reason, or it might be at least partly re-deriving its original intuitions during the self-attention/encoder step, or it might be both i.e. the most plausible justification is most plausible because that's the same intuition it originally followed.

At any rate, the justification it gives is the one you'd expect so it may not make much difference.

One question this raises is how it decided what the scientific consensus is. People talk about the risk of LLMs being abused for misinformation all the time; I find it not particularly persuasive because webspam has always existed and in reality people don't spend much time reading webspam. They find information by following links that roughly follow trust networks. LLMS however, do not. They are trained on Common Crawl and similar dumps that may or may not be any good at de-spamming their corpus. That opens up the question of whether you can make future LLMs believe arbitrary things and withhold arbitrary information by just bulk generating web pages that claim to be written by scientists representing the global consensus. The prior probability for "Professor ? says ? is a consensus = true" must be very high inside the network, but it has no way to verify any such claims. Perhaps the primary victim of LLM misinformation will be LLMs.