It's still a mystery, yes. It's annoying, but not particularly important, as EleutherAI has created a more open substitute with the necessary size to train a GPT-3-like model, as The Pile: https://pile.eleuther.ai/ (In some ways it is probably better than OA's mystery-meat corpus.)