It's provably impossible for a model with 1.76 trillion floating-point parameters (like GPT-4) to memorize millions of books?
How many bytes do you think a million compressed books takes? Consider that the way these models are trained is basically completing the next symbol based on the previous words, which is how most compressors are made.