Bear in mind that a "1 million token" context window isn't actually that. You're being sold a sparse attention model, which is guaranteed to drop critical context. Google TPUs aren't running inference on a TERABYTE of fp8 query-key inputs, let alone TWO of fp16.
Google's marketing wins again, I guess.