The fake detection mechanism (aka discriminator) is usually just another neural network and I bet that's the case here as well. So it must be differentiable and thus, if anyone ever gets a hold of it, it could be easily used to train a generator that will eventually fool the discriminator.
It depends on how you set it up. You can use a very non-expressive valuation that assigns a score, but then you need to use reinforcement learning techniques to transform that score to a model update. Alternatively, if your valuation represents a differentiable metric function from your target you have a way of going directly from your output to a model update.
The second way requires dramatically fewer update steps (usually) than the first. Thus - having your adversarial target be differentiable definitely helps, though it is possible to do even absent such a criterion.
A NN is trained to make something indistinguishable from reality as possible - so it can't tell the difference. The inverse of that will just claim reality is fake too.
What you need a a deep fake and a real video of the same thing, then train on the difference. Clearly this is impossible - which is what makes the problem hard.
You actually don't need that. You only need a set of real videos and a generator for fake ones. Then train the discriminator to tell these two classes apart and make use of its differentiability to update the generator in tandem with the discriminator.
Not if you have a balanced dataset. If not, you're already getting into trouble with GANs, which is why they were superseded by diffusers on image generation tasks.