You'd imagine they'd be able to remove a good amount of it by removing duplicate data? e.g. video, audio and images

Haha, yeah I bet the kind of data they collect dedupes and compresses pretty well. They probably sometimes collect the same packet many times on its journey.

