If software identified the user's face in live video and transmitted just that at high resolution, discarding or compressing the rest, how much efficiency could be gained?
Separately - as data gets cheaper and video chat is further optimized, video chat over 3G/4G becomes more feasible. Probably our answer to this is in the countries that have abundant bandwidth? What do high schoolers in Korea/Scandinavia use for video chat?
Edit: I mean to suggest that photos might be the optimization.