Edit: I mean to suggest that photos might be the optimization.

Potentially, but I think the gap facetime/skype fills is real time, whereas snapchat is async no matter how you slice it. It's explicitly a call+response method of communication that the receiver can engage right away, or view tomorrow, and respond tomorrow. While sending a recorded video or highly compressed pic will benefit from more efficient data transfer, its still not quite the same as a live open video chat.

