But all kinds of post-processing can be (and is) done on readout (pixel by pixel) without needing to keep the whole frames and go through them. You just keep some calculated parameters from previous frames (and not the whole frames).
You can do a lot of processing this way, incl. scaling, white balance, color correction, effect filters, etc.
Web cams add usb interface and an additional controller chip, so maybe there's some added latency there. But if you use the camera sensor directly over CSI, you can get pretty low latency.