If you define raw video as RGB with 1 byte for red, 1 byte for green and 1 byte for blue, then yes, it will be 3 bytes per pixel.
But there are clearly other ways to store a pixel of colour information in less than 3 bytes, which is OP's point. It's not an optimization really - it's just a different coding format (Just as ASCII isn't an optimization of Unicode).
But there are clearly other ways to store a pixel of colour information in less than 3 bytes, which is OP's point. It's not an optimization really - it's just a different coding format (Just as ASCII isn't an optimization of Unicode).