Your display uses 3 bytes per pixel. 8 bits for each of the R, G, and B channels. This is known as RGB888. (Ignoring the A or alpha transparency channel for now).
YUV420 uses chroma subsampling, which means the color information is stored at a lower resolution than the brightness information. Groups of 4 pixels will have the same color, but each pixel can have a different brightness. Our eyes are more sensitive to brightness changes than color changes, so this is usually unnoticeable.
This is very advantageous for compression because YUV420 requires 6 bytes per 4 pixels, or 1.5 bytes per pixel, because groups of pixels share a single color value. That's half as many bytes as RGB888.
When you decompress a JPEG, you first get a YUV420 output. Converting from YUV420 to RGB888 doesn't add any information, but it doubles the number of bits used to represent the image because it stores the color value for every individual pixel instead of groups of pixels. This is easier to manipulate in software, but it takes twice as much memory to store and twice as much bandwidth to move around relative to YUV420.
The idea is that if your application can work with YUV420 through the render pipeline and then let a GPU shader do the final conversion to RGB888 within the GPU, you cut your memory and bandwidth requirements in half at the expense of additional code complexity.
Wikipedia is a good source of diagrams and details that explain this further: https://en.wikipedia.org/wiki/YUV#Y%E2%80%B2UV420p_(and_Y%E2...
Converting via lookup tables, one for each component, made it very cheap to perform palette-style tricks like tonemapping and smooth color shifts from day to night.
[Edit: it may have actually been CIE Lab, not YUV/YCbCr, because the a&b tended to have narrower ranges. It's been too long!]
How would you do even trivial things like color blending in 4:2:0?
That said, this (storing JPEGs in YUV420) is just an optimization and the more images and displays go HDR, the less frequently we'll see YUV JPEGs, though we could see dithered versions, maybe, in 444. That's basically the same thing, but once you discard 420 and 422 you might as well use standard 8-bit RGB and skip the complexity of YCbCr altogether. If you're curious about HDR as I was, though 10-bit is "required", you can dither HDR to 8-bit and not notice the difference unless doing actual colour grading (where you need the extra detail, of course). For obvious reasons, I've never heard of anyone dithering HDR to YUV 420 though, and most computer screens look pretty terrible in when output to a TV as YUV422 or YUV420.
I can think of several image processing tasks which are more straightforward in a luma/chroma format. Maybe it's because I'm more used to working with the data in that form?
Probably because it's the native input format for every display technology in existence? If you're going to twiddle an image, you might want to do it in the format that it's displayed in.
I think a twitter account called @JohnCarmackELI5 that just explains all of Johns tweets like I have no idea what he's talking about would be invaluable. The man is obviously bursting with good/interesting stuff to say, but I grok it like 5% of the time.