My fuzzy understanding, and I'm not at all an expert on this, that the main benefit is that bf16 is less prone to overflow/underflow during calculation, which is a source of bigger problems in both training and inference than the simple loss of precision, so once it became widely supported, it became a commonly-preferred format for models (whether image gen or otherwise) over FP16.
i asked chat for an explanation and it said bfloat has a higher range (like fp32) but less precision.
what does that mean for image generation and why was bfloat chosen over fp?