Not sure if or what the chance was, but fmax(x, 0) only requires checking the sign bit instead of doing a full floating point comparison (putting aside nan handling).
A hypothetical relu instruction could probably get away with much less power and die soace?
A hypothetical relu instruction could probably get away with much less power and die soace?