Someone else gave a link to an openai blog post, I personally first heard about it in a post by facebook (on their experiments with small precision).
I believe it is not used because it requires 16bits precision which, nowadays, you only get on GPU.
People usually train on GPU but then evaluate on CPU (in production) where the discontinuity would be much smaller (as you would use 32 bits precision).
Furthermore I don't know if, in practice, that type of discontinuity trains as well as a classical activation function (the gradient propagation might be hindered by the limited precision).
I believe it is not used because it requires 16bits precision which, nowadays, you only get on GPU. People usually train on GPU but then evaluate on CPU (in production) where the discontinuity would be much smaller (as you would use 32 bits precision).
Furthermore I don't know if, in practice, that type of discontinuity trains as well as a classical activation function (the gradient propagation might be hindered by the limited precision).