Traps (CPU exceptions, such as traditional FPU exceptions like division by zero) usually involve kernel mode context switch. So if you trap on tag, the performance for tagged values will probably be 3-5 orders of magnitude slower. That's a lot.
Could you explain why?
I thought that trapping was more like a 'slow branch': slow due to the flush the pipeline but why should the kernel be involved(1)?
1: except if you need to swap in a page, but that's just like any other memory reference.
Runtime/language exceptions have different mechanisms that don't require kernel context switches (but might involve slow steps like stack walk).