Goal of SMT isn't double the performance, it having >1 threads better using exec...

Goal of SMT isn't double the performance, it having >1 threads better using execution units within a CPU core.

The example frequently given is running an integer heavy and floating point heavy code at the same time - they don't use the same execution units, so they can better utilize available resources.

In the real world it's more likely that one thread is waiting on memory access or some other part of the system and the other thread can proceed during that wait.