SMT in general is the surprise. Is there any other ARM core out there with multiple threads? Of course once you are able to do multithreading then adding more contexts (4 instead of the 2 common on x86) is "just" a matter of finding the right trade-off between resources and memory latency.
Edit: Apparently the Cortex A65AE ("AE" = automotive enhanced) was released last year and was the first ARM core with multithreading.
A "thread" in SMT is not the kind of OS-level or user-level thread most people are used to. Think of N threads as N register sets[1] that are swapped in and out on the same core, mostly to hide memory/cache latency. There's still only one set of functional units, and only one thread can be active on the core at a time - unlike separate cores which can all be active simultaneously. "Time-shared multithreading" or "multiplexed cores" might be more accurate, but SMT has been the established term at least since Tera.
[1] It's actually more complicated than that, with register aliasing etc, but it's a decent conceptual model.