They do. SPARC64 has hardware contexts. Technically, the architecture wouldn't require this. A process's state would be a mapped linear segment of memory so you can have as many contexts as you can fit in RAM. There is no need then for traditional "save everything" context switching. You just move the CPU's execution context to a different area in RAM and the context is there.
AFAIK the main problem with hyper-threading is cache contention (two hyperthreads on the same CPU thrashing each-other's cache).
Anyway, I fail to see the point of this discussion, TFA states that MRAM attains speed comparable to that of the DRAM, which is much slower than CPU cache (at least one or two orders of magnitude slower), so that won't go away just now.
Also, the article speaks of "write speeds" (whatever that means) of tenth of nanoseconds but says nothing of latency. I suppose there are no refresh periods, which might improve over DRAM a little. It all seems very vague so far, I'm looking forward for some more technical and all-encompassing performance numbers.