Modern CPUs (since around 2000) go faster in large part because they have multiple cores that can do more than one thing in a time. If your program needs to go faster using more cores is often your best answer and then you will need these tricks. (SIMD or the GPU are also common answers that might or might not be better for your problem)
Modern CPUs can do 4-5 GHz singled threaded. (Sometimes you can even get a higher clock speed by disabling other cores.) This somewhat outpaces "a 1mhz 6502" even without parallelization.
They can, but nobody runs a single process on such CPUs. They run some form of OS which implements spinlock, mutexes, and all these other complex things.
I suppose someplace someone is running an embedded system without an OS on such a processor - but I'd expect they are still using extra cores and so have all of the above tricks someplace.
I never get the single threaded assertions regarding CPU performance, it is mostly useless in the day of premptive scheduling in modern OSes.
Yes it matters on MS-DOS like OS design, like some embedded deployments and that is about it.
It is even impossible to guarantee a process doesn't get rescheduled into another CPU with the performance impact it entails, unless the process explicitly sets its CPU affinity.
Except that ignores the amount of times the OS preempts the thread, or moves it into another CPU trashing all the cache contents in the process, and related NUMA patterns.
The way it is measured, is mostly ideal, assuming that threads run to completion without any of those side effects taking place.