Then you can get into what you mean by FP - if you share nothing then your communication is done by copying. This is not a magic bullet and isn't an option for many scenarios. If you do shared state for join parallelism you can cover other scenarios, but now you are sharing data.
Atomics are very fast and work very well when they line up with the problem at hand. Then again, you are creating some sort of data structure that is made to have its state shared.
If the problem was so easy to solve, it wouldn't be nearly as much a problem. Handwaving with 'just use FP' is naive and is more of a way for people to feel that they have the answer should anyone ask the question, but reality will quickly catch up.
Where is the synchronization in this scenario? You either have to decide how to split up the read only memory to different threads (fork join) or you have one thread make copies of pieces and 'send' them to other threads somehow. Arguably these are the same thing. This is one technique, but again, it doesn't cover every scenario.
I don't know if calling it 'immutability' changes anything.
> avoid locking (unless you have to synchronize on a change)
Synchronizing on changes is the whole problem, you can't just hand wave it away as if it is a niche scenario. Anyone can create a program that has threads read memory and do computations. If you can modify the memory in place with no overlap between threads, even better. These however are the real niche scenarios, because the threads eventually need to do something with their results whether it's sending to video memory, writing to disk, or preparing data for another iteration or state in the pipeline. Then you have synchronization and that's the whole issue.
I must confess, I have no experience there, it's just years of reading about and writing functional code and seeing a potential trail here.
That said if you have new thoughts on the subject, please write them :)