I agree with you that Fortran is running on more than just legacy here. At the same time, I also think Julia has caught up a lot as far as SIMD, multicore, MPI and GPU.
For SIMD, Chris Elrod's LoopVectorization.jl [1] is an (IMHO) incredibly impressive piece of work (which incidentally provides the foundation for I think the first pure Julia linear algebra library competitive with BLAS).
Multithreading is pretty easy with things like `@spawn`/`@sync` and `@threads for` in the base language, as well as super low-overhead multithreading from the Polyester.jl [2] package (which LoopVectorization also uses to provide a version its vectorization macro that'll also multithread your loops in addition to SIMD-vectorizing them).
MPI.jl [3] has been totally problem free for me, though I wouldn't be surprised if the Fortran bindings still have an edge somewhere, and Cuda.jl [4] seems to provide pretty seamless GPU support which should play nicely with MPI.jl's Cuda-aware MPI [5], but I don't work as much with GPUs myself.
What about debuggers? For Fortran there are some powerful commercial debuggers available, incl. Parallel debugging and even reverse debugging (being able to step backwards based on snapshots).
I tend not to use debuggers very much myself so not speaking on a ton of expertise, but I think it's probably safe to say the Fortran ones are going to be a lot better on that front. There are a few options for debuggers so far, but no parallel ones that I know of yet.
For SIMD, Chris Elrod's LoopVectorization.jl [1] is an (IMHO) incredibly impressive piece of work (which incidentally provides the foundation for I think the first pure Julia linear algebra library competitive with BLAS).
Multithreading is pretty easy with things like `@spawn`/`@sync` and `@threads for` in the base language, as well as super low-overhead multithreading from the Polyester.jl [2] package (which LoopVectorization also uses to provide a version its vectorization macro that'll also multithread your loops in addition to SIMD-vectorizing them).
MPI.jl [3] has been totally problem free for me, though I wouldn't be surprised if the Fortran bindings still have an edge somewhere, and Cuda.jl [4] seems to provide pretty seamless GPU support which should play nicely with MPI.jl's Cuda-aware MPI [5], but I don't work as much with GPUs myself.
[1] https://github.com/JuliaSIMD/LoopVectorization.jl
[2] https://github.com/JuliaSIMD/Polyester.jl
[3] https://github.com/JuliaParallel/MPI.jl
[4] https://github.com/JuliaGPU/CUDA.jl
[5] https://juliaparallel.github.io/MPI.jl/latest/usage/#CUDA-aw...