https://pkg.go.dev/runtime/pprof#hdr-Profiling_a_Go_program
It helped me to step upy profiling game quite a bit.
One more tool, which the OP surprisingly doesn't mention, is `go tool trace` for execution tracing. It's really useful, esp. when debugging GC behavior or CGO calls locking up threads.
https://pkg.go.dev/runtime/pprof#hdr-Profiling_a_Go_program