I thought the current recommendation was to not have multiple OSDs per NVMe? Tbf I haven’t looked in a while.
I have 3x Samsung NVMe (something enterprise w/ PLP; I forget the model number) across 3 nodes, linked with an Infiniband mesh network. IIRC when I benchmarked it, I could get somewhere around 2000 MBps, bottlenecked by single-core CPU performance. Fast enough for homelab needs.
I do agree that nvme-of is the next hurdle for ceph performance.