You'd probably be surprised. For reads, there are tons of drives that will saturate PCIe 3.0 x4 with 4kB random reads. Throughput is a bit lower because of more overhead from smaller commands, but still several GB/s. Fragmentation won't appreciably slow you down any further, as long as you keep feeding the drive a reasonably large queue of requests (so you do need your software to be working with a decent degree of parallelism).
What will cause you serious and unavoidable trouble is if you cannot structure things to have any spatial locality. If you only want one 64-bit value out of the 4kB block you've fetched, and you'll come back later another 511 times to fetch the other 64b values in that block, then your performance deficit relative to DRAM will be greatly amplified (because your DRAM fetches would be 64B cachelines fetch 8x each instead of 4kB blocks fetched 512x each).