You want the address to be visible to the CPU somewhat early so that the target (might be) in the cache before you use it. I'd expect pointer tagging to obstruct that mechanism - in the worst case codegen might mask out the bits immediately before the memory operation. I don't know how transparent this sort of thing is to the core in practice and haven't found anyone else measuring it.
That's not really how out-of-order execution in CPUs work. The address doesn't have to be fully computed X cycles before a load in order to be filled. Loads are filled as their dependencies are computed: requiring an additional operation to compute the address means your address is essentially 1 cycle delayed - but that's delay, not throughput, and only actually makes your code slower if your pipeline stalls
Data memory-dependent prefetchers are a thing (..with expected side-channel potential), and tagging would conceivably make it non-functional. Though, realistically, I wouldn't expect for it to make much difference.