The discussion is about big O, yes? This isn't really relevant to our discussion. Parallel analysis and average case, for the purposes of the discussion here, are fancy games you can try to play to get nice bounds, but they do not really have any impact on the big O of these algorithms.
"You can't possibly have a meaningful comparison operator, for example, that operates in constant time on unbounded integers."
We don't really deal with unbound integers worst case is something like 2^(2^1,000,000,000,000,000,000)) before we can't actually store them.
Even then log of log of N makes random numbers of that size practically constant time to compare. Put another way there is less than 1 in 2^64 you need to do more than 3 comparisons and less than one in 2^128 you need more than 4. One for length, one for the chunk which might be just one bit and one more for the next 64 bits. Sure that might happen, but reolistically random input is unlikely to need many comparisons.
Do you realize that the whole point of asymptotic analysis is that _we do not care_ what integers we "really" deal with? These bounds specifically deal with sorting arbitrary data of unbounded size, and _no other assumptions_.
Do you understand? No assumption that we can use parallelism. No games with integers you see in "real life". None of that is relevant. You can't play games with practicalities to get a better bound. This is the general bound, and if you fiddle with it to get something else, you are changing the scope of the question to be something else entirely.
If you want to have a discussion about those other models, fine, but that's not the discussion we're having.
What your talking about is a type of overly simplified model that's irrelevant for actual computing. My point is there are other models out there. If you actually want the fastest algorithm out there you need to design it for actual hardware with finite data, cache, limited registers, limited ram, and sometime aproximating a real instruction set etc.
However, by changing your assumptions you can still use O notation with more complex models.
Short circuting cuts that to log(log(N)) on average and log(N) worst case.
It's actually generally faster to compare X bytes of Vary large numbers than X Bytes of long int's. Degenerate case being 2 numbers of x/2 bytes.
Anyway, the point is treating log(log(N)) as constant is generally reasonable especially when N is small.