The hash, in this case, is just one deterministic way to shorten the CPU counter to few bits which can then be used to increase the entropy of the timestamp by replacing the timestamps stale bits. What's being asked here is not why use some compressed bits of the CPU counter increases the entropy of the timestamp overall. Rather, why you'd use a hash of the entropy providing information to do this since hashes allow for many:1 mappings (i.e. allow for decreases in entropy) while never providing better than 1:1 mappings (i.e. preserving entropy). At best, the hash is preserving the entropy of the timestamp and counter bits and, at worst, it is throwing some away. The question that follows: is there a better way that preserves the entropy of the inputs without the risk of throwing some away and, if so, was there some other reason to still use the hash? This is what I took amelius to be asking.
Knowing the timestamp delta is ~30ns then even a 1 THz processor would only execute 30,000 cycles before said timestamp increases and the outcome stays unique anyways. From that perspective, taking the low 16 bits of the cycle counter is already guaranteed to help produce perfectly unique timestamps, and without the risk of introducing hash collisions. It's also significantly easier to compute and now monotonic* now, whereas hashes were neither. From that perspective, I too don't see what value the hash is supposed to add to the information being fed to it in this case.
In threaded/distributed/parallel scenarios you may wish to throw the lane/core/node number in the bits as well. In the case having the same timestamp is supposed to be invalid this leaves you a simple deterministic way to tiebreak the situation as part of the creation of the timestamp itself.
*A 10 GHz CPU running for a little over 58 years could risk a 64 bit counter rollover in the middle of a time delta. If that's too close for comfort or you want the code to work on e.g. 32 bit counter systems, you'd need to eat more cycles and registers to set another bit to the outcome of whether current cycle count is < the one at the start of the current timestamp delta.
Yeah, when I was thinking about the hash, I thought of it as stuffing to fill the unused portion of the number that would look better than using zeroes.
TIMESTAMP010111101010101TSC "looks better" than TIMESTAMP000000000000000TSC even though they contain the same information.
I would drop the hash, it's deceptive.
I don't believe it breaks the monotonicity, though? I mean, it would if it weren't already broken. If you're taking the low 16 bits of the TSC, then a rollover in those 16 bits during the same timestamp will already go backwards. TIMESTAMP0...0 follows TIMESTAMPf...f.
I guess it depends which portion you look at. If you solely look at the time based portion you do indeed still have a function which never decreases but that is true even in the case of reading the raw counter whole on its own anyways. If you look at the whole value, including the hashed portion, it's no longer monotonic.
In the cycle based case looking at the whole value is the same thing as looking at a relative time stamp which has more precision that the system clock. In this way, it's "truly" monotonic across the entire value, not just monotonic on a part and unique in another.
Side topic: It also comes with an even stronger guarantee of always increasing instead of just "not changing direction". Is there a name for that?
Knowing the timestamp delta is ~30ns then even a 1 THz processor would only execute 30,000 cycles before said timestamp increases and the outcome stays unique anyways. From that perspective, taking the low 16 bits of the cycle counter is already guaranteed to help produce perfectly unique timestamps, and without the risk of introducing hash collisions. It's also significantly easier to compute and now monotonic* now, whereas hashes were neither. From that perspective, I too don't see what value the hash is supposed to add to the information being fed to it in this case.
In threaded/distributed/parallel scenarios you may wish to throw the lane/core/node number in the bits as well. In the case having the same timestamp is supposed to be invalid this leaves you a simple deterministic way to tiebreak the situation as part of the creation of the timestamp itself.
*A 10 GHz CPU running for a little over 58 years could risk a 64 bit counter rollover in the middle of a time delta. If that's too close for comfort or you want the code to work on e.g. 32 bit counter systems, you'd need to eat more cycles and registers to set another bit to the outcome of whether current cycle count is < the one at the start of the current timestamp delta.