The argument depends crucially on how the hash space is searched. If you could avoid inputs that yield previously seen hash outputs then it would be the case that time previously spent mining would decrease the expected time until getting a valid hash. That doesn't happen by design.
Analogies to slot machines and flipping coins obscures this fact.
How would one productively store previously seen hash outputs? The problem is multi-fold:
Storing the minimum target nonce seen during mining has a key-space of size 256+96=352bits { midstate + { merkle_root, ntime, nbits } }.
So 1/2^352 chance of ever having a collision. Or 2.725094297605216460742E-107. 107 zeros between the decimal and the first interesting digit.
However you wouldn't have indexed this because ntime (the current time) is always going to be unique, and nbits (the current difficulty) might as well be too.
A single bit change in any part of the key completely changes the result, so there is no pattern you can infer by only storing a prefix.
Such a strategy is doomed.
EDIT: ntime is effectively unique as it is the Unix time and will roll over after many years, but the Bitcoin network requires mined blocks to have an ntime within threshold of the current actual time.
Analogies to slot machines and flipping coins obscures this fact.