For any given value of X, the probability that its hash contains N leading zeros is proportional to 1/2^N. We're taking the maximum, so if any one of the values has N leading zeros, then max(f(X)) >= N.
If the number of distinct values observed is significantly greater than 2^N, it's very likely that at least one of them has N leading zeros. Conversely, if the number of distinct values is much less than 2^N, it's relatively unlikely. So the expected value of max(f(X)) is proportional to log(distinct values), with a constant factor that can be derived.
This is the basic "probabilistic counting" algorithm of Flajolet and Martin (1985). Its major flaw is that it produces an estimate with very high variance; it only takes a single unusual bitstring to throw the result way off. You can get a much more reliable result by partitioning the input into a bunch of subsets (by slicing off a few bits of the hash function) and keeping a separate counter for each subset, then averaging all the results at the end. This improvement, plus a few other optimizations, gives you HyperLogLog.