Good hash functions are specifically designed to make hash collisions like that as rare as possible, so in practice the compression ratio will be strongly correlated to the chain length, but you can still have instances where one hash value is slightly more common than another. The result is just that if you use a chain length of 100, the compression ratio in practice will be slightly worse than 100:1.
As to how to choose the heads of the chains, I suspect the answer for getting the absolute optimal choices will either be NP complete or will involve some application of linear algebra. If all else fails you could just keep adding random chains rooted in a value that isn't in any existing chain until you have full coverage. I don't know what people do in practice.