It may not always be possible to get perfect compression. Suppose for some reason you have the property that the hash for 10% of the words is "be" and the hash for a further 50% of the words is one of the words that hashes to "be." Using rainbow tables for this is not going to go well because all your chains are going to end up at "be" inside of a small number of iterations. But a hash with this property will also fail at its intended purpose because Alice could claim the hash for a word she doesn't know is 'be' and have a 10% chance of being right.
Good hash functions are specifically designed to make hash collisions like that as rare as possible, so in practice the compression ratio will be strongly correlated to the chain length, but you can still have instances where one hash value is slightly more common than another. The result is just that if you use a chain length of 100, the compression ratio in practice will be slightly worse than 100:1.
As to how to choose the heads of the chains, I suspect the answer for getting the absolute optimal choices will either be NP complete or will involve some application of linear algebra. If all else fails you could just keep adding random chains rooted in a value that isn't in any existing chain until you have full coverage. I don't know what people do in practice.