Choose every 100 words in the dictionary?
Is there a risk of not getting complete coverage? Example: I keep applying the one-way function to the ciphertext, but the result is never an endpoint of one of my precomputed chains.
password → number ¬ new_password → new_number ¬ ...
Second, both the one-way function '→' and the reduction '¬' are not one-to-one mappings. This process will give you a preimage—i.e. given H it can give some password P such that P → H and you are happy—but that password will not necessarily be the original password! Usually any preimage will do, thankfully, but it's still a problem because this means that two different chains can 'merge' into one. This ties very closely to the question you're asking.
As you have surmised, 100% coverage of the input domain is also really hard to achieve because you can't control which words the one-way function (and therefore the reduction) is going to choose! Many rainbow tables just settle for some inaccuracy -- so if you cover 99.9% of the input domain, you don't worry about trying to get all of the other 0.1%.
The reason these two completely different-sounding things are connected is that if you think about it, if you find one of those 0.1% and feed it to your hash function and then the reduction -- there's actually a 99.9% chance that you end up on a chain you already have! So you have these super-short chains which immediately merge into other chains!
If you were trying to compress by a factor of 1000, then you might choose a chain length of 2000, try for 99.95% coverage, and then store these missing 0.05% as a separate lookup table -- then between the two tables you have 100% coverage. Or, you might choose a chain length of 1000 with, say, a 99% coverage, and just accept a 1% failure rate on hashes. Either one is acceptable depending on how you store the chains and how much work you want to do to ensure coverage.
Good hash functions are specifically designed to make hash collisions like that as rare as possible, so in practice the compression ratio will be strongly correlated to the chain length, but you can still have instances where one hash value is slightly more common than another. The result is just that if you use a chain length of 100, the compression ratio in practice will be slightly worse than 100:1.
As to how to choose the heads of the chains, I suspect the answer for getting the absolute optimal choices will either be NP complete or will involve some application of linear algebra. If all else fails you could just keep adding random chains rooted in a value that isn't in any existing chain until you have full coverage. I don't know what people do in practice.