Good point. I was relying mostly on the words being sufficiently complex that the odds of them appearing by chance would be small, but enforcing that assumption would be a good exercise.
I'd love to exploit ad networks user profiles for this. I.e., store some bits as "interests", by running a few appropriate google searches or hitting a few web sites, read the bits by seeing what ads you're served. This would probably require a bit of learning and a redundant encoding to make it work, but...
A line of thought that's particularly interesting there is that a 2h marathon has implications for 10000m and half marathon speeds, which make it seem quite a bit further off.
One missing feature is enforcing a unique solution, which is something you'd generally expect from a word search puzzle.