Alternately, it's possible to use asynchronous processor design and not worry about clock distribution. The tools aren't really there, but there have been async processors made before, and they work. They handle synchronization with local handshaking, instead of distributing a clock signal everywhere.
Another option is to abandon the cycle-for-cycle lockstep requirements, and just ensure that the synchronization time is bounded, and reasonably low. I know there have been some papers published about using this kind of globally-asynchronous-locally-synchronous architecture for realtime apps.