If you make a bunch of (highly redundant) small processors, then I don't see why it would be much harder than the clock distribution issues in large processors, which also need to keep all their parts in sync.
Alternately, it's possible to use asynchronous processor design and not worry about clock distribution. The tools aren't really there, but there have been async processors made before, and they work. They handle synchronization with local handshaking, instead of distributing a clock signal everywhere.
Another option is to abandon the cycle-for-cycle lockstep requirements, and just ensure that the synchronization time is bounded, and reasonably low. I know there have been some papers published about using this kind of globally-asynchronous-locally-synchronous architecture for realtime apps.