Just think how much redundancy you could get, cheaply, with the advances that have been made with Moore's Law over the years. Computers for space probes don't need to be that fancy. It's totally feasible to build processors that use error-correcting codes in their entire datapaths, have tri-modular redundancy for all their functional units, and then are arranged alongside several other identical processors for ridiculous amounts of redundancy.
This sounds silly, but the vast majority of the cost is non-recurring engineering cost. Manufacturing it would be a relatively cheap matter of sending the design to a fab like TSMC along with a bundle of money. Transistors are dirt cheap.
Aren't the mechanisms of Moore's Law (ie. smaller transistors that run at lower voltages) exactly the same things that make chips more susceptible to radiation? Once you compensate for that by including more redundancy, you may not have a cheaper chip.
Yes. More dense, lower power chips are more susceptible to radiation.
And anyway, what exactly does "redundancy" mean? If a rocket engine controller is triple redundant, how does that work? Are there three propellant valves in parallel, so each computer controls one-third of the thrust? Are they in series, so that failure of one computer disables the propulsion function? Is there a majority vote system, and is it electronic, electromechanical, or fluidic? Redundancy is not pixie dust that magically makes your system design better.
A sample Google interview question is to design the protocols to run a cluster of unreliable computers. Should there be a MIL-SPEC master computer? Should the cluster elect a master? Or several oligarch servers? Where does an outside agent submit a request, and what does it do if the request is not answered. Designing reliable systems is hard.
If you make a bunch of (highly redundant) small processors, then I don't see why it would be much harder than the clock distribution issues in large processors, which also need to keep all their parts in sync.
Alternately, it's possible to use asynchronous processor design and not worry about clock distribution. The tools aren't really there, but there have been async processors made before, and they work. They handle synchronization with local handshaking, instead of distributing a clock signal everywhere.
Another option is to abandon the cycle-for-cycle lockstep requirements, and just ensure that the synchronization time is bounded, and reasonably low. I know there have been some papers published about using this kind of globally-asynchronous-locally-synchronous architecture for realtime apps.
It could be that I just don't know enough about redundant system design, but I'm pretty sure the way Voyager worked was each computer ran independently and identically, and the result of computations was simply compared to the result on the other computers. In other words, you run it like Folding@Home or SETI@Home which send each job to multiple clients. That doesn't seem like a difficult problem to tackle.
Redundancy in hardware is one problem. But then all those CPUs still run the same software.
After Ariane 5 crashed spectacularly due to a software error that affected the two on board computers and the ground control unit likewise (http://en.wikipedia.org/wiki/Ariane_5_Flight_501), there had been talk about having the same software be developed by multiple, independent teams, and then use the different versions for error correction. Sounds like a crazy idea and probably won't work, but I don't really know of a better solution either.
IIRC, that's more or less how the DNS root servers are managed--they're not just in different locations, they're running different server software on different OS's, to minimize the chance that any one problem could take all of them out.