Many types of bit error are not recoverable without a full system reset. It isn't a matter of a simple "this bit in ram got corrupted", but more "this floating point unit has got into a state where it will not produce a result, and will therefore hang the entire processor".
Therefore boot time becomes critical - if you end up rebooting due to bit errors multiple times per second, you can't afford to wait for Linux to start up each time...
And that’s why it’s wise to have multiple systems running at the same time, if one errors you still hopefully have another online. There’s a reason airplanes and now cars are designed this way. I’m sure they’re working towards this too.
This is a problem with floating point operations happening at a lower level than the error correction you're imagining. In principle, that's not at all necessary. Are you arguing that it's infeasibly expensive to design a chip with operations that are error correctable?
Therefore boot time becomes critical - if you end up rebooting due to bit errors multiple times per second, you can't afford to wait for Linux to start up each time...