Many types of bit error are not recoverable without a full system reset. It isn'...

ajuc · on Feb 28, 2021

Run 9 systems in parallel and reset the ones that give less common results or no results at all.

You still have 10% the surface area, power usage and weight and 10 times the speed of the radiation hardened ones.

dawnerd · on Feb 28, 2021

And that’s why it’s wise to have multiple systems running at the same time, if one errors you still hopefully have another online. There’s a reason airplanes and now cars are designed this way. I’m sure they’re working towards this too.

spockz · on Feb 28, 2021

Well I suppose they do not have to load all the kernels and drivers that Linux provides today.

I wonder how one could use micro kernels to further improve startup time and have a mini distributed OS/kernel for each component.

jessriedel · on March 1, 2021

This is a problem with floating point operations happening at a lower level than the error correction you're imagining. In principle, that's not at all necessary. Are you arguing that it's infeasibly expensive to design a chip with operations that are error correctable?

londons_explore · on March 1, 2021

It's possible - but you'll end up reinventing nearly every step of the IC design process, which will cost a lot.