Verifying your output on another system requires another system, the cost of which handily outweighs the cost of having ECC if the CPU/chipset support it.
As for: "I guess it makes sense to have ECC RAM on the machine building your releases"
That's a very narrow use case, there are many more usecases than that one and for a lot of those it makes good sense to have ECC: inputs to long running processes, computations that have some kind of real world value (bookkeeping, your thesis, experimental data subject to later verification, legal documents and so on).
> but for your dev machine does it really matter?
Maybe not to you.
> I mean, at this point it's really about a rather subjective perception of risk and particular use cases.
No, it's about a thing that if adopted widely would allow us to check off one possible source of errors that would not meaningfully increase the cost of your average machine and would still be an option, nobody would be forced to use anything.
> In my situation I find that memory issues are very low on my list of "things that can go catastrophically wrong".
Good for you.
> Really the only thing I can think about is building a corrupt release on my non-ECC build server.
You are still thinking about just your own use-cases.
> But from experience I'm not exactly in the minority to do that either and yet I don't observe many such issues in the wild.
Likely you also have somewhere between 8 and 32 GB of RAM in your machine.
If I look at my servers which have been operating for years on end they do tend to accumulate corrected ECC errors. The only reason I know about it is because there is ECC in there to begin with. If those machines would be running without ECC I'd likely not even be aware of any issues. But maybe the machines or some application on them would have crashed (best possible option), or maybe some innocent bits of data would have been corrupted (second best). And at the far end of the spectrum, maybe we'd have to re-install a machine from a backup (not so good, downtime, extra work) or maybe it would have led to silent data corruption (worst case).
Now, servers are not workstations, but my workstation has exactly as much RAM as my servers and no ECC, which is highly annoying but single threaded performance of the various Intel CPUs is much better on the consumer systems than it is on the Xeons unless you want to be subject to highway robbery prices.
So for me having the ECC option on consumer hardware would be quite nice, and I suspect anybody else doing real work on their PCs would love that option too.
Yeah I see where you're coming from, I guess I have a different perspective because I never really considered using consumer-grade hardware for "pro" server use. But I suppose it makes sense if you don't want to pay the premium for a Xeon type build. I wouldn't be comfortable hosting a critical production database on non-ECC RAM for instance.
Going on a tangent this discussion made me wonder if ECC memory was common on GPUs (after all, with GPGPU becoming more and more mainstream what good is it having ECC system RAM if your VRAM isn't?)
Unsurprisingly it turns out that consumer-grade GPUs don't have ECC. However I stumbled upon this 2014 paper: "An investigation of the effects of hard and soft errors on graphics processing unit-accelerated molecular dynamics simulations"[0].
Now obviously it's a rather specific use case but I thought their conclusions were interesting:
>The size of the system that may be simulated by GPU-accelerated AMBER is limited by the amount of available GPU memory. As such, enabling ECC reduces the size of systems that may be simulated by approximately 10%. Enabling ECC also reduces simulation speed, resulting in greater opportunity for other sources of error such as disk failures in large filesystems, power glitches, and unexplained node failures to occur during the timeframe of a calculation.
>Finally, ECC events in RAM are exceedingly rare, requiring over 1000 testing hours to observe [7, 8]. The GPU-corrected error rate has not been successfully quantified by any study—previous attempts conducted over 10,000 h of testing without seeing a single ECC error event. Testing of GPUs for any form of soft error found that the error rate was primarily determined by the memory controller in the GPU and that the newer cards based on the GT200 chipset had a mean error rate of zero. However, the baseline value for the rate of ECC events in GPUs is unknown.
As for: "I guess it makes sense to have ECC RAM on the machine building your releases"
That's a very narrow use case, there are many more usecases than that one and for a lot of those it makes good sense to have ECC: inputs to long running processes, computations that have some kind of real world value (bookkeeping, your thesis, experimental data subject to later verification, legal documents and so on).
> but for your dev machine does it really matter?
Maybe not to you.
> I mean, at this point it's really about a rather subjective perception of risk and particular use cases.
No, it's about a thing that if adopted widely would allow us to check off one possible source of errors that would not meaningfully increase the cost of your average machine and would still be an option, nobody would be forced to use anything.
> In my situation I find that memory issues are very low on my list of "things that can go catastrophically wrong".
Good for you.
> Really the only thing I can think about is building a corrupt release on my non-ECC build server.
You are still thinking about just your own use-cases.
> But from experience I'm not exactly in the minority to do that either and yet I don't observe many such issues in the wild.
Likely you also have somewhere between 8 and 32 GB of RAM in your machine.
If I look at my servers which have been operating for years on end they do tend to accumulate corrected ECC errors. The only reason I know about it is because there is ECC in there to begin with. If those machines would be running without ECC I'd likely not even be aware of any issues. But maybe the machines or some application on them would have crashed (best possible option), or maybe some innocent bits of data would have been corrupted (second best). And at the far end of the spectrum, maybe we'd have to re-install a machine from a backup (not so good, downtime, extra work) or maybe it would have led to silent data corruption (worst case).
Now, servers are not workstations, but my workstation has exactly as much RAM as my servers and no ECC, which is highly annoying but single threaded performance of the various Intel CPUs is much better on the consumer systems than it is on the Xeons unless you want to be subject to highway robbery prices.
So for me having the ECC option on consumer hardware would be quite nice, and I suspect anybody else doing real work on their PCs would love that option too.