
Persistent-memory error handling - bootload
https://lwn.net/Articles/684288
======
teddyh
Like “clugstj” wrote in a comment, how is this a different problem than the
one which mmap()ed files have always had?

If this really is a problem which must be solved, why can’t it be solved in
the same way in which RAM solves this problem, i.e. with ECC? Or with
something like Forward Error Correction (support in dm-verity added in Linux
4.5).

~~~
bsder
The real issue is that a memory error reboots your entire machine.

That probably isn't the right granularity anymore.

A memory error should probably kick off a signal to the application. If the
error isn't caught, then it should probably kill the process.

Memory errors should probably not cause a reboot unless they actually hit a
kernel page.

~~~
mrich
This is already possible for ECC RAM:

[http://www.intel.com/content/dam/www/public/us/en/documents/...](http://www.intel.com/content/dam/www/public/us/en/documents/white-
papers/xeon-e7-family-ras-server-paper.pdf)

So even an application can recover from unrecoverable ECC errors.

But it's not feasible to solve this in applications, since there are many
scenarios where such storage should simply be transparent.

