In one usage, "soft" errors are ones that are 'caught' and transparently fixed by ECC, and thus have no effect (on a system that has ECC memory). "Hard" errors, by contrast, are ones that affect multiple bits and aren't corrected by ECC.
In the other usage, which I think is the more technically correct one, a "soft" error is a transient condition (bit flipped by cosmic ray, etc.) and the memory cell continues to operate normally on the next cycle. A "hard" error is where the cell is basically stuck in one state or another, and indicates that it's probably time to replace the module. I think you detect a "hard" error by looking for a series of "soft" errors, although maybe some architectures/chipsets detect the difference and report them in different ways...?
If anyone can substantiate either set of definitions, I'd be interested as well.
Hard errors are permanent (e.g. a bit is always bad) and that's when you throw the DIMM away.