Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Do you use ECC RAM in your home/personal workstations?
24 points by specktr on Jan 22, 2023 | hide | past | favorite | 25 comments
After coming across this[0] thread yesterday I'm considering upgrading my RAM on my Ryzen 9 3900X workstation to ECC to protect myself from data integrity issues in the future.

Would you suggest this route? It seems to me that there's little reason to upgrade if there have been no issues in the past.

What have your experiences been with running your workstations with ECC ram? Is it worth the additional cost?

[0] https://news.ycombinator.com/item?id=34470900




> It seems to me that there's little reason to upgrade [to ECC RAM] if there have been no issues in the past.

One of the benefits of ECC RAM is the ability to detect single and double-bit memory errors. In the absence of that capability, how would you know if you have a problem with your RAM?

> What have your experiences been with running your workstations with ECC ram.

For AMD desktop processors, there is a little more effort involved in building an ECC-capable system, because ECC is a feature provided by the motherboard maker, rather than a feature guaranteed to be present as part of AMD's platform. As such, you need to verify that your motherboard actually supports it.

Also, ECC memory largely lacks memory auto-overclocking features like XMP, EXPO, or whatever AMD called it in the past. As such, the memory will default to conservative bandwidth and latency settings (rather than the faster settings typically offered on enthusiast RAM), and it will be on you to overclock it if you wish to do so.

(On the positive side, ECC memory also largely lacks the "heat spreaders" and other plastic/RGB bits found on enthusiast memory, which means it's more likely to fit properly underneath a tower-style CPU air cooler.)

Lastly, unbuffered ECC can be somewhat difficult to find at typical consumer electronics retailers, and you'll likely have better luck getting it from vendors that specialize in server memory.


When built my last machine, I looked pretty hard for ECC RAM. What I could find was _very_ slow and something like twice the price.

Do you have any suggestions at all for where to look?

Also, is it really just lack of XMP and having to OC manually? My impression was that you could expect extremely little OC to be stable even if you did it yourself.


> My impression was that you could expect extremely little OC to be stable even if you did it yourself.

Not my experience at all. I bought a Ryzen 3700X with 4x 16GB 2666MT/s Kingston ECC. Memory chips turned out to be Micron revision E, same as used in many popular Gaming XMP sticks.

I could run these at 3333MT/s and lower CL from 19 to 14. Thus, raw latency going from 19*2000/2666=14.2ns to 8.4ns. After recently buying a used Ryzen 5950X I can now push the memory to 3600CL14, or 7.77ns. At this point I'm at the limit of what I can cool without starting to hear my PC.

> What I could find was _very_ slow and something like twice the price.

And at this point I'm running my memory at the same speed as G.Skill Ripjaws V Black, which is the fastest gaming memory I could find available here, but the G.Skill is twice the cost. Though, I see that it's also available as 4600CL19, and I actually never tried decoupling the memory clock from the fabric clock after upgrading my CPU since I couldn't push 4 sticks fast enough on my 3700x that I'd make up for the decoupling. I should probably try it out. Suspect I won't hit 4600MT/s though, but I don't think I would be able to run the G.Skill att 4600CL19 with 4 sticks either.


> Do you have any suggestions at all for where to look?

I've bought from Memory4Less in the past and had a good experience. ServerSupply was also a source I used for work, although I don't think I've ever bought anything from them for personal use.

> Also, is it really just lack of XMP and having to OC manually?

Yes.

> My impression was that you could expect extremely little OC to be stable even if you did it yourself.

All other things being equal, I would expect ECC RAM to have somewhat lower overclocking headroom than non-ECC memory, because the memory controller has to drive an additional memory chip with a wider memory bus. However, I generally wouldn't expect significantly worse results than the manual overclocking that enthusiasts did in the past. Obviously, as with any overclocking, your mileage may vary.


I don't. In fact we shouldn't have. ECC RAM is supposed to be serving services that requires extremely high RAS (Reliability, Availability and Serviceability). Any normal DDR4 RAM out here is already having a good enough RAS for a relatively long time.

As an anecdotal reference, my home "NAS" few weeks ago is running only with 16GB of cheap ADATA DDR4 2666MHz RAM ran for half a year non-stop, and I didn't even see a single RAS event in dmesg. It could be survivorship bias though.

---

For a typical home/personal workstations we normally run stuffs that is burst period (i.e. used during 9 to 6 and some high usage in 2-3 hours). It is also okay to shrug a few bitrots or even tolerate a system crash like if the butterfly effect hits.

So another reason I don't use ECC RAM on workstations: Remember that ECC RAM is notoriously long to do memory initialization and memory training.

Not sure if its an AMD thing but my friend booted his EPYC server with 256GB of DDR4 ECC RAM for half an hour.

You are using a "work station" so getting the work done as quickly as possible is more important than having a system too safe and reliable. Speed and work throughput is of utmost importance so ECC I'd say is okay to be sacrificed.


> So another reason I don't use ECC RAM on workstations: Remember that ECC RAM is notoriously long to do memory initialization and memory training. > > Not sure if its an AMD thing but my friend booted his EPYC server with 256GB of DDR4 ECC RAM for half an hour.

This server likely had a setting enabled to perform a full test of the memory on boot.

I'm not sure why server vendors do this (I've never once in my career seen that test catch anything), but the boot times have nothing to do with ECC. My Ryzen 9 3900X-powered PC with 64 GB of ECC boots in a few seconds.

> ECC RAM is supposed to be serving services that requires extremely high RAS (Reliability, Availability and Serviceability). Any normal DDR4 RAM out here is already having a good enough RAS for a relatively long time.

Without the ability to detect errors, how do you know it's good enough?


Server BIOS is just very poorly written. It never gets the boot time optimizations that desktops get.

At a former employer, every time we got a new generation of Intel dual socket server boards, the bugs that we had to get fixed in the previous generation would always reappear. Specifically, there were issues with the board properly detecting and configuring PCIe gen 1 boards. One of the custom boards used an older FPGA and only used gen 1 speeds. Intel would eventually get the BIOS fixed about 6 months after the first test systems would arrive. I saw this happen in no fewer than 3 distinct generations of server boards. It's bonkers how broken their processes are. My best guess is that the code for the next generation of CPU always branches off before the bug fixes are reapplied.

Sometimes I hate computers.


> Not sure if its an AMD thing but my friend booted his EPYC server with 256GB of DDR4 ECC RAM for half an hour.

That could be a memory training issue. Using 256GB listed as compatible with the Supermicro motherboard my EPYC server boots in a minute or so. If i configure my workstation memory with weird timings it can take a while for the motherboard to figure out the remaining Auto-timings, possibly adding 4-5 minutes to the boot procedure.

> Speed and work throughput is of utmost importance so ECC I'd say is okay to be sacrificed.

But you don't have to, see my other post in the thread regarding my workstation memory.


I don’t think it’s worth paying a premium as long as you regularly checksum important data and look for changes over time, keep redundant backups, and regularly check for integrity failures. You should see the inconsistencies as an early warning sign. Just don’t set up your workflows so it’s too late by the time you see them.

That said, I’ve seen a not insignificant number of computers in the wild that couldn’t calculate valid sha256 checksums when utilizing vector optimized implementations. Who knows how bad other hardware issues could be. You just wouldn’t know. I would pay a little extra for ECC memory given the choice.

In fact, I have two Dell Precision T7810 workstations at home with 144 GB of ECC memory and dual Xeons totaling 36 cores on each machine as my two primary personal computers.


> In fact, I have two Dell Precision T7810 workstations at home with 144 GB of ECC memory and dual Xeons totaling 36 cores on each machine as my two primary personal computers.

What are you doing that demands this level of equipment?


DevOps and hosting side projects initially. I never approached full utilization (CPU or memory) with either one as a home server, so now I have a third Precision 7810T with 32 cores (dual Xeon E5-2698 v3 processors) and 32GB of memory that hosts all my side project infrastructure at home (in addition to my website).

Lately I've been doing distributed computing, machine learning, and cybersecurity side projects with them. I can run a variety of environments and tools while still using them to game or for everyday computer tasks (running several multi-core VMs is no problem with this kind of hardware).


All of the machines in my home lab do but sadly none of my workstations do (ThinkCentre and a ThinkPad.) I wish they did.

Some higher end Dell Precision's and at least at one time Lenovo/Thinkpads had models with ECC ram and if they still do next time I need a new machine that's what I'm getting.


> they still do

Yes, they still do. Dell Precision 5XXX and 7XXX models come with ECC support. I own a a Precision 7560 with a Xeon W-11955M, Quadro A5000, and 2×16 GB of 3200 MHz ECC DDR4.

The problem is that Dell Precisions are very costly compared to 'normal' gaming laptops. But on the bright side, they come with business-class support, including a 24/7 hotline, and next-business-day on site service.


Yeah, last time I looked the cost was the primary barrier and decided to wait until I needed an upgrade. I don't do any PC gaming so I can save a bit my going with the Intel GPU instead of a discrete graphics card.


It used to be the case, when RAM were less reliable, that all workstation ram was ECC, it fell out of favor due to cost and performance priorities.

However, as both software complexity and ram availability increase, the likelihood of memory corruption increase as well, even with memory reliability being quite high.

I don't run ECC on my gaming computer, (I also run its hard-drives in raid0, I care about performance, capacity and cost-effectiveness).

I do run ECC on my file storage (zfs server) and on my workstation. It might not be strictly needed on my workstation, but I just don't want to think about it, and it's just another parameter towards more reliability.


I have only used ECC memory with IBM Power and HP PA-RISC & Itanium systems in cluster configurations. Typically ECC errors only started appearing when one or more fans stopped working. Generally, other system and application errors lead to fail-overs being triggered.

For personal use, I can't justify the increased mobo + RAM costs for a minimal improvement in reliability. RAID for magnetic drives and UPS are better investments for increased reliability.


It all depends on what each machine is doing and whether you care to pay the premium (and it is a premium unfortunately).

On my personal machine I like to live dangerously - it is running RAID0 and has non-ECC RAM - because everything is reproducible and I'm happy to rebuild it periodically.

On my home NAS, I do the complete opposite (running RAID1, ZFS and have ECC RAM), because it is the primary data store for things I care about.


I wish everything had ECC, but sadly it does not. I do, however, have code on Pi based servers to allocate a few MB of random size every once in a while, give it a random value, and check it in a bit, to catch the worst always present errors. I've never seen it catch anything though, and would be very surprised if it did.


I'm not sure that you can upgrade to ECC: https://community.amd.com/t5/processors/problems-with-my-ryz...

ECC is usually reserved for high-end CPUs like ThreadRipper/EPYC.


Ryzen desktop CPUs support ECC as long as the motherboard supports it.

What it doesn't support is registered (including load-reduced) DIMMs, which is the most commonly available type of server memory. Unbuffered ECC DIMMs only.


Unless you get yourself a “desktop” Ryzen that is just a rebranded APU with graphics parts shutoff, e.g Ryzen 5500.

Admittedly, AMD does list ECC as unsupported on that one, but most motherboard vendors will only warn you about APUs not supporting ECC RAM.


No, the performance per penny penalty is not worth it in the slightest for a workstation.


I did when I had a cheese grater Mac Pro because Xeon, otherwise no. Unless I'm mistaken, if your memory controller doesn't support error correction, it is pointless to buy ECC memory.


Yes since 12 years. Use ECC whenever you can.

>What have your experiences been with running your workstations with ECC ram?

Correctable bit-flips.


Yes and supported (dual-8 POWER9, Raptor Talos II). Very stable. I'd consider it worth it.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: