Hacker News new | past | comments | ask | show | jobs | submit login
AMD Announces Ryzen Update: Enables Memory Clocks Up to DDR4-4000 (anandtech.com)
236 points by jjuhl on May 26, 2017 | hide | past | web | favorite | 79 comments

It looks like their Windows PSP drivers open ports on https://www.reddit.com/r/Amd/comments/6dinzy/why_do_amds_psp....

Note that AMD' claim that it's only local loopback ( has been refuted by numerous comments, and without further reply from AMD.

Given AMD's refusal to work with libreboot, or document the on-die Arm cores with DMA memory access, these processors should not be considered secure for any layer 3 networking, financial or security related tasks.

Note to people who aren't running Windows: this is unrelated to the article, which covers some interesting iommu virtualization and memory controller improvements in the latest firmware updates.

And so is Intel, what's left? RISC-V?

From the service configuration file posted on that reddit thread, it is a WCF based service and it seems to have metadata access enabled (the <endpoint address="mex"...> part). One could try to use Visual Studio to "add service reference" and it should discover its methods and let you call them in code.

The eternal question with overclocking is... Which tests and for how long does your system need to pass them for it to be deemed a 100% stable overclock? The different PC enthusiast communities and even individual posters within them all have different answers.

If your system will handle 15-30m of Prime95 SmallFFT then you're pretty much good to go. Prime95 is real sensitive to instability and odds are good that if it's gonna crash it'll crash in the first 10 seconds.

Beyond that, just pick your burner application of choice and let it run for an hour while you go watch a movie. The rest are all kinda interchangeable as far as I'm concerned. Aida64 maybe?

(one note: I would not leave Prime95 running for prolonged periods of time. Modern CPUs are not designed for a program that is so small it fits into uop cache and runs AVX nonstop. With a heavy overclock it can pull 300W+ (potentially above 400W) on HEDT processors and OEMs like Asus have warned that it may cause damage[0]. Electromigration seems like a likely culprit meaning time is a key factor. Running Prime95 for like a half hour is fine and a great stability test, but 24H+ runs are extremely unnecessary and potentially damaging.)

[0] https://rog.asus.com/articles/overclocking/rog-overclocking-...

(Note that while this article is regarding Haswell-E, the same logic really applies to Broadwell-E as well as Ryzen 5 and 7. Running a processor at triple its nominal TDP nonstop for a prolonged period with most of that power going though a few specific execution units simply cannot be healthy for it.)

Modern CPUs are not designed for a program that is so small it fits into uop cache and runs AVX nonstop.

This is a slippery slope towards CPUs which are only warranted to run certain approved apps. Overclocking aside, I'd consider a CPU broken if it can't run any sequence of instructions 24/7 without damage at the stock speed and voltage.

Besides, Prime95 isn't what I'd consider a pathological case anyway --- FFTs are common in scientific computing and signal processing, and having a CPU continuously perform them is not at all an unusual workload.

It somewhat reminds me of https://news.ycombinator.com/item?id=7205759

I agree with you, but with the caveat that CPUs should be able to run flat-out, indefinitely at rated voltage. In this case, people are over-volting the CPU by up to 40% so running the hardest workloads simply pushes an amount of power (heat) through the CPU that it was never intended to handle.

The way I see it is an over clock is only stable if it can run the same applications as a stock clocked chip could at full bore for the same time periods. So if you could take a stock 4770k or something and run prime95 small FFTs albeit at 90C for 24 hours with no issues then your OC should as well. Having a high standard like that means I have more confidence in my OCs and my systems.

> The way I see it is an over clock is only stable if it can run the same applications as a stock clocked chip could at full bore for the same time periods.

AVX has been on a separate clock/offset for a while now, and many OC'ing motherboards will let you tweak it.

Given how much power AVX consumes I wouldn't be surprised if throttling it even let you clock the rest of the core higher.

> run prime95 small FFTs albeit at 90C for 24 hours with no issues then your OC should as well

That's nice in theory but in practice if it's gonna crash it's gonna do it quick. I doubt many CPUs are suddenly erroring out for the first time at 23 hours. So why not just test for an hour and call it good? An hour of super abnormally high stress is more than enough to validate it for standard usage in my opinion. More than OK at 30 minutes too. Usually fails within the first 5 seconds, if you make it to 1 min then you have a very solid chance of making it the whole way. 30-60m is plenty.

If your data assurance requirements exceed one "proof test" shot[0], so to speak, then you have no business using overclocked CPUs at all. Prime95 SmallFFT is way way more stress than you would ever put a CPU under in any real-world load, and like an overpressured proof shot it's not really the best thing in the world for the processor to do this nonstop. Electromigration is a thing that exists.

[0] https://en.wikipedia.org/wiki/Proof_test

I think it's possible with exotic cooling solutions like custom water cooling. The costs are often ludicrous and not something you'd put into a cloud server but for a home workstation a good water cooling setup can get you better than stock performance such that you could run Small FFTs for hours or days and be fine. That sounds like overkill but it's peace of mind for me and I need that or I'll just keep the thing stock which I've been doing lately. My current 1620v2 xeon based workstation with a lot of enterprise Sas SSDs and tons of ram is at stock because the xeon line is locked down and it's plenty fast.

> If your system will handle 15-30m of Prime95 SmallFFT then you're pretty much good to go.

Wait, SmallFFT? Isn't that the test that focuses heavily on CPU and very little on the memory?

Since this news is about memory timings/overclocking, I'd rather put much more focus on tools like Memtest86+, running that for a few hours to be sure that it touches all the addressable regions and uses various access patterns.

Whatever test you do, you want to exercise all the memory, or else you'll end up with a computer that always seems fine after boot, but accumulates errors and crashes based on uptime and how much concurrent work you throw at it.

Memtest86+ tests the memory itself. I think Prime95 mixed mode is best if your goal is to stress the memory controller and all related subsystems.

Update: I mean blend mode

I'm curious as to your recommendation of SmallFFT in response to the grandparent's post. prime95, iirc, says SmallFFT does not test the RAM much. I think "blend mode" was the recommendation that includes RAM testing, while SmallFFT was more for stress-testing CPU overclocks?

Is it because the question is not whether or not a particular bit of RAM is damaged and rather whether or not the overclock to the communications bus between CPU and RAM is affected?

Yeah, blend is probably better if you're going for RAM overclocking. Which is really what's going on here. Or maybe something that's really hammering throughput or something (database loads come to mind).

Obviously you want to focus your testing on whatever subsystem you're trying to OC. I was just giving generic advice on overclocking. There is pretty broad consensus that Prime95 is a good stability test for CPUs.

Someone also suggested Intel Burn Test as a test that might catch some instabilities that Prime95 misses in a reply and then deleted it for some reason (why?).

In particular though, it's not just the memory subsystem you're overclocking here, you're really overclocking the interconnect as well. It will have its own "silicon lottery" and a particular point of instability, in addition to your individual memory kit's own stability characteristics. There are a lot of moving parts here.

I do wonder how much of this is getting the memory controller stable versus getting the interconnect stable. It seems possible they may have knobs they can twiddle on the interconnect as well.

[Serious question here:] I wonder if Prime95 or burn-test type tools would stress the interconnect with enough cross-core communication to be worth running.

I remember when I built my last gaming PC I unknowingly ran a version of Prime95 that could not pass an OC stress test on certain Intel processors.

The combination of instructions it used meant that even if, under sustained 100% CPU load with a normal benchmark, your cooling solution was sufficient, within 30 seconds the CPU would hit 100° and activate it's thermal protection before a shutdown.

I went through 3 water cooling setups and numerous thermal paste reapplications before I realized, since I was driving the CPU hard (4690k at 4.9Ghz with a high overvoltage I recall being considered at the limit for non-LN2 setups, 1.4?) and thought that cooling was the issue

Yup 24 hours of smallFFT or bust. And then a similar thing with any over clocked GPU.

Would Prime95 be likely to damage hardware if ran for a prolonged period on aws/gce/azure?

Do they have anything in their contracts telling you not to do that?

The hypervisor could throttle the workload even if it lit up the entire physical node (which is unlikely). The CPU could also throttle itself, the instances where Prime fried CPUs were due to overclockers forcing high frequencies and voltages AFAIK.

> Prime95 is real sensitive to instability

You mean 'really sensitive'. 'real' is the adjective form - 'really' is the adverb submodifier form, which is the one you want in this case.

It's acceptable in colloquial sense: https://en.wiktionary.org/wiki/real#Adverb ; Also, I heard it's more common in some US states.

Most of the time I dont use apostrophes.

This isn't exactly overclocking. Rephrased in terms of CPUs: it's as though motherboards always clocked the CPU to 2.1GHz, irrespective of what is indicated on the CPU box. This happens with RAM because there used to be no way for the RAM to advertise its speed and timings to the motherboard. Intel corrected this with XMP on their platform (I'm not sure what the AMD equivalent is called).

If you clock your memory to its advertised speed and timings (i.e. enable the XMP profile) you ~~won't~~ [edit] shouldn't have any problems after booting. What may happen is that your system fails to POST, in which case you need a better motherboard.

Using the correct memory profile is extremely important. Reviewers found that the performance of Ryzen is strongly correlated to the memory clock - something to do with Infinity Fabric. This means that DDR4-4000 is, in theory, absolutely required if you want the most out of Ryzen - making this patch a big deal for Ryzen users.

My understanding (please, someone tell me if I'm off base!) is that the infinity fabric is kind of like Intel's QuickPath Interconnect and the "uncore" in one.

In Intel systems it was* required to have the uncore clock be twice the memory clock-rate. The "uncore" clock rate controlled the clock for caches, the memory controllers, possibly other IO components, and the like.

It makes sense that certain benchmarks would be insensitive to increases in memory frequency if those same components were locked at ~2.1GHz in Ryzen's initial release.

On a related note: does AMD publish block diagrams for their infinity fabric or is there any firm information about how many PCIe 3 lanes will be available on the Threadripper platform?

* - I'm not sure if it still is, to be honest!

> On a related note: does AMD publish block diagrams for their infinity fabric or is there any firm information about how many PCIe 3 lanes will be available on the Threadripper platform?

I'm interested in a block diagram or documentation as well, if anyone knows of any.

PcPer did some tests which show the inter-core latency and some other experimental data.


> My understanding (please, someone tell me if I'm off base!) is that the infinity fabric is kind of like Intel's QuickPath Interconnect and the "uncore" in one.

Yeah that sounds about right. The only thing I will add is that I've heard (can't source) inter-die communication on Threadripper/EPYC multi-chip-modules apparently happens by a different mechanism than intra-die/inter-CCX communication.

As such, my suspicion is that Threadripper and EPYC will basically behave like a multi-socket system.

As for lanes: I think it's official at this point that Threadripper has 44 lanes like Skylake-E does, and EPYC has 128 PCIe lanes. Possibly plus additional lanes from the PCH?


My choice of stability test is Intel Linpack, although it probably doesn't work so well on AMD CPUs; and my rule is that a system which can't survive at least 24h of Linpack is not stable --- I've seen systems fail it with stock voltages and frequencies.

I used to overclock, but stopped after realising the errors that occurred with Linpack; it really made me appreciate the fact that a CPU could miscalculate, and a system that appears "stable" can actually be silently corrupting data.

When I used to tweak settings to the max i found that even heavy testing was no guarantee of stability in following months or years. With the system set on the edge of stability it can pass heavy testing but still be super susceptible to slow degradation of silicon and electronics and to transient conditions. I like to set things quick and cool running these days - i do quite quick stress testing to find roughly the best voltage/speeds and then add buffers of 5-10% voltage and sacrifice 5-20% speed to help guarantee some years of stability.

That is why you back off from the bleeding edge after endurance testing it.

I have not done overclocking for quite a while, but general strategy was:

0. Get the right hardware - some hardware oveclocks better.

1. Depending on what you overclock come up with a short stress test. This would be GPU/CPU/RAM/PSU testing.

2. One component at a time. Identify top voltage/temp you are willing to tolerate. Overlock in steps of 5-10% nominal frequency. When the stress test fails increase the voltage by 5-10% until you reach the voltage limit. Once you get all of the data look at the frequency voltage curve and pick a voltage point (generally lower is better), back off a step or two from the frequency that passed and run an endurance test.

3. Run all components at the same time and stress test the entire system. The caveats here are to look out for power and temperature issues. I.E. make sure nothing get too hot (this includes the motherboard power regulators!) or voltage rails drop out of spec.

4. Slightly back off from the setup that passed 3 and you should be good to go for long term usage.

TL-DR: Overclock one component at a time come up with frequency/max temp/required voltage chart and don't try to run the hardware on the bleeding edge.

edit: list spacing...

I thought I was rock stable after lots of testing in one case, memtest, prime95, 'stability test' even thrashed the floppy drive. Later i wrote a java applet that hit the cpu cores in both regular and random millisecond patterns - it crashed the machine in seconds. Multicore load effects are something else to look out for then - take a few steps back from the edge and it all matters much less. Its worth it for a solid machine.

My takeaway from reading into these things was that at certain temperature breakpoints, lifetime decreases a LOT, way too much to want to get close to that.

So I tried to find the rough limits and wrote down idle/load temperatures for various combinations of settings/frequencies, and then picked my "sweet spot" at which temperatures started to increase a lot more per little increased performance. Granted, I did get a good case and CPU cooler, which I clean religiously, and of course there's always luck involved, but divided by the 8 years it's running this current computer is the cheapest I ever had, apart from having to replace the PSU because that was the one bit I paid less attention to, even the hard drives are still spinning, it's freaky. So the next one will have an even better case and cooling and then not get overclocked much or at all, either, that's for sure.

8 years is good going and Eco friendly :) When a machine is no longer the latest rocket, may as well run it cool too, except in winter if the heat is handy.

I used to overclock a little, until I realized it's just killing the chip faster. These things are made to fail eventually but much later with stock clocks.

There are plenty of Sandy Bridge and FX processors that have been heavily OC'd since day 1 that are still ticking along five years later. If you have decent cooling and don't go too nuts on the voltage it's perfectly safe. And if your CPU dies after 5 years of hard labor do you really care all that much?

FWIW I do agree with you though. I overclocked my 5820K to 4.13 GHz all-core, which is as far as it would go on stock voltages. According to the on-chip sensors I pull 90W during a normal (non-AVX) load. I could probably get another 400-500 MHz if I really pushed it, but getting the last 10% isn't worth turning that 90W processor into 200W+ for me.

For scaredy-cats: I think that's a pretty reasonable approach to overclocking. Just go as far as you can at stock voltages. I bet pretty much any processor can overclock all its cores to max turbo speed with either no extra voltage or a very minor voltage increase.

There's really one need that warrants overclocking -- gaming. I realize I need my processor for other things and not fail at it. I can sip coffee awhile longer while it finishes the job.

Thousands of (former) Abit BP-6 owners want to have a word with you... remember that one? For those too young to remember, the BP-6 was a dual-socket motherboard which took two Celeron CPUs and allowed all sorts of overclocking trickery. Both the fact that you could run an SMP configuration using Celeron CPU's - something Intel had not foreseen nor intended - as well as the fact that it was possible to get most 300MHz (remember, this is a while ago) Celerons up to 450-466 MHz made it ideal as a cheap hobby server board.

I still have one lying around, it still works and was in daily use (for mail, file and miscellaneous serving tasks) until a few years ago.

[1] https://en.wikipedia.org/wiki/ABIT_BP6

A hobby :)

Use to be a frequenter of extremesystems.org 10-15 years ago, in the overclocking and water-cooling threads.

Good fun.

Now practicality wise, I had no use to OC really. Wasn't even a big gamer at the time. Pure fully fun and just seeing if I could (thought I was the cool kid running water-cooling, overclocked cpu/mem/you, installing every nightly patch in Gentoo portage with a tonne of system specific gcc and kernel flags lol).

Now my thousands of dollars water-cooling parts, and a q6600 processor have been sitting in garage for 10 years.

Use a standard i7 low-end laptop, no desktops in past 10 years

I still have q6600 too. was thinking of wiring up again as a home server (or compiler machine like urself?). Also got q8400, A8-3870K, and gaming laptop with i7-6700hq+gtx980m. needless to say, i don't oc laptop at all.

Definitely. I overclocked my 4930k to 4.5ghz all 6 cores, and up to 1.44-1.448v from 1.35v, using plain old air cooling. Thing has been on most days 24/7 for 4-5 years now.

I have an old AMD bulldozer 8290 ro something liek that, the highest end one. Its stock clock speed it capped at 4.0Ghz. I have all 8 cores stuck at 4.7Ghz since I finished building it (It was a long and fun project and took several weeks of weekend tinkering).

I started off using the machine for gaming and building C++. Now that machine is is part of Jenkins cluster running several VMs each building C++. This machine is about as stressed as a desktop CPU can be. I do not believe these are built to fail.

Presumably you'd replace the chip due to speed long before you'd ever hit its EOL.

I overclocked my i7-4790k from 4.0 to 4.7 GHz just using the "easy" button in my ASUS software and called it a day. It's been humming happily along for 24/7 since the chip came out.

mmm people with sandybridge still chugging along. I'd say pentium 4 is now near EOL. Core 2 is still pretty good. I'd say only big addition was AVX/AVX2 on sandybridge/haswell which gave us YMM registers.

What's the overlap though on someone who would aggressively overclock and someone who would still be running such a slow CPU?

So your chip will last 5 years instead of 8 years.

For some people (myself, gamers, hobbyists etc.) that doesn't matter.

There is no easy answer. My approach would be something like this:

* Take 100 identical machines and run them with a specific setting for a year. Take the number of failures and write it down.

* Take another 100 machines and run them with a different settings for a year. Write down the failures.

* Do this for like 10 different speed settings. (You need 1000 machines and have to wait one year of time.)

* Make a graph of the data. Log of failures over linear speed axis.

* Try to see a curve in the graphed data.

* Determine the failure rate that you want to achieve.

* Look up the speed setting in the curve.

* Now you have your speed.

I guess only people like the processor manufacturers actually do it this way. Most other folks just do guesswork.

If you have advanced knowledge about the failure curve you might come up with a law like this:

* Turn up the speed until you hit 1 failure per 24 hours under heavy load.

* Turn down the speed by 30% and sell.

Grand Theft Auto IV.

No, seriously — it crashed my CPU (i5-6400) at 4.5GHz when linpack finished just fine at these settings. Lowered to 4.4 and no more crashes.

I'd say an overnight test (around 8 hours) is enough.

Do this for Memtest86+ and Prime95/OCCT/Intel Burn.

Contrary to some opinions I've seen, playing a game for a while is not a sufficient test for overclock stability.

Uhoh this kind of removes one of the things holding me back from buying -

"""The last addition should excite those interested in virtualization. AMD has announced "fresh support" for PCI Express Access Control Services (ACS), which enables the ability to manually assign PCIe graphics cards within IOMMU groups. This should be a breath of fresh air to those who have previously tried to dedicate a GPU to a virtual machine on a Ryzen system, since it has thus far been fraught with difficulties."""

People have been fairly successful over the years combining their windows games box into their linux workstation by doing this.

I haven't dipped in but I am pretty tempted.

Can anyone comment on the stability of the linux workstation if the windows vm takes a dump via bad drivers etc? That's my concern is that my linux workstation is highly highly stable and I don't want to borrow problems just to save a box.

You might want to hold out until AMD gets the "Nested Page Tables" issue fixed.

It's supposed to speed up virtual memory handling in the guest by avoiding the need for shadow page tables and/or emulation by the host.

Currently the feature is broken, resulting in crippled GPU performance, allegedly due to the IOMMU not keeping up with the DMA transaction rate. You can disable it entirely (kernel command line amd-kvm.npt=0) at the cost of ~5x increased CPU usage and constant stuttering in games, which is what most people choose to do for now.


Well that will help me push this project off for a few months more :)

Is there a good spot to keep up with this, that is very linux focused?

Brilliant, thank you.

Just to be clear, does this solve the issues with GPU passthrough in virtualization environments (i.e. KVM)?

It seems like it will help if your motherboard doesn't already have convenient IOMMU groupings.

There's nothing to indicate that this will help with the NPT issue, so expect poor performance (CPU or GPU, your choice).

What is the NPT issue?

I posted about it below: https://news.ycombinator.com/item?id=14427098

There's also a thread on the vfio-users mailing list and on a few other mailing lists as well as some discussion on /r/vfio.

I don't think this address passthrough in any way whatsoever.

Can these AGESA firmware updates (or board specific firmware upgrades incorporating the AGESA update) be installed from Linux? Is there a standard UEFI method for it?

Edit: looks like you can do it inside the firmware GUI, no OS is required. You can update over the net, or load update files from USB stick.

Does DDR4-4000 really mean that your RAM is now very close to CPU core speed? That could result in a massive performance boost for a lot of things.

Nope, DDR4-4000 is not "running at 4 GHz". SDRAM 133 ran at 133 MHz, but DDR numbers have always multiplied their frequency by "number of simultaneous transfers" or something like that. (The first DDR multiplied by two - "double data rate".)

Wikipedia has a good table for DDR3: https://en.wikipedia.org/wiki/DDR3_SDRAM#JEDEC_standard_modu...

DDR3-2133 has a 266.67 MHz memory clock, and a 1066.67 MHz bus clock (x 4), and a "data rate" of 2133 "mega-transfers per second". And a bunch of other details, it's complicated.

The point is, don't take the "4000" literally. It's complicated. And timings and latency can differ between models of the same "data rate".

[background, I'm not sure from your reply whether you already know this]: Usually RAM speed doesn't really affect anything consumers do too much (gaming and video encoding). Also, the tradeoff to RAM speed increasing is also that latency usually increases too, so while you get more bandwidth you usually don't get better latency.

Ryzen is getting improvements here because it's designed as a die containing a pair of 4-core complexes (CCX) that are interconnected with a fabric that runs at RAM speed. Faster RAM => better inter-core performance is the primary mechanism here, not actual bandwidth increases or latency improvements. If you need that stuff you'll want Intel HEDT/Xeons or Threadripper/EPYC with more actual memory channels.

[actual answer]: Second hand but: the replies I'm seeing on Reddit are saying that the performance gains on the interconnect are linear up to 2933 and diminishing up to 3466 and you get little additional gain past there. Full thread[0] and a specific reply from someone who claimed to have tested it[1].

[0] https://www.reddit.com/r/Amd/comments/6dgaoz/amd_details_the...

[1] https://www.reddit.com/r/Amd/comments/6dgaoz/amd_details_the...

Key is 'latency', If DDR interface speed is increased, normally latency "CL" is increased too.

It doesn't imply a latency drop though.

Any word on ECC support for the Ryzen chips?

I thought ECC was already supported out of the box on Ryzen? IIRC, the issue was motherboard support (motherboards have been the pain point for Ryzen from the very start).


As a current user of ECC RAM w/ a Ryzen 7 1800, I can tell you it works.

We need faster ECC RAM though: the external RAM speed affects the internal AMD Infinity Fabric, which provides inter core comms.

It works but does it "work"? I've never seen a straight answer on this. It's not validated - does it really work, as in, error corrects?

I don't remember the source, but one of the review sites overclocked some ECC ram for Ryzen to the point where it was erroring. They were stable and viewing the one-bit errors coming in and being corrected until they got a two-bit error and the machine booted(as they are supposed to).

No, two-bit failures are supposed to hard fault (to minimize damage of data corruption). Which the Ryzen did not do (in that specific test, at least).

Memory (and other) errors generate a Machine Check Exception which are handled by OS and/or BIOS. OSs can be smart enough to accumulate a history of errors and offline affected physical pages (which can result in a very localized single process segmentation violation). You could just log the errors. But you could also dial up the reaction all the way to NMI on these errors, which often triggers a processor reset.

The OS detected the fault and terminated the application which used the faulted memory address range, which is what it was supposed to do.

The guy who ran the exact test OP is referring to specifically states that it handles two-bit errors in a sub-optimal manner.

Sure, you don't have to hard fault. Just like regular memory controllers don't fail on any bit-errors. But it's ideal, when dealing with critical data.

It definitely works on ASRock AM4 boards. Just make sure to get unbuffered ECC RAM.

Ryzen will work with ECC ram but it won't actively correct ECC errors (at least on Linux) the interrupt code flags that as differed.

Currently there aren't Ryzen server chips out, just desktop/enthusiast models.

Yes, it does. It'll correct 1-bit errors and report 2-bit errors. It just doesn't fault on 2-bit errors, as is expected.

Ryzen will send a machine check exception to the OS in the event of a double-bit error. It's up to the OS on how to handle it.

This also isn't a Ryzen-specific behavior. All x86 ECC-capable machines work this way.

4.10.something (4.10.6 maybe?) Added EDAC support for ECC.

And word on why this is, or a source for it? I'm curious if this is just something that can be fixed with an upgraded kernel.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact