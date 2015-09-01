Are there interesting features that differentiate them from Intel beyond performance/cost? As in new instructions or things like that?
AM4 will come with USB3.0/3.1 from ASMEDIA and will only support a single NVME drive, so if you are looking for a professional workstation, need high speed storage, ECC, and can utilize quad channel low latency memory then Intel is still likely the way to go even with blue team tax.
If you need multi GPU support and NVME at the same time, as well as other peripherals that use PCIE then you also might need to look at intel still since Zen will only come with 28 PCIE lanes.
Hopefully they've sorted their PCIE performance issues at least, AM3 not only had bandwidth limitation due to PCIE 2.0 support but also had a much higher PCIE latency for unexplained reasons.
Intel's 7700Ks can easily hit the 5ghz mark on aftermarket air and AIO water cooling.
Games aren't optimized well beyond 2 cores, with games that are optimized for more than 4 cores being particularly unheard off.
Games aren't an application that supports parallelism that well since your sound, physics, AI and graphics threads all have to be synced within a single frame otherwise everything falls apart.
The worst case for Zen is going to be the not here and not there CPU, with price slashed 7700K mopping the floor with 8C Zen CPUs in gaming because they can clock to 5ghz and higher while it's unclear if AMD can even hit 4ghz reliably on all cores and on the other hand the Zen ecosystem not being mature enough for the prosumer and professional types due to subpar support/performance of storage, peripherals and memory.
The biggest mistake I made is getting a 2nd 5820K for my gaming rig, I got it cheap so I don't mind but performance wise I would be better off with a 6700/7700K.
And I'm lucky as my 5820K hits 4.5-4.6ghz with an AIO cooler.
Most game engines architectures have supported limited parallelism in the form of dedicated game simulation and render threads, as well as often having asynchronous processing of audio, network, and IO. More modern engines have task based architectures that allow for parallelism of the game simulation. This is often implemented as a fork-join model around sync points spread throughout the simulation update: pre-physics and post-physics update periods for instance. While it is true that some game logic relies on knowledge of global state and complicated dependencies between entities in the simulation, games can and do find meaningful reductions in CPU wall time.
I wish this was increasingly untrue.
Running For Honor just now a new title built for consoles one CPU core is at 100% the rest are <10%
Out of the lastest 10 AAA titles only one I would call something that might be worth more than 4 cores and it's WD2.
2 cores on 100%, 2 more on 60-70% and 2 more on 20%.
W/ HT it will be 2, 100% and the 10 left "cores" at about 15%.
And this is by far the best "multithreaded" game that came out in the past 8-12 months.
What devs do for consoles doesn't translates to PC, PCs come with a huge variety in hardware and unlike consoles where devs get 6-7 cores out of the 8 exclusively for their game on a PC they have to live with everything else from AV scans to Streaming.
No one is taking advantages of multicore CPUs because no one can do it right on a fragmented platform where you don't control over the runstate of the app, co-hosting and have zero knowledge about it's hardware and configuration.
How much of that is a chicken-and-egg problem? Gamers buying hardware look for clock speed over core count because that's what today's games benefit from, therefore developers don't optimize for higher core counts because customers don't have them now.
> Games aren't an application that supports parallelism that well since your sound, physics, AI and graphics threads all have to be synced within a single frame otherwise everything falls apart.
I've often heard that repeated, but I'm not sure I totally understand why. I admittedly haven't looked at anything resembling a modern game codebase -- my last time in that space was eons ago in programmer years -- but my instincts would say that doesn't necessarily have to be true, or at least not for that reason.
Graphics and sound kinda make sense, in that you're trying to push out pixels/samples at a fast & regular rate and they need to be perfectly aligned. Though does anything not primarily rely on GPU (which is itself massively parallel) for most graphics work nowadays? Given that, I'd assume the CPU's part of graphics is mostly a support role moving stuff on and off the GPU.
For aspects like AI and physics, I'm not convinced they do have to be locked to frame rate like you say. Why can't they run independently and continuously and let the game take advantage of the "most recent" state of the world as often as needed to match the frame rate? There may be design/architectural reasons this isn't done, but I don't see any fundamental reasons that it couldn't.
Maybe there are factors I'm just missing or not aware of, I'd love to see a good solid analysis of why this is true, if it is.
>For aspects like AI and physics, I'm not convinced they do have to be locked to frame rate like you say. Why can't they run independently and continuously and let the game take advantage of the "most recent" state of the world as often as needed to match the frame rate? There may be design/architectural reasons this isn't done, but I don't see any fundamental reasons that it couldn't.
AI and Physics define what is going to be displayed on a screen if say you doing cloth physics this will change the animations of a flag weaving in the wind, if there are thread locks the animation will be breaking up and be choppy.
It's even worse if the physics have actual game implication does a barrel hit a player or not if a frame is skipped?
And then we get into the realm of multiplayer where you don't just need to sync things within a single computer but between multiple computers all over the world.
Everything has to be synced to make the game work, can it be split across 10 cores probably, but since most gamers don't have 10 cores you better work with 2-4 cores and make sure it works well.
- Minimal granularity: You can never run more cores than you have problems at the same time, but you also cannot just split problems as long as you want or the overhead of splitting will kill all performance gains you'd had (x+y in an extra thread/process is technically possible, in reality pretty stupid)
- Synchronization points: Most problems are not completely separate from each other, so you can do part of the problem in parallel but at some point you have to converge and do something with the combined result.
- Comparable task size: In an ideal world all of your tasks would take exactly as long as the other, because if one task takes far longer than the others you have to wait for it. So, the minimal run time of your parallel problem is bound by the time of your longest sub-problem. If one of the problems takes far longer than the others (and they depend on each other) you've lost.
The last one is more of a "meta" point: Complexity doesn't scale linear for dependent problems. That's another reason embarrassingly parallel problems are so nice. You cannot only run them all in parallel, you can think about each one as a completely separate problem. The moment you have dependencies you have to think about how to bring them all together and that gets hard very fast if you have many moving parts.
So, to sum it up: Many small things conspire against just scaling something which works great for two or four cores up to eight, 16, 32 or whatever. If you happen to have an embarrassingly parallel problem you're golden, but unfortunately only a small subset of interesting problems are that way and for other problems scaling is hard.
Could the cores that would otherwise be waiting for the longest sub-problem to complete start working on some new sub-problems to make the following set of sub-problems finish sooner?
Unfortunately, I had this happen to a program of mine last year. All other tasks were long done but one kept running and running. The run time of the program escalated to hours with the first few minutes being marked by high usage of the system and then only one core in use, while everything else had to wait. It sucked.
I find it hard to find a game that will use all 6c/12t on my 5820K, most of them will use 2-4 with usually 1-2 being pushed fully and 2 more between 10-25%.
The pendulum has swung so far far that (some) recent games began to run good on the old FX chips, see Watchdog 2. If not looking at budget chips, having multiple cores is attractive for gamers by now. Maybe the potential of hexa-cores is not used that much yet, but it is obvious that will come, and games like Battlefield 1 do use them already. I think those Ryzen hexa-cores have a very good chance at the market, if their single thread performance is high enough.
Given that AMD is part of the OpenCAPI consortium, they could potentially make monster HPC machines with Infiniband and GPU's connected with cache coherent OpenCAPI instead of PCIE.. But we'll see, I guess..
What does this mean? That it will only support booting from a NVMe device if it is plugged in to one particular PCIe port?
You can add additional NVME drives through the expansion slots but these are not natively supported by the platform.
It's still completely unclear what the hell you mean by that. I've yet to encounter a platform that cares which PCIe lanes a NVMe SSD uses when it comes to boot support. Nobody really cares about whether the software RAID supports NVMe, and I'd count it as a benefit of AMD's platform if they don't implement Intel's obscene hacks that make their NVMe software RAID work. Do you mean that the chipsets for Ryzen will only have enough PCIe lanes or ports to provide one PCIe x4 slot?
Z270 comes with upto 3 NVME x4 slots (over a built in bridge in the PCH) which support IRST and the rest of the intel crap, meaning for a motherboard manufacturer to support upto 3 of them they don't need need to add anything but the physical connectors.
There is also a mess in FMA support for both Intel and AMD cpus... https://en.wikipedia.org/wiki/FMA_instruction_set
This entire deal pretty much turned AVX into cancer outside of very specific circles, especially considering the wierd support for AVX and SSE in OpenCL.
* ML is used for branch prediction,[1] as well as [implied] pre-fetch.
* Automatic clock and power tuning.[2] This is not per-die, but granular across the chip.
* eXtended Frequency Range,[3] unlocks the ranges that [2] is allowed to use. Essentially, the chip overclocks itself when you provide it with more cooling headroom.
[1]: https://youtu.be/X9NNOqzTbKI?t=13m53s
[2]: https://youtu.be/X9NNOqzTbKI?t=14m38s
[3]: https://youtu.be/X9NNOqzTbKI?t=15m22s
The CPU market could become a lot more interesting again after all this time.
Comparing to my current Haswell CPU, that's quite a great value for the money:
(84W TDP, 4 cores, 8 threads, $300+ price soon after launch, 3.4-3.9GHz).
I'd prefer somewhat higher frequency though, more in line with thier X series, but without major increase in TDP. But I guess +10W is tolerable for a good processing power increase.
The usual argument is that clocks are not comparable, because you don't know that a 4GHz cpu A is not slower than a 1GHz cpu B in doing a specific task. That I assume is known around here, but is not what I mean. AMD is marketing the feature of those processors to overclock automatically above the specified turbo clock. Meaning 3.9GHz should be a lower bound, and it is completely possible those cpus will routinely clock much higher in practice. See http://www.kitguru.net/components/cpu/jon-martindale/amds-ex... (and it is also as XFR in the original article here).
Or maybe they won't overclock well at all. Well, we'll see after they were released.
Edit: I just realized that with regards to the specific comparison I answered to my point is moot, because according to the table in the article, the Ryzen 7 1700 does not have that feature. The Ryzen 7 1700X does. That might make picking the right processors more difficult than usual. I'll let the comment stand regardless, that feature and distinction might not be well known yet.
Does clock speed matter if IPC for your workload is good at a lower frequency?
We'll have to see real world benchmarks by independent 3rd parties to validate the performance/$ but it continues to look quite impressive as long as you are staying in the sub $500 range.
Intel tried to do enough market segmentation that the curve from a Pentium all the way up to dual CPU workstations was fairly continuous in price, with zen it's not the case anymore.
It's nearly impossible to find a desktop CPU that supports ECC ram now even though 5 years ago it was commonplace.
Trying to run a NAS with some sensitive data is now impossible unless you buy their server chips
That's not correct. All 6th Gen Core i3 have ECC support, and the 7th Gen Core i3 that have 'E' in the product name support ECC:
http://ark.intel.com/products/97130/Intel-Core-i3-7101E-Proc...
Quad (and more) core i5 and i7 that do have near-equivalent Xeon parts do have ECC disabled.
Getting an Intel Xeon E3 for socket 1151 isn't particularly hard or expensive either.
Btw, is there any updated paper/source with some stats on bit-flip odds in modern computers?
Last I've read was this one [1] but it's been debated to death. [2] [3]
[1] http://lambda-diode.com/opinion/ecc-memory
[2] https://news.ycombinator.com/item?id=1109401
[3] https://www.reddit.com/r/programming/comments/ayleb/got_4gb_...
Smartphones and laptops should also come with ECC-RAM.
It is that important for reliability.
It's important but it's not that important outside of specific applications/use cases.
They do fail. Lots and lots of times. And it gets worse over time, as chips degrade due to hot-electron effects or electromigration.
You'll have to use your best guess or intuition on this.
On some systems, ECC is flat out broken or silently ignored. On many others ECC errors aren't reported to the OS in granular enough batches to do anything about them.
Edit: relevant portions:
> Before we begin to discuss our results, we must first discuss ECC protected systems and the fact that there are no standards on what constitutes ECC protection or ECC event reporting. At its base level, ECC protection simply means that a server can handle or correct single bit errors, although some systems advertise the ability to correct multiple bit errors. Generally, it is our belief that ECC events should be reported to the operating system so that savvy users can gauge the health of their infrastructure.
> Unfortunately, server vendors routinely use a technique called ECC threshold or the 'leaky bucket' algorithm where they count ECC errors for a period of time and report them only if they reach certain levels of failure. From what we understand, this threshold is commonly above 100 per hour, but this remains a trade secret and varies based on the server vendor. So, to see ECC errors (MCE in Linux or WHEA in Windows), there generally needs to be 100 bit flips per hour or greater. This makes “seeing” Rowhammer on server error logs more difficult.
> In addition, we have observed some server vendors will NEVER report ECC events back to the OS, although they might get logged into IPMI. Typically, users expect to see correctable ECC errors logged directly to the OS or that halt the system when they cannot be corrected. During our investigation into this phenomenon, we even encountered one server that neither reported ECC events to the OS nor halted when bit flips were not correctable. The end result was data corruption at the application level. This is something, in our opinion, that should never happen on an ECC protected server system.
> ...
> Using these advanced techniques, we were able to observe ECC events within the first 3 minutes of test time. Generally, this system would lockup or reboot within 30 minutes. Once again, this was on a Rowhammer mitigated system using both ECC and double refresh. So it follows that dual mitigations, on some systems, appear to be flawed and can be exploited as a denial of service.
On the fact that it's broken on some systems. Isn't this like complaining about air bags that have been disabled ? It however very good you report that they are broken on some systems.
That said if you want hardware level attacks then attacks against the cachelines of the CPU are considerably more dangerous and reliable and there is very little one can do to mitigate against those.
How does ECC compare to the fiat money ponzi scheme?
I don't know how well its supported by software though.
I don't even want to know how much such a monster would cost.
IIRC when asked, Intel folks say they haven't make a consumer 6+ core part because there's little benefit/demand, but I can't find a citation for that. [1] is a nice discussion about why we haven't gotten more than 4 cores in consumer parts yet, though.
[1] - https://news.ycombinator.com/item?id=12304046
I need to build a new system and am trying to gauge how long I should wait after Zen comes on the market.
A cursory Google search shows AMD first sending Zen (now Ryzen) patches to GCC as early as 2015. So, presumably it has been in a few recent compiler releases already.
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg02311.html
Does that go for motherboards that can support these new chips too?
AMD and Intel engineers have hardware quite early to work on this, long before the consumer product ships.
Then again, for cheap servers, we can get E5-2683v3 here for ¥1900 ($270), 35M cache, 14 cores.
But, given how flaky this can be with Intel configurations, I'd expect the AMD ones to be even flakier. Ryzen's possible popularity may improve the situation.
Then again, I think this is just a very particular use case. I unfortunately have it, but for most people it shouldn't matter.
Guess, we'll have to wait until they come out to see how they behave.
With a practical example:
1 - You boot up your Linux base system.
2 - You fire up KVM with GPU PCI passthrough with a Windows guest to play some games.
3 - You shutdown windows.
Now you can't use the GPU anymore, the module was assigned by KVM and you can't - for instance - run a CUDA simulation in the Linux host. You need to reboot X (you don't need to reboot the system).
The workaround/solution is to fire up KVM with GPU PCI passthrough to another guest (This time a Linux guest) and in there you will have full access to the GPU to do CUDA computations (or whatever else you want).
Maybe we had different setups, so this wouldn’t apply – I used to unbind the device with the following script, and then load the `nvidia` module; the device was then available on the host:
for dev in "0000:01:00.0" "0000:01:00.1"; do
if [ -e /sys/bus/pci/devices/${dev}/driver ]; then
echo "${dev}" > /sys/bus/pci/devices/${dev}/driver/unbind
fi
done
However if you only have one nvidia gpu (and have a different kind of gpu for display), then I think simply running "sudo rmmod vfio" "sudo modprobe nvidia" would work.
What could be of great interest would be if Ryzen came with mobile chips with LPDDR4, that would really put pressure on intel. Then they could also do 50W TDP and still compete with Intel for the Macbook Pros.
The AM4 APUs that are launching with Summit Ridge are still construction cores and they come at 65-90W TDP also.
Until low TDP Zen chips are out and until Zen APUs are out AMD is effectively forfeiting the OEM and mobile markets.
Intel has also managed to scale it's current Core based CPU's to as low as 4.5W TDP which I have a very strong feeling that AMD will not be able to do until 2019-2020 at best, not with the TDP for these chips currently.
That said if AMD does release a 40-45W 4C CPU they'll have a strong chance of being a contender for at least some workstation level laptops, lack of native USB3 and no TB might hurt them but if PCIE over USB-C beats TB then they might still have a chance.
However since the external GPU enclosures became reliable I will never buy a laptop that doesn't support one for personal use unless it's a very very small form factor device and that's unlikely.
Lets see if AMD will be able to come with a similar solution to how seamless TB2/3 with Iris graphics and an external GPU work these days.
Once Raven Ridge will be out we can discuss it, currently Summit Ridge is nearly unknown.