AMD Ryzen is a good architecture at a good price. But compared to Intel, there are two important differences IMO:
1. pext / pdep are emulated -- It takes many cycles for pext and pdep instructions to be executed, while Intel can execute them once per clock. This is a crazy awesome instruction for any low level programmer, and its a shame it isn't possible to utilize it on AMD Zen processors.
2. Zen is a bit slower with 256-bit AVX Instructions.
1. Zen offers more cores per dollar
2. Zen offers two AES-encryption units per core. This means you can run two AES-instructions per clock tick. Dunno why AMD does this, but its kinda cool in some obscure cases I've coded.
1. Limiting it to a subset of chips, and initially not releasing it for client chips at all. Creating ISAs for a small part of the market is the best way to ensure it does not see any use.
2. Reasoning about AVX512 performance is ridiculously difficult for most workloads because of the clock penalty. Unless you have cases where you are all AVX512 all the time, you will likely see performance drops.
I also created a bitmask representation of relations, which would represent 1-variable in 4-bits, 2-variables in 16-bits, 3-variables in 64-bits, and 4-variables in 256 bits.
Ex: Texas / Oklahoma / Arizona relation would be a 64-bit number ("true" means a color-set is in the relation. "False" means the color-set is not in the relation), and extracting or packing the data into these three variables would be a pdep or pext operation.
Extracting data (pext) would be a "select" operation. While PDEP + OR would be a "update" operation over the relation. I've written a join for fun, but I haven't gotten much further than that. First, because pdep / pext were slow on my machine. Second, because I figured out an alternative solution to my particular problem.
I think the pext / pdep instructions have HUGE implications to 4-coloring problem, 3-SAT, Constraint Solvers, etc. etc. More researchers probably should look into those two instructions.
Just look at Binary Decision Diagrams, and other such combinatorial data structures, and you can definitely see the potential uses of PEXT / PDEP all over the place.
I have used pext/pdep for the iterator implementation (iterate over all subset of a card set, or all combination of n cards).
(e.g. To iterate over all 10 card combination brute-force and filter and print out all the ones which can be knocked (evaluating 15_820_024_220 hands) takes 70 seconds - one threaded - on my 7th gen intel i3.)
Voxels were stored in a buffer sorted in Morton order. The idea was to balance performance improvements realized by increased spatial locality against the cost of computing the Morton codes. The trade-off was only worthwhile on Intel because of the use of pdep/pext in optimized encode/decode functions.
I imagine something similar would probably apply to texture lookups in a software 3D renderer.
Except that for a 3D rasterizer you'd probably be better off calculating Morton code for 8 pixels at once in a SIMD register and then using vpgatherdd to fetch 8 ARGB pixel values "in parallel" (in theory, in practice AVX2 gather might not be any faster than scalar loads).
I don't use them often, but there are cases where I would not want to try to write code without them.
Its a new fundamental bitwise operator. Some other programmers have called it a "bitwise gather (pext) or bitwise scatter (pdep)" (EDIT: Had it backwards the first time). Its a very powerful way to think about bits that Intel just invented in those instructions.
If you have any data-structure that is bitwise, I can almost guarantee you that PEXT or PDEP will be useful in some operation. These instructions have been used to calculate bishop / rook moves in less than five operations.
And yes, remember that bishops and rooks can be blocked by other pieces. So given all the pieces on a chessboard, and the location of the bishop in question, calculate all possible locations the bishop can move (after accounting for "being blocked").
Tldr: Works for motherboards that support it, but not officially supported/tested/etc.
Shouldn't we have moved to ECC RAM everywhere a long time ago? With economies of scale would it actually be any more expensive or slower? There's no place where the extra safety is a negative, is there?
Also the lost productivity due to main memory errors not being detected probably easily goes into the billions. Thanks, Intel.
 There was a time when consumer systems genuinely didn't support ECC for lack of hardware support. This hasn't been the case for many, many years.
Also no other interconnect in your system is as much of a bottleneck as the one to main memory is. It's worth keeping that in mind before entirely blaming this on "market segmentation". ECC RAM does actually slow down the part of a system that is already the bottleneck in most common situations.
I'm not aware of any desktop CPU that doesn't have ECC caches. CPU internal busses use ECC and external interconnects (e.g. PCIe, DMI, QPI/UPI) use it as well.
> ECC RAM does actually slow down the part of a system that is already the bottleneck in most common situations.
ECC invariably introduces some additional latency in the memory controller, but I don't see a persuasive argument why it would reduce throughput. It would surprise me if this additional latency is measurable, given that the ECC logic is already in the core and in the data path anyway, and the system configuration (AMD, Intel) / CPU fuses (Intel) only decide whether it is active.
That being said buffered ECC modules are usually not the fastest. I don't think that this is due to any technical limitation per se, but rather market demand (cost, perf per Watt).
I believe all of those are just parity checked and not ECC?
> ECC invariably introduces some additional latency in the memory controller
Latency is a non-trivial factor here, too, though.
> That being said buffered ECC modules are usually not the fastest. I don't think that this is due to any technical limitation per se, but rather market demand (cost, perf per Watt).
Poking around it looks like ECC RAM tops out at DDR4 2666 @ 1.2v. By comparison there's no shortage of DDR4 3200+ options at 1.2v. Whether or not this is purely market demand or not, it doesn't seem like there's a purely power reason for it.
But you also can't solely blame Intel for a lack of market demand. Even when the choice is there nobody seems to be making ECC memory for high-end desktop usages. Where's the DDR4 3200 for Threadripper or Xeon-W workstations, for example? They surely benefit from the improved bandwidth, or else they wouldn't have triple & quad channel memory. And they'd surely pay the price of admission, because we're talking $3,000+ entry points for builds.
Right now DDR4 is fairly new, but as old servers get rotated out I expect a good market for cheap ECC DDR4 sticks that come from used servers but are too small to get reused in new servers. (unregistered/unbuffered is still a problem though)
Whether it's needed or not...that's use case dependent.
That's actually a price I would be willing to pay.
Most people would be fine with a $200 Chromebook, $500 bare-bones Windows notebook, or a garden-variety PC from 2007 but millions of us choose to pay more because we value the additional things that newer, more powerful computing devices give us.
Lots of professionals and enthusiasts gladly pay large premiums for higher-spec gear, even when the improvements are quite small, because those small improvements are enjoyed over many thousands of hours of lifetime use.
Perhaps more to the point, you already see gamers paying premiums for higher-specced memory to enable their overclocking and tweaking endeavors.
So I definitely think there's a market of people who'd pay more for ECC...
The cost difference for ECC amortized over the life of the hardware is negligible compared to the annoyance and time spent trying to work out what's causing those random bluescreens/reboots/corruption.
>Whether it's needed or not...that's use case dependent.
Even for gaming, having your game crash/misbehave because of bit flips is at the very least annoying. But maybe this is uncommon enough that it doesn't make enough difference?
My gaming system with non-ecc memory will run memtest all day long never spotting a single error.
I know these errors can & do happen, but I'm fairly certain the number of crashes I've experienced that would have been prevented using ECC memory is single digits at most.
But ECC being in general slower than non-ECC? Well that's very noticeable.
Drive fails and no backups? Potentially terabytes of data vanishes. House burns down and no insurance? Hundreds of thousands of dollars to repair. No ECC and a bit flips? Nothing happens, program crashes, or maybe a single file gets corrupted in an unrecoverable way.
The other thing that annoys me about PC hardware is that the motherboard tries as hard as possible to make your system unstable. I don't want overclocking. I don't want to run the memory at XMP speeds. Just give me a button for "run everything at its conservative spec". (With that in mind, I'm not sure memory manufacturers test anything other than their XMP timings, leaving you to guess whether the non-XMP profile has the right voltage/latency numbers. It's infuriating!)
This wasn't my experience either on memory.net or crucial.com
Yes it is, much more than the 1/8th more RAM would require.
> much slower
to quantify, the fastest currently is DDR4-2666 while non-ECC goes to the crazy DDR4-4700.
As far as I can tell crucial.com shows a total of 2 options. Both 16GB UDIMMs. One tall, one short. I think last time I checked it had none. I didn't see a single option on memory.net. Remember that you need Unbuffered/Unregistered ECC (UDIMMs with ECC) and there aren't many options of those. RDIMMs will not work.
It's also hard to verify the DIMMs will work because the motherboard's QVL doesn't list any of the ones I've found so far. Not all retailers will have these either, so that seems hard enough to find for me.
Hard to find -- perhaps. Although these days, with internet and so many sellers, even hard to find things are quite easy to find.
Expensive -- just a little bit more than non-ECC RAM. Probably because of the couple more chips ECC RAM uses.
Much slower -- slower for sure, but again, I think it's not much slower.
Kingston, Samsung, Crucial / Micron all sell UDIMMs.
There are also posts at other fora complaining that Hardware Canucks is wrong to suggest that an uncorrectable error should result in an immediate system halt - I leave that argument to those who are interested.
It has 6 physical cores (12 virtual) which I think is enough for almost all workloads. No complaints.
They've now been superseded by the i9-9700k and i9-9900k which each have 8 physical cores and are slightly faster per single threads.
Software is nowhere near making use of all those threads under most circumstances IMO.
Cue all the people saying how they couldn't live without their 32 threads...
I can live without 24 threads, but I love not having my computer become unusable because I'm encoding video or doing some other CPU-heavy task. having more than 4 threads has opened a whole new world of thinking of how to parallelize common tasks - not ever having to wait for your computer feels like a super-power. Paradoxically, this has freed me up to use an ARM chromebook for day-to-day usage - when I need firepower, I remote into the TR workstation (smartplug + boot on power BIOS + dynamic DNS)
I'll never stop feeling a little exhilaration from typing "make -j 22"
I ended up installing ESXi on the machine, and passing through my video card, usb and nvme to the windows vm. It works great. I then use the spare compute for running vSphere
Why would you expect anything different when JS _does_ run in a single thread? We'll have to wait for WebAssembly to have anything like real multithreading, with good-enough performance, on the Web.
No reason for site A rendering to block site B; no reason for either to block the main UI. No reason for an issue tracker to take 10 seconds to achieve interactivity on LAN (heck, I'd consider 0.5 seconds slow).
Specifically an Erlang/Elixir-ish actor async implementation
I run Debian Stable. When I swapped out the CPU (Ryzen T 2700X), Motherboard, and RAM, and I powered on Debian, it booted up normally, and it automatically configured itself to run the new CPU, motherboard, and RAM.
Even windows has gotten better about this, its usually possible to image drives and boot them in a VM without everything exploding.
In my mind you're doing something incredibly specialized if you notice the difference between AMD and Intel, or between current generation and last generation CPUs. Video encoding is really the only "mainstream" application I can think of.
I mean theoretically my 2700X has slight worse performance per core (though not at the same price point, not fair to compare a 350 processor with a 600+) but it doesn’t matter when I have webpack running with 4 threads, type checking on a separate thread, a DB server and IntelliJ all running with not remotely a stutter.
More cores are important, but the i7-8700 uses 65W, a single thread is faster than AMD and it has six cores / 12 threads. To get close performance but with 8 cores / 16 threads I'd have to get the 2700X, which is more expensive, plus a video card, which is much more wattage and in order to not get a throwaway video card much more expensive. There are also benchmarks that show AM4 has storage performance issues between 10 - 30%, which could affect complex workflows involving builds and containers.
Still, I'm within the return period and trying to find a way to justify AMD, since my work is increasingly about containers. It's just such a huge timesink.
OTOH, AVX2 peak rate is lower for Zen than recent Intel offerings.
Now if i know that AMD is Planning to release the zen2 in May(?) I think I have to buy the 5 2600 because of the price fall at release.
I also need any Tips for a good Mainboard :)
Only a select few games are able to use more than 6 cores, and then only in some situations. For compilation and other workstation tasks the 8-cores (and more) are king, but they're expensive, so unless you have money to blow, go for the 2700.
Poorly designed older games. Modern games should be able to use all cores, because Vulkan is available. Something like dxvk for example is using as many cores as reasonable for compiling Vulkan pipelines.
See its config Wiki (dxvk.numCompilerThreads parameter) : https://github.com/doitsujin/dxvk/wiki/Configuration
So more cores can help for gaming, no doubt. Just depends on the use case.
Funny, its not many years ago people said games can use just 1 core, then 2 cores, 4... and now 6. Isn't it likely this progression will continue?
Work-stealing queue design in games can use as many cores as you can throw at it (at least until memory bottleneck).
And what is with the diff between 2600 and non x? I think I need an aftermarket cooler for both variants .. and the 2600 (non x) are cheaper in Power consumption and the clock losses are acceptable, or not?
And a b450 Mainboard will also be good enough? Im using a Nvidia 1060 6gb graphic Card.
It’s an excellent machine as is the 1700X I have at work.
Honestly day to day for dev I can’t tell the difference between the 1700X and the 2700X.
If you want to buy a decent machine and not spend time finding out what is decent right now, get apple. If you want control and a perfect tuning for your particular situation, definetly do not get apple.
Sticking to a closed platform means many things, including, for instance, that you 'may never get a taste of' building your own box...
I wonder what the benefit of an upgrade would be.
Another bonus: the stock CPU fan, although flashy with it's RGB LEDs, is very formidable and can probably even stand up to a bit of overclocking.
I am glad to see competition in the CPU space again. It's been too long.
I may have to try that when I get back. I have a Ryzen 5 2700x and an AMD Fury 9x.
It's extremely easy to set up the PCI passthrough itself in virt manager, the system-level configuration is a bit more involved. You may also want a KVMFR like Looking Glass, since otherwise you'll need a physically separate keyboard/mouse/video setup.
I'm using Nix, so my system configuration is easy to summarize:
(On that note, I can definitely recommend NixOS, it's hard to even describe how helpful it's been in making my configuration understandable and reproducible.)
There are plenty of guides as well. Here's one for NixOS, but undoubtedly you can find more.
I don't think most commercial VM solutions support this kind of configuration, I'd guess Vbox might but I know for a fact VMware Workstation doesn't. (and there's no VMware Workstation package for NixOS yet, so my license is collecting dust at the moment :()
It's worth noting you need a separate GPU for this right now. Intel just recently started supporting something called GVT-G that lets you split an Intel IGP into multiple VMs, not as useful for me since I want a better GPU but maybe useful to others. I have yet to try it.
I'd like a setup like this in my future!
It is indeed a pair of NVidia cards, but that part only matters a little bit. I don't really highly recommend NVidia for the host and as far as I know you can run whatever card you want on the Linux host. Looking Glass may care about the guest GPU simply because it's still a bit experimental, but there's not really any reason I'm aware of it can't work with AMD or Intel graphics processors.