Is anyone publishing things like that these days? I mean pages like these:
(I noticed that Agner Fog's chapter on Ryzen is conspicuously missing a "Literature" section.)
One of the problem is that the market for these kind of review are very much a niche. And just like all forms of free media, if there aren't enough page view they stop doing it.
I have always thought some of these media will consolidate, I mean I only ever read Anandtech, Servethehome and some Ars, and that is about it. I have RSS Header news feed from a few other sources such as Tom's hardware, Engadget but if Anandtech cover the same topic I always go there first.
Not only has that not happen, most of these website manage to stay afloat catering for different market. But I have no idea how the market segmentation works. I could tell site like Wcctech is sort of 100% rumours site with very little if any technical knowledge in writing. And yet it gathers huge amount of audience.
While others like Tom's Hardware seems to have retain enough of its news reader to become sustainable.
They tend to bang the drum when it comes to Intel but in AMD reviews you'll get things like "Due to bad luck and timing issues we have not been able to test the latest Intel and AMD servers CPU in our most demanding workloads". It's a lot like reviewing a Ferrari but due to bad luck you could only test it in city traffic.
2 years ago they forgot to cover the Threadripper launch for 2 weeks while the front page was flooded with dozens of uninteresting half page articles about Intel motherboards being launched around the same time. I love a good tech article regardless of which brand they're talking about but bias will always kill the experience for me. YMMV I guess.
Often, review parts are shipped to sites with a review embargo until a certain date -- if you don't ship your review on that date, you lose out; if the shipment is late because of the vendor, or the shipping service, or the reviewer is sick, or out of town, or the shipped firmware isn't great and interim firmware makes a big difference the choices are:
a) take the time to do a full review, but publish late
b) do a cursory review, apologize and publish on time
c) do b, but follow up with a full review as time permits
If C happens more with AMD than Intel, it could be bias, it could be bad luck, or it could be Intel has been delivering more finished things to reviewers.
I get that I can also be biased. But bias should be like noise, taking all of the articles together should average it out. In AT's case it's more like the signal rather than the noise. What really capped it off for me was not covering a public event that every other website covered, like the 2017 Threadripper launch. The signal was that they are even willing to ignore one of the most interesting launches in years to post articles about trivial motherboard announcements. I would never mind if Intel launched some awesome new CPU.
Then the confirmation came the following year, coincidentally also during a Threadripper event when they wrote multiple articles touting Intel's new 5GHz 28 core CPU. They missed the fact that it was a massive overclock chilled by an (admittedly hidden) 1HP chiller and their experience raised no red flags where even the comments did. But worse, when the bubble burst unlike every other publication AT's response was an anemic piece excusing Intel and with the literal conclusion that "the 28-core announcement was not ideally communicated".
I understand Intel's shenanigans to try to steal some of the attention that TR is getting. But as a journalist being played like that should trigger a more visible reaction. Consistently painting them in a good light just raises suspicions for me. And while I still read their articles I no longer take them or the conclusion at face value unless another big site confirms it.
But sometimes I just want 2 Sentence on their Frontpage. Like
1) Today is the launch of AMD Threadripper, here are the Spec. It is exciting to test ( hype ) and we intend to publish a full review within 2 weeks.
Rather than just stay silent on the issue.
2) Today there is a new Intel threat called X, as published here ( Intel Official Documentation ) and here ( Likely the Bug have its own webpage now ). We may cover it with more details in the future.
I understand they have timing and staff issues. But two sentence will show they knew of the issue / press/ release rather than staying Silent on it.
May be Anandtech wants to be a pure review site, but then it is not has a section called News Pipeline. Staying silent on anything at Intel's disadvantage makes me question if they have slight bias towards Intel.
That being said, last year all I could read on their comments section was their bias towards AMD to the point they were being accused of being paid by AMD. They had a ton of AMD coverage, including I believe a one on one with Lisa Su.
So I’m currently taking accusations of bias with a grain of salt.
But if you read my concrete examples above and go to AT's site to confirm their legitimacy I think you will agree that this goes far beyond giving too much attention to one of them during a period of major change. They were willing to do exactly the opposite and refuse attention during a major launch to cover trivial topics for another company, they accepted being repeatedly played by the same company and never publicly held them accountable.
And this last part is arguably the most worrisome because it's no longer about one journalist's personal preference towards one company. It's their journalistic integrity. When you realize you were tricked into deceiving your readers you're expected to take a stand publicly. And at the very least learn from the experience and trust but verify. AT still enthusiastically covered paper launches that never materialized, with no "grain of salt" thrown in there. And it doesn't matter which brand they favor, only that they are not willing to take a stand after being repeatedly played for attention.
I still read them (only as secondary source) and not recommending against it. Just that the implicit trust I had when Anand was writing is off the table for me.
A few years ago they seemed to have additional sources of information (they'd talk about things like instruction-to-port assignments and penalties for moving data between integer and FP domains).
Beyond that it gets into pretty deep expertise into both Intel and AMD for comparison of the approaches and I assume most such high-level experts works for either AMD or Intel so you would not get an impartial view anyway
To be honest, though, I don't see a substantial difference between the haswell article you linked and the Zen 2 article, provided you are willing to look past the AMD slides. The haswell article is also just "putting the presentation slides into words," just from IDF 2012 instead of AMD Tech Day 2019, and apparently the author felt a need to do the block diagrams themselves.
(Also, FWIW, Agner's manual does have a literature section for Ryzen, it is just not numbered for some reason).
There's also just the trend of modern designs being tricky enough it's harder to infer as much about them and harder to write accessibly about what you do know; it's not super easy to figure out and describe, say, modern branch predictors simply because they're all layering a lot of strategies on each other.
For example, from Haswell on, Agner Fog essentially said Intel's large-core branch predictors are good at lots of things but there's not much he can say about how they work (p29 at https://www.agner.org/optimize/microarchitecture.pdf). Writing code to beat Cortex-A76 prefetchers, AT's Andrei Frumusanu had difficulty fooling them with anything other than essentially-random access patterns and compared them to "black magic" (https://twitter.com/andreif7/status/1102230575522430977). These aren't just random folks saying "wow, CPUs are complicated"; they successfully figured lots of stuff about past generations of CPU.
AMD did reference the TAGE family of branch predictors, which there's lots about in public literature. There might be some broadly interesting stuff in the vendors' contributions to gcc/LLVM (machine models and arch-specific optimizations).
Maybe ARM implementors talk a little more about their stuff? That might have something to do with the dynamics of the relatively open/diverse market for ARM SoCs versus the long-running one-on-one-ish x86 rivalry.
Hard to boil all that down to a single point, but if AMD and Intel want to talk more about the guts of their products, I'm sure plenty of grateful wonks would lap it up. :)
I used it in undergrad to run benchmarks with different cache sizes and cache coherence strategies to see which were more effective. I'm sure Intel and AMD have much more advanced simulation tools though. Most likely multiple, or at least multiple levels of granularity (so you could do stuff like, simulate these potential branch predictor designs at a gate level, and then turn around and simulate the entire CPU at a higher level of abstraction.)
One optimized for node.js tasks, one for databases, ...
On the other side of the computing spectrum, there were a couple of papers in 2010s about offloading most common mobile tasks (like CSS layouting) to the specialized mobile CPU subsystems. Maybe this should have been implemented if Google made their own mobile CPU.
At least two different tasks, audio and graphics, have special application specific processors. Networking and crypto are also often offloaded.
as a DB guy, there's no 'one task' for DBs . The only thing I can think that is nearly characteristic of DBs I've worked on is that they're IO bound.
That's possibly true of most things except Floating point and graphics.
*which I swear every company has their own version of
Every documentation I've seen is quite light on the branch prediction improvements. Going by the slides, they improved is accuracy by 1/3; I'd be curious to know how.
Side note: if your superscalar is big enough (yeah, those registers use power), couldn't you just get rid of branch prediction at no performance cost (doing something else while waiting for the data)?
My only grudge against Zen (as a consumer) is that the AM4 socket is intended for both APUs and CPUs. While this is a good thing, I have a couple utterly useless video outputs on my motherboard. I would have liked AMD to include some display driver circuitry on every chip. Maybe in the I/O die, if they use such a thing in all of their designs going forward? I mean, I would be quite content with using software rendering when I need to drive a screen, or even spare a bit of memory bandwidth and CPU cycles to drive an extra display from my desktop's graphics card.
In one of the pictures in the article, it says the new architecture uses the TAGE Branche predictor. This is likely based on the work of Andre Seznec. There are many articles on the implementation (but they can be difficult to understand if you are not already familiar with his work).
I've implemented the bare bone predictor on a computer architecture course, you can see an abridged version of my presentation slides here. Note this only describes the bare bone predictor, in recent work Andre Senzec added a Loop predictor and a Statistical Correlation Unit to increase the accuracy.
There are some work using TAGE with perceptions in the Statistical Correlation unit.
It is nice to see research being applied to new mainstream chips relatively quickly. In complement to your slides, there is a short overview here  (this is actually the first search result).
Edit: the picture from AMD in this review makes me think it can hit 16 memory channels with the two socket version. Does anyone know if this is true?
Yes, if the motherboard provides all the necessary slots. The inter-socket communication is achieved by re-purposing CPU pins used for PCIe, not pins used for DRAM. Each CPU has the full 8 DRAM channels of its own.
"There are a total of eight DDR4 memory controllers on this hub chip, the same number in total that were on the Naples complex; both support one DIMM per channel and have two channels per controller, but Rome memory runs slightly faster – 3.2 GHz versus 2.67 GHz – and therefore with all memory slots filled, yields a maximum of 410 GB/sec of peak memory bandwidth per socket. That’s 45 percent higher than the Cascade Lake Xeon SP processor, which has six memory controllers for a total of 282 GB/sec of memory bandwidth running at 2.93 GHz and 21 percent higher than the 340 GB/sec that Naples turns in running that 2.67 GHz DRAM. (Those are ratings for two-socket servers.)"
I would say though that if you are bringing this to the code standards of today then this should really be wrapped up in some kind of unit test (https://github.com/sstephenson/bats )for it to pass the PR. That would make the code a bit more maintainable and can be integrated as a stage in your CI/CD pipeline.
If we do that then the intent would be clarified by the input and the expected output of the test. Then then the code would at least be maintainable and the readability problem becomes less of an issue when it comes to technical debt.
I've done this plenty of times with my teams and its certainly helped.