Really good talk by Bryan Cantrill (co-founder of Oxide Computer COmpany) on the problems with closed source firmware and the need to move away from BIOS and EFI and how they booted their own x86 AMD custom board system all made with open source firmware and without using AMD's firmware.
I have booted a few SoCs without a BIOS or anything of the sort before (nothing nearly as big as an AMD Milan chip). Doing this with a huge, fast chip is really impressive from the folks at Oxide. DDR5 DRAM training is also a ridiculously complicated and touchy exercise, as is some of the PCIe link training that (according to the talk) Linux/Unix handles.
Apparently the DRAM training is apparently one piece of AMD firmware I believe they re-used. Though I had trouble following that part. I think they said that, but they only didn't do it because they didn't need to because they had to pick and choose battles.
Edit: Yes, I believe he says they used the AMD firmware for the PSP (Platform Security Processor).
Edit2: This post may actually be incorrect. Please go watch the talk. I'm not sure anymore.
No, you're correct: the PSP does DIMM training. Also, note that this is AMD Milan, so it's DDR4, not DDR5 -- DDR5 is still forthcoming from both AMD and Intel.
DDR4 link training is a bit easier to comprehend than DDR5, but still a headache. I think almost every high-performance SoC uses an auxiliary processing core for link training at this point (including large FPGAs).
I have been waiting for someone to find a security vulnerability in one of these cores.
One part I couldn't follow on your talk is which software is Oxide designed and which is vendor supplied software? Have you really stripped out every piece of vendor software from the product? Or are there still some parts that are vendor supplied black boxes?
The question after your talk implied that DIMM training was still being done by non-Oxide software.
On AMD, DIMM training is done by the PSP. And you have to run the PSP to run the SoC, so we have no alternative there.
More generally, we have implemented and opened everything we can; there still remain opaque bits like the PSP, as well as some smaller bits scattered through the machine (e.g., SSD firmware, MCU boot ROMs, VR firmware, etc.). We have endeavored to make as open a system as possible within the constraint of not making our own silicon. And while we haven't talked about it publicly, we will also open our schematics when we ship the rack, allowing everyone to see every component in the BOM. While we do have some (necessary) proprietary blobs in the system, we want to at least be transparant about where they are!
What's the "VR" in "VR firmware" stand for? Unless Oxide has invented a revolutionary VR-based user interface for server hardware, I expect I misread that line. ;)
Oh, sorry: "VR" is a "voltage regulator" in this context. You can see our drivers for these parts in Hubris, our all-Rust system that runs on our service processor.[0] All of our drivers are open, but the parts themselves do contain (small) proprietary firmware blobs.
Why not write your own firmware for these? They may be a lot more sophisticated than I am thinking, but voltage regulators usually have a datasheet/manual with a set of control registers, and you likely want to set those registers based on the physical hardware you have.
The regulators are actually quite sophisticated and have many undocumented registers that set how things like the communications with the processor work, nonlinear control algorithms, etc.
Very interesting. I would assume that parameters for the control algorithms are actually a few of the things you want to set for yourself, since that lets you optimize system stability under load switches. Your board might also have more or less inductance or capacitance than other motherboards, and this can affect the stability and performance of the control loop a lot. It's a shame they don't give you the documentation about those control systems to figure out how to set those parameters for yourself.
My very limited understanding of DIMM training is that it's about the digital system learning and setting the precise analog timing settings required to talk to the DIMM. Every memory cell has tiny manufacturing differences and so these need to be learned on the fly at computer boot and they change over time with use so need to be re-determined every boot.
Timing adjustment per x number of data bits has been required since DDR3 but DDR4 also has internal reference voltage calibration for DQ bits (VREF_DQ). This voltage sets threshold by which the IO cell determines if a voltage represents a logic high or low. This VREF_DQ value is calibrated per x number of bits in addition to adjusting the timing to try to find the best place to sample the signal.
This[1] page does a decent job of going into the initialization procedures of DDR4, and why the various steps are needed. Really quite fascinating and complex, due to the high speed.
The essence is that the signals are so high-speed, ie each bit takes a very short time, that the physical distances between DIMM modules and DRAM modules on a DIMM start to matter and has to be compensated for so that all relevant signals arrive at the same time at a given DRAM module.
I've worked on this before. We had explicit help from AMD - their documentation is better than Intels for this stuff, but still.
if it weren't for linux's dependence on the bios pci and memory probes, you could really cut out all that crap. you need to program the memory controllers and bring in the kernel from storage.
oh no, my recollection was that at least at the time, linux used the apci pci traversal to start driver discovery instead of doing its own. obviously after that it assumes complete control of the device tree.