But on a more practical note, what kind of board and toolchain does one need to get this going on an FPGA? Is there a readme somewhere that would walk one through the process?
You'd need a SoC variety of FPGA with a memory controller as I didn't see one in this code base. Putting this in an FPGA seems feasible.
But I see some challenges. It looks like it's a Harvard-architecture core. That means separate buses for data and instructions, which is not common outside of embedded or specialized systems. I'm sure you can setup GCC to work with this, but it would be a project.
You could build (or find) a memory controller that can multiplex separate instruction and data buses to a single memory space... decide data will be in memory range A and instructions in memory range B, and inform the linker where to put code and data.
I'd probably start by downloading whatever free versions of the fpga tools the vender's offer and see if I can synthesize the code with any of the targets and how well it fit. (assuming someone else hasn't posted that info already). If it isn't going to fit in anything supported by the free version of tools, I probably wouldn't go any further with it myself.
Assuming that it did fit, I would switch gears and would build a simulation testbench, and start tinkering to see how it worked as compared to the docs. If it really is strictly harvard, I'd build a bridge to the FPGA's memory controller that could map two buses to a single memory space. If I got that far I'd start working setting up a compiler and linker to map out code and data partitions to that memory space.
At this point you might be ready to build all this and load the FPGA, but you have no peripherals (like ethernet or a vga). I'd consider slaving it to a raspberry pi or something like that. I saw a debug module in the github repo, so that might be a good thing to expose to the raspberry pi. Or pick a simple bus like i2c and use that to get some visibility from the r-pi into the risc-v state and bridge over to the ram.
Another direction you could take would be to get something like a snickerdoodle. I believe it can boot linux with the arm core in the FPGA, and it has the peripherals you need like ethernet(wifi) and access to an SD card. So the direction I would take there is trying to supplant the ARM core with the RISC-V. So the effort there would be to disable the ARM core, which ought to be straight forward, and build a wrapper around the RISC-V core to be able to talk to the peripherals in place of the ARM.
Given that it's an ARM core, I'm sure all the internal busing is AMBA (AHB/APB/AXI), so it's probably pretty reasonable to try this.
In my view, RISC-V pretty much precludes a hard Harvard architecture. The fact that RISCV has a FENCE.I instruction that "synchronizes the instruction and data streams" means that a RISC-V implementation can't really be strictly Harvard, since if it was it wouldn't make much sense to invalidate the icache.
This core has an icache, but that also doesn't make it a Harvard architecture. As long as the backing store of both the icache and the dcache is the same, it won't be any more difficult to work with than any other modern modified Harvard architecture (read: pretty much every computer system in every desktop, laptop, phone, etc in the last decades).
FWIW AVR is not only Harvard, but code memory addresses point to 16-bit words, and data pointers to bytes. Yet, gcc works great (mostly). It is mainly a matter of whipping up a good linker script and directing code and data to the appropriate sections. Not particularly hard, although Harvardness does leak into your C code, mostly when taking pointers to functions or literal sections stored in flash.
*mostly — gcc long ago stopped taking optimization for code size seriously. Unfortunate for uCtlr users, as for small processors like the AVR optimization for size is pretty much also optimization for speed. Gcc has had some pretty serious code-size regressions in the past — but mostly not noticed by people not trying to shoehorn code into a tiny flash space.
Just switching on the wrong SoC feature would bring the entire thing outside the envelope. Even the contents of the passive LCD affected power consumption in adverse ways. Showing a checkerboard pattern could make the device fail.
It sounds like AVR, by contrast, has some parts of the address space that are (x) only and others that are (rw) only, even though other RISC-V processors don't have this restriction. That seems odd to me.
Maybe it's not as odd if you think of memory as a device that is mapped into your address space. ROM could be mapped into an x86-64 address space and be read-only -- even if the page table said it was writable it would probably throw some kind of hardware exception if you tried to actually write it.
Having page tables at all is completely optional on RISC-V. This is a chip with no MMU.
AVRs have three separate address spaces (program, RAM/data, EEPROM), i.e. the address "0" exists three times. Additionally their registers live in RAM.
I'd point out that WASM is also Harvard architecture so that's not so exotic anymore
Maybe it's a way to help existing experts to federate around real use cases to discuss further field improvements, which means outsourced R&D for WD.
Eventually, it will be less work in the future once more people get interested in the subject.
I'm curious how the performance compares to other open-source cores
There are two ALUs.
It seems like highly performant core.
I'm not familiar with "operand read bypass", is this the same thing as "operand forwarding"? Is this a pipeline optimization? Might you have any link you could share on this?
Edit: Not sure why I am getting downvoted, isn't this a valid question?
Edit: There is a little more detail available for an existing Marvell ARM SATA controller (similar to what WD uses now), which this processor is probably the replacement for. That indicates it has 2 ARM Cortex R4 cores. On the Cortex R4, it looks like the FPU is an optional feature, and there is no MMU, just an MPU.
Basically, it looks like the BOOM CPU is intended to be a general purpose CPU, and this is intended to be a very fast/capable microcontroller. It wouldn't run Linux, other than maybe something like uClinux.
It does look to be potentially REALLY fast. See https://www.anandtech.com/show/13678/western-digital-reveals...
However, too little, too late, ee industry. An entire generation of top tier college students have pretty much skipped ee.
However, this is a pretty small sample size (I helped interview at a small company), so I'm not sure if it's a trend.
It doesn't create perfect mixed skills but give a working experience with both sides. For example I hate semiconductor work, dislike circuit analysis, and much prefer software but I enjoy working with VHDLs and embedded software. I'll take Rust or Haskell over Verilog any day but I have no problem working with VHDLs given the need.
As for a whole generation skipping EE, no evidence.
Umm, what? I just looked at MIT's statistics (the registrar publishes them online) and sure, there are a lot more CS students, but there are certainly a lot of EEs.
From a cursory look, it appears there were around ~55,000 CS graduates in 2015 the US, ~11,000 EE, and ~5,000 CE.
I don't agree that a "generation have skipped EE," but I do think it is probably true that CS graduates in general now probably have much less hardware experience/education than has been the case historically (no comment from me on whether that is good/bad).
I’ve seen feedback correct this — shortly after I graduated analog fell out of favor (for jobs) so almost all the analog engineers in industry were old guys....so when the worm turned there was a shortage of experienced analog EEs.
I assume this reversion to the mean will happen more broadly for EEs. Then again, China and Europe are graduating a lot of EEs so it’s not like EE won’t be done, just not in the US.
In Eastern Europe students still study it but then move to software as Web Dev pays 2x as much. In Western Europe students gave up on EE completely since since SW companies make HW ones look archaic.
No grad wants to work for us when he comes for the interview and sees only old guys. My boss has only hired Eastern Europeans, Indians and Chinese lately since he couldn't find any EE. Also, him and other companies not offering competitive wages to EE while complaining of a shortage is the true problem.
Offer competitive wages and grads will flock to this field.
It will end up costing them faaaar more than the salaries and benefits that were saved. Gg
I'd say putting yet another middleman between your logic compiler/netlist and what you actually use is not the best course of action for HW implementations. Even HLS has a big friction for use and is hardly picked up, with its obvious, immediate advantages.
“Starting” is a bit ambiguous.
But I suggest that you first simple read the ISA specification. It’s surprisingly readable and if contains justifications for some design decisions.
After that, buy a small FPGA board and run a picorv32 CPU in it. Or one of the many other RISC-V soft cores.
It’s a small (tiny!) FPGA board that’s still large enough to run a RISC-V CPU. It’s a full open source flow. And there are plenty of examples.
Get that first LED blinking!
Oh, and https://www.nand2tetris.org is great. You implement a simple CPU, and though the language isn't a real-world one you learn enough to probably be able to implement a similar CPU on an FPGA using VHDL or Verilog.
For actually learning RISC-V, you can check out these books: https://riscv.org/risc-v-books/
The Patterson and Hennessy book is a great starting point and the RISC-V reader is great reference.
https://github.com/freechipsproject/rocket-chip (parametric SoC generator, Chisel)
Thank you for the information.
What do you mean? The core here is entirely Verilog/SystemVerilog
How can an ISA ‘use Chisel’? It’s a spec not an implementation.
This is a nice gesture but hardware is a bit different than software and dumping a bunch of RTL code is not really useful.
From my experience writing the design RTL code is at most 25% of the man hours. The rest is verification and some synthesis backend work.
Or give me some google keywords? Is there a link with the pentium bug?
Otherwise it's taking 32 cycles to do the division. There is a count for this.
Edit: This is just the top result from https://duckduckgo.com/?q=pentium+cpu+bug
Pleeeeze I want to have access as an end user. I'd like to do a predicate push down and be able to write into the DRAM buffer and the flush a commit. Or pin certain blocks directly into DRAM. Pleeeeze!
Also see: http://spritesmods.com/?art=hddhack&page=1