Hacker News new | more | comments | ask | show | jobs | submit login
Verilog sources for Western Digital's open source RISC-V core (github.com)
320 points by obl 27 days ago | hide | past | web | favorite | 78 comments

First of all, kudos to WD. This makes me feel good about spending $1500 on their spinning drives just last week.

But on a more practical note, what kind of board and toolchain does one need to get this going on an FPGA? Is there a readme somewhere that would walk one through the process?

Depends on what you want to do I guess.

You'd need a SoC variety of FPGA with a memory controller as I didn't see one in this code base. Putting this in an FPGA seems feasible.

But I see some challenges. It looks like it's a Harvard-architecture core. That means separate buses for data and instructions, which is not common outside of embedded or specialized systems. I'm sure you can setup GCC to work with this, but it would be a project.

You could build (or find) a memory controller that can multiplex separate instruction and data buses to a single memory space... decide data will be in memory range A and instructions in memory range B, and inform the linker where to put code and data.

I'd probably start by downloading whatever free versions of the fpga tools the vender's offer and see if I can synthesize the code with any of the targets and how well it fit. (assuming someone else hasn't posted that info already). If it isn't going to fit in anything supported by the free version of tools, I probably wouldn't go any further with it myself.

Assuming that it did fit, I would switch gears and would build a simulation testbench, and start tinkering to see how it worked as compared to the docs. If it really is strictly harvard, I'd build a bridge to the FPGA's memory controller that could map two buses to a single memory space. If I got that far I'd start working setting up a compiler and linker to map out code and data partitions to that memory space.

At this point you might be ready to build all this and load the FPGA, but you have no peripherals (like ethernet or a vga). I'd consider slaving it to a raspberry pi or something like that. I saw a debug module in the github repo, so that might be a good thing to expose to the raspberry pi. Or pick a simple bus like i2c and use that to get some visibility from the r-pi into the risc-v state and bridge over to the ram.


Another direction you could take would be to get something like a snickerdoodle. I believe it can boot linux with the arm core in the FPGA, and it has the peripherals you need like ethernet(wifi) and access to an SD card. So the direction I would take there is trying to supplant the ARM core with the RISC-V. So the effort there would be to disable the ARM core, which ought to be straight forward, and build a wrapper around the RISC-V core to be able to talk to the peripherals in place of the ARM.

Given that it's an ARM core, I'm sure all the internal busing is AMBA (AHB/APB/AXI), so it's probably pretty reasonable to try this.

I wouldn't call this Harvard. It has ICCM/DCCM (closely-coupled memories) that appear to be strict (i.e. core cannot ld/st to ICCM), but everything is in the same (physical) address space, and the CCMs are just mapped into that space. In that sense, the ICCM is not much different from having a ROM in the memory map, which isn't usually seen as making something a harvard architecture.

In my view, RISC-V pretty much precludes a hard Harvard architecture. The fact that RISCV has a FENCE.I instruction that "synchronizes the instruction and data streams" means that a RISC-V implementation can't really be strictly Harvard, since if it was it wouldn't make much sense to invalidate the icache.

This core has an icache, but that also doesn't make it a Harvard architecture. As long as the backing store of both the icache and the dcache is the same, it won't be any more difficult to work with than any other modern modified Harvard architecture (read: pretty much every computer system in every desktop, laptop, phone, etc in the last decades).

> I'm sure you can setup GCC to work with this, but it would be a project.

FWIW AVR is not only Harvard, but code memory addresses point to 16-bit words, and data pointers to bytes. Yet, gcc works great (mostly). It is mainly a matter of whipping up a good linker script and directing code and data to the appropriate sections. Not particularly hard, although Harvardness does leak into your C code, mostly when taking pointers to functions or literal sections stored in flash.

*mostly — gcc long ago stopped taking optimization for code size seriously. Unfortunate for uCtlr users, as for small processors like the AVR optimization for size is pretty much also optimization for speed. Gcc has had some pretty serious code-size regressions in the past — but mostly not noticed by people not trying to shoehorn code into a tiny flash space.

For ARM, I have seen IAR emit unoptimized code half the size of GCCs output with speed optimizations enabled (which lead to slightly smaller code than size optimization!). When optimizing for size, IAR shrank the code size by another 1/3 to 1/2. When you are really strugging to fit functionality into a tight contoller because your hw engineers won't put in a bigger contoller, this is a dealbreaker.

Agree completely that it is a deal breaker. But don’t be too hard on the hardware engineers, sometimes the BOM can not afford the extra 17 cents.

I guess the tone came out wrong. I understand why we got the HW we got. In this case it wasn't even about cost. The power available to the device as a whole was so low that we were counting microamps. It was all incredibly tight:

Just switching on the wrong SoC feature would bring the entire thing outside the envelope. Even the contents of the passive LCD affected power consumption in adverse ways. Showing a checkerboard pattern could make the device fail.

It's surprising to me that the RISC-V ISA specification is loose enough that a core could be considered RISC-V-compliant and yet also need a linker script to accommodate its peculiarities.

?? linker scripts are about the layout of the executable. The OS (if there is one) is the driver of that. ISA spec is an orthogonal concept.

On x86-64, the page tables control memory access permissions (rwx). The layout and semantics of these page tables are defined by the ISA and work the same on any processor. You wouldn't need to lay out your binary differently on Intel vs. AMD, for example.

It sounds like AVR, by contrast, has some parts of the address space that are (x) only and others that are (rw) only, even though other RISC-V processors don't have this restriction. That seems odd to me.

Maybe it's not as odd if you think of memory as a device that is mapped into your address space. ROM could be mapped into an x86-64 address space and be read-only -- even if the page table said it was writable it would probably throw some kind of hardware exception if you tried to actually write it.

(AVR isn't RISC-V)

Having page tables at all is completely optional on RISC-V. This is a chip with no MMU.

> It sounds like AVR, by contrast, has some parts of the address space that are (x) only and others that are (rw) only, even though other RISC-V processors don't have this restriction. That seems odd to me.

AVRs have three separate address spaces (program, RAM/data, EEPROM), i.e. the address "0" exists three times. Additionally their registers live in RAM.

Every architecture has a gcc linker script, it's just that most of them have a default one hiding in an "arch" directory somewhere that works for normal use.

It occurs to me that the more likely explanation is that GCC devs aren’t up to date on the specs and that perhaps a unified situation is possible, just not implemented.

> It looks like it's a Harvard-architecture core. That means separate buses for data and instructions, which is not common outside of embedded or specialized systems.

I'd point out that WASM is also Harvard architecture so that's not so exotic anymore

That sounds like considerably more work than I was hoping it would be. So the follow-up question, then. Who is this release for, in your opinion?

I don't know why WD released this, but it would be useful for people building SoC ASICs that don't want to license an ARM core. Depending on the licensing.

Maybe this will be picked as teaching material in universities by students or teachers, which means potential future candidates for WD.

Maybe it's a way to help existing experts to federate around real use cases to discuss further field improvements, which means outsourced R&D for WD.

Eventually, it will be less work in the future once more people get interested in the subject.

Memory controllers can be implemented in the soft logic, so should not require a hard memory controller or SoC FPGA.

The core speaks AXI and AHB-Lite, so for an experienced FPGA/core guy, it probably wouldn't be too much work to integrate into their own FPGA flow. And since it doesn't have any floating-point, it will probably be able to fit on modestly-sized FPGAs.

I'm curious if the FPGA tools support the system verilog syntax that this is using. I'm an FPGA designer, but have not switched to system verilog.

They have an open-source software simulator for this as well: https://github.com/westerndigitalcorporation/swerv-ISS

I'm curious how the performance compares to other open-source cores

From cursory glance they have operand read bypass and branch prediction. Operand read bypass speeds things up considerably (factor is 1/N, where N is pipeline length) and the presence of branch prediction hints at speculative execution.

There are two ALUs.

It seems like highly performant core.

>"From cursory glance they have operand read bypass and branch prediction."

I'm not familiar with "operand read bypass", is this the same thing as "operand forwarding"? Is this a pipeline optimization? Might you have any link you could share on this?

It is the same thing. Just different name.

How does it compare to BOOM [1], both RISC-V and both open source.

[1] https://github.com/riscv-boom/riscv-boom

Edit: Not sure why I am getting downvoted, isn't this a valid question?

This one doesn't have out of order processing, a FPU, or a MMU. It's also 32 bit where BOOM is 64.

Edit: There is a little more detail available for an existing Marvell ARM SATA controller (similar to what WD uses now), which this processor is probably the replacement for[1]. That indicates it has 2 ARM Cortex R4 cores. On the Cortex R4, it looks like the FPU is an optional feature, and there is no MMU, just an MPU[2].

Basically, it looks like the BOOM CPU is intended to be a general purpose CPU, and this is intended to be a very fast/capable microcontroller. It wouldn't run Linux, other than maybe something like uClinux.

It does look to be potentially REALLY fast. See https://www.anandtech.com/show/13678/western-digital-reveals...

[1] https://www.marvell.com/storage/assets/Marvell_88i9441_Solei...

[2] https://developer.arm.com/products/processors/cortex-r/corte...

Because you’re comparing a workstation class out-of-order CPU design to a teeny tiny embedded chip lacking an FPU or even a unified code/data cache. It’s like saying “how does the Intel Atom compare against the AMD Ryzen Threadripper core?”

Compared to the most other RV32 soft core implementations, this one is far from being teeny tiny!

It would be their dream come true if the hw oss community (very small) takes this up and develops it into something huge like for sw projects. There is so much manpower needed to develop these kinds of things - what better way than oss.

However, too little, too late, ee industry. An entire generation of top tier college students have pretty much skipped ee.

Nowadays the Computer Engineering degree lets those of us who were torn between EE and CS get both. At least at Virginia Tech, CpE(Computer Engineering) easily outstrips CS and teaches the basics of CS and EE fundamentals as well as practical application of both of them.

Sure, that makes sense in theory, but of the computer engineers I've known, they prefer one end or the other and aren't as good at either as someone dedicated to just one.

However, this is a pretty small sample size (I helped interview at a small company), so I'm not sure if it's a trend.

Ya most of us tend to prefer one side or the other but this system tends to be a good way to get create would be EEs with a decent bit of exposure to CS and vice versa.

It doesn't create perfect mixed skills but give a working experience with both sides. For example I hate semiconductor work, dislike circuit analysis, and much prefer software but I enjoy working with VHDLs and embedded software. I'll take Rust or Haskell over Verilog any day but I have no problem working with VHDLs given the need.

Lots of unsubstantiated claims here. WD doesn't need the OSS community, the community won't make or break the project. They're a 'nice to have' at best. Remember, if this really means savings of $X (that goes to ARM ??) per unit of storage shipped, given their volumes, they'll gladly fund this to the finish line. Claiming WD needs the OSS community is like claiming chrome/Google needs the community. Sure, the community contributions are valuable, but the core funding and heavy lifting is always in-home.

As for a whole generation skipping EE, no evidence.

Looks like royalties are somewhere in the 2% range for ARM[1]. There's also whatever markup companies like Marvell charge to WD. It sounds like WD sells around 40 million drives per year. So, even a savings of 25 cents per unit would be $10M/year.

[1] https://www.anandtech.com/show/7112/the-arm-diaries-part-1-h...

> An entire generation of top tier college students have pretty much skipped ee.

Umm, what? I just looked at MIT's statistics (the registrar publishes them online) and sure, there are a lot more CS students, but there are certainly a lot of EEs.

I would think MIT might be a bit special, given that both EE and CS are Course 6 - some other schools have a similar setup (i.e. one EECS department), but many do not.

From a cursory look, it appears there were around ~55,000 CS graduates in 2015 the US, ~11,000 EE, and ~5,000 CE.

I don't agree that a "generation have skipped EE," but I do think it is probably true that CS graduates in general now probably have much less hardware experience/education than has been the case historically (no comment from me on whether that is good/bad).

The institute breaks out 6-1 and 6-2 (took me a while to figure that because the registrar writes VI-1 :-).

I’ve seen feedback correct this — shortly after I graduated analog fell out of favor (for jobs) so almost all the analog engineers in industry were old guys....so when the worm turned there was a shortage of experienced analog EEs.

I assume this reversion to the mean will happen more broadly for EEs. Then again, China and Europe are graduating a lot of EEs so it’s not like EE won’t be done, just not in the US.

The number of EE grads is going down in Europe too.

In Eastern Europe students still study it but then move to software as Web Dev pays 2x as much. In Western Europe students gave up on EE completely since since SW companies make HW ones look archaic.

No grad wants to work for us when he comes for the interview and sees only old guys. My boss has only hired Eastern Europeans, Indians and Chinese lately since he couldn't find any EE. Also, him and other companies not offering competitive wages to EE while complaining of a shortage is the true problem.

Offer competitive wages and grads will flock to this field.

This was clear 10 years ago when ee and cs salaries started diverging significantly and ee companies remained fairly stingy, even with food and drinks and perks at the workplace, let alone salaries.

It will end up costing them faaaar more than the salaries and benefits that were saved. Gg

I graduated CE and worked for 11 years with digital design and FPGAs in silicon valley. The salaries haven't kept up. I transitioned to SW and I'm not going back.

Supply and demand. FANGs vertical integration. No more Moore.

Top tiers. Proportions

There's alot less jobs in ee than there is in programming/IT. Half of the ee's I know went into programming due to not finding jobs in what they wanted.

Most of the ee's I knew went into banking, because money (I assume) .

Would have been awesome if they joined the effort of developing FIRRTL[1][2] and using it as a universal (LLVM-alike) hardware intermediate language, better suited modern chip design needs than Verilog or VHDL[3].

[1] https://github.com/freechipsproject/FIRRTL

[2] https://aspire.eecs.berkeley.edu/wp/wp-content/uploads/2017/...

[3] https://github.com/SymbiFlow/ideas/issues/19

It's a neat idea, but to me, that's like it. For SW, LLVM serves two-way: 1)target any (supported) arch you like with your language of choice, 2)customize your toolchain to generate native machine code (i.e. GPU shader langauges) from any (supported) language.

I'd say putting yet another middleman between your logic compiler/netlist and what you actually use is not the best course of action for HW implementations. Even HLS has a big friction for use and is hardly picked up, with its obvious, immediate advantages.

A HDL that no one in the industry uses?

I've been meaning to get started with RISC-V for some time now but can't find much on it for total beginners online. Can anyone recommend a starting point for a total noob?

What do you want to do with it?

“Starting” is a bit ambiguous.

But I suggest that you first simple read the ISA specification. It’s surprisingly readable and if contains justifications for some design decisions.

After that, buy a small FPGA board and run a picorv32 CPU in it. Or one of the many other RISC-V soft cores.

Ah, I'm mostly hoping to learn how it works and what I can do with it. Trying to find a hardware-based pet project!

Buy yourself a TinyFPGA BX.

It’s a small (tiny!) FPGA board that’s still large enough to run a RISC-V CPU. It’s a full open source flow. And there are plenty of examples.

Get that first LED blinking!

Or an UPduino which is still cheaper. I've had good experiences with one, though I'm only at the "PWM-animated LED" stage right now.

Oh, and https://www.nand2tetris.org is great. You implement a simple CPU, and though the language isn't a real-world one you learn enough to probably be able to implement a similar CPU on an FPGA using VHDL or Verilog.

Since RISC-V uses Chisel instead of Verilog, you will probably need this first: https://github.com/freechipsproject/chisel-bootcamp

For actually learning RISC-V, you can check out these books: https://riscv.org/risc-v-books/

The Patterson and Hennessy book is a great starting point and the RISC-V reader is great reference.

The error is understandable, the main prototype RISC-V core is written in Chisel and the "Rocket Core Generator" is written in Chisel. The RISC-V team at Berkeley has a lot of overlap with the Chisel team. So it is easy to think RISC-V is all Chisel. I believe Chisel, Chisel, RISC-V, Chisel, if that makes any sense.


https://github.com/freechipsproject/rocket-chip (parametric SoC generator, Chisel)

https://github.com/ucb-bar/riscv-sodor (Chisel)


Yup I'm an idiot but I can't seem to delete or edit my original submission :c

Thank you for the information.

Ur good. Chisel.

> Since RISC-V uses Chisel instead of Verilog

What do you mean? The core here is entirely Verilog/SystemVerilog

I presume the parent generalized their experience with the BOOM and Rocket implementations of RISC-V, which do use Chisel, to the entire architecture.

Sorry I am an idiot how do I edit/delete my comment so I can erase this misinformation?

Unfortunately, if you don't see an edit button on the comment then you can't edit it anymore. But props for admitting you're wrong and trying to correct it :-)

> Since RISC-V uses Chisel instead of Verilog

How can an ISA ‘use Chisel’? It’s a spec not an implementation.

So this is the logic of the controller in Verilog. But I don't see any test scripts. I am not an expert in logic design, but it seems to me that validation is at least as expensive and time consuming as the actual creation because you cannot afford mistakes in the ASIC masks. Am I missing something here?

You are correct. The most difficult part of a design is not the design itself, it's all the documentation and the verification environment around it.

This is a nice gesture but hardware is a bit different than software and dumping a bunch of RTL code is not really useful.

A full environment that could get you to a blinking LED off of an FPGA would be the complete dump, but I don't see the value in downplaying publishing a bulk of code that all others have hidden behind lock and key.

From a quick look this is a reference model, not a testbench. There is a lot of work needed, writing testbench environment, test cases, analyzing and writing coverage models etc. before the Verilog code can be considered verified and ready for ASIC tape-out.

From my experience writing the design RTL code is at most 25% of the man hours. The rest is verification and some synthesis backend work.

Great work, many thanks to WD. As far as I see only a subset of the SystemVerilog features is used. Is there a "coding standard" somewhere available specifying this subset with a rationale? Is it mostly to be compatible with Verilator? Is there information available why they used Verilator and how it has proven itself?

Can someone explain a bit what’s happening around the division? https://github.com/westerndigitalcorporation/swerv_eh1/blob/...

Or give me some google keywords? Is there a link with the pentium bug?

I like the small number case: they use a very old logic minimization program to generate the equations for it:


Otherwise it's taking 32 cycles to do the division. There is a count for this.

Link-> relationship

Wester Digital if you are reading this!

Pleeeeze I want to have access as an end user. I'd like to do a predicate push down and be able to write into the DRAM buffer and the flush a commit. Or pin certain blocks directly into DRAM. Pleeeeze!

This person got pretty far with the Marvell ARM WD drive controller in 2015: https://www.malwaretech.com/2015/04/hard-disk-firmware-hacki...

Also see: http://spritesmods.com/?art=hddhack&page=1

What would you use this for?

Shut up and take my money!

Is this code just for a processor, or an SOC? How is this different from the open source Shakti processor being designed by IIT Chennai?

The link assume you know and this as well. May be in both has a few short paragraph for those not in to appreciate and say thx you.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact