
Show HN: A pipelined RISC-V processor written in VHDL - inforichland
https://github.com/inforichland/freezing-spice
======
nekoeth0
Nice! I wrote a 5-stage pipelined RISC MIPS processor in SystemVerilog last
semester, in one day, drunk as HELL, but hey, it worked amazingly well. Thank
goodness they didn't ask for branch prediction.

------
ajross
Does it do anything different or better than Rocket (or Sodor) that would be
notable? Why VHDL instead of the more conventional (heh, for this ISA) Chisel?
Would be good to see some notes about architecture in the README just to tell
me what I'm looking at.

~~~
inforichland
I am still leery of the "unconventional" HDLs. I don't really see many of them
as much of an improvement over say, VHDL-2008; you get records and functions
and instantiation is not as verbose. Yes, the language is a little wordy but
it's well proven and I already know it; and many other people know as well, so
they can read it and understand it better w/o learning a new language.

~~~
sklogic
Higher level generator HDLs shine when you want something heavily parametric.
VHDL and Verilog code generation features are way too weak, so having a higher
level meta-language helps a lot.

------
gluggymug
Firstly, I commend you on your work.

Second, I looked into your tests directory and from first impressions there's
not much there. What is there is kinda messy and not conducive to thorough
testing.

If I could make a suggestion: you should look at reusing the work of others.
You are in the lucky position where some work has been done for you! There's a
massive amount of tests for RISC-V that already exist at
[https://github.com/riscv/riscv-tests](https://github.com/riscv/riscv-tests)
(by weird coincidence someone posted links about this in another thread a few
days ago!)

If I were you I would be going through that stuff to try figure out how to get
those tests running in my environment. E.g. You have to compile a test and
create a mechanism to load the binary into your testbench memory or whatever.

~~~
_chris_
"E.g. You have to compile a test and create a mechanism to load the binary
into your testbench memory or whatever."

God, that's always the hardest part about these things. The core takes you a
weekend, but the connection to the outside world takes forever.

~~~
gluggymug
I think I am one of those who would do it the other way around, from the
outside inwards connecting to the outside world first. I already know ALL the
I/O for the core is somewhere in the RISC-V design code (exact signal names
etc). It has to be in Chisel or something.

I would translate that to VHDL to get my ports. It becomes a stub to build my
core-level testbench on. If I can mirror their test environment, I at least
have a start point. Maybe I could even reuse their testbench somehow.

Then I'd start duplicating the sub modules and their interconnections in the
core. And so on.

Do the tough stuff first then enjoy things getting easier as I go along
hopefully!

------
KMag
One thing I've wondered about recent ISAs is regarding split register files.
Since most CPUs are single-chip implementations these days, why not have
integer, fp, and vector registers unified at the ISA level to allow different
implementation points and less state spilling/loading during context switches:

(1) High performance implementations use register renaming anyway, so they can
easily use a split register file internally without exposing it at the ISA
level.

(2) Low power implementations can use a single register file (at the cost of
fewer I/O ports).

This would also mean that when switching threads, only a little bit more state
than the vector unit registers would need to be stored and loaded.

~~~
_chris_
The RISC-V ISA manual covers your question on page 37 (riscv.org).

> "a split organization increases the total number of registers accessible
> with a given instruction width, simplifies provision of enough regfile ports
> for wide superscalar issue, supports decoupled floating-point unit
> architectures, and simplifies use of internal floating-point encoding
> techniques. Compiler support and calling conventions for split register file
> architectures are well understood, and using dirty bits on floating-point
> register file state can reduce context-switch overhead."

(1) Not really. It's easy to go from ISA says "split" and your processor uses
"unified", but it's much harder to go the other way with it... the whole point
of a unified ISA register file is you can trivially write to a "FP" register
and then read it for a "integer" ALU operation. Now you've made that very hard
if you try to internally split the RF.

------
milspec
MMU?

I see one place seemingly using RISC-V with an MMU in the classic desktop
PowerPC style (Linus Torvalds posted a great rant about the stupidity of that
MMU) and another place that is seemingly using RISC-V with an MMU that is very
much like x86 (the paging part, obviously no segmentation) but with distinct
rwx.

Which is it? Did this not get specified? Constantly changing the MMU greatly
hurt 32-bit SPARC and PowerPC.

FWIW, this is good: Bits 0..11 direct mapped, bits 12..29 are x86-style page
table tree node indexes that are hardware-walked, and bits 30..63 are
software-filled like MIPS. (a forest of trees) In the low bits of the bottom
level you get: can read, did read, can write, did write, can execute, did
execute, user/super (exclusive), type ram/framebuffer/mmio/pte (two bits),
reserved, and validity. The "did foo" bits on PTE pages do get updated.

~~~
_chris_
You can find your answers in the Privileged ISA spec
([http://riscv.org/download.html#tab_spec_privileged_isa](http://riscv.org/download.html#tab_spec_privileged_isa)).

Of course, it's an open ISA, so you can do whatever you want. The style of
virtual memory you choose to use will depend on the target application.

~~~
milspec
Thanks. The following concerns paging, not the base/limit system:

From a security and reliability perspective, I'm saddened to see that rwx got
supported while --x did not get supported. That is backwards. Having to change
permissions after code modification is not bad; this provides a convenient
point for cache flushing and ASLR-enforced address changes. Preventing
executable code from being misused as data is valuable.

I'm also saddened to see that user access implies supervisor access. This too
is exactly backwards; nothing should be both user and supervisor accessible.
Given that data access can be performed at a less-privileged level by setting
MPRV=1, the ability of the supervisor to access user pages normally is
especially strange.

Lack of distinct did-execute and did-read bits is mildly annoying. If a page
is marked as being accessed and executable, one must assume that it is now in
__BOTH __the instruction cache and the data cache.

I have mixed feelings about having page frame numbers shifted over by two
bits. The win is Sv32 getting a reach of 16 GiB. I suppose this is worth the
minor annoyance when debugging OS kernel code.

Other than that, I like it. It's certainly sane. The traditional page table is
pretty good for the middle bits of the virtual address. I think it is less
good for the upper bits due to ASLR, and I hate to see anything that
encourages a failure to use all 64 bits of the virtual address space.

~~~
_chris_
You should read through the RISCV mailing list archives
([https://lists.riscv.org/lists/](https://lists.riscv.org/lists/)) for
discussions on these topics, and contribute your own thoughts if they haven't
been covered. The ISA-dev list should be the most relevant. They have thought
very carefully about these things and I'm sure they'd appreciate additional
feedback on the topic.

------
alain94040
Interrupts / Exceptions are probably the most difficult piece and may force a
complete redesign. Since they are not done yet, I'd wait a little bit
longer...

~~~
_chris_
It's actually really pretty easy to do. I put together a handful of different
RISC-V cores that all implement the privileged/supervisor spec
([https://github.com/ucb-bar/riscv-
sodor/blob/master/src/rv32_...](https://github.com/ucb-bar/riscv-
sodor/blob/master/src/rv32_1stage/cpath.scala)).

Basically, detect a few cases in Decode, pass the rest of the instruction down
the pipeline to the commit(memory) stage, and let the commit stage detect
exceptions and redirect the PC as required.

~~~
inforichland
Exactly, that's how I would (hopefully will soon) implement it as well; it
takes a few extra cycles but avoids extra overhead.

------
Gladdyu
If you would want to generate some nice block diagrams displaying the
components at different abstraction levels and the signals connecting them you
could try synthesizing your design in Alteras' Quartus (a free version is
provided on their website) and then using the Netlist / RTL viewers. It
exports to PDF.

