Hacker News new | past | comments | ask | show | jobs | submit login
Open source RISC-V implemented from scratch in one night (github.com)
272 points by guigg 9 months ago | hide | past | web | favorite | 109 comments

This is good news for risc-v! The fact that it can be implemented so easily by hobbyists furthers the cause of trusted hardware. Sure, it won't be implementable by hobbyists at the speeds necessary for modern desktop computing, but a trusted security core (something like a yubikey) could be implemented on completely from-scratch hardware, but then still use existing trusted/vetted software (openssl), just a cross-compile away.

"Modern" desktop computing is perhaps less resource hungry than you think when you cut away so much graft of telemetry.

None or barely few games, of course. But word processing, email, and pure HTML browsing without javascript? Maybe not HTML5's fancy features, but general rich text?

I think it's very achievable.

My software "stack" hasn't changed in maybe 15 years now. It's emacs + firefox + terminal and a couple utilities here and there. It ran fine on my first computer with a single core ~1GHz CPU and 128MB of RAM (although multi-core architectures and more RAM did make it a lot easier to multi-task later). The only reason I upgrade my computer these days is to run modern games, for everything else except compilation I could probably use a computer from 10 years ago and not feel the difference.

I'd gladly switch to a completely open source hardware architecture even if it meant losing a significant amount of raw performance provided that the hardware and OS are stable and it's not prohibitively expensive.

I don't know what "prohibitvely expensive" is in your case, but there's a possibility today:

Grab one HiFive Unleashed for $999 [1]:

- 4 cores up to 1.5Ghz

- 8Gb DDR4 ECC Ram

- 1Gbps ethernet

Then grab one HiFive Unleashed Expansion Board for $1,999 [2]:

- SSD M.2 Connector

- SATA3 Connector

- x16 PCIE Connector (4 lanes of pcie2)

- A bunch of other cool stuff you wouldn't probably use (SPI, FPGA, etc.)

Finally grab some M.2 Drive and a graphics card ($500 maybe?).

This would set you back a grand total of $3500, which is definitely way more expensive than the current mainstream but may fit within "non-prohibitively expensive" for some. The whole platform should be open source [3].

[1] https://www.crowdsupply.com/sifive/hifive-unleashed [2] https://www.crowdsupply.com/microsemi/hifive-unleashed-expan... [3] https://www.sifive.com/products/hifive-unleashed/

While the hifive is an awesome proof of concept 3,5 k$ for essentially something performing roughly as well as a raspberry pi seems prohibitively expensive for now.

Seems worth it.

My issue would be paying all that, and then strapping closed source hardware to it.

That sounds nice actually.

I'm currently dealing with our legacy system which means I have multiple virtualised systems running in a replica of our production system.

Currently using ~25 gig of RAM just to run the environments and an IDE.

It isn't the telemetry that's slowing us down, but layers upon layers of (mostly needless) abstraction and buffer bloat.

Windows 95 had most of the things we use in a GUI environment today and it ran on a 486 with 8MB of ram. When packaged in Electron with a javascript x86 emulator it is still smaller than many modern text editors. Plenty of GUIs ran on significantly less.

Since the modern web is basically one of the worlds most overengineered and crufty VM platforms, I don't expect it would run very well, and that would probably be enough to doom the system in the eyes of many, sadly.

Never mind that Win95 didn't need a GPU that could push millions of polygons just to draw its UI...

What layers of abstraction are we stacking? Sure, there's electron/v8, but that's just one layer. You said yourself that notepad in an electron-based x86 VM is still pretty small, so clearly electron itself isnt the problem.

Maybe people just demand more from their software these days, and all these little conveniences just add up more than you'd realize? That would also explain why the system is not "doomed in the eyes of many" as you claim it ought to be.

What conveniences? Most web apps (e.g. Google Docs) have less functionality than their Win95 counterparts, notwithstanding the stupendous amounts of added bloat.

A lot of the conveniences of the modern computing landscape are anything but.

Laggy response times in aforementioned text editors, frivolous UI animations on certain OSs, terrible web apps like JIRA...

The complexity doesn't necessarily mean better software. It's quite often worse along many dimensions: performance, usability, maintainability, portability.

The web is incredibly bloated unfortunately. I had to upgrade my RAM recently because I couldn't open too many browser tabs without making the system hang, even with javascript disabled. Apart from that, you can do almost everything from the command line so even an old Raspberry Pi could be usable as a desktop computer.

You could still have a lot of RAM with RISC-V. The web does require lots of RAM nowadays, yes. But it is not a problem in this regard.

Take a look at Tab Wrangler extension. It closes tabs you haven’t used in a while.

I am curious if you how how this compares to The Great Suspender? I also don't understand why Chrome and Firefox don't buil tab managers directly into the browser.

Firefox used to have Tab Groups, it got removed but the webextension API got some new features specifically to support similar functionality in an extension.

i.e. https://addons.mozilla.org/en-US/firefox/addon/basic-panoram...

In reality I think it's just not a priority for browser devs because the overwhelming majority of users do not use huge numbers of tabs.

Thanks. Yeah I suspect tab junkies are probably a minority. Do you know is the latest Firefox on par with Chrome in terms of tab usage and memory footprint?

Memory usage in FF with lots of tabs seems a bit better than Chrome, CPU usage seems a bit worse. I think the UI does a significantly better job of handling them though, since Chrome just shrinks everything until there's no text left.

About a year ago FF was really awful with lots of tabs, CPU usage was very high and the UI would occasionally freeze up for 10+ seconds at a time. They're really been making some big performance improvements lately.

> But word processing, email, and pure HTML browsing without javascript? Maybe not HTML5's fancy features, but general rich text?

All things I've seen achieved on a 486... In fact, come to think of it, I've seen people doing all of the above on m68k-based systems too.

> pure HTML browsing without javascript

I highly doubt this is achievable now. Turn on NoScript and vast majority of web sites just refuse to work not even properly, but to just load the content.

I wonder what the "vast majority" is that you mention. For me, with JS off by default, the majority of sites I come across when searching for things are perfectly readable without JS.

The sites that need it are a small whitelist of ones I trust (e.g. the banks a sibling comment mentioned), which I enable.

...That said, a 75MHz RISC-V will be approximately comparable in performance to a 100MHz 486DX4, or a 40MHz Pentium.

Then don't use shit websites that require javascript.

... like my banks?

I actually like/need to use websites that use javascript, imagine that

Those things only require a couple hundred Mhz of general purpose computing to make sense.

Maybe less, should specific purpose assists be available.

And RAM address space sufficient to contain the tasks.

I think the best feature of the RISC-V instruction set is that is very clear and very simple. The RV32I instruction set is composed by ~40 different instructions and, depending of the environment and the implementation, you don't even need implement all them, as long the compiler will never generate some instructions. A simpler core means faster clock rates, less logic and more cores per chip, which much more performance. By this way, although the performance in the FPGA is not so good as in a ASIC, the results are not so bad:


With 1680 RISC-V cores running in parallel at 250MHz, the result is impressive, even working in a FPGA!

Sorry to burst any bubbles here, but this is s very incomplete implementation.

You couldn’t run anything but small toy programs on this machine. This is more like what a student would build in an undergraduate course in computer architecture.

For example, there is no MMU, no debug support, no traps, no interrupts, no exception handling, no OS privledge levels, no FP, no memory controller etc.

Of course, one wouldn’t implement all of these in a few hours.

The fact that this is RISCV is somewhat of a red herring as you could do a similar thing with a restricted subset of MIPS or ARM or even x86 as they do in UT Austin’s comp arch class.

I fully agree with you about the missing features! However, as long my objective is replace some legacy 680x0/683xx/coldfire processors running small toy programs, I see no need for that complex features. Please keep in mind that although 25% of RV32I instruction set is missing in my implementation, there is no side effect, since that instructions are not relevant for this specific environment (fencex, exxx and csrxxx), since the gcc generates by default exactly the implemented subset and nothing more. I think this is a very important advantage in the RISCV architecture when compared with others and I dont think the gcc will have the same benevolent behaviour in the case of ARM or x86. Maybe for MIPS is possible, but I am not sure about it. Anyway, why not use RISC-V? :)

Yes. Part of the motivation for RISC-V was to enable designers to do one-off experiments like this or small production runs with custom ISAs based on RISC-V, without having to worry about licenses or intellectual property issues. RISC-V even leaves some op codes uncommitted so designers can add their own custom instructions. Dave Patterson (RISC-V designer and Turing award winner) explained this at a talk I attended a few years ago.

An Arduino has none of the missing parts you mention (except interrupts), yet it's quite a useful device even in non-toy applications.

The cortex m0/m3 CPUs most certainly do have breakpoints, svc (trap), interrupts and an optional memory protection unit.

The Atmel CPU is more constrained but still has hardware breakpoints, IO instructions, watchdog timers and interrupt support. It also has far more complex addressing modes (more CISC-y) to save on instruction counts and a variable length instruction set encoding where memory space to store code is a first order concern.

I’m also sure there is a memory controller to control the SRAM.

So, even if you were to build a simple micro controller, you’d need a lot more features and most likely higher performance (and power efficiency) than you would get from a trivial 2-stage pipeline. Not to mention there are no instruction or data caches in this RISC-V machine.

Good point, but unfortunately the focus of the project is only the RISC-V core running in a FPGA. Everything else is already widely available in the internet and can be easily integrated.

That class was one of my favorites at UT. I was lucky enough to have a guy from Intel teach it as an adjunct.

What book did you use?

It’s been about 14 years so not entirely sure. The textbook was using a RISC architecture from what I remember. We also had a semester long project where we had to research and invent a new x86 instruction to speed up a program of our choice. It was... intense.

You have pretty high expectations for one night of work

I love reading about stuff like this. There's something magical about having an inspiration fire so hot you can't put it out without 6 hours of hard coding at an ungodly hour.

I wonder how easy it would be to port https://github.com/xoreaxeaxeax/sandsifter to the RISC-V instruction set.

Would probably be a decent step in the right direction for validating/verifying the future of trusted computing.

Although... this gives rise to a 2nd thought. If it was _this easy_ to build a RISC-V implementation, is it all that special, technically speaking? I ask as someone naive about processor design. Is implementation relatively straightforward, but design hard?

RISC-V is pretty nice in that the ISA gets out of your way, and you can focus on the techniques to build a high-performance processor without wasting effort on legacy and other weird corner case behaviors. A++, would recommend.

However, if you want to build really high performance cores, there are plenty of challenging techniques you have to employ that add a lot of complexity that is hidden below the ISA abstraction layer (speculation for example).

So if you want to make RISC-V go fast, you have to employ more design tricks like "macro-op fusion". For example, scan for two load instructions in the fetch stream and fuse them into a single "load-pair" micro-op if they access adjacent addresses. There are a whole bag of tricks like this that are irrespective of the ISA and add a fairly high "skill-ceiling" to processor design.

> RISC-V is pretty nice in that the ISA gets out of your way

Except for the C variant where they went to 110% complexity for maximum ICache efficiency: 32bit instructions aligned to 16bit?? I wonder if there are other RISC ISA which made the same choice.

Yes, while the C extension helps for high performance I think it feels like a major hack to instruction encoding.

The RISC-V ISA does great with a very small number of instructions, so playing around with encodings is rather easy. I'm reaching the conclusion that fixed 24-bit opcodes are an close to optimal if immediate constants are allowed after.

Take a good look at this deep dive on the Samsung M3 core. Its an ARMv8 processor, which is a fairly orthodox RISC in its capabilities. A RISC-V processor of similar capabilities would have a similar structure. The ISA simply doesn't help all that much at the very high-end.


The advantage of working with RISC-V over ARM right now is that you can configure a RV core to have far less capability than an ARM core. You can license RTL from SiFive for RV64imc (64-bit address and data, baseline integer instruction set, hardware multiply, and 16-bit compressed instructions). Such a core simply does not exist in the ARM marketplace today, partially because NEON is mandatory in ARMv8.

It's not too hard to build a basic processor that can execute enough of an ISA to do something 'real' such as boot Linux, especially if you can craft the kernel build to keep it as simple as possible.

First the complexity comes in fully implementing the whole ISA. Yes RISC-V has an advantage over ARM/x86 in that it will have less cruft, but the complexity in ARM/x86 doesn't exist for no reason. It's driven by real software requirements so RISC-V will need similar complexity if it wants to seriously challenge either architecture (with the advantage they can learn from the previous mistakes made and thus implement things more cleanly).

Second it comes in verifying your design, especially around the fun corner cases. There are plenty of bugs that occur when a series of rare things happens at once (maybe involving some obscure areas of the ISA) which can be very hard to track down. Unless you can successfully hunt down and fix these you'll end up with a phone or a computer that occasionally crashes for no good reason.

Third is making it hit your power and performance targets. It's one thing to say you have a 256-bit data path and can issue 4 instructions per cycle with out of order execution. It's quite another to build such a design so it can actually sustain decent throughput on real software whilst hitting a decent clock frequency and remaining within power budget, especially when you have all of the various complex bits of ISA to deal with.

> but the complexity in ARM/x86 doesn't exist for no reason. It's driven by real software requirements so RISC-V will need similar complexity if it wants to seriously challenge either architecture

Like what? Are you talking about extensions that don't exist for RISC-V, like Transnational Memory?

I would say most reason for complexity is legacy and there is little actual software requirements in that regard if your doing a new architecture.

I'm more talking about common things. Take the humble load, give a memory address, get some data into a register, what could be simpler?

Well for one thing your load may come in multiple sizes, can target different kinds of memory (e.g. device memory, non-cacheable memory, fully-cached memory, ARM architecture actually allows you to get quite specific about differing levels or shareability and cacheability too), can be unaligned with respect to the access size (but you still need full performance with them). There are ordering requirements with respect to other loads and stores in the system (even within the same CPU avoiding read-after-read ordering issues may not be simple) and various different kinds of barriers that can effect loads. You get exclusive loads or atomics (some variations can return the data seen so are performing a load on top the atomic op). It may be a vector load that needs to quickly feed the vector register file as opposed to the 'standard' register file. In a multi-processor system you can have various different types of snoop operation coming in that could effect the load. You also need to work out if you're actually allowed to do that load, modern page tables are pretty complex affairs. The page table itself could be changing as the load is executing. A decent chunk of the complexity is for virtualisation support but even without that there's various fiddly bits.

You also have speculative execution attacks to worry about. Certain loads may need to be very sure they should execute before doing anything, others may be free to speculate away and forward data into further speculative execution.

You can certainly build a perfectly functional ISA that avoids a lot of this (and avoid other things by keeping to a simple in-order microarchitecture) but that will loose you a lot of performance.

Honestly the believe that those things don't actually improve performance that much. ARM has much of this for historic reasons not because the market demand it. I think ARM has more problems because of the incredibly weak memory architecture. RISC-V quite deliberately and explicitly avoided to have overly complex semantics and went a memory model somewhere between x86 and ARM.

x86 has TSO and still is the fastest, so I think overall you are doing much better for yourself if you avoid massive complexity in your memory model because that is gone cost you in application complexity that you could be using for optimizations.

RISC-V has privilege architecture and a vector architecture as well, and they of course do add complexity, but are still simpler then corresponding functionality in ARM/x86 while doing many things better.

RISC-V was specifically design as it was because the ISA does really not impact performance that much and having something simple and understandable was not going to be a huge performance hit.

My understanding of sandsifter is that it works by executing an instruction next to a page boundary, which gives either a page fault error (the instruction executed correctly and then execution passed over to the next page which is presumably non-executable) or an error indicating the instruction was malformed. So, for it to work on RISC-V you need paging, and also interrupt vectors to work. The MMU was only recently standardized so I'm not sure if it's actually implemented in any hardware or software yet, but once you have those two things it should be fairly straightforward.

Sandsifter is dealing with the problem of "how do I find valid instructions in any 1-15 byte sequence, which yields 2^120 combinations?"

In RISC-V there are currently only 2 byte and 4 byte instructions, which you can brute force your way through. The specification does technically allow for longer sequences, in which case sandsifter will work just fine. [And the RISC-v privileged specification has existed for years].

>And the RISC-v privileged specification has existed for years

It is a draft which has not been finalized and contains a disclaimer that it might be modified in a non-backwards compatible way prior to final release.

> The MMU was only recently standardized so I'm not sure if it's actually implemented in any hardware or software yet, but once you have those two things it should be fairly straightforward

Chips with MMUs have been taped out for years. Likewise the Linux port has existed for years.

I don't know anything about RISC-V ISA yet, but I am wondering if its ISR is great for learning computer architecture in shcools compared to MIPS, the dominant ISR at colleges for introductory computer architecture classes?

The HDL (Verilog) code looks quite short and simple. If the partial implementation of the ISR implementation is like that it shouldn't be so bad for learning...

Yes. Berkeley started using it to teach introductory computer architecture: http://www-inst.eecs.berkeley.edu/~cs61c/sp18/

A major project of the course is to build RISVC emulator and implement 2-stage pipeline in logisim.

The technical university of Denmark also transistioned to teaching computer arch using RISC-V. Some of the most popular textbooks are also being converted to RISC-V editions.

Well Berkeley would certainly wants to teach RISC-V since Berkeley made that architecture. But it is still great to see they are actually using it in classes!

Wow this is great! Do you or anyone else happen to know if any of the lecture videos for this course might be available online?

Yes, it was designed with educational applications in mind.

What does ISR stand for in this context?

Opps, typo ;) It's ISA. Sorry for my English. Maybe because I'm used to type ISR (Interrupt Service Routine) at work.

It's very similar to MIPS.

As the possibilities for smaller transistor get narrower companies will get more open towards smaller clients with small budgets, so taping out homebrew CPUs or SOCs or integrated circuits will be possible.

so far, the opposite is happening in the "cellphone"/pocket pc space... it is getting hard to get a phone that you can root without having to resort to "holes" that can get patched at any time...

Can you elaborate please?

Well, most people aren't going to make their own chips any time soon... so you have to buy what is available and cellphones are getting more restricted/locked down everyday in the name of security (they are getting closer to what they used to be before Android came out)... not sure if that answers your question though.

Whoa. This is very impressive!

It does seem to hilight an increasing unease I have with riscv. Implementations are many and cheap, but reusable verification is rare and people don't use what is out there. They have maybe the riscv-tests set working. But that's not enough to call your new CPU usable for anything other than a hobby project.

Fwiw, the riscv-formal package from Clifford wolf is the closest thing to a turn key solution to verifying a riscv core, even if people must remember it doesn't cover everything.

hmmm... I promise investigate this topic in the future!

Impressive, I like this! I'm curious more in general how widely risc-v is being used in the industry today already (they mentioned a steep adoption from academia to industry). Looking at the foundation members, there are quite a big number; given they range for risc-v is from small devices to super computers, are there any examples where it's used today and showed benefits over other archs?

There'll be a few RV32 (RISC-V 32 bit) cores in every WD hard drive starting next year.

Go to the risc-v website, they have constant news and updates about people using it.

Could be fun in near future. Plenty of room for improvement so it’s a good base for exploration of Fpgas.

And with this https://bellard.org/riscvemu/ and similar projects already spun up in other ways, risc-v is becoming more interesting as time goes by.

> and the best feature: BSD license

IIRC this isn't the first open source RISC-V core but it's great to see another implementation.

It is not even the first RISC-V with BSD license, but I love make some GPL friends cry! ;D

> works up to 75MHz

How much would this increase if it used an ASIC instead of FPGA? And how much would it cost for different batch-sizes?

IIUC it's critically dependent on the manufacturing process used.

> how much would it cost

A CPU IC isn't very interesting until it has some I/O, so it's much more meaningful to talk about an SoC with one or more of these darkriscv cores.

Sorry, I don't have an answer other than to say "this isn't quite complete enough for it to be useful for most tasks." That said, there's probably tons of open source implementations of DDR/SPI/PCI/USB interfaces (on opencores.org, e.g.). So it's "only" a matter of integrating these.

fastest speeds depend on two things. First the manufacturing technology used (smaller is faster because parasitic capacitance is smaller, although finfet is getting to have large parasitic resistance). The 2nd factor is the datapath between flip flops. Some architectures handle very fast clock rates because the data path is simple (or pipelined). Others are quite slow. I've seen some architectures top out at 500MHz when another architecture is running over 1 GHZ in the same chip.

I am not sure is recommendable integrate this design in a ASIC because is not so stable yet, but the synthesis tool pointed 133MHz in a Xilinx Artix-7 FPGA, running at 1 clock/instruction. A more pipelined and stable RISC-V design, such as the VexRisc, can easily reach 346MHz in the same FPGA and uses less logic, but with only 0.5 instructions per clock. A performance optimized VexRisc runs at 183MHz in the same FPGA with impressive 1.44 instructions per clock, but uses more logic. There are lots of RISC-V implementations, each one with different features.

Microsemi can use a 111MHz clock for their RISC-V implementation but they recommend a 70MHz limit on the milspec FPGAs. If you scroll down in the readme, they list some other max clock rates for other chips. Also, you can usually mess around a bit with the timing by either changing the seed that the synthesizer is using to place and route or by manually placing and routing signals if you have a high LUT utilization and the tool is struggling with automatic placement.

Quite a bit in theory but memory latency wouldn't decrease so there would have to be some added complexity in interfacing with it.

Well, I tested three different configurations for memory:

darkriscv@75MHz cache=off 0-wait-states 2-stage pipeline 2-phase clock: 6.40us

darkriscv@75MHz cache=on 3-wait-states 3-stage pipeline 1-phase clock: 9.37us

darkriscv@50MHz cache=on 3-wait-states 2-stage pipeline 2-phase clock: 13.84us

The first configuration works in a zero wait-state environment with separate instruction and data high speed synchronous memories working in a different clock phase (weeeeeird!). As long there are no latency, this configuration works at 75MIPS with a 2-stage pipeline, which means only one clock is lost when the pipeline is flushed by a branch.

The second configuration uses a small hi-speed cache with 256 bytes for instruction and 256 bytes for data, a 3-stage pipeline, which means two clocks are lost when the pipeline is flushed by a branch and a more convencional single phase clock architecture, as well a memory with 3 wait states or something like this. Although working at 75MIPS, the cache miss and the longer pipeline decrease the performance to around 51MIPS.

The third configuration is the core configuration from the first scenario, but with the small hi-speed cache from the second scenario and the 3 wait states. In this configuration, the performance decreased to 50MHz and, according to my calculations, the performance is around 34MIPS.

By this way, if is possible work only with the interna FPGA memory, the first configuration is better, otherwise you can use the second configuration.

I guess is possible create a fourth configuration with the 3-stage pipeline and zero wait-states (no cache), but I need implement a two-clock load instruction. In this case, I guess is possible peak around 100MHz.

I applaud your hackerishness in trying out all this.

wow! I didn't expect to get so many comments and questions about my weird project! thank you all! :)


"after one week of exciting sleepless nights of work (which explains the lots of typos you will found ahead), the darkriscv reached a very good quality result"

Not commenting on the actual quality of the code, but I wonder how can one make typos due to sleep deprivation, and yet produce "good quality results" in software.

I wonder when will we, as a community, stop praising all nighters and rushed work.

There's a big difference between doing short bursts of work with little sleep - particularly when you're young - vs doing it constantly for months/years.

In this case it just seems to indicate enthusiasm for the project rather than dangerous overwork.

in fact: 50% enthusiasm for the project, 50% dangerous overwork! hehehe

> I wonder when will we, as a community, stop praising all nighters and rushed work.

I agree, but it sounds like the author wanted to claim that he did something impressive in a small timeframe, thus suggesting to the reador some level of technical prowess. If the author claimed instead that he did it while well-rested in a couple of months then the achievement wouldn't be so impressive.

I wish open source projects would stop being hosted on the Microsoft-owned GitHub.

If wishes were horses, beggars would ride.

"The main motivation for the darkriscv is create a migration path for some projects around the 680x0/coldfire family."

On the Amiga we don't need a replacement for MC680## processors because we have the Vampire 2+ accelerator, which gives us a superscalar, 64-bit MC68080 with AMMX extensions.

Coming to ATARI ST and Amiga 1200 near you if the Apollo team keeps this momentum.

Yeah, I know about it! But unfortunately the 68080 is too large for my FPGA applications, which are cost driven. Other open source 680x0 projects does not work too, by the same reason. For some years, I wondering how create a compact implementation of the 680x0 in the FPGA, but with no success. At some moment I started work in a subset of the 680x0, something like a RISC version of 68000, with a minimal instruction set and a very optimized pipeline, but in this case the problem moved to the toolchain and make the gcc work well is not so easy... Defeated by all that problems and limitations, I started test lots of new architectures and found the RISC-V.

What about OpenSPARC, since it has low power consumption and can be cut down to less cores and threads?

The available implementation of the OpenSPARC is too complex: a single core with 4 threads requires 59350 LUTs and runs at 62.5MHz in a Virtex FPGA (according to [1], slide 21). Although is possible remove some features, I don't think is possible reduce the logic without impact the compatibility. I think the Leon3 [2] is a far better option for FPGAs, since requires only around 3500 LUTs and runs at 125MHz in a Virtex FPGA. In a low-cost FPGA, the performance of Leon3 is around 66MHz, which is enough to replace the 680x0 and coldfire v2 processors. The requirement for 3500LUTs is not so bad, as long the TG68 (an open source 68000 replacement in VHDL) has similar requirements. However, the typical RISC-V implementation uses less than 1/3 of the logic when compared with Leon3 and TG68. Also, although the RISC-V provides almost the same performance as the Leon3, the extra logic can be used for more parallel RISC-V cores, resulting in a increase of 2 or 3x in the total performance. Finally, there is an additional problem with OpenSPARC, Leon3 and TG68: the GPL license. In another hand, most RISC-V implementations use the BSD license, as long the RISC-V instruction set itself uses the BSD license. Of course, the OpenSPARC, Leon3 and TG68 are implemented that way. There is no obstacle, other than the technical complexity, to prevent develop a new OpenSPARC, Leon3 or TG68 from scratch, with a more firendly license, better performance or better use of the logic. In this case, the question is: how many time you need to implement a minimal viable core with one of that architectures? In the case of RISC-V is perfectly possible implement a small core in a FPGA with the RV32I instruction set in few hours, because the RV32I set of instructions is really very poor and primitive, which make it specially friendly to the hardware and explain why is so compact.

[1] http://ramp.eecs.berkeley.edu/Publications/OpenSPARC%20T1%20... [2] http://ramp.eecs.berkeley.edu/Publications/LEON3%20SPARC%20P...

Writing RTL takes one night but, how long will the verification take? How much will it take to tapeout? Who will pay for it?

Your only comments on HN seem to be to criticize RISC-V.

The posters intention is not to start producing hardware, at no point does his project mention taping it out and manufacturing. Obviously it is just a fun side project to implement the RISC-V core ISA in an FGA. It was then made open-source on GitHub so anyone else interested can look at it. Chill out mate.


If you named anything specific about the RISC-V ISA that you think is retarded, you might contribute something to the conversation.

Again: have you looked at the assembler code? Are you able to compare MC68000 assembler and RISC-V assembler? Are you able to compare SPARC assembler and RISC-V assembler?

Without being able to do that, it's going to be exceptionally difficult for me to contribute any more to the discussion, especially typing on a mobile telephone.

That's my contribution for now, I'm pointing out what to compare with. That point was obviously missed.

You have to bring something to the table too, instead of just demanding everything be spoon fed and served on a Silver platter to you.

I carefully studied the 2.1 and 2.2 versions of the spec. I wrote, in Racket, a miniature interpreter for lists of RISC-V instructions (just symbolic lists, not the byte strings they assemble into), then wrote programs to compute triangular numbers and Fibonacci, and verified they worked. I've built the RISC-V toolchain and compiled a few test C programs and run them in "spike". I don't think I took more than a glance at the "gcc -S" output. I have no familiarity with SPARC or Motorola assembly, but I have decent familiarity with x86-64 assembly, having written what looks like a couple dozen miniature test programs and one medium-sized program, and having tried a few times to write a decent x86-64 assembler.

I found the RISC-V spec enjoyable to read, especially with all the rationales; it felt clean, minimal, and well-thought-out. Writing a program in it was probably not much different from writing in x86-64 assembly. The encodings seemed much easier to deal with than x86-64, though as mentioned I didn't go as far as writing an encoder.

What kinds of important features are present in SPARC or MC68k but not in RISC-V? Are they absent from x86-64 as well?

There are a few instructions that nearly every modern processor ended up adopting, but RISC-V still lacks. Off the top of my head, (with the RV ISA developers response in parenthesis):

- bitwise rotate (let them eat macro-op fusion)

- byte and bit swapping (strictly missing from RV, although proposals exist)

- leading zero count, trailing zero count, and popcount (strictly missing from RV, proposals exist)

- efficient multiword arithmetic (let them eat macro-op fusion, or long dependency chains)

- base + [scaled] index addressing modes (you don't really need those)

- multi-register save/restore instructions (ARMv8 doesn't have them / RVC is equivalent in density to ARMv8, nanoMIPS is irrelevant, let them eat millicode)

So yeah, there are deficiencies. None of them are crippling, but I wouldn't say that RV is super-wham-o-dyne, either.

Motorola assembler is what is known as orthogonal instruction set. The flow is logical: move.b for byte, move.w for word, move.l for longword, from source to destination, which reflects real life.

The assembler reads almost like a high level programming language. The register scheme is intuitive as well, from a0-a7 being the address, to d0-d7 being the data registers.

Now, let's do a mental exercise: I'm going to load an effective address relative to the program counter, 32-bits wide, into the 1st address register. Then, I'm going to load an arbitrary value from an arbitrary memory location into the second address register. Do the same in RISC-V; compare intuitiveness.

  lea MemoryAddress(pc), a0
  move.l $00bfe001, a1
  MemoryAddress:  DC.l 0

Indeed, in RISC-V, full 32-bit constants must be broken across two instructions. The translation goes:

  ; 1. lea MemoryAddress(pc), a0
  auipc a0, [upper 20 bits of (MemoryAddress - label)]
  addi a0, a0, [lower 12 bits of (MemoryAddress - label)]
  ; 2. move.l #$00bfe001, a1
  lui a1, 0x00bfe000
  addi a1, a1, 0x001
  ; 3. rts
  jalr x0, x1, 0
It is cumbersome in that sense. But it will probably be handled by an assembler (as a macro or builtin). The spec contains an appendix of "pseudoinstructions", which gives the first translation above. There isn't even a dedicated "move" instruction—it's just "addi dst, src, 0" by convention! Clearly anyone writing assembly will use at least that pseudoinstruction.

If there's a canonical format for "pseudoinstructions", and all assemblers handle them in the same way, and the abstraction doesn't leak in any way (i.e. the only temporary registers you use are ones you overwrite fully by the end; it is true that now some "instructions" have longer encodings, but that comes with the compressed instructions anyway; and it is true that an interrupt could happen between the two halves of the "instruction", but I think that shouldn't make a difference), then I don't think there's much of a problem.

"Indeed, in RISC-V, full 32-bit constants must be broken across two instructions."

Any RISC processor has that issue, because encoding is fixed at 32-bits to keep the hardware simple. That's not what I'm referring to.

Look at that retardation, "auipc". Because "auipc" is intuitive, right? (For the record, I'm being sarcastic.) What the hell was the instruction designer smoking? Then there is the square bracket notation, like on intel, and intel has some of the most idiotic, non-conventional assembler syntax -- and this thing mimics something so bad? Every other processor uses parenthesis, that's a well understood norm.

Then there is more intel retardation in the form of dst, src (or dst, src, src on RISC). What kind of a warped, twisted mind came up with that? What was going on in that person's head to work in such twisted ways?

Then there's the "cherry on top":

  jalr x0, x1, 0 ; because "jalr" is intuitive as well, it immediately tells you what it does?
You'll know a bad processor design by its instruction set. Always and forever.

All right, I count myself disappointed. Surely you know that "processors" don't use parentheses; the parentheses and brackets are in assembler syntax, which is long gone by the time the processor is executing byte sequences. There is nothing inherent about Intel-style syntax in physical RISC-V processors; nor in Intel processors, for that matter, as demonstrated by the fact that gcc outputs its x86-64 assembly in AT&T syntax by default. The fact that Intel syntax appears in the RISC-V spec might, at the very worst, demonstrate that the creators have bad taste in that area. If that taste has led to bad decisions elsewhere, it would be more substantive to point out those bad decisions. If not, you can configure your toolchain to use AT&T syntax and take pride in your superiority.

Bad names are harder to work around. However, about those names: "auipc" is certainly letter salad, but (a) it stands for "Add Upper Immediate to Program Counter", and its effect is to add an immediate value (multiplied by 2^12) to the program counter and put the result in a register, so the name is entirely logical given its task; and (b) except in the rare case where the PC-relative offset is exactly a multiple of 2^12, an AUIPC will be immediately followed by an add (to load a full PC-relative address), or perhaps a load-with-offset (to load a value at a full PC-relative address), or a jump-with-offset (to jump to a label at a full PC-relative address); and all three of these AUIPC+(add|load|jump) combinations are given as pseudoinstructions as well (e.g. "la" for "load address"), so the programmer will probably never need to write a bare "auipc". As for "jalr", well, it stands for "jump and link register", which jumps to the "register" argument and stores the return address in the "link" argument. There's a set of pseudo-instructions based off x0 being the "always-zero" register and x1 being the conventional "return address" register:

  Pseudo      Real            Description
  j offset    jal x0, offset  Jump 
  jal offset  jal x1, offset  Jump and link
  jr rs       jalr x0, rs, 0  Jump register
  jalr rs     jalr x1, rs, 0  Jump and link register
  ret         jalr x0, x1, 0  Return from subroutine
Seems logical enough to me. You can just use "jal offset" and "ret" if you prefer.

Anyway, if your goal was to convince me there are serious flaws in RISC-V, then criticizing the naming conventions and surface syntax has led me away from the hypothesis that you know any.

You can have a "phenomenal" processor design (Itanium was "phenomenal") in that sense, but if it's crap to program, it's not going to do it much good. As you've so aptly demonstrated, RISC-V is crap to program; you aren't likely to see any scene demos written for it, but that aside, we've yet to see whether the performance of this "phenomenal" processor will live up to the hype. Right now the closest this "phenomenal" processor has come to reality is in a tinkertoy. That's long ways away from servers and production. I still stand by my position: the mnemonics suck and compared to MC68000 or OpenSPARC the programming model is retarded, even for a RISC processor. Good luck with the hype.

Programming RISC-V will for most people be practically just like programming Thumb2,x86,etc - because they will use a standard high level languages and compiler toolchain.

Calling things ‘retarded’ as an insult is a very juvenile thing to do.

Retardation, by definition, is something backward, and RISC-V instruction set is backward, since it is empathically not an improvement over what we already have in OpenSPARC or MC68000 in terms of clean design or intuitiveness. Perhaps you rely on the childish and not the engineering use of retardation, whereby I suspect that it's a cultural thing.

> Writing RTL takes one night but, how long will the verification take?

Obviously more than a night, but hey development is completely open so you can see for yourself.

> How much will it take to tapeout?

0 days. It is running on an FPGA.

> Who will pay for it?

See previous comment. Presumably the author paid for the FPGA, but maybe it was a gift?

Almost! For most of time I work only in the simulator (Xilinx ISIM), but luckily I have lots of FPGA boards available at work too!

As someone living in hardware land, I have a lot of respect for all the effort that goes into those steps of the process and I’m glad you brought it up, but presumably they aren’t that important to the point of a hobby project like this.

Another hardware lander here. Did a lot of verification of processors in the past, including one that was Mips-like. Parent comment is correct in its suggestion that verification is a much bigger deal than getting HDL that will compile and pass a few test cases. But for a hobby project, yeah, no need to throw a wet blanket on it.

After two weeks, I hope most critical problems are already solved, as long more complex software is working without problems, thanks to the clear and simple RV32I instruction set! Of course, as observed in the project description, I am working only with FPGAs, which means that is far more faster and easy to simulate with real software and find problems. As long most of complex work is done by the FPGA tool and I have a well comported clock environment, always according to the specifications, the verification is purely logical.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact