Hacker News new | past | comments | ask | show | jobs | submit login
RISCVEMU: 128 bit RISC-V emulator (bellard.org)
170 points by ingve on Dec 19, 2016 | hide | past | favorite | 67 comments

Unfortunately it can't boot the Fedora/RISC-V[1] disk image out of the box because the supplied Linux kernel doesn't have all the config enabled[2] for systemd to run properly :-(

It gets as far as the "Welcome to Fedora 25 (Twenty Five)!" banner but then gives a bunch of systemd errors around "Function not implemented". inotify seems to be the main one.

[1] https://fedoraproject.org/wiki/Architectures/RISC-V

[2] https://github.com/rwmjones/fedora-riscv-kernel/blob/master/...

Oh the hilarity :D

Because Red Hat is home to systemd, did I got it right ?

The serious answer might be that a hardware RISC-V deployment is very unlikely to be running Fedora in production. All the existing planned devices are MCUs which are much too small (and lack MMUs in many cases).

The more glib one is that if you're interested in building a CPU emulator from source but don't want to rebuild a kernel, you're probably skipped a few important skills along the way.

Fedora is targeting 64 bit server-class hardware, which is on its way -- in production as we speak.

I'm quite capable of rebuilding the kernel, just lazy. The main problem is that the kernel is embedded in the bootloader, and the bootloader for this emulator is patched because the RISC-V HTIF [obsolete device] emulation is different from the other RISC-V implementations, which makes rebuilding the whole thing somewhat tedious.

> in production as we speak

Really?! Got a link? I thought the ISA was still in flux, isn't it a bit early to be making hardware?

Follow the links in: https://news.ycombinator.com/item?id=13213342 and have a look at the latest workshop proceedings: https://riscv.org/2016/12/5th-risc-v-workshop-proceedings/

The user-level ISA is fixed. The privileged-level ISA is in flux, but people obviously think there's enough there to make real hardware. To be fair the main problems with the privspec are to do with virtualization, and likely the first crop of hardware won't support that or will have only interim support.

At least one team is developing a not-tiny design with Linux in mind [1].

> BOOM supports atomics, IEEE 754-2008 floating-point, and page-based virtual memory. We have demonstrated BOOM running Linux, SPEC CINT2006, and CoreMark.

[1] https://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-...

There are workstation RISC-V machines in development.

Who is developing them?

SiFive.com seem to be closest to having 64 bit hardware. They came out with 32 bit hardware a few weeks back (which doesn't run Linux -- it's a sort of Arduino type board). LowRisc.org may have 64 bit hardware next year. There are a few others, best to see the recent conference talks for details: https://riscv.org/2016/12/5th-risc-v-workshop-proceedings/

Does anyone know where does Fabrice Bellard work for his day job?

He used to work for Netgem [0], and according to his French wikipedia page [1] he co-founded Amarisoft where he is listed on their about page [2]:


We are delighted to bring some affordable tools and software to the 4G mobile community to unleash creativity and at the end expand communications among people. Accessible technology is the basement of possible success story. We at Amarisoft are working on helping all size of company or individuals being a player in mobile networks of now and future generation. We hope you'll enjoy the opportunity.


Fabrice Bellard

Fabrice is a computer programmer who is best known as the creator of the FFmpeg and QEMU software projects. He studied at Ecole Polytechnique, specializing at Telecom Paris in 1996. Fabrice is an amazing person bringing creativity to the whole team.

[0] https://news.ycombinator.com/item?id=2557422

[1] https://fr.wikipedia.org/wiki/Fabrice_Bellard

[2] http://www.amarisoft.com/?p=about

He is one of the most genius programmers of our time, for sure. He achieved so many great things in his career. More than most people achieve in life. See: https://en.wikipedia.org/wiki/Fabrice_Bellard

Don't judge it too much by the speed of the JavaScript demo. The (host) native emulator is ridiculously fast.

Considering it's a RISC emulator running on top of an x86 emulator running in JS, the slowness is perfectly understandable

But when running the 128bit demo I get a funny value for sqrt(2) in fp64 mode: 1.728102266788482

Well, it's definitely fast enough not to notice excessive latency in the terminal and to run simple applications, but the instructions are interpreted so expect a serious slowdown compared to recompiling or dynamically translating the code.

On the other hand, this makes the code portable and indeed the 32-bit and 64-bit targets work pretty well even on my ARM AArch32 machine.

It would be nice to add the 64- and 128-bit DFP (Decimal Floating Point) support to this. The Intel DFP library can be dropped in, similar to how SoftFP was used for floating point: https://software.intel.com/en-us/articles/intel-decimal-floa...

The RISC-V spec mentions it supports this in the “L” Standard Extension for Decimal Floating-Point.

There's no actual spec.

Although there is discussion about creating one if you want to join the ISA-dev mailing list.

There certainly is: https://riscv.org/specifications/

I read it about a year ago and found it straightforward and clean.

Are you making a technical point about stuff missing?

I think they mean that the decimal floating point extension ('L') isn't in the spec. Check out the user ISA v2.1. The 'L' name is reserved, but it's not fleshed out.

I'll wait while you go and actually read the "L" section of the spec you linked to.

Got it. You were indeed making a technical point and chose snark over clarity. Thanks.

I thought it was clear that "there's no actual spec" referred to decimal and not risc-v as a whole. It's an important non-nitpick point.

I don't [edit: think] "there is no spec [for the CPU]" was an unreasonable interpretation for that comment, even if "[for decimal floating point]" was the intended meaning.

There was some ambiguity, and the interpretation is understandable.

Dismissing it as a "technical point" and implying that kobeya never should have commented is not.

Yes, that is also true.

Ah interesting.. I just assumed it was actually spec'd out-of-document, so shame on me :)

Well if you are interested in making it happen I encourage you to join the ISA dev mailing list. There are others discussing informal proposals for the L spec as recently as the last few days. Helping craft a spec and reference implementation would make actual silicon implementations a near term prospect.

I suspect the in-browser version of this running Linux is probably considerably more powerful than the machine I first used Unix on - an HLH Orion.

Fabulous work! Is there a way to change the size of the disk image? Or do I have to bake a new one, according to Yocto/OpenEmbedded RISC-V Port for instance?


He quietly updated both the source tarball and the disk image after this story was posted (from "2016-12-18" to "2016-12-19"). It looks like there were mostly some changes to the block device code.

The disk image has been updated again, to "2016-12-19.1".

Fabrice Bellard is a genius.

He's done so much work on QEMU I would just naturally assume it would be included there. Why create a separate emulator?

Fabrice is not active in QEMU anymore (and has not been fur almost 8 years now).

Without having looked at the code, I would guess that it wasn't worth the effort of being the first 128-bit target on a core that is based on JIT compilation and supports 32-bit hosts.

No github/bitbucket/gitlab/etc.?

I think this is an interesting question. Does Bellard use a VCS?

QEMU used svn, but that was when everybody did, so don't judge him from that. :-)

Bellard strikes again :)

I understand that designing processors is fun, but I hold several things against RISC-V:

- the assembler dst, src backward syntax (like intel);

- the fact that this processor design is being aggressively pushed here (featured several times already);

- the fact that the instruction set is non-orthogonal (32-bit fixed encoding makes the decoder simple, but creates the same load-store problem as on SPARC - hello non-existent, synthetic instructions!);

- the fact that they could have extended the open source UltraSPARC T-series design, but decided to just re-invent the wheel all over again. How does 128-bit support justify starting from scratch, and not re-using what is already there and open sourced?

The last point is their biggest sin, in my view. There is already an open source processor design, and a good, solid design, and they just went ahead and invented their own anyway. All the lessons about reuse went out of the window. We're the only industry that I know of that keeps nuking itself and starting from scratch, throwing away all the work which had been done before.

Oh, and I seriously dislike the instruction mnemonics. To the authors of RISC-V:

couldn't you have made the mnemonics MC68000 compatible, or at least make them similar?

To address a couple of these points:

- Assembler syntax is a matter of personal preference. I also loved programming the 68K, writing OS-9/68k drivers back in 1990. However the vast majority of even kernel/embedded programmers rarely touch assembler these days.

- SPARC & register windows. A very unfortunate design choice in hindsight. The RISC-V ISA developers have paid very close attention to existing designs, and other ISAs and microarchitectures are frequently referenced. Have a look at the discussions on the ISA-Dev mailing list: https://groups.google.com/a/groups.riscv.org/forum/#!forum/i...

- RISC-V is trying to avoid patents, so they cannot necessarily reuse existing ISAs, even open ones.

Based on your behavior in this comment thread -- downvoting comments that give valid criticism of the RISC-V ISA -- I have to conclude that the reason the RISC-V folks chose not to reuse a patent-unencumbered ISA such as SH2 or MIPS32 (minus the unaligned load-store instructions) is simple vanity.

What does your comment add to the discussion, and what is the proof of what you are attributing to the user?

For one, MIPS32 and SH2 are both 32-bit ISAs. Extending them to 64-bit would require just as much compiler re-working as building a new RISC ISA -- and it'd no longer be MIPS32 or SH2 anyway.

I haven't downvoted a single thing in this thread. Other people are doing the downvoting.

- SPARC & register windows. A very unfortunate design choice in hindsight.

What do you mean, unfortunate design choice? The register windows help compilers and provide 256 virtual registers, thus significantly boosting performance. It's one of the biggest, best features of the Scalable Processor ARChitecture.

And that's the exact feature RISC-V considers "unfortunate". I cannot believe it!

Considering that the OpenSPARC cores are under the GPL, what impact do patents have in that case?

Register renaming is a better way to provide large numbers of hardware registers.

Another aim of RISC-V is to provide an architecture which is both easy to implement at the low end, and can be scaled at the high end. Register windows are not easy to implement. Register renaming OTOH can be left out of low end designs and included (as an invisible microarchitectural detail) at the high end.

The smallest RISC-V implementation is PicoRV32 which is ridiculously small (https://github.com/cliffordwolf/picorv32 - smallest is 725 LUTs in an FPGA). It couldn't have been done if the architecture had register windows.

BTW I'm fairly certain that high end SPARC implementations must be doing register renaming, otherwise each function is limited to 8 or 16 registers.

If RISC-V is supposed to scale down to the low end, why doesn't it have delay slots? On an implementation with a simple instruction pipeline, delay slots not only improve performance, they also simplify the hardware.

Because you shouldn't expose such micro-architectural details in an ISA which is trying to be stable for the next 50 years (yes really -- see discussions on the mailing list).

The very very low end PicoRV32 is not pipelined (unsurprisingly). More ordinary low end (eg. Rocket) has the canonical Hennessey-and-Patterson 5 stage pipeline and still doesn't need to expose branch delay slots in the ISA.

I should say also that the classical MIPS -- ie. no instruction dependencies -- was another mistake in hindsight.

Because you shouldn't expose such micro-architectural details in an ISA which is trying to be stable for the next 50 years

...and earlier you wrote

However the vast majority of even kernel/embedded programmers rarely touch assembler these days.

It is unclear to me why this double-standard is okay: on one hand, that which is really important, like assembler syntax and mnemonics is exposed and yet not considered relevant, but "micro-architectural details" which really don't hurt me as the coder one way or the other should be hidden.

I am not sure who the RISC-V designers expect for their audience, nor why this double standard, but in the light of "the 50 year plan", it appears that the designers haven't given it much thought, if any at all.

In the end is this: we have a RISC-V processor design which is not human friendly as far as mnemonics go, and with such processors, relying on compilers will be the only practical solution. Who has not learned from the lessons of history is in for another bitter teaching: such attempts have failed miserably in the past. As evidence, I point to PA-RISC and Itanium. If it were but for a better compiler!

Microarchitecture has a technical meaning here - it means the implementation of the chip. The ISA should be separate from the microarchitecture, although very often they are not (common examples being: branch delay slots, register windows, non-interlocking pipeline stages).

The microarchitecture is nothing to do with the assembler syntax or instruction set.

The microarchitecture is nothing to do with the assembler syntax or instruction set.

I know that, my point is that this processor design has completely screwed up priorities in terms of features and usage.

I should say also that the classical MIPS -- ie. no instruction dependencies -- was another mistake in hindsight.

What do you mean by "no instruction dependencies"?

I actually meant "without Interlocked Pipeline Stages" which is what the "IPS" originally stood for in MIPS. Again it's a microarchitectural detail which was exposed through the instruction set, and later MIPS abandoned it.

Okay, so if we assume that implementing register windows is hard, why tackle the problem again, instead of reusing what is already there? The entire work appears to me to be the authors' personal bias against SPARC, possibly wanting to implement an entire processor from scratch, at all costs, no matter what. The patent argument seems very illogical for a code base under the GPL. Something just doesn't seem right.

> possibly wanting to implement an entire processor from scratch

Wanting to facilitate others to do so, sure. This is a common thing to do in a computer engineering program, and I believe it was an explicit design goal that student projects could be real RISC-V implementations instead of targeting the toy ISAs commonly used for this purpose.

My understanding is that the very large number of registers on the SPARC made context switching very slow --- you need to flush up to several hundred registers onto the physical stack and then restore them again later; while at the same time, processors rarely had enough registers to run non-trivial programs without causing overflow traps. So, the extra complexity wasn't really considered worth it, and it was better to devote silicon to other optimisations rather than exposing what is basically a processor implementation detail in the ISA.

However, my knowledge is years old. Have these problems been solved?

However, my knowledge is years old. Have these problems been solved?

They must have been. I used some Snoracle T4 hardware, and the CPU's were blazingly fast. How they did it though, I don't know, as I didn't have a chance to look into it.

Oracle is the steward of SPARC, who knows what might happen to it. RISC-V is controlled by a non-profit industry consortium.

SPARC International is the steward of SPARC.

That's funny, I don't like much the RISC-V because of the focus on minimalism(1), but the points I dislike are totally different from yours..

1: no trap on integer overflow instructions? Come on:

a) the MIPS (a processor designed for simplicity too) has those!

b) in 21st century, security features should not be optional.

> 32-bit fixed encoding makes the decoder simple, but creates the same load-store problem as on SPARC - hello non-existent, synthetic instructions!

I'm not familiar with this. Could you explain the problem or point me in the right direction for background info?

If you have 32-bits available to encode the instruction and the operand, how do you encode a load of a 32-bit address, or even a 64-bit address?

That's why the SPARC assembler, as(1), accepts a completely made-up, non-existent, synthetic instruction, ld. In reality, what happens is that as(1) assembles this made-up fantasy into two instructions, sethi and or. This is the cold, hard reality of loading a 32-bit value from memory:

  .asciz "Counter is %.2d\\n"
  .align 4

  main: sethi %hi(String0), %l0
        or %l0, %lo(String0), %l0

Looking at the RISC-V spec, you probably should be using the AUIPC instruction to address data in your .text, .data or .bss sections. As a bonus, your code becomes position-independent so you can actually use it in shared libraries. To use arbitrary 32-bit and 64-bit pointers known at compile-time (which should be pretty rare), you also have the option of loading them from memory, i.e. using AUIPC to generate a pointer to the value in your .text and then loading it to a register in addition to your approach.

Also, check out chapter 2 of this paper for the technical and licensing motivation to not go with SPARC or several other RISC ISAs: https://people.eecs.berkeley.edu/~krste/papers/EECS-2016-1.p...

Keep in mind, the UltraSPARC T1 and T2 are only two cores suited to specific workloads. They'd still be starting from scratch on the RTL if they wanted to do anything different with the microarchitecture. Now they've (UCB-BAR anyway) got at least one open out-of-order core suited to the higher end, a flexible in-order core for microcontroller up to cellphone operation; not to mention the gaggle of open RISC-V cores from outside UCB-BAR.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact