Hacker News new | past | comments | ask | show | jobs | submit login
What's new for RISC-V in LLVM 17 (muxup.com)
51 points by todsacerdoti on Oct 11, 2023 | hide | past | favorite | 39 comments



LLVM needs a way to be able to turn off compressed instructions entirely at configuration time. Some of the RISC-V server platforms which will appear in the next few years won't have them.


You could certainly change the default `-march` to rv64g (plus whatever extensions you want) rather than rv64gc. That provides mostly what you'd want, though there's some small chance that inline asm or hand-written assembly using `.option rvc` or `.option arch, +c` inappropriately causes issues. I just posted some thoughts on this here: https://lists.riscv.org/g/tech-profiles/message/348


Using 'clang -march=rv64g' does work but it's a pain in the neck for Linux distros. [Edit] The question is, is there a way to change this at llvm build time? I wasn't able to find a way. For GNU binutils it is relatively simple.

BTW I've been writing a document on the topic of how Linux distros will support hardware with and without compressed: https://github.com/rwmjones/rhrq-riscv-extensions/blob/main/...


99.9% of my clang invocations are from the build dir of the git tree I'm working on, so I could be missing something known to others on the distribution side - but let me try my best answer.

Firstly, as you probably know the Clang/LLVM model is quite different to gcc/binutils in terms of build-time configuration. Any clang can target any architecture (unless it was disabled at build time) using --target=$TRIPLE (see <https://clang.llvm.org/docs/CrossCompilation.html#target-tri...>). In practice it's not as useful as it sounds of course because you need appropriate sysroots. You can control the default with LLVM_DEFAULT_TARGET_TRIPLE, but without patching clang it's going to enabled compressed for the riscv64-unknown-linux-gnu triple. I think the standard way of controlling other default flags would be to deploy a configuration file <https://clang.llvm.org/docs/UsersManual.html#configuration-f...> - though I'd need to check which logic took precedent if you explicitly passed --target=riscv64-unknown-linux-gnu. Ultimately we'd need to change the meaning of the current linux triples, or decide upon new ones I think.

Thanks for sharing that doc - really helpful. I don't know if it's different with GNU as, but not that if a file contains `.option rvc` that _isn't_ sufficient to set the EF_RISCV_RVC ELF flag with LLVM. Additionally, I'd imagine in general that people using .option in inline asm may see different behaviour for Clang vs GCC. Clang doesn't generate assembly output that then gets passed to the assembly - ELF emission happens directly (of course, with the assembler being invoked as needed for inline asm blocks), and so it's rather difficult to replicate the effect you'd see with GCC generating a .s file that's then assembled.


Why would RISC-V servers not want to support compressed instructions? I imagine compressed instructions would improve code memory locality, perhaps at the cost of some instruction decode overhead.

On the flip side, Minimax is a RISC-V design that only implements compressed instructions: https://news.ycombinator.com/item?id=33422717


One thing that comes to mind: Compressed instructions enable 2-byte alignment for all instructions, and actually allows for a 4-byte instruction to span two pages.

EDIT: Looks like I'm not alone: https://news.ycombinator.com/item?id=29667135

He is completely right and I have had those exact problems: Not being able to figure out where you are in the instruction stream annoys me when it comes to RVC. I actually dislike it so much that I intentionally disregard it in my RISC-V emulator and focus on other things, eg. RV64GVB etc. I support C-extension, but it's hard to make it run fast. I wish they solved compression another way. We have very fast compressors/decompressors that can use dedicated dictionaries that could have been used in ELF loading instead.


Decompressing during loading wouldn’t help memory/cache usage once the code is loaded


I didn't think about the potential benefits of fitting in icache.

I managed to find this PDF that does admit that it has a modest 2-3% benefit whenever the code wouldn't normally fit in icache: https://forums.macrumors.com/attachments/a-case-to-remove-th...


The servers in development have huge i-caches and gobs of memory bandwidth so compressed instructions don't really benefit there. (They are much more appropriate in embedded designs.)

The real issues are two-fold: Compressed instructions consume 75% of the instruction encoding space. The thought is that without compressed instructions occupying that space, more 32-bit extensions can be added before we need to use longer instructions.

Secondly verifying designs which use unaligned instructions is difficult because there are many different cases to consider (especially instructions crossing cache lines and pages). The verification thing comes from a particular vendor who already design very high end chips - you are likely to have several in your possession - and I have to assume they know what they're talking about.


> Secondly verifying designs which use unaligned instructions is difficult because there are many different cases to consider (especially instructions crossing cache lines and pages).

I don't see how this applies to RISC-V compressed instructions? As far as I understand they're always 16-bit and always come in pairs such that 32-bit instructions are still perfectly aligned.. no?

I'm not entirely convinced by the "huge i-cache" argument either. Fitting more instructions into the instruction cache is always a benefit. Moving less data around is always a benefit. And the RISC-V compressed instruction scheme is so simple it doesn't really cost much. Put another way, if they supported compressed instructions maybe they wouldn't need such big I-caches and could fit another core or accelerators instead?

> Compressed instructions consume 75% of the instruction encoding space. The thought is that without compressed instructions occupying that space, more 32-bit extensions can be added before we need to use longer instructions.

This argument I can buy though. I could see the point that there are more complex instructions that they want to use the encoding space for in a high end server architecture. But can they? Is that supported by the standard?


> such that 32-bit instructions are still perfectly aligned.. no?

Nope, 32-bit insns are naturally aligned if you don't use C but only 16-bit aligned if the C extension is present. So a single insn can indeed cross a cacheline or page boundary. This would be fixed if C insns "always came in pairs" (at the expense of some extra C.NOP's) but they don't. (Though this could be made an optional extension of its own.)

> But can they? Is that supported by the standard?

C has always been an optional extension, and when not using it you free up that part of the encoding space. Additionally, the 3 blocks of encoding space that make up C are largely self-contained so the extension itself could be split up, which would allow for freeing plenty of space while still keeping most of the benefit of compression.


> This would be fixed if C insns "always came in pairs" (at the expense of some extra C.NOP's) but they don't.

The ISA manual hints at this: with regular instructions allowed to be 16b aligned, regular & C instructions can be mixed freely. This also helps improve the code density.

Requiring C instructions to come in pairs, reduces that advantage.

So it's a kind of all-or-nothing affair.

Probably RISC-V designers ran a bunch of code simulations, and concluded that "mix regular & C instructions freely" weighed heavier than "simplify RISC-V cpu designs that support C subset".

Note that cpu design effort is a 1-time cost (when designing new cpu). Whereas adding spurious NOPs would be a recurring cost for all software using the C subset.


> Secondly verifying designs which use unaligned instructions is difficult because there are many different cases to consider (especially instructions crossing cache lines and pages).

Is this a practical concern? You can trap the case where an insn is potentially spanning two pages and handle it in software. As for cache lines, these are small enough that the issue could come up regardless, e.g. from a vector load insn, which is something you do want to support.


I've never got close to designing cutting edge mobile & server chips so I have to trust that they know what they're talking about, and after multiple conversations this is one thing they feel really strongly about.


> Compressed instructions consume 75% of the instruction encoding space.

How much space is spare after the rest of the (mainstream) extensions are included?


It depends. There are lots of partially filled-in blocks in the encoding space, so it depends on the kinds of insn you need and whether you're going for a standard extension (since a non-standard use of those blocks could itself clash with future standard extensions). Some standard extensions do use up a lot of space, such as surprisingly F and D because of the FMADD insns which need to take 3 registers as input. So if you dispense with those in your implementation you can have plenty of extra space.


Those server platforms will make sure LLVM supports whatever they have, because a loss of Rust would be a very large blow, unless rustc_codegen_gcc is production-ready by then.


This FUD keeps popping up every now and again.

Yes, compressed instructions rely on variable size opcodes.

No, it isn't like x86 where it's really complex for the decoder to know where instructions start and end. It has been designed to avoid that.

As for actual implementations, every server chip that's been announced so far supports them.

This includes Tenstorrent Ascalon, the one with the 8-wide decoder.

Higher code density means more code fits in cache, or less cache is needed (and can thus be clocked higher). It helps very high performance implementations.


I dream of library for example named libcpu that have backends compiled from Verilog/VHDLl(or even get traction under something better)so people could play with existing CPUs by implementing them and/or creating something complete different without the need for expensive FPGAs and not constrained by their size. Building emulated machine by using some connecting code(maybe a framework) and hard part hidden in libcpu for major chips.


You may want to check out Verilator:

https://verilator.org/


What's with RISC-V and these peculiar extension names?


Someone thought Intel was too clear in their naming of ISA extensions?

One a more serious note, it looks like RISC-V extensions are single letters, except for X (vendor extensions), S (collection of supervisor-mode extensions), and Z (collection of user-mode extensions).


And RISC-V is designed to be highly modular that the "base" instruction set is really tiny, it even lacks integer multiplication and division (provided via M standard extension). For this reason there are a tremendous number of extensions, and most recent extensions are using the Z prefix, where the second letter is often a relevant single-letter extension (e.g. `Zmmul` is a subset of M with only multiplication) or a mnemonic (e.g. `Zk` prefix is dedicated for cryptographic extensions). No wonder why extension names look so crowded.


The poor souls having to deal with that c++ monstruosity which is LLVM for legac y support in the RISC-V world have my sympathy.

The RISC-V world was made mostly for assembly written software (without abusing a preprocessor, namely trying to write c++ or even C with a preprocessor...). That because it is a world wide standard, free from royalties and licencing (not like arm and intel).


> The RISC-V world was made mostly for assembly written software

I can assure you this is not the case. In fact RISC-V is somewhat annoying to write software for directly in assembly because of the lack of support for loading large constants (directly), and making large jumps.


I don't agree. It is fine if you use a multi-instructions-per-line assembler.


Why is that different from "abusing a preprocessor"? Because it's built into the assembler?


The preprocessor can be independent from the assembler. For instance when I code x86_64 assembly I use a C preprocessor (and I am not doing weird things with it)... but I use 3 assemblers (fasmg/nasm/gas), I still use their internal preprocessor, but hardly, only to smooth their syntax to have only 1 source file I can assemble with fasmg/nasm/gas.

The main issue with a "preprocessor" it is when it's ultra complex and powerful, for instance with fasmg and to a lesser extend nasm: devs are literaly jerking off using as much as possible the complexity of the preprocessor, making their assembly code hardly assembly anymore, and you end up having to learn their whole new language they did built with it, and the really annoying thing their "assembly" code becomes strongely dependent on that specific preprocessor and the "technical exit cost" is astronomical. This is no better than to depend on a c++ compiler.


I guess that makes sense. RISC-V assembly doesn't actually seem that difficult to hand write, especially given that pseudo-instructions exist.

I don't see why you wouldn't just write in C, though, aside from a few functions where hand-written assembly is necessary. C compilers are more complex than assemblers, but they're still really common and, and they're doable for the average person to learn to write on their own.


You pinpointed the issue: a C compiler is already way above the complexity of a normal assembler, but you can write "reasonable" real-life alternatives with a small team (cprop/tinycc/more?)... but C code is quite sensitive to planned obsolescence and abuse via extensions. ISO is always adding stuff, which first will increase the cost of writing a new real life alternative (slowly but surely), and some code was C but ended up being specialized to specific C compilers of cosmical cost, namely way more expensive (aka not reasonable for even a small group of devs) to write a real life alternative, linux kernel code (even though llvm is running after the never ending addition of gcc extensions required to compile kernel code), glibc and much GNU software code (the "GNU jail"), etc. You can be almost certain, some C software will get "feature creep" which will force you to update your C compiler, some sort of steady and creepy planned obsolescence.

We took the example of C, but with other languages it could be much more worse and accute, I guess you get the overall picture. For c++, it is just out of control, beyond sanity, this language is actually toxic.

I still write in C some software, but I am carefull to avoid as much as possible extensions, latest ISO tantrums, and I try to use NOT gcc. I force myself to code C programs which compile with tinycc & cproc/qbe & gcc (if I can get my hands on other "real life" C compilers, I may try to add them). But I am writing more and more assembly, because I realize the cost of a compiler is not that much worth it (and actually giving more trouble than an assembler in my use cases), and I get much better stability (and control on the actual machine code running). It is even reaching a point where I think it would be better to write the same code for different ISAs than using any compilers... it all depends on the type of code and its expected life cycle.

Writing assembly abusing a complex macro-preprocessor is not that much "better" than using a C compiler in the end, since the overall costs and drawbacks of such complex macro-preprocessor would not be that much "better" than the costs and drawbacks of a C compiler.

So I already think there should be much more assembly out there, and with RISC-V being a worldwide free standard, the golden grail of machine ISA interoperability, even more so.


In what way is RISC-V more suited for writing assembly than, say, AArch64?

From what I know of each, they're fairly close together, but a few small things like the clever encoding of logical immediates (e.g. and x1, x1, 0x3fff70000000 is a single instruction; you don't need to waste a register to load the immediate) in AArch64 make it seem friendlier to me.


100%. RISC-V at least requires a macro assembler of some kind to be more pleasant.

That's not to say RISC-V isn't a good ISA. The whole point of RISC is to be a good target for compilation, not humans.


There's no real reason to code in machine code, or to give up the convenience of a macro assembler and use a micro assembler.

But you can do both with RISC-V, there's nothing to stop you.


Absolutely, and I've done it. But as far as various assemblers go, RISC-V is one of the least ergonomic. It's consistent, certainly. But tedious.


I have done it, too. RISC-V Feels like MIPS, except better.

m68k used to be my favorite, until RISC-V happened.


I severely disagree, I think that a "good enough" and "free" and world-wide ISA standard is meant for assembly programming. I meant a lot more, like all system components and very high level language interpreters (javascript/lua/python/insert_your_high_level_language_here).

I did not mean zero preprocessor, I meant a conservative/frugal usage a simple pre-processor to avoid to spawn another insanely complex SDK dependency of the code which will be a pain to create a real life alternative. Basically, RISC-V is meant for multi-instructions per line assemblers (semi-colon...).


Traditional with the MIPS CPU linage, as many instructions common in other ISAs, only exist as macros.


>In what way is RISC-V more suited for writing assembly than, say, AArch64?

Not them, but I would say it's backwards compatibility guaranteed for RV64GC, which happens to be very nice and clean, free of the baggage other ISAs have.


aarch64 is not a world-wide free standard. You have acquire a licence from arm if you want to create cpu support aarch64. That's what apple is doing.

The licence fee is only valid in some countries though.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: