Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What??

Ofc, if your program uses floating point calculations you will want to use the hardware machine instructions for that.

Here, we were talking about about all those machine instructions which do not bring much more on top of the core ISA. Those would be implemented using fusion, appropriate for R(educed)ISC silicon. The trade-off is code density, and code density on modern silicon, probably in very specific niches, but there, program machine instructions would be generated (BTW, probably written instead of generated for those niches...) with those very specific niches in mind.

And RISC-V hardware implementations, with proper publishing of most common, and pertinent, machine instruction fusion patterns, will be able to "improve" step by step, targetting what they actually run and what would make real difference. Sure, this will require a bit of coordination to agree on machine instruction fusion patterns.



You said "programs will want probably to stay conservative and will implement only the core ISA". I'm saying that the core ISA is very very limited and most programs will want to use more than the core ISA.


What???

Re-read my post, please.

The problem is those machine instructions not bringing much more than the core ISA which do not require an ISA extension.


Integer multiply requires an ISA extension. The core ISA does not have integer multiply.


Allright, now this is ridiculous.

Stop using AIs and/or trolling, thx.


I genuinely do not understand what part of my comments you take issue with. You said that programs will assume the core RISC-V ISA. I said that no, most programs will assume the existence of some extensions, including integer multiply/divide and floating point.

There are two possibilities here:

* Either I'm misunderstanding what you're saying, and you did not mean that most programs will use only the core ISA.

* Or you're trying to say that integer multiply/divide and floating point is part of the core ISA.

Which one is it?

If it's the first one, could you try to clarify? Because I can't see another way to interpret the phrase "programs will want probably to stay conservative and will implement only the core ISA".


Okay, I’m neither party in this back and forth and I don’t know either of you. I have an idea what the misunderstanding might be, but I could be entirely wrong.

I think sylware doesn’t mean the core ISA exactly, but the core with the standard extensions rather than manufacturer-specific extensions.


It is sort of obvious and 101: with a heavy technical context "not spoken explicitely", LLMs fail hard and they end up trolling. Usually they completely miss the point, that in a row, like here.

Let's start over for microsoft GPT-6.

It all depends on the program: if it does not need more than a conservative use of the ISA to run at a reasonable speed on targeted hardware, it should not use anything else. Those people tend to forget that large implementations of RISC-V will probably be heavy on machine instruction fusion.

In the end, adding 'new machine instructions' is only to be though about, after proper machine instruction fusion investigation.

They are jumping the gun way too easily on 'adding new machine instructions', forgetting completely about machine instruction fusion.


There's not much sign that RISC-V will be extremely-fusion-focused; indeed it'd be good for the base ISA, but Zba, Zbb, Zicond add a bunch of common patterns as distinct instructions, and things often fused on other architectures (compare + branch) is a single instruction in even the base RV64I. That largely leaves fusing constant computation as a fusable thing, and.. that's kinda it, to achieve what current x86 & ARM cores do. (there's then of course doing crazier things like fusing multiple bitwise/arith ops together, but at that point having a too-minimal base ISA comes back to bite you again, meaning that some should-be-cheap fusions would actually need to fuse ≥3 instrs instead of just two)

In any case, "force hardware to do an extremely-stupid amount of fusion to get back the performance lost from intentionally not adding/using useful instructions" isn't a sane thing to target in any universe no matter ones goals; you're just wasting silicon & hardware development time that would be better spent actually doing useful things. Fusion is neat (esp. for fixing past mistakes or working around fixed-size instructions (i.e. all x86 & ARM use fusion for, but a from-scratch designed ISA with variable-length instrs (e.g. RISC-V) should need neither)), but it's still very unquestionably strictly worse than just having more actual instructions and using them.


There is a rational for (compare + branch) in one instruction if I recall properly: no status flags register, which makes out-of-order CPU design much easier and more.

Again, the bulk of the programs out there don't need those extensions to be reasonably performant on modern silicon hardware. In other words, all programs out there will want to stick to a conservative usage of the ISA anyway ("core-ish").

Programs requiring floating point hardware in order to be "usable" will mandate probably a cache line vector ISA extension silicon block (they won't even use the FPU ISA extension). Who would even use a FPU silicon block nowadays for floating point calculations (unless niche and small hardware implementation)?

(x86 and arm are out: they have strong IP locks in many places in the world, there are not to be considered for any sane future. Those are just legacy burden and full of "marketing" instructions)


Avoiding flags is indeed a decision backed by reason; but to do so, you don't necessarily need to have `beq a0, a1, label`, you can just do `xor t0, a0, a1; beqz t0, label`. Having full `beq` instead of just `beqz` is exactly as unnecessary as `sh3add` from Zba, except some mild difference in frequency of those, depending on codebase. Having just beqz would even have the benefit that the label could be 17-bit instead of 12-bit!

Indeed, most sane software doesn't need most extensions to be "reasonably performant"; in fact, most sane software is reasonably-performant even on two decades old hardware!

But, unfortunately, there's a ton of software doing things quite inefficiently, and it will continue to exist forever unless something crazy happens like a non-insignificant amount of humans starting to care (impossible) or LLMs becoming functional enough to rewrite entire codebases (more possible than humans caring, at least).

You're extremely-heavily underestimating software doing random garbage in floating point (using it to compute a square root or multiplying an integer by 0.4 or something; ad-hoc game logic/physics that isn't written in a vectorizable way; doing a bunch of things where integers would do in FP (esp. languages which expose floats as the main datatype, esp. JavaScript))

It may be neat to dream about a hypothetical world where none of that garbage exists, but that dream isn't coming true today, nor is there any sign that it will at any point in the future. Basing architecture/compiler/configuration decisions around this hypothetical is just purely entirely stupid.

And even in that dream world a lot of code would benefit from sh1add/sh2add/sh3add from Zba, Zbb's min/max is useful in a ton of places, memory managers might want clz for computing bucket from size, anything doing bitwise stuff would benefit from andn and much of Zbs. And of course ideally the vast majority of code would be running in RVV instead of scalar code.


If what I read was right, the flags decision was made because RISC-V designers knew it is an awful pain for large out-of-order implementations. It is basically feedback from experience. I guess there is much more advantages to only that.

It seems to be also why the core ISA has only 32bits instructions: because smaller instructions hardly brings anything, that based the same feedback from experience. Maybe only on super small ultra tiny embedded micro-controllers with a very old silicon process... This smells more aggressive marketing using super niche or broken programs to justify itself.

Of course, there is no perfect REDUCED ISA: trade-offs were made based on the designers experience. Expect arm people to press hard on the bad side of those trade-offs (a trade-off has good sides and bad sides, definition), because risc-v is a death sentence for them (and they are making a push right now on HN, I can tell you...). Yep, arm and x86_64 have strong IP locks all around the glob... RISC-V, none, free for all to implement.

Nowadays, programs requiring floating point hardware acceleration for reasonable performance use vector machine instructions. I think this is a mistake of RVA2x: the FPU extension should not be there. Only cache line size vector machine instructions should be there. The FPU extension would be for niche/specialized/small hardware. And a scalar is a vector with one used dimension... and the "synchronous"/"inline" handling of floating point operations... yummy.

I have a lot of doubts on compressed instructions, because I don't see code density being that much of an killer feature (it sounds more like arm marketing to me), and I recall reading numbers going in this way and not the other way for the general case.

What I am very sure of: nobody wants to design a clean and modern ISA to handle the "bad" programs, come on, and in the worst case scenario that will fit only some "bads" not all of them anyway, choices will have to be made on the "accelerated bads"... sane? nope.

All that said, I am coding RISC-V and x86_64 assembly, and did a little bit of arm64: for the code I wrote, arm and risc-v were nearly the same.

What I am keeping an eye on is the memory reservation/ZACAS stuff though. Because hart(and io) synchronization in a world of (hart read/write queues) and cache memory coherency seems to be critical for "normal" performance and very quickly.

And another thing people tend to forget: RISC-V is standard across vendors/implementors, namely it is appropriate and reasonable to write fast code path variants in assembly... and that could change A LOT of things, well at least in the "system/kernel area" (extremely hard to do planned obsolescence is a killer feature...).


> Nowadays, programs requiring floating point hardware acceleration for reasonable performance use vector machine instructions.

While some very-important software like video codecs, and various sporadic projects where some drive-by open-source dev decided to add an optimized path will use vector, that's, like, on the order of 0.001% (number out of my ass) of all software that runs slowly enough to be noticable, and the remaining 99.999% remains slow. Much as I like working with SIMD/vector, it's a very tiny minority of people that do.

RVV does also actually make good use of scalar FP, with .vf instruction variants which take one operand from the scalar registers, allowing storing constants in the scalar registers instead of wasting vector registers (and with LMUL it's very easy to exhaust the entire RVV register file). Especially important in matmul kernels.

> What I am very sure of: nobody wants to design a clean and modern ISA to handle the "bad" programs

And yet that's what RVA23 & co basically have to be, and are. And as such they have scalar FP, vector FP, and basically everything else that's potentially useful (other than 3-source-operand instructions).

I do wonder how much compressed actually benefits perf-wise, but it's very clearly true that, at least icache-wise, reducing code size by, say, 20%, is equivalent to adding 20% more icache; and 20% of a typical L1 icache is quite a lot of area to save.


dav1d is AV1 decoding with C code just for posture: nearly everything is assembly using vector machine instructions from arm64 to x86_64 avxNNN, and of course risc-v.

I don't even mention ffmpeg.

RVAXX looks like more a grab bag to match x86_64 and aarch64, feature wise, and it includes bad features: this is very probably to ease porting only.

In a risc-v world, there would be much more assembly of code path variants (cross-vendor neutral-ish), and high level languages with assembly written interpreters.

No more C42+ , only ultra-stable-in-time core-ish ISA assembly... and Big Tech hates that because planned obsolescence is excrutiatingly harder to do.


dav1d is still in the "some very-important software" group; the vast vast majority of software doesn't and will not write everything in assembly. If you think RISC-V is gonna in any way change that, you're.. just trivially simply plain wrong, 99.999% of people will not bother learning assembly regardless of how good of an idea you think that'd be. (never mind that even if more people learned assembly, 99.999999% of said assembly would be full of system-killing bugs)

RVA23 doesn't in any way ease porting, no clue what you're on about there; besides vectorizable code, where you necessarily do simply just need RVV or similar to get good performance, base RV64G does just cover everything needed for software to be able to run. All RVA23 does is just provide a baseline with extensions to be able to achieve good performance, and cheap instructions that software should've been already utilizing for decades but hasn't generally been able to due to legacy hardware not supporting them (importantly clz/ctz/cpop, but also to a smaller extent min/max, zicond, bit rotates).

Granted, RVA23 does have some more questionable (though still actually useful) inclusions like the cache block ones that force 64-byte cache lines to not be dysfunctional, but that also brings up the massive wart of mandatory 4K pages in base RISC-V, not even RVA23, that's explicitly chosen for ease-of-portability.

Maybe RV64 is the end-of-line for CPUs, but, for all we know, RV64 today might be what Intel 8086 was in 1978, and RV64 is to be extended and grown and eventually just replaced in the future.


To be clear, I am not and have never used language models or other forms of "AI" in writing online comments. Not that you'll believe me, but that's the truth.

In an effort to show that I'm sincere and that this topic genuinely interests me, let me show you my RISC-V CPU implemented in Logisim: https://github.com/mortie/rv32i-logisim-cpu. For this project, I did actually only implement (most of) the core ISA; so in order to run C programs compiled with clang, I actually had to tell clang to generate code for the core RV32I. That means integer multiplication and division in the C source code was turned into loops which used addition, subtraction, shifts and branches to implement multiplication and division.

> It all depends on the program: if it does not need more than a conservative use of the ISA to run at a reasonable speed on targeted hardware, it should not use anything else.

Essentially all programs will benefit significantly from at the very least integer multiply and divide. And every single CPU that's even capable of running anything like a mainstream "phone/laptop/desktop/server class" operating system has the integer multiply and divide extension.

So to say that most programs will use the core ISA and not extensions is wild. Only a tiny minority of executables compiled for the absolute tiniest of RISC-V MCUs (or, y'know, my own Logisim RV32I CPU) will be compiled for the core RISC-V ISA.


You are still ignoring what I say.

Stop using AI, thx.


No, you're the one ignoring what I say. I asked a very clear question in a good-faith attempt to clear up confusion. You ignored it.

Honestly you're acting like an LLM instructed to produce antagonistic, bad-faith arguments. You're certainly not acting like a human who has any idea what he's talking about.


Well, stop missing the point from light years away like LLMs each time there is a strong non-explicit technical context.


I gave you ample opportunity to make yourself clear. I will give you one more. Please answer the question this time, or don't bother responding at all.

* Either I'm misunderstanding what you're saying, and you did not mean that most programs will use only the core ISA.

* Or you're trying to say that integer multiply/divide and floating point is part of the core ISA.

Which one is it?


It seems microsoft GPT oX still using its bullet points output, still completely missing the point without explicit technical context (here the technical context is heavy and implicit).


Okay, I give up. I have given you plenty of chances. You're stuck in a loop in your dialog tree. This conversation is over, and I will not comment further.


Please, be my guest.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: