Hacker News new | comments | show | ask | jobs | submit login
Seventh RISC-V Workshop: Day One (lowrisc.org)
107 points by bshanks 10 months ago | hide | past | web | favorite | 47 comments



For those not following RISC-V closely, 2018 promises to be an interesting year:

* 64 bit hardware will be available from SiFive. It'll be low-ish end, 4 application cores, but it'll run real Linux distros. SiFive are already shipping early hardware to various partners.

* Linux 4.15 will ship with RISC-V support. It's available in RC releases now. (-rc1 was 3 days ago I think)

* glibc will ship RISC-V support. That'll happen in February 2018. I think it's not appreciated how important this is. It means there will be a stable ABI to develop against, and we won't need to re-bootstrap Linux distros again.

* GCC and binutils have been upstream for a while.

* A lot of other projects have been holding off integrating RISC-V-related support and patches until RISC-V "gets real", ie. it's really available in the kernel, there's hardware. These projects are unblocked.

* Fedora and Debian bootstrapping will kick off (again). [Disclaimer: I'm the Fedora/RISC-V maintainer, but I'm also getting everything upstream and coordinating with Debian]

* There'll be finalized specs for virtualization, no hardware though.

* There should be at least a solid draft of a spec for industrial/server hardware. Of course no server-class hardware available for a while.


Is the RiscV community seeking to create a bus standard? By analogy, the pc architecture is not just about the amd64 instruction set, but also PCI. I expect that broader standardisation would make life easier for operating system efforts, but maybe you will correct me.


I sincerely hope not! PCIe is the obvious choice to use in servers.


Has SiFive released any dates for their dev board that can run Linux? I haven't been following it to closely but would love to get one.


My reading of the blog post is that it'll be released in Q1 of 2018. Until then, they're giving a few FPGA stand in boards out. The dev board itself will also use an FPGA to implement a few SoC peripherals, like USB and HDMI.


Actually they are sampling the real chips out to some developers now. However you are correct in saying that an FPGA is used to implement the Southbridge, which is mainly for practicality of getting a board out quickly, not because of awesome self-modifying hardware(!)



Also: "Didn’t have the resources to support a customised register file. A synthesised register file resulted in huge congestion when routing the wires from the flip-flops. Instead, black-boxed the register file and hand-wrote some Verilog to instantiate specific flip-flops, muxes, and tri-state buffers. Effectively hand-crafting their own bit block out of standard cells."

That just sounds insane to me that the tools can't handle synthesis of straightforward stuff like this. My personal fantasy startup is a next generation, mostly open-source EDA simulation, synthesis/optimization and place & route toolchain. You can expose the analog cell design to the world, but generate revenue from the per-fab process-specific partnering required to work the customers through the process of making real hardware. Who's got funding for me?


My comment on the register file is this: instead of making it 6r3w (6 read 3 write), just make it 2r3w and plop down 3 copies of it - each copy has 2 read ports feeding a local ALU and writes go to all instances. Then to add another ALU you make a 2r4w register file and plop down 4 instances of it - one connected to the new ALU. This is still only 6 ports total compared to 9 on the one they've got. Routing within the file should be much easier. The tradeoff is multiple instances of it.


Hi phkahler, the ultimate critical path is the load data coming back to the register file. Replicating the register file doesn't end up being a win (there would be so much area to broadcast it out to), otherwise, everybody would do it.


Hi Chris! But it must alleviate the routing problem within the register file no? And the scalability would be better right? I mean, you can't possibly expect to add another ALU to Boom and require 4 write & 8 read ports can you? Does AMD use 12r6w It seems there has to be a better way ;-)


Split up the physical register file so each pipe has it's own fraction of it, and have the register renaming logic "route" to the correct fraction? Well, that's what BOOMv2 partially does by splitting the INT and FP register files, so I guess I'm proposing splitting the INT file further [1]?

[1] This has probably been thought of and dismissed a million times over by actual experts. Say, losing whatever advantages it may have due to extra micro-ops needed to move stuff back and forth to the correct register file fraction?


> Split up the physical register file so each pipe has it's own fraction of it, and have the register renaming logic "route" to the correct fraction?

Have I got the paper for you! From my adviser's previous student: https://dspace.mit.edu/handle/1721.1/34012

But you're right -- the win isn't obvious. There's pain and advantages in both directions. I've always wanted to build it, but I only have so much time in the day. :(


Oh sure. But the text as I read it seemed to imply that the issue was the library they were using had hugely wasteful flip flops or something, or was unable to route above them, or something crazy. It seems (and I'm not expert or professional in this space) that everyone who's used these EDA tools has experiences like this, and hates them. Seems like a great target for innovation at a lower margin, especially now that ASIC development is becoming commoditized.


Area is the new clock speed.


Although not obvious to consumers, area has always been the bottom line - Gordon Moore used to call the biz 'expensive real estate' (about $1bn/acre)


If you've actually got the EDA expertise for something like that, I can introduce you to people who can get you the funding. Check my profile and email me if you were serious.


Heh, no, it was a joke. Rather, the idea is serious, but my own background wouldn't drive any interest in funding absent a team and a product. I threw it out there half hoping to hear a "me too!" from someone already working on something like it.


Well if you do find someone who is working on that please let me know, the future wave of semi startups will thank you :)


Check these guys out:

http://www.symbioticeda.com/


Cool, thanks. Though they seem focused on the digital side: synthesis and formal verification of HDL code. IMHO that world is served fairly well by existing tooling, as it's fundamentally a software problem that looks like other software problems.

Where what I want to see (or do, at least in my dreams) is closer to the analog end: design of parametrized cells, GPU-parallel large circuit analog simulation, circuit-dependent optimization (i.e. choose "speed" if the cell is on a critical path, "area" or "power" if not, optimally choose drive strength and buffers based on latency needs of the path, swap flip flop implementations likewise, etc...) place and route, eventually mask file generation, reverse synthesis and design rule checking (though obviously that part becomes fab-specific).

None of this stuff is "hard" in a fundamental way, but it's been hidden behind bad tooling for so long that no one seems to have tried to innovate much in the past few decades. Anyway, I got ideas aplenty.


The security improvements are interesting. I hope they take DJB's advice into account when implementing the new crypto instructions and features:

https://blog.cr.yp.to/20140517-insns.html


"Boomv2 achieves 3.92 CoreMark/MHz (on the taped out BOOM), vs 3.71 for the Cortex-A9." - it's a bit of a letdown. I was hoping it'd be closer to x64 performance than to Cortex-A, but it's probably not achievable for RICS-V budget.


Nobody is going to try to compete with Intel Core i7 or AMD Ryzen 7 from year one of a new ISA, for the same reason ARM chip makers didn't try to do that for years. It's just too risky and the investments need to be much bigger. Instead you "grow your way up" as ARM chips have done.

Now, Cavium's Vulcan-based ThunderX2 seems to be beating Intel's Skylake server chips:

https://www.nextplatform.com/2017/11/27/cavium-truly-contend...


Cavium is still no where near intel on single core performance; they just threw way more simpler cores onto a chip.


I can still see it as a great option for say shared hosting providers. Multiple times as many customers could be supported for the price they pay for Intel chips.


Absolutely.

Just saying that in the context of performance/cycle/core, they're closer to a bunch of Atoms than a Xeon (which, again, is totally fine for some use cases).


Sorry to disappoint, AllSeeingEyes. What I taped out is a fairly modest instantiation of BOOM. I was trying to reduce risk and we had a very small area to play with, so I settled for ~4 CM/MHz. One potential win here would have been to use my TAGE-based predictor, which is an easy >20% IPC improvement on Coremark. Of course, a lot more reworking would be needed to achieve x86-64 clock frequencies.


> which is an easy >20% IPC improvement on Coremark. Of course, a lot more reworking would be needed to achieve x86-64 clock frequencies.

Maybe these should be expressed as Instructions Per Second (at peak and at the point of diminishing returns?) or something like that, rather than two independent numbers. Higher clock frequency actually seems like a bad thing, all else being equal. It seems to me that throughput ought to trend toward infinity, clock frequency toward zero. ;- )


Of course it's important to always keep the "Iron Law" in mind, but it's far easier to compare ideas and talk about things in terms of IPC. For example, if a switch out one branch predictor for another, we're talking about an algorithmic change (implemented in hw) that will have an effect on IPC, and no effect on clock period (assuming we didn't screw something up).

This is particularly useful when talking about a processor design, and not a specific processor in particular. As you said, there's a lot of good about slower clock frequencies, so you'll see the same ARM design being deployed at a variety of frequencies. Far easier to talk separately about a design's IPC from its achievable clock frequency (although both are important!).


No worries, Chris, I didn't expect entirely new processor design to play in Intel league :)

Really hoping to get my hands on RISC-V-based SBCs next year.


Me too!


I'm sure Chris will drop by to expand this, but it should be noted that figure was for a particular instantiation. There's no reason you can't explore different power/performance/area trade-offs, or invest further development effort to further improve performance.


There is still a lot of low hanging fruit in the boom design. Hopefully Chris will still be contributing some improvement to Boom.


I've been following developments in RISC-V since I first heard about it a few months ago and I want to get involved! What can I do? I do embedded systems development as my day job and am really eager to dig into architecture and hardware design.

Can somebody associated with the project reach out to me?


The lowrisc project [1] tires to be sort like a hardware linux. They are working on a linux capable SOC and they have a number of innovative ideas you might be interested in.

Maybe listen to this maybe: https://www.youtube.com/watch?v=it3vVtnCYiI

[1] http://www.lowrisc.org/


Practically, what's stopping someone from making a privacy-oriented ARM implementation (meaning, while I can't "make my own" ARM board due to patents, why can't I make a board which is "source-available" and doesn't contain a ME or PSP-like engine)?

Is there something in the ARM license?


A ton of ARM boards like this already exist. You can bring up most Allwinner chips with 100% open source code. The bootmask from for many have been dumped and disassembled even.

The trouble is they all suck in some slightly different way. Maybe only 1gb ram, maybe a bottleneck bus in front of your net or storage io. Graphics are the big pain point right now. Etnaviv/vivante is your only choice for free accelerated graphics, and you'll only find it in a few chips. Mali and PowerVR are all around, but have absurdly difficult to work with closed source drivers.

The nicest oss-all-the-way-down arm socs are i.MX6, which is expensive and frankly old/slow.

To be clear: riscv doesn't solve any of these concerns. That doesn't mean it isn't an amazing project.


So how will RISC-V help? Do ARM patents cost that much to license?


RISC-V doesn't directly help with any of these concerns.

But it is an alternative to ARM, and has a governance model that parties interested in open standards might be more willing to join.

Right now, ARM CPUs on the low end have little standardization at the SOC level. They have a few cores from ARM, (what you think of as the main CPU, maybe a GPU, maybe a low end micro for power management, and a couple more for realtime tasks). Then you have the non-ARM IP in the form of image processors for cameras, video decoding, etc. The SoC mfg is responsible for gluing together all these parts, and every mfg has their own proprietary take on the process, meaning a different initialization sequence, different firmware layouts, etc.

The SoC mfg's goal is to ship a product. It isn't to define a standard, and since there is no standard to follow, 'anything' goes.

Because ARM costs enough that it isn't viable to do a SoC layout you aren't going to ship millions of the chip, academic research (where the seeds for standards are often planted) just doesn't happen.


>Because ARM costs enough that it isn't viable to do a SoC layout you aren't going to ship millions of the chip, academic research (where the seeds for standards are often planted) just doesn't happen.

Is it license or fab issues? I thought that license is $0.X per chip, so it shouldn't make a difference (license-wise) if you make Y chips or 100,000 * Y chips. Fab costs change by overhead, but how would RISC-V help there?


It's a mix of upfront costs and per-chip royalties, with upfront costs for the IP being in the millions.

Here's a somewhat accurate article on the topic: https://www.anandtech.com/show/7112/the-arm-diaries-part-1-h...


If like WD you need 1B cores in your products each year .... and you're paying (number pulled out of thin air) 25c a core to ARM - you're saving $250M


I know at least Allwinner were using OpenRISC for their embedded controller on recent chips, and it wouldn't surprise me if other companies were doing the same, so I do wonder to what extent RISC-V is replacing that rather than commercial cores for this application.


These notes are wonderfully detailed and compact. Please do this for all my meetings!


I'm glad you find them useful. Really I'm indebted to the presenters for giving such clear and well explained presentations.


I wonder what Microsoft will do with RISC-V. Hopefully they don't think it is too risk-y.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: