Since there's always confusion about it, I'll start off by clarifying the difference between the two. RISC-V is an open-source ISA standard. Rocket is an implementation of this ISA which also happens to be open source. We do not intend for the ISA to be tied to a single reference implementation. The intention for RISC-V is to enable many different implementations (whether open-source or proprietary). These different implementations can all run an open-source software ecosystem, which currently consists of a GNU toolchain, LLVM, and Linux kernel port.
Of course, somebody is also FREE to implement something that's "totally not RISC-V", and call it whatever they want. They can change one tiny thing or even nothing.
The major win here is:
a) if somebody WANTS to call their thing "RISC-V compatible", it must implement the same core ISA instruction set as everybody else and
b) all of these companies are pooling their resources regarding lawsuits or patent fights against the core ISA.
Got a source on that?
That's because the ARM ISA is not small either by any stretch of the imagination. On the other hand, the instruction listing of the base RISC-V ISA and the standard extensions can fit on a single powerpoint slide.
I wasn't involved in any of the recent tape-outs, so I can't say exactly how big the decoder is. But it's quite small relative to the other chip components. Currently, the integer pipeline of the chip is roughly the same size as the FPU, and these two together are roughly the same size as the L1 cache. All of those components together are smaller than the L2 cache (depends on the size of the L2 cache, though). So decoder size doesn't really matter in the grand scheme of things.
Decoder speed probably does matter, though. Currently, we can decode an instruction in a single cycle (1 ns). The x86 decoder, on the other hand, can take multiple cycles depending on instruction. But maybe this isn't a fair comparison since the instructions are decomposed into uops.
I have no idea about the performance of ARM decoders.
> From the original Core 2 through Haswell/Broadwell, Intel has used a four-wide front-end for fetching instructions. Skylake is the first change to this aspect in roughly a decade, with the ability to now fetch up to six micro-ops per cycle. Intel doesn’t indicate how many execution units are available in Skylake’s back-end, but we know everything from Core 2 through Sandy Bridge had six execution units while Haswell has eight execution ports. We can assume Skylake is now more than eight, and likely the ability to dispatch more micro-ops as well, but Intel didn’t provide any specifics.
That is, each baby takes 9 cycles to form, but per 9 cycles the population can have more than one baby.
I would assume GNU/Linux, Android, Solaris because of the companies involved. Am I correct?
It really would be interesting to know why Oracle got involved, though, since their involvement is not the universal sign of pending success for an open source project.
sometimes it is writing software that interoperates (bi and the like), other times it is direct need (java in the case of google and oracle, and i'm sure at least some backend systems).
Sooner or later they will port Android to it so it can be used in smart phones and tablets.
Being involved in alternative architectures seems like a sound defensive move.
> Currently RISC-V runs Linux and NetBSD, but not Android, Windows or any major embedded RTOSes. Support for other operating systems is expected in 2016.
No mention of Solaris, so I assume that would be forthcoming?
I live in the SFBA. Is it possible to come to Berkeley and see the reference implementation running ?
Nothing special - I would just be fascinated to see something like this in person.
I'm a cofounder of lowRISC, a not-for-profit working to produce a fully open source SoC implementing the RISC-V ISA, in volume silicon. If you have questions then fire away.
One reason for push-back on implicit overflow checking is that it complicates superscalar designs by adding another exception source. The good news is that with an open ISA like RISC-V with high quality reference implementations, we can finally actually perform meaningful experiments to test these assumptions - adding different overflow checking semantics to a realistic implementation, quantifying the difference when putting it through the ASIC flow for a real process and making the matching changes to the compiler. It seems ridiculous that us in the computer architecture community haven't had the ability before.
See http://danluu.com/integer-overflow/ for a quick and dirty benchmark (which shows a penalty of less than 1% for a randomly selected integer heavy workload, when using proper compiler support -- unfortunately, most people implement this incorrectly), or try it yourself.
People often overestimate the cost of overflow checking by running a microbenchmark that consists of a loop over some additions. You'll see a noticeable slowdown in that case, but it turns out there aren't many real workloads that closely resemble doing nothing but looping over addition, and the workloads with similar characteristics are mostly in code where people don't care about security anyway.
TL;DR: There exists a compiler flag that controls whether or not arithmetic operations are dynamically checked, and if this flag is present then overflow will result in a panic. This flag is typically present in "debug mode" binaries and typically absent in "release mode" binaries. In the absence of this flag overflow is defined to wrap (there exist types that are guaranteed to wrap regardless of whether this compiler flag is set), and the language spec reserves the right to make arithmetic operations unconditionally checked in the future if the performance cost can be ameliorated.
Note that there's even pushback in this thread about enabling overflow checks in debug mode due to performance concerns...
If you click through to the RISC-V mailing list linked to elsewhere in this discussion, you'll see that the C++17 standard library is planning on doing checked integer operations by default. If that's not a "performance focused language", I don't know what is.
> the C++17 standard library is planning on doing checked
> integer operations by default
Note that the Rust developers want arithmetic to be checked, they're just waiting for hardware to catch up to their liking. The Rust "specification" at the moment reserves the right to dynamically check for overflow in lieu of wrapping (Rust has long since provided types that are guaranteed to wrap for those occasions where you need that behavior).
> someone implemented integer overflow checks and found
> that the performance penalty was low, except for
Even if the performance penalty was nonexistent in reality, the fact is that people are making decisions which are bad for security because they perceive a problem, and adding integer overflow traps will fix it.
And the answer is that it there's a tradeoff. All of the no-brainer tradeoffs were picked clean decades ago, so all we're left with are the ones that aren't obvious wins. In general, if you look at a field an wonder why almost no one has done this super obvious thing for decades, maybe consider that it might be not so obvious after all. At zurn mentioned, there are actually a lot of places where you could get 5% and it doesn't seem worth it. I've worked at two software companies that are large enough to politely ask Intel for new features and instructions; checked overflow isn't even in the top 10 list of priorities, and possibly not even in the top 100.
In the thread you linked to, the penalty is observed to be between 1% and 5%, and even on integer heavy workloads, the penalty can be less than 1%, as demonstrated by the benchmark linked to above. Somehow, this has resulted in the question "If you could make your processor 5% faster ...". But you're not making your processor 5% faster across the board! That's a completely different question, even if you totally ignore the cost of adding the check, which you are.
To turn the question around, if people aren't willing to pay between 0% and 5% for the extra security provided, why should hardware manufacturers implement the feature? When I look at most code, there's not just a 5% penalty, but a one to two order of magnitude penalty over what could be done in the limit with proper optimization. People pay those penalties all the time because they think it's worth the tradeoff. And here, we're talking about a penalty that might be 1% or 2% on average (keep in mind that many workloads aren't integer heavy) that you don't think is worth paying. What makes you think that people would who don't care enough about security to pay that kind of performance penalty would pay extra for a microprocessor that gives has this fancy feature you want?
This is not true. One problem is that language implementations are imperfect and may have much higher overhead than necessary. An even bigger problem is that defaults matter. Most users of a language don't consider integer overflow at all. They trust the language designers to make the default decision for them. I believe that most people would certainly choose overflow checks if they had a perfect implementation available, and perfect knowledge of the security and reliability implications (i.e. knowledge of all the future bugs that would result from overflow in their code), and carefully considered it and weighed all the options, but they don't even think about it. And they shouldn't have to!
For a language designer, considerations are different. Default integer overflow checks will hurt their benchmark scores (especially early in development when these things are set in stone while the implementation is still unoptimized), and benchmarks influence language adoption. So they choose the fast way. Similarly with hardware designers like you. Everyone is locally making decisions which are good for them, but the overall outcome is bad.
> if people aren't willing to pay between 0% and 5% for
> the extra security provided
The current RFC seems to be https://github.com/rust-lang/rfcs/blob/master/text/0560-inte... which seems to avoid taking a stand on the performance issue.
1. Memory safety is Rust's number one priority, and if this were a memory safety concern then Rust's hands would be tied and it would be forced to use checked arithmetic just as it is forced to use checked indexing. However, due to a combination of all of Rust's other safety mechanisms, integer overflow can't result in memory unsafety (because if it could, then that would mean that there exists some integer value that can be used directly to cause memory unsafety, and that would be considered a bug that needs to be fixed anyway).
2. However, integer overflow is still obviously a significant cause of semantic errors, so checked ops are desirable due to helping assure the correctness of your programs. All else equal, having checked ops by default would be a good idea.
3. However however, performance is Rust's next highest priority after safety, and the results of using checked operations by default are maddeningly inconclusive. For some workloads they are no more than timing noise; for other workloads they can effectively halve performance due to causing cascading optimization failures in the backend. Accusations of faulty methodology are thrown around and the phrase "unrepresentative workload" has its day in the sun.
4. So ultimately a compromise is required, a new knob to fiddle with, as is so often the case with systems programming languages where there's nobody left to pass the buck to (and you at last empathize with how C++ got to be the way it is today). And there's a million different ways to design the knob (check only within this scope, check only when using this operator, check only when using this type, check only when using this compiler flag). In Rust's case, it already had a feature called "debug assertions" which are special assertions that can be toggled on and off with a compiler flag (and typically only enabled while debugging), so in lieu of adding any new features to the language it simply made arithmetic ops use debug assertions to check for overflow.
So in today's Rust, if you compile using Cargo, by default you will build a "debug" binary which enables checked arithmetic. If you pass Cargo the `--release` flag, in addition to turning on optimizations it will disable debug assertions and hence disable checked arithmetic. (Though as I say repeatedly elsewhere, Rust reserves the right to make arithmetic unconditionally checked in the future if someone can convincingly prove that their performance impact is small enough to tolerate.)
The check failures trigger a panic?
Is there any work to enable an ASan-like feature for unsafe blocks BTW?
There isn't as strong a need for ASan in Rust because so little code is unsafe. Most of the time, the only reason you drop down to unsafe code is because you're trying to do something compilers are bad at tracking (or that is a pain in the neck to encode to a compiler). It's usually quite well-contained, as well.
You can work with uninit memory, allocating and freeing memory, and index into arrays in Safe Rust without concern already (with everything but indexing statically validated).
IMHO the kind of stuff `unsafe` is used for is very conducive to aggressive automated testing.
Not quite 'no'. Of course there are languages that that dynamically switch to bignums, but I doubt that is what you mean.
Swift does detect overflow (by aborting, IIRC). https://developer.apple.com/library/ios/documentation/Swift/...:
"If you try to insert a number into an integer constant or variable that cannot hold that value, by default Swift reports an error rather than allowing an invalid value to be created. This behavior gives extra safety when you work with numbers that are too large or too small."
In particular, they can have weaker ordering semantics and they can be buffered and elided among themselves (obviously with some sort of inter-core snooping).
And possibly support for "tagged" numbers, e.g. add integers if high bit is not set, call function otherwise, same for floats if not NaN, with a predictor for them.
Atomic reference counting is slow and gets even worse the more CPU cores and especially CPU sockets you have. If you can afford to make as expensive operation as an atomic add, you can definitely afford to add overflow checks. Atomic add is 50-1000+ clock cycles depending on contention, core/socket count and "moon phase" -- latency is somewhat unpredictable.
> In particular, they can have weaker ordering semantics and they can be buffered and elided among themselves (obviously with some sort of inter-core snooping).
I'm not sure how weak ordering semantics and fetch-and-add (atomic add) could mix. Aren't atomics about strong ordering by definition? Maybe there's something I don't understand.
> And possibly support for "tagged" numbers, e.g. add integers if high bit is not set, call function otherwise, same for floats if not NaN, with a predictor for them.
You'd still get branch mispredict which I guess you're trying to avoid. There'd be no performance improvement.
All languages would benefit from instructions to more efficiently support tracing of native code. A pair of special purpose registers (trace stack and trace limit registers) to push all indirect and conditional branch and call targets would really speed up tracing of native code a la HP's Project Dynamo. Presumably upon trace stack overflow the processor would trap to the kernel or call to userspace interrupt vector entry.
A small pseudorandom number generator and another pair of special purpose registers (stack and limit register) for probabilistically sampling the PC would make profiling lighter weight, both for purposes of human analysis of code and also for runtime optimization in JITs or HP Dynamo-like native code re-optimization.
Traps (CPU exceptions, such as traditional FPU exceptions like division by zero) usually involve kernel mode context switch. So if you trap on tag, the performance for tagged values will probably be 3-5 orders of magnitude slower. That's a lot.
Could you explain why?
I thought that trapping was more like a 'slow branch': slow due to the flush the pipeline but why should the kernel be involved(1)?
1: except if you need to swap in a page, but that's just like any other memory reference.
Runtime/language exceptions have different mechanisms that don't require kernel context switches (but might involve slow steps like stack walk).
What's the current status? (I read http://www.lowrisc.org/blog/2015/06/second-risc-v-workshop-d... (June 30, 2015))
Your site mentions a lot of "FPGA", do you have some actual silicon prototypes?
What's the current raw speed in MHz or FLOPS?
I read about OpenRISC, OpenSPARC, RISC-V, Z-Scale, BOOM - which are furthest in testing phase? Can we buy some of them in 2017? Or will it take longer?
We haven't yet produced a silicon prototype, but will be taping one out this year. The Berkelely Rocket implementation has been silicon-proven multiple times as has the ETH Zurich PULP core which we also hope to use. The aim of this test chip is to integrate an LPDDR3 memory controller+PHY, plus USB host controller+PHY.
I don't have the link handy, but the Rocket implementation has clocked at 1.5GHz on a 45nm process.
For the final question, perhaps it's useful to define some of these terms:
* OpenRISC: an older 32-bit open ISA.
* OpenSPARC: The open-sourced design from Oracle. GPL-licensed. I don't know of anyone planning to produce a commercially available ASIC using it.
* Z-scale and BOOM are both RISC-V implementations from Berkeley. Z-Scale is a microcontroller-class RISC-V implementation and BOOM is an out-of-order implementation. Both make use of parts of the Rocket implementation (essentially using the codebase as a library). I believe only the base Rocket design has been produced in silicon so far. With lowRISC, we hope to discuss at the upcoming RISC-V workshop (this Tuesday and Wednesday) the status of BOOM, and whether it will make sense to use it as our application cores.
I hope we'll see commercially available lowRISC chips towards the end of 2017, but we'll be able to make a better judgement about how realistic that is once we reach our first test chip.
Hi Alex, I look forward to chatting with you guys about BOOM. =)
That's not bad. How many instructions can this issue per cycle?
Can't really comment, since these aren't our projects. I haven't really seen any developments on these fronts, though.
> RISC-V, Z-Scale, BOOM
RISC-V is the ISA. Rocket, Z-Scale, and BOOM are implementations of the ISA we've produced at Berkeley. Rocket is our reference implementation. It is a 64-bit in-order core. Z-scale is a small 32-bit core with no MMU intended for microcontrollers. BOOM is an out-of-order 64-bit core. They all share some common code, but BOOM is a bit behind the other two.
We have taped out different Rocket and Z-scale chips. But these run as tethered systems and were only meant for our research. As a university research lab, we do not really have any intentions for mass manufacture. ASB can answer better about when lowRISC chips will be commercially available.
What specific license will you be releasing under? Is that still TBD?
Also, who is putting up the money for dev time and fabrication? Partners?
Exciting stuff. Thanks for sharing!
We have received backing from a private individual which has got us going, and some additional donations from some of our project partners.
The Linux Foundation provides "Foundation as a Service" to several of these. I work for Pivotal, so I'm most familiar with the Cloud Foundry Foundation, which is administered by the Linux Foundation on behalf of members.
From what I can see, it's a smooth way to offload the administrative stuff to specialists while retaining the core openness.
Notably, Lattice manufactures the only FPGA for which a full open source design flow currently exists (albeit unsupported by Lattice themselves) , and has its own set of open source soft processor cores  .
So i don't see this sector changing with an open-source core.
If so , it will probably come from china - they're building all the parts, are hungry(even at the state level) to achieve dominance , and don't care much about legacies, and have the market (probably backed by government) to support such strategy.
Lattice on the other hand, is a more general purpose chip maker, so they may not want to bite the hand that feeds them. They've got a lot of products other than FPGAs, so keeping those lines working is probably more strategic than anything else.
Incidentally, I think the first consumer devices to adopt RISC-V will be home wireless routers, because the stock firmwares for those are closed, so they'll just need to be recompiled with a new toolchain. And those devices don't need a GPU.
I don't think we'll be able to compete with Intel's x86 chips for a while. Mostly because Intel's silicon processes are more advanced than those offered by other foundries.
> The RISC-V authors aim to provide several freely available CPU designs, under a BSD license. This license allows derivative works such as RISC-V chip designs to be either open and free like RISC-V itself, or closed and proprietary, (unlike the available OpenRISC cores, which under the GPL, requires that all derivative works also be open and free).
So if it's not GPL, such as OpenRISC, then it's not guaranteed to be fully open source. You could still have 99% of the chip as open source and the other 1% as the proprietary backdoor.
(Which, of course, means that no sane manufacturer would touch GPLed hardware designs.)
Last time I complained here about the verification infrastructure of this project, someone gave that link.
I didn't like much of what I could find there.
What exactly are you trying to do?
In addition it would be good to have a set of assertions to maintain the sanity (read functional correctness) while experimenting with the design.
How are these implementations verified right now ? All I see are few assertions in chisel and small set of tests.
Comparing a real CPU against an ISA simulator is VERY HARD. There's counter instructions, there's interrupts, timers will differ, multi-core will exhibit different (correct) answers, Rocket has out-of-order write-back+early commit, floating point registers are 65-bit recoded values, some (required) ambiguity in the spec that can't be reconciled easily (e.g., storing single-precision FP values using FSD puts undefined values in memory, the only requirement being that the value is properly restored by a corresponding FLD).
We also use a torture tester that we'll open source Soon (tm).
Not sure. I'd look out for videos to show up at the RISC-V workshop that's ongoing (http://riscv.org/workshop-jan2016.html).
The problem is verification is where the $$$ is, so even amongst people sharing their CPU source code, they're less willing to share the true value-provided of their efforts. A debug spec is being developed and will be added to Rocket-chip to make this problem easier.
With that said, MIT gave a good talk at the RISC-V workshop about their works on verification, and we open sourced our torture tester at (http://riscv.org/workshop-jan2016.html).
The tests are not extensive. Just hand written assembly code testing one thing at a time AFAIK. As someone who used to lead ASIC verification projects for a living, I expected a lot more at a minimum.
I don't know what the Chisel stuff checks. Chisel doesn't do anything at the simulation stage I believe.
I guess if you are just playing around with CPU designs, you can use this stuff.
I would never sign off on going to tape out with just this stuff though. Apparently they've taped out over 11 times though!
You want to run the RTL and compare against the ISA simulator? I think you're on your own...
Are there legal issues that make it more favourable?
IMO, compared to OpenSPARC, this is a fresh design that leverages lessons learned from everything that proceeded it. There are many problems with SPARC that makes it hard & expensive to scale up and down.
And yes, the RISC-V ISA is simpler than the SPARC ISA. We also have clearly separated the base integer ISA from the various extensions, such as floating-point, atomics, supervisor, etc.
I don’t see how they’re going to match raw performance against those 1tflops+ 20-cores xeons.
I don’t see how they’re going to match performance/watt against 7w quad-core C2338.
I don’t see how they’re going to match price against $20 x5-Z8300.
If none of the above, then why should Google/HP/Oracle/anyone else buy that?
The numbers I've seen suggests Intel ships around 300-400 Million processors per year. In 2014, 12 Billon (with a B) devices with ARM cores shipped. In 2011, around 500 Million MIPS based cores shipped.
These are obviously very different businesses. But it's clear there is an enormous market for licensable low-end IP cores that are used in everything from cars, cable boxes, TVs, cellphones (many phones have multiple cores internally to operate functions like the cellular baseband), etc. Most of the licensees are just gluing together licensed cores and contracting foundries like TSMC or Global Foundries to build them. The margins tend to be pretty thin (we're talking chips that are in the single digits for cost), so saving a little money on IP licensing is attractive. Since these are often custom applications, binary compatibility is less of an issue.
That seems like a more interesting market for an open ISA and open core processors.
Maybe these RISC-V chips will never get out of the labs of Google, but they could still serve to force Intel to keep prices the same on its new generations in the future, unless they want Google to really get serious about making its own chips.
Have customizable cores to add HW accelerators to like Cavium's Octeon III's do.
Things might get cheaper over time as big vendors buy in volume and that money goes back into enhancements.