- Where can I get the ISA specification?
- Where can I get a compiler?
- Is there a link to the "softcore model"?
With RISC-V you can start very simple and small (micro-controller) and work your way up in understanding and implementation to a very large core (application class). POWER is a monster of an architecture, designed more for "big iron". I guess that might limit the "hobbyist" factor RISC-V has.
1. This I think, all 1200 pages of it: https://openpowerfoundation.org/?resource_lib=power-isa-vers...
Even Gentoo has one !
Do you have a particular use case that makes POWER make sense over x86, or do you share my paranoia and love of non-mainstream ISAs?
Do you now? There is not even a hidden embedded micro-core running a "secure operating system"?
The biggest problem remaining is whatever blobs are in devices. That's being rapidly worked on.
You can't trust any modern computer to not be subverted. So, you have to change how you use them. True secrets should be kept out of computers or rooms with technology. Go old school.
But disks? Isn’t their firmware closed?
So the firmware that matters -- the firmware that can subvert the system due to privilege level, etc. -- is open. No other vendor aside from some lower end ARM toy SoCs can say that.
Encrypt your data in-memory with a file system feature (or something like LUKS/dm-crypt) before it's sent down the SATA cable to the disk.
The NSA has gone after disk firmware:
The interesting question rather is: how many of these simply cannot afford it and how many think that this is not worth it?
And no, ME cleaner does NOT (and cannot) fully remove a modern ME. The PSP "disable" toggle in the UEFI configuration does NOT disable the PSP from running during startup.
Because a cloud machine is rented and not owned. And because of the ping latency: there is a reason why there is for example still hardly any cloud gaming.
Put another way, would you call a car that I kept duplicate keys and retained title for, but said you could use and maintain at your sole expense for a single upfront payment, rented or owned?
Latency is being solved, Google etc. are working that problem. I'm playing devils advocate here, but fundamentally if you don't care about actually controlling or being able to modify something, and pricing is cheaper to rent, why own?
Not a perfect solution, but such a problem can be mitigated by a firewall that blocks such ingoing/outgoing packets.
> I'm playing devils advocate here, but fundamentally if you don't care about actually controlling or being able to modify something, and pricing is cheaper to rent, why own?
Since I love to tinker with my computers, the answer is obvious to me.
Your understanding is wrong. For instance, running Java workloads on servers is a major Power9 use case.
The thing to remember, though, is that the Talos is only two four-core CPUs, for eight total. These benchmarks are comparing it to the Epyc 7742, which is a 64 core chip.
Naturally the Epyc will kill it on most highly threaded benchmarks. The individual cores on Power9 are quite fast, though.
Are there any benchmarks for single thread performance there that I could see ?
EDIT: Forgot to mention the open argument which is quite amazing as well (I've followed what Talos does).
18-cores with 4x SMT == 72 threads per Power9. That's a lot of threads, no matter how you look at it.
But only 1x divider, 1x crypto unit per SMT4 core.
The chief downside to Power9 is that it only supports 128-bit vectors, and these 128-bit vectors are executed by ganging-together the ALU units. (so 4x 64-bit ALUs == 2x 128-bit vectors processed per clock tick). Compared to AMD Zen (4x 128-bit pipelines), AMD Zen 2 (4x 256-bit pipelines), and Intel Skylake-X (3x 512-bit pipelines), Power9's SIMD capabilities are tiny.
Another oddity: most instructions take 2-clock ticks to execute, even simple instructions like XOR or Add. This increased latency is likely the reason why it performs so poorly with Python / PHP code.
But when code is written for Power9, it works quite well. Stockfish chess seems to work extremely well on Power9, likely because Stockfish scales to many "cores" well (fully taking advantage of SMT4), and only has 64-bit operations.
One more wildcard: Power9 has 10MB (!!!) L3 cache for every 2-cores. That's 90MB L3 cache on the 18-core. I presume that real-life database applications would benefit greatly from this oversized L3 cache.
EDIT: It should be noted that the L3 caches serve as victim-caches of other L3 caches. So Power9 core-pair 01 can have its 10MB L3 cache serve as a "L3.1 cache" of core-pair 23. AMD Zen / Zen2 L3 cache CANNOT use this functionality. So AMD Zen2 64-core may have 128MB of L3 cache, but each core only "really" can go up to 16MB of L3 cache (because the other 112MB of L3 cache is only for other cores/module)
EDIT: Also note, Power9 came out a few years ago at 14nm, while Zen2 came out on the 7nm node a month ago. I think a new 7nm Power9 update is planned, but I don't know what its timeframe is.
In effect, you could have 1-program using the entire 90MB L3 cache for itself on Power9. While AMD Zen2 requires (at minimum) 8-programs, each program using only 16MB L3. This design decision is clear in the intended use of the chips: Zen2 is clearly targeted at the cloud-market, while Power9 is big-iron / databases.
Unfortunately, most of the benchmarks these days show that AMD EPYC / Rome is just the better overall processor. Still, 18-core Power9 is relatively cheap: a complete 18-core / 72-thread system for $4000ish: https://secure.raptorcs.com/content/TLSDS3/purchase.html
Cheap for Power9 anyway. AMD EPYC is also relatively cheap. You can get a 16-core / 32-thread / 32MB L3 cache AMD Ryzen 9 3950x for only $700 these days (and maybe a complete system build for only $2500).
I see it more as a single big massively wide OoO core with 23 execution units (putting skylake's 10 execution units to shame). The slices are more there for design reasons, to simplify the design process by making it more symmetrical.
Bulldozer is clearly two integer cores sharing some execution units between them, a thread can only exist on one of the two integer units.
In contrast, a thread on POWER9 can simultaneously use all 4 slices, all 23 execution units. The dispatcher can dynamically mix and match which slice it's sending a threads instruction steam to based on slice utilization.
That single difference puts it in a complete different class of CPU architecture to bulldozer.
My reading of the documentation is different.
> The most significant partitioning related to threads occurs when more than two threads are active, placing the
core in SMT4 mode. In SMT4 mode, the decode/dispatch pipeline, shown in the blue shaded area in
Figure 25-1 on page 321, is split into two pipelines, each pipeline is three iops wide and each pipeline serves
two threads. The split decode/dispatch pipes each feed one of the two superslices, shown in the green
shaded box in Figure 25-1, providing two execution slices for each pair of threads. The branch slice and
LS-slices are shared between all threads.
Page 322 of 496: https://ibm.ent.box.com/s/8uj02ysel62meji4voujw29wwkhsz6a4
The left superslices serves 2-threads, while the right superslice serves 2-threads. All 4 threads are "behind" the singular decoder.
It seems very "Bulldozer-esque" to me, especially in SMT4 mode.
You are correct in that there is an SMT1 mode where one-thread could potentially utilize the entire processor. But with 2-latency on even Add / XOR instructions (see Appendix A), I don't foresee SMT1 code to be very useful on Power9. The processor is clearly designed to run most effectively on SMT2 or SMT4 modes.
I'm not even sure how easy or hard it is to switch into SMT1 to SMT2 or SMT4 modes. I don't think Linux can switch cores while running, and may need to reboot for instance. Maybe AIX can switch between the modes on the fly?
I guess if your code has enough Instruction Level Parallelism (ILP) available in its code stream, it could benefit from SMT1 mode. But I'd imagine that most 64-bit CPU-code wouldn't have much ILP.
It's only in SMT4 mode that it starts statically partitioning the threads onto superslices. Even then, it's two threads sharing two slices.
I assume the static patitioning is an optimisation, that preformance increases due to the split L1d caches (and I'm guessing there is a delay cycle when one slice depends on data from another, I haven't read the documentation that closely).
It's the fact that slices can be dynamically scheduled across all four slices which makes it "not bulldozer" in my mind, and I don't think the presence of a mode that does statically partition superslices should make it "like bulldozer", even if that is the most common mode. It's just an optimisation.
> I'm not even sure how easy or hard it is to switch into SMT1 to SMT2 or SMT4 modes.
Idealy, the CPU core would dynamically drop down to SMT1 or SMT2 mode whenever the the extra threads are executing idle instruction.
Well, its certainly a Bulldozer-like mode of operation :-)
Power9 is obviously a very different chip than Bulldozer. So I guess it all comes down to opinion, whether or not the chip is similar enough to warrant a comparison.
I believe 7nm POWER10 will be the next move, they had announced Samsung as the partner for their next chips back in December if I remember right.
The Power9 18-core / 72-thread is going to come in at under 150W total.
The main advancement the past decade has been in power-efficiency. Cloud-scale providers keep their computers running at max load as well, so 500W does add up over months / years into a sizable amount of money.
Especially when you consider that 500W computer needs 500W of Air-conditioning, so the "True cost" of a 500W computer is roughly ~1200W or so (500W from the computer, 700W to power an air-conditioner to move 500W of heat)
A 12-core / 24-thread AMD Ryzen 3900x is just $500, with a total system cost under $1500. The big advantage of a Ryzen 3900x would be a max clock-rate of 4.7 GHz, while your Nehelem 2009 computer is... what? 2.5 GHz? Probably? And computers of that age didn't have deep sleep capabilities, wasting even more power than usual. Modern computers idle at 20W, even servers and desktops. Tons of power-saving features these days which add up.
I think a typical $1500 computer these days would be more than twice as fast with 1/4th the power usage. I don't think anybody seriously in this hobby should be using anything as old as Nehelem these days.
IMO, the price/performance "old computers" seems to be Haswell (~2014 era servers), if people want to buy old equipment. But 2009 is definitely too old, there are lots of used servers that are a little bit more expensive but a LOT more power efficient / faster in practice.
I thought air conditioners/heat pumps were supposed to be substantially better than 1w of heat moved outside per watt of electricity?
15 BTU/hr == 5 Watts of cooling per Watt of input.
So it appears you are correct. To move 500W watts of heat, you only need 100W of air conditioner power.
For example, the Dell PowerEdge R630 (2014-era) server is in and around $600 to $1000 on Ebay, and will be more power-efficient and faster than any 2009-era system.
I think 2014-era servers are where the price/performance point is for the home-server enthusiast, especially if we're talking about sub $1000 price points.
2x8 core dual socket Intel Xeon E2640 v3 (Haswell) with 64GB of RAM. Its an auction, so it will probably go up another $100 or $200 from there, but I would expect it to sell well south of $1000.
2014-era equipment is the current price/performance king for home hobbyists. Obviously, a modern desktop with all the bells and whistles is a bit more expensive at $1500, but for $6oo to $700, you can get a pretty good 2014-era system.
My rule of thumb is to buy something 5-years out of date. That's roughly the time when businesses get rid of old equipment and upgrade. So 5-years old equipment tends to win in price/performance.
Well, I guess the Mac Pro is fine for that, as long as you're fine with the Mac OSX operating system. The Mac Pro line hasn't really had many updates, so maybe the 5-year heuristic doesn't really apply.
The TDP on the 18 core (and 22 core as well) is 190W as listed on Raptor’s website.
Where it's a big win is for pointer chasing workloads or big databases, where the working set isn't going to fit in cache anyway and then it's effectively like having really fast context switches. You have four threads and three of them are waiting on main memory while you keep the core busy with the fourth, then that thread has a cache miss but by then one of the other threads has the data it was waiting on.
To make SMT-4 perform well you want to have larger caches so that cache contention between the threads doesn't become the bottleneck, but that eats a lot of transistors. It's essentially a brute force trade off between performance and manufacturing cost and IBM is more willing to say "damn the cost" than Intel.
There's also the matter of who needs a machine like that. There is a lot of ugly pointer-chasing code in the world, but to take advantage of SMT-4 it has to be well-threaded ugly pointer-chasing code. You basically need a customer that needs their application to scale and is willing to do the bare minimum necessary to make that possible, but not spend a lot of resources actually optimizing the code once they get it to the point that throwing more hardware at it is a viable alternative. That's the enterprise market in a nutshell right there, and that's where IBM lives.
This is what the parent comment means when referring to pointer chasing -- XML documents are a big random access graph in memory, CPU cache and prefetch is close to useless in that environment, so when walking the DOM as part of some parsing task, much of the time is spent waiting on memory, with the execution units lying idle.
OTOH many 'genuinely computational' jobs like say, an ffmpeg encode have very noticeable slowdowns with hyperthreading enabled. In those kinds of jobs where the code is already highly optimized to keep the CPU pipeline busy, there will be contention for the single set of execution units shared by both threads, and so the illusion is destroyed.
As to why it results in a measurable slowdown, someone else would need to answer that, but it is at least conceivable that software overheads to manage the increased task partitioning might account for some of it
Bare in mind that this is only true if you parse with the DOM model, if you care about efficiency and it's at all possible then the SAX model is much faster, you won't be bound by pointer chasing as there's very little in memory at once. IME the next big gain comes from eliminating string comparisons with hash values. By that point xml parsing is entirely limited by how fast you can stream the documents.
The advantage is not losing access to lovely tooling like XPath for parsing
(If anyone had not seen this trick before, the key to avoid deleting elements out from under the parser is to keep a small history of elements to be deleted later. For an array, it's only necessary to save the node describing the previous array element)
This is an extreme version of yield on memory access
Not at $6,089.00; they can forget that. It has to cost no more than $500 USD or this will be a repeat of the same mistake Sun Microsystems did. Will these companies ever learn?
One cannot charge enterprise prices if one wants to build an upward spiral. Intel systems dominate because they are dirt cheap and convenient to buy.
This company is repeating the same mistake IBM, hp, SGI and Sun before it made.
Those who do not learn from history are doomed to repeat mistakes of those who came before them.
Have you bought one of those Talos systems?
Well, a fork of luajit. LuaJIT proper has been abandoned for months…
Sounds weak, one of the versions of ARMv8 has a spec that's exactly 6666 (!) pages. I would expect IBM to be more detailed lol
It's actually a document generated from machine-readable XML files https://alastairreid.github.io/ARM-v8a-xml-release/
FWIW: the original RS/6000 devices were 20-40 MHz in-order CPUs with architectures objectively simpler than a RISC-V microcontroller like the E310.
It's the same architecture as PowerPC, designed for desktops, isn't it? Have things really changed so much since then?
I'm a hardware nostalgic and have both gathering dust in my basement. So I can't wait for a PowerPC revival of any kind.
I also can't wait for a PowerPC revival. Saving up for one of them Talos workstations as my next major hardware purchase (but it's really hard to pull the trigger when the motherboard or CPU alone costs as much as I paid for the entire Threadripper rig I built last year...).
PGI has a free POWER compiler https://www.pgroup.com/products/community.htm
So this is opening up of POWER ISA, since there is quite a few different version or Revision of that, I assume that is the one beings used in POWER9 and in the future POWER10?
And it is more like RISC-V ISA open source rather than MIPS open source ?( I believe POWER was previously opened but with Cooperate Protection speak all over it ).
And this does not include Implementations, like POWER9?
I mean, if all of these were true, without implementation , or at leats licensing it for cheap, it still doesn't change the market one bit.
There's also a cool vector library  that bridges the gap between different versions of the ISA and different compiler versions.
Facebook and Google already have their own compute projects and, like Amazon, have access to custom versions of silicon from a variety of vendors.
With a properly open CPU design we'll start to see the first tightly integrated, vertical "cloud" products that maybe still have a "commodity" API on the top (or maybe not?) but are custom all the way down from there.
With the end of Dennard Scaling, if not Moore's Law, Open ISAs and Open CPU designs will radically change both the hardware and compute markets and ecosystems over the next 5 to 15 years, similar to what we saw with Open Source in the 1990s.
Of course, it's not clear that POWER will be the one to do that, and RISC-V isn't going to be making a grab for Intel's crown any time soon, but this looks like IBMs bid to lead in that area.
When the cloud vendors start building systems like this they'll not look too much different from mainframes and IBM wants to continue to own that market.
It was much earlier, but OpenSPARC's impact was limited-- and that was full RTL.
If POWER is open, does anyone really want to make competing high-performance designs-- let alone open them? Better to take something like RISC-V and come up with the first high performance design.
This is especially true when you consider IBM's vertical integration: IBM is the only real POWER OEM and the only real POWER semiconductor vendor.
(If we really assume a reduction of innovation in processors, and a 15 year time horizon... expiration of IP becomes a significant factor, too. Why not just make generic ARM?)
The problem is that RISC-V mnemonics and programming model is so retarded (as compared to MC68000 or UltraSPARC) that one needs a compiler to abstract and hide that mess away. The other problem is that in several years in which RISC-V has been hyped, nobody came up with a 19" rack server design, let alone sold one priced competitively with a 1U P. C. tin bucket server. RISC-V is all hype, but without serious hardware, its impact will be and remains questionable at best.
And that a ISA that is that knew doesn't have of the shelf server, has nothing to do with the problems of the ISA but rather making mass-market produces for new ISA is incredibly difficult.
RISC-V has barley out of the lab for a couple years and the growth of software and hardware has been impressive so far. Saying it is 'all hype' is serious nonsense and speaks more about your expectations then RISC-V-
Could you provide some examples instead of a slur?
And it's not too bad; it's basically very close to a modernized MIPS. There are legitimate complaints, though.
Probably the most controversial is that integer divide by zero can't be made to raise an exception.
Similarly, omitting condition codes is something that will be distasteful to many.
Also, there are so many combinations of legal instruction subsets that compatibility may suffer. Most everything is in a large set of optional extensions (and some important optional extensions aren't really finished yet).
lui, auipc -- because two instructions are better than a simple move.b or move.w. Really, what nonsense.
sx, ux - I'm speechless at that nonsense.
bltu, bgeu -- because blt and bge just weren't enough -- who designs a processor like this?
lb, lh, lhu, lbu, sltiu instead of move.b, why? I challenge the sales pitch of making more nonsensical instructions amounting to a simpler processor design! (Boy does this make me mad.)
It's not a slur, it really is utterly retarded, especially if one used to program an elegant microprocessor like the UltraSPARC or the Motorola 68000; even the MOS 6502 is more elegant.
But to each his own, live and let live, right? Well why then must this botched processor constantly be sold and paraded as the greatest thing since sliced bread, a non plus ultra of processors, when it isn't?
 Unless you're talking about progress or watch mechanisms.
And expecting people outside of the Puritan U. S. to abide by the same political correctness norms is extremely rude, inconsiderate and exclusionist -- using those same politically correct norms no less, which is to say, the U. S. should ban political correctness, and do so yesterday for the benefit of everyone.
I'm not American and I don't live in the US.
> mnemonics and programming model is so retarded
...you are going to get downvoted. This is because people who speak English as a first language understand you to mean "this is stupid, like a retard". They don't understand you to mean "this is delayed, like a watch mechanism would be adjusted".
You can keep arguing that you didn't mean what you said, but at least two people are telling you how your words are being interpreted.
I would be a sad excuse of a being if I feared what some people on a random forum will think of me, or "downvote" me in some arbitrary, imaginative system. The entire thing is a delusion.
Not singling out anyone in particular but I'm a formed adult and have been for several decades, and I do not require upbringing, id est, anyone telling me how to behave or what not to write.
I will write it how I want and I shall not fear arbitrary decisions based on some arbitrary policies someone somewhere thought up. If that gets me down-voted or even banned, I will not let it bother me, as life does not revolve around arbitrary websites trying to tell one how to behave and think and I will damn myself into oblivion before I allow someone to impose such a thing on me. Lest we forget: I'm the only one who decides that, and I'm not allowing anyone to control my thinking or writing.
I think far too many people seem to think that the instruction set is something you can just drop in to a chip and start stamping it out, without any appreciation for the amount of device-specific engineering that has to happen. The reason things like a "true open source" Raspberry Pi haven't happened is the $5m - $10m of work required. And for high end devices that would be required to be competitive in the cloud, that number goes up a lot.
I've not heard of Facebook, Google or Amazon doing significant custom silicon projects themselves, as opposed to just working with vendors for some customisation. The only FAANGM in that space are Apple.
IBM are the like the pastoralists living in the ruins of Rome in ~1000AD. They're a consulting firm with a grand name and history.
I guess what I'm saying is, even if a reletively modern, 2-issue, OoO, with SMT and 256b vector proc, came out open source, would anybody really bother to integrate it and fab it?
From what I see fb and Google work with silicon vendors because they don't want be silicon vendors.
More historically, Google have been building their own networking gear for some time https://www.wired.com/2015/06/google-reveals-secret-gear-con...
I'm focussing on Google in particular because they have always had a strong preference for Open components wherever possible and they've traditionally taken advantage that openness wherever they think they need to even if that goes against common practice. (There's a story I can't find the link to where, in the very early days, they wrote their own patches to Linux to work around some bad RAM chips that they'd scavenged from somewhere.)
If Google can get an advantage then they will take it. They will also invest heavily, over years, to research these advantages and opportunities.
Their attitude to things like ARM is still fairly accurate at the scale of their datacenters: https://research.google.com/pubs/archive/36448.pdf
I agree. It's only that POWER does not appear to be very high end to me. At best it is performing acceptable for the energy it consumes. Lowering energy consumption is what drives the margins. As a Cloud vendor I would stay as far away from POWER as possible.
POWER9 still has two advantages -- security and speed. Yes, speed -- the core is quite weak on some tasks and very strong on others. If you're buying this to primarily run an AVX intensive type workload, don't (unless you need the security aspects). Those massively wide, vector dependent workloads aren't exactly common in multitenant cloud though, unless you're using GPU offload where POWER again beats even the newest AMD chips for pure GPU offload performance.
So much for the good...the ugly is that POWER9 was fundamentally late and not at performance levels we wanted, but that's a transient state. Every CPU vendor puts a chip like that out from time to time, and IBM is acutely aware of the problems here. I see no reason to go to an even more problematic architectures (x86 duopoly with master vendor keys, RISC-V with fragmentation and weak cores / immature toolchains) when we now have a better option available.
I really see it two ways, the fact that Talos has real hardware that isn't priced up in the stratosphere (it's not cheap, but it's not insane) and then the ISA being opened. Those are giant steps for a company like IBM. At the same time, as big as those steps are for IBM, they seem like pretty small steps in terms of taking on the world with this stuff.
Throw in something like the full G5 design? We might be talking about something different.
Just opening the ISA doesn't mean that new players can start spitting out processors based on it tomorrow or even next year. And why would they want to? Power was never in remotely the same position that x86 is/was re: binary compatibility so being able to say 'Power compatible' doesn't carry much weight. An ISA which has been a minority player but around for a long time is more likely a liability than an asset.
I can, however, buy a wide range of PowerPC CPUs, for a wide range of applications. From embedded applications, like routers, to laptops, desktops, workstations, high-end servers, up to super-computer class CPUs.
I think all the major non-IBM POWER folks are at Apple these days and you know which architecture they are working on!
The IBM folks really, REALLY understand how to design a secure core and chip, plus the decades learning how to make a fast and relatively efficient core. RISC-V is simply in a far more nascent state, trying to push it to POWER9 performance (let alone AMD performance) is like saying a toddler just learning to walk will win a 10k marathon tomorrow. Eventually that may happen, but not in one day, more like 20 years. ;). And when you start chasing performance, who is doing the actual hard, tedious work of verification and making sure security flaws aren't being accidentally introduced into the implementation?
POWER is interesting to me because we get mature tooling on a proven ISA that can be built and run on high performance chips today. No more cross compiling, no more pure emulation required to do soft core work. That in and of itself is huge in the embedded space, and honestly I'd love to see the experimental and interesting cores currently decoding RISC-V ported to decode ppc64 -- all the sudden real comparisons on performance etc. for identical binaries become possible, allowing proper comparison of core design ideas under real world loading. No more guessing and having to take on pure faith that the performance difference is down to ISA or compiler performance -- either your core is faster / more efficient on the sane binary, or it's not!
But IBM’s announcement is simply the ISA itself being open sourced (with some patent IP). Apart from an FPGA soft core there’s no RTL/VHDL. If you want to make silicon you’ll be starting basically from scratch.
What if the existing RISC-V and other academic cores are already good enough for a lot of people? The instruction decoder is a relatively small part of the CPU, swap that out and you suddenly get new SoCs that can run the existing POWER software base (that means proven toolchains, vector accelerated applications, etc.). Right now RISC-V doesn't even have vector instructions per se; adding all that support to the entire tooling seems like a lot of effort for not much gain when you can simply implement VSX in the hardware and use the existing tooling for it.
I keep hearing the open RISC-V cores are going to be very fast very soon. If that's true, how would an IBM provided core help versus an instruction decoder swap on one of those and some tuning?
Fujitsu had already built SuperSPARC-based supercomputers, and Oracle recently ported their Red Hat clone to AArch64 (no legacy 32-bit or Thumb) and produced an ISO for the PI-3.
Fujitsu is saying that their ARM implementation is the fastest server processor available, ahead of Intel.
I mean it should be obvious. The ISA does not dictate memory performance, micro architecture, clock frequencies, manufacturing processes, number of cores, maximum allowed power consumption, etc. All of those affect performance but are independent of the ISA.
Back in the day IBM ran a "System on Chip" factory based on PowerPC that gave us the Bluegene/L supercomputer, the GameCube/Wii/Wii U, the Playstation 3 and the Xbox 360. All of these combined one or more cores, coprocessors and tweaks to hold its own against x86.
RISC-V is meant to be used like that, but memory management support is not yet finalized. They are sampling prototype RISC-V chips with an MMU you can put in a dev box to develop Linux on. Other than that you are not using Linux or Windows.
If you think mainstream OS is bloated, then RISC-V has your number. If you want very low cost it would be exciting to cut RISC-V down to have fewer and less wide registers. The other day I saw an article about a guy who wants to build RISC-V out of vacuum tubes and thought... 'cripes with all of those wide registers it is a lot of tubes.
POWER is good-to-go right now for high end applications and can stay relevant against ARM and x86 by staying open.
If IBM wants an uptake of POWER systems and people to develop on them and for them, the only thing which might make a dent are sub-$500 USD complete workstations and rack mountable servers. Otherwise, they will repeat the same mistake which Sun made, that is, they open sourced their UltraSPARC T1 under GNU GPL but the uptake was nil, because few had the knowledge to design systems around the processor. People want cheap, ready made toys they can tinker with immediately.
I don't see the point of this effort for IBM. These things need communities, and POWER simply doesn't have the community; as a proprietary architecture for so long that nobody really decided to buy POWER but rather they wanted some device/ecosystem/price point and POWER was how IBM could deliver it.
The article mentions RISC-V, which still has a nascent ecosystem and no significant design wins (yet!!). But if you want to design a chip with it you can find designers with some experience with it, people developing some IP you might want to use, etc. Even that has more momentum.
- Will Huawei be able to use this processor design (now that it is open sourced) to build it's own chips, bypassing ARM restriction & US IP ?
- Are these processor designs usable in mobile device, or only in workstations and servers (using to much power for example) ?
For mobile processors it seems about as good as 64 bit ARM but with a bit less software support in the mobile world, though a good history of software support in general.
Freescale hasn't made a new low-power Power chip in a while, but... historically speaking, there were a lot of low-wattage / efficiency-focused embedded POWER designs.
I don't know what happened politically between the companies to use ARM instead. But I would imagine that ARM's instruction set was cheaper (or maybe easier) to engineer than Power ISA. Hopefully Freescale engineers can chime in on the discussion, because I'm really just shooting from the hip here.
I would expect most issues to come down to business politics. IBM open sourcing the PowerISA is also a business politics move (I guess they hope to recapture the lost ground in the embedded space).
PowerISA means operating with IBM's ecosystem: GCC, Linux, etc. etc. Remember IBM has merged with RedHat, so there's a lot of promise for Linux support that ARM and RISC-V don't necessarily provide. I think this is a good move.
Including radiation-hardened chips appropriate for satellites (if we count PowerPC).
uh, ARMv8 and RISC-V were developed with Linux in mind from the beginning, they didn't even have anything other than Linux/BSD/various-RTOSes, like IBM did with AIX.
Who is writing the RISC-V compiler? If the RISC-V compiler for GCC or CLang messes up, who do you call?
If the Power9 GCC / CLang compilers mess up, you call Red Hat for support. Red Hat / IBM are now the same company, so they'll offer end-to-end services.
ARM has okay support: the ARM foundation seems to be taking care of their compiler kits / Linux patches / etc. etc. pretty well. But I don't think you can buy an ARM support package from anybody... really.
I think the ARM / Linux ecosystem is still nascent. You get good support through the Rasp. Pi community, and maybe the occasional Android Phone gets a big community around it. But ARM / Linux ecosystem is quite poor outside of Rasp. Pi.
ARM, as a company, is clearly designed as an "embedded" company. It provides the documentation and compilers, but doesn't provide too many OS-level services above that.
uh, where and when exactly did they offer that? Actually I don't remember anyone anywhere offering commercial support for GCC or LLVM/clang.
Well, I'm not the type of person to look for commercial support for anything ever, but I've heard of several companies that provide support for DBMSes like PostgreSQL. Not so for compilers.
I just googled "gcc commercial support" and the results are the GCC FAQ, a mailing list post about it from 2005 (!), GCC on Wikipedia, "Office 365 GCC" (lol) and so on. Looks like it's just not a thing at all.
But IBM's XL Compiler: https://www-01.ibm.com/support/docview.wss?uid=swg21110831
I think I confused it with ARM: ARM has a CLang-based compiler with official ARM support IIRC. https://developer.arm.com/tools-and-software/server-and-hpc
I think the hobbyist (who won't get much support even if they're a paying customer) benefits from free tools / free support / communities.
But it seems like a number of professionals prefer having a degree of professional support in the products they use.
I haven't heard of commercial clang support though.
The scale-out POWER9 scales down to 4-core.
It's all about shoving a ton of hot power hungry multithread cores as close together as you can and running them at full bore.
4xx and 75x were OK for embedded a decade ago, but today they're hot and power hungry. You can use them in devices where you can burn 10+ watts to maintain backwards compact with existing PPC code, but they're way the fuck too hot for a phone.
There are differences in details about uncommon instructions, irrelevant assembly language changes, some instructions privileged for one arch and not the other, that kind of things.
But for the bulk of the ISA, it's the same. You probably can create a single userspace binary compatible with both? Not sure but seems doable.
The microarch is likely different but then it is also different between several members of each category, so the word does not really designates the micro-arch, but really the ISA. And then you have other brand names using that, and they are so similar that e.g. Freescale switched from PowerPC to Power while incrementing PowerQUICC II to III. I remember Linux has an eieio macro that just emits the aforesaid instruction for PPC, and actually the opcode does something similar on Power (mbar) and IIRC the assembler is happy to emit it regardless of the ISA.
So it was kind of messy when you reached the differences, but everything was quickly workable and you got use to it. The reference manuals of Freescale are very good and the "[...]Programmer’s Reference Manual for Freescale Power Architecture Processors" EREF_RM often directly points at the few differences with PowerPC.
Which operating system would the use? Is that supported with the power instruction set?
Windows also ran on the PowerPC architecture at various points.
In 1996 I worked briefly a company that had a bunch of "PERP" ("PowerPC Reference Platform") machines lying around given to them by IBM that for application porting to Windows NT PowerPC. For kicks, I put Linux on them, so they'd actually be useful for something.
PowerPC is not strictly identical to Power architecture, but is related and most tools and OSes can be made to work either.
for last three-four years phones have ca 2-3-4Gb of RAM and 32-64Gb of storage
nad if are talking about RISC-V phone, it would be produced in no early than 2020, so not sure code density is a significant factor
I don't think
See also OpenSPARC. 
I'm curious if anyone will do anything interesting with this.
Anyway it will be interesting what "non expired" patents they bundle with this. Mostly because of https://archive.fosdem.org/2019/schedule/event/patent_exhaus...
which would allow people in the US to use these patents in any context as long, they derived their worked from an Open Source Power processor which got the patents exhausted and did not violate the Open Source license the processor.
What is less widely considered is that emulation is considered an implementation of a CPU. So e.g. research labs at universities have been told in no uncertain terms by CPU vendor legal notices to stop working on research that is emulating the x86 or ARM instruction sets. Now with the POWER ISA being open, everything about the ISA is fair game for research in emulation, soft cores, hardware, etc., which does put it in a space that only academic "toy" ISAs and RISC-V really sat in before.
I find it hard to believe and hard to enforce since emulators of various quality are practically everywhere (and not only for x86). QEMU?
IANAL, this is simply what legal folks are saying on this topic.
Now with EPYC Rome, I wonder just how many takers IBM will have.
Because it's hard to beat the x86 mammoth for so many reasons (on top of my head):
- huge market share in servers/workstations
- Intel has more resources than pretty much anyone else
- AMD is now back in the game and started a core/performance/price war with intel
- x86 is "cheap"
- market shares for "cheaper" stuff will probably be taken by ARM and RISC-V
- so much time was invested in optimizing compiler, code and so on for x86 because that's what everyone has
- the Torvalds argument which is to say developper "will happily pay a bit more for x86 cloud hosting, simply because it matches what you can test on your own local setup, and the errors you get will translate better,". So as long as you don't have cheap Power workstations, it'll be a moot point. I remember working on AlphaPC and pretty much nothing was 64 bits clean back then, it was a huge mess. Now that part is solved but not everything else...
I definitely get the appeal for the Googles of the world to challenge Intel and for niche (internal) products, and for myself because honestly I don't really need an intel compatible CPU but in the long run, I am not sure it'll go anywhere...
Well, the local machines are coming, it's totally feasible to have a Blackbird at home and host at IntegriCloud…
but both of those are really expensive, so instead I have a MACCHIATObin at home & AWS Graviton in the cloud. ARM is winning :P
> optimizing compiler, code and so on
Fun fact, IBM is paying large amounts of cash on BountySource for SIMD optimizations of various things for POWER: https://www.bountysource.com/teams/ibm
But ARM is winning again: many things, especially the more user-facing ones, are already optimized thanks to smartphones. For POWER, the TenFourFox author is I think still working on SpiderMonkey's baseline JIT. For ARM (AArch64), IonMonkey (full smart JIT) is already enabled, developed by Mozilla, thanks to both Android phones and the new Windows-Qualcomm laptops: https://bugzilla.mozilla.org/show_bug.cgi?id=1536220
Yeah but there's a HUGE but: the motherboard and CPU (1S/4C/16T) and heatsink alone are $1.4k, no RAM no case no HD no nothing (I found a guy who spec'ed one for $2.1k with everything you'd need for a reasonable workstation). So unless you have a massive good reason or interest (political, because POWER, your company runs on POWER, "f*ck" x86, ...) to run your code on POWER, I don't see why you'd spend that much while you could get better for a lot less.
And the only way it'll get cheaper is to mass produce it: let's be realistic, as much as I'd want to have a POWER workstation or laptop (hey, there were SPARC and Thinkpad PowerPC laptops so why not), I won't be holding my breath while I wait...
(okay, not only because of the price, also because I just like the A64 ISA and UEFI)
The SolidRun MACCHIATObin is not nearly as powerful — it's ultrabook-grade performance, not server-grade — but it works fine for coding & browsing, and it's also quite open — the only blob in the firmware is something tiny and irrelevant (and I'm pretty sure for some secondary processor), everything on the ARM cores post-ROM (including RAM training code) I have built from source.
Yeah, it's low volume and Raptor needs to pay their employees — but $1100 for a mainboard? Come on. Maybe they should have dropped PCIe Gen 4 from the Blackbird at least.
If you trust Intel and AMD, without an SLA, to keep your data private all I'll say is that's quite naive. Even the HDMI master key leaked, do you really expect the ME and PSP signing keys not to fall into the wrong hands at any point?
Yes, the mainboards are expensive. That's the price of making them blob-free and still retaining high performance. Blackbird lowers that barrier to entry some as well.
Again, Rome has a mandatory PSP blob that cannot be removed (any UEFI toggles that say otherwise are not accurate -- the PSP must run before the x86 cores even come out of reset). If you're OK with that loss of control, my gut impression is that use of Linux etc. is just being done to avoid Microsoft licensing fees, not because of security or owner control concerns ;). At that point, why not just lease cloud space on a major provider that can offer that compute power even cheaper than a local machine which sits idle overnight?
> local machine which sits idle overnight
um, I thought we're talking about workstations here. I power mine off when unused.
> use of Linux etc. is just being done to avoid Microsoft licensing fees, not because of security or owner control concerns
This is based on two rather odd assumptions:
- Microsoft as the default: No, I grew up with Unix, Unix is my default choice just because I know it and I'm used to it;
- owner control on all levels being equally important: meh, there's a lot more that you'd want to tweak in the kernel and up the stack. I wouldn't know what to change in firmware. I have changed many little things in the FreeBSD kernel (and contributed them). The only thing I ever changed in the UEFI firmware on my ARM box is some ACPI tables to fix compatibility.
> That's the price of making them blob-free and still retaining high performance
That sounds vague ;)
Also, what's "high performance" about the board anyway? PCIe Gen 4? On a typical developer workstation that's kind of a waste, Gen 3 is plenty.
Good providers will still allow you to run an accelerated VM inside the leased VPS, so you could still do your kernel hacking there.
I'm simply saying there's something interesting here -- you care enough about owning (I use that term loosely) a machine to spend more on a local system, but not enough to obtain one that you can freely modify as desired. Clearly there is a threshold, and I'm curious where it lies. :)
> accelerated VM inside the leased VPS
Does that work on POWER?
> they can provide lower cost
They can but they won't. They like having huge profits. Even if they offer the base VPS for cheap (Spot instances) they rip you off on storage, bandwidth, IP addresses, etc.
Also, again, desktops. I like developing directly on a desktop workstation. I can't exactly insert my Radeon into a PCIe slot in the cloud and run a DisplayPort cable from the cloud to my monitor :)
Stadia seems to think it can push a high resolution monitor like stream over a network interface. I'm playing devils advocate of course here but fundamentally if you don't have control of the hardware there's no long term advantage to local compute, at least not with current market trends etc. Everything points to a move back to dumb terminals for consumer use at this point -- in the past it would have at least been possible to hack those terminals to run some minimal (for the time) OS, but crypto locking of the terminal hardware stops that quite cold.
Enough people are buying, which means the price is just right (Capitalism 101).
But still, that didn't prevent POWER9 from being in one of the largest supercomputers. And super wide SIMD has its disadvantages (hello AVX Offset downclocking)
They have cornered at least 85% of that market...
And most likely, it's the nVidia connection with NVLink which matters most in there if we talk about SIMD...
They said it better than I could (in June of 2018): https://www.top500.org/news/new-gpu-accelerated-supercompute...
In the latest TOP500 rankings announced this week, 56 percent of the additional flops were a result of NVIDIA Tesla GPUs running in new supercomputers – that according to the Nvidians, who enjoy keeping track of such things. In this case, most of those additional flops came from three top systems new to the list: Summit, Sierra, and the AI Bridging Cloud Infrastructure (ABCI).
Summit, the new TOP500 champ, pushed the previous number one system, the 93-petaflop Sunway TaihuLight, into second place with a Linpack score of 122.3 petaflops. Summit is powered by IBM servers, each one equipped with two Power9 CPUs and six V100 GPUs. According to NVIDIA, 95 percent of the Summit’s peak performance (187.7 petaflops) is derived from the system’s 27,686 GPUs. (emphasis mine, Summit being a POWER9 supercomputer with 4608 nodes with 2 POWER9 and 6 V100 in each)
There are already manufacturers who have licensed the EPYC IP from AMD, but a for-free design could be compelling.
Seeing the number of design wins m68k racked up, it would have been the logical choice (and ISTR that IBM actually liked it better, but it and its peripherals were more expensive). Disclaimer: not a fan of any of these architectures.
1) OS from the start. Develop in the open. Maybe lock some features behind a paywall.
2) OS when something is not hot anymore. Take your formerly private stuff you charged a lot of money for, and because so much better stuff has come out.. meh, let's OS it.
This is clearly a case of #2....
In the late 80s/early 90s when POWER appeared, and RISC fever was in full swing, if someone, as a VP or C-suite level decision maker at a mega-corp like IBM, declared "let's just release all the IP of our high performance processor design to anyone who wants it!", they would have their coworkers and superiors questioning their sanity, at very, very least.
Well that was basically the PPC consortium. I'm not sure how much apple/motorola/etc paid to be part of it, but the idea was to build a common ISA from multiple vendors.
POWER is hot, IBM is probably just confident that someone else producing competing compatible chips and taking all their customers is not a real threat / completely outweighed by the benefits of an open ecosystem (including more compatible but different segment (low-power) chips)
Don't downplay the work that goes into complex software.
I'm talking about the barrier to entry.
You don't think Torvalds could have gotten paid for programming in the time he spent writing Linux?
What is the sum of all the time invested into Linux * the average hourly rate?
I can write a tiny blog engine in a day on my existing computer. I can't walk into Global Foundries and ask them to make me a single wafer of my tiny microcontroller on their 14nm process.
Hobby software is on the same playing field as pro software, and can smoothly become pro software like Linux did. Hobby silicon is on the 1970s playing field - Jeri Ellsworth and Sam Zeloof making a few transistors with size measured in micrometers. There is literally no way to make your own "hello world" in modern performance silicon.
ISA royalties are for chip vendors. QEMU implements like most ISAs out there and I don't think anyone ever had to explicitly acquire any rights for that…
Guessing Microsoft will start with putting mainframes in the West Des Moines data center since so many insurance companies are still dependent on DB2 batch crunching.
Google is using POWER9 servers in their data centers:
Of course, most of their servers are still x86; as far as I am aware, the reason why Google also uses POWER9 servers is that they don't want to be too dependent on the two manufacturers of x86 CPUs.