Hacker News new | past | comments | ask | show | jobs | submit login
AMD launches Kaveri processors aimed at starting a computing revolution (venturebeat.com)
313 points by mactitan on Jan 14, 2014 | hide | past | web | favorite | 187 comments

AMD is doing some interesting work with Oracle to make it easy to use HSA in Java:

* http://semiaccurate.com/2013/11/11/amd-charts-path-java-gpu/

* http://www.oracle.com/technetwork/java/jvmls2013caspole-2013...

* http://developer.amd.com/community/blog/2011/09/14/i-dont-al...

* http://openjdk.java.net/projects/sumatra/

It is intended that the GPU will be used transparently by Java code employing Java 8's streams (bulk collection operations, akin to .Net's LINQ), in addition to more explicit usage (compile Java bytecode to GPU kernels).

This is one of the things that makes java 8 really exiting. I think it is going to show where JIT languages are going to shine in comparison to AOT compiled ones. Multi architecture code in one program without the developer needing to jump through hoops.

Graal/JVM (like pypy) is a really nice way to bring many languages to advanced VM's. See for example node.jar/nashorn/"fast js on the jvm" or Topaz and Truffle (Ruby on pypy and graal/jvm)

Not only that, replacing Hotspot by Graal will also reduce the amount of C++ code in the standard JVM towards the goal of having a production quality meta VM.

Totally agree, I never wanted to get involved in C++ hotspot. But graal development (even to the point of generating assembly) looks much more welcoming.

Speaking of Java, does anyone know if there are any plans of having "real" generics in Java?

It is part of the wishlist for post Java 8 as presented on Java One 2013, but the wishlist was presented without any guarantee of what will be the exact focus.

I was thinking about looking into Go, but this changed my plans. Aparapi + HSA chip looks far more fun and interesting. Combined with Java 8, and easy GPU programming is finally here.

The Java Chips are here! I remember reading about these back in the day. (Byte cover story, November 1996).

Do you mean the chip that put the JVM on?

The late-90s project I'm thinking of was called picoJava. Apparently it lives on in some form.




Thank you for your references. I remember that a lot of people complained about Java is too slow since it's an interpret language, as apposed to compilation language like C/C++ which executes the machine code fast. So a hardware JVM interpreter makes a lot more sense.

Somehow, Sun aborted that plan, but the JIT became farter now. But I still don't get it why it has anything to do the AMD's new APU?

This reaffirms for me again that we really need AMD to keep Intel from falling asleep at the wheel. I was certainly intrigued by what I saw in the Xbox One and PS4 announcements and being able to try some of that tech out will be pretty awesome.

It is fascinating for me how FPUs were "always" co-processors but GPUs only recently managed to get to that point. Having GPUs on the same side of the MMU/Cache as processors is pretty awesome. I wonder if that continues though what it means for the off chip GPU market going forward.

In older PCs the FPU was a separate chip.

That it was (a separate chip), but what is perhaps less well known is that Intel also produced an I/O co-processor chip called the 8089. That chip was interesting to write code for, as it tried to offload various I/O operations. At the time, Intel was on something of a "systems" tear, building their own small systems to compete with other computer vendors like DEC, Motorola, and TI.

The 8089 was a total flop relative to its development cost and Andy Grove declared Intel would not do any more Graphics or I/O processor chips. (I was the Systems Validation Engineer on the 80782 at the time, so much for my project!) As it turned out I think it was just too early for a specialized co-processor.

I wouldn't say too early. There were a fair number of temporarily successful co-processors in that general era of many different sorts; aside from float (and smaller markets for FFT chips and boards and some efforts at vector co-processors), the most famous were probably the multimedia coprocessors on the Atari ST and on the Amiga (Jay Miner etc).

If that's too late for your tastes, look at the nearly universal support chips for 8080/z80 like DMA. Possibly one could count UART and PIO too.

But Moore's Law kept killing them, and eventually it became common wisdom that such was the case, so they faded.

Over a long time period, external devices to offload the primary cpu come and go, under different guises, as Ivan Sutherland noticed already in 1968.

Intel's 8089 and related may have been partially motivated by the success of IO processors in the mainframe and supercomputer world in the 1960s and later.

P.S. Hi Chuck :)

On the 486, that chip was actually a full blown CPU there was no 487 co-processor. Instead, Intel sold a full 486DX chip labeled as 486 that disabled and replaced the already mounted 486SX. (ref: https://en.wikipedia.org/wiki/Floating-point_unit#Add-on_FPU...)

>> we really need AMD to keep Intel from falling asleep at the wheel

100% Fullack! I too remember the Co-Processors. Do you remember those giant ISA slots, which had math's co-processors that helped some scientists boost their code? =) awe.. and the beloved Turbo Button on your Tower, hehe. Makes me wonder if our Mimzy [1] from the future will have an Intel inside too. Back then, it was a 100% sure bet that Intel would live forever and grow to the largest company on earth. Today things are different, a little company coming up with self-assembling nano-materials and an optical, neuronal, graphene or silicene chip design could beat Intel quite fast. What would hold them back is only the time required for the OS landscape to become more adaptive to the hosting hardware. I see that future in my living time.

I'm also thinking about Bitcoin mining with this rig, or do you know something better? (Those damn ASICS...)


[1] http://www.youtube.com/watch?v=MSHhmwGzN8w

Regarding BitCoin, at this point you won't be able to find anything "better" than ASICs, that sort of defines the maximum refinement.

From a systems architecture perspective it is interesting to compare this to the Intel Larrabee project (lots of parallel cores) but that seems to me to have suffered a similar coding challenge that the Cell architecture has.

[1] http://en.wikipedia.org/wiki/Larrabee_(microarchitecture)

Indeed. I've found a 1024 Core manycore manufacturer, but idk. it's price per unit. However, ASICSs that support scrypt are not available yet and even if, you can't use an ASIC for anything else than Bitcoin mining, which is pretty bad, because in 1 year the best ASIC will be outdated and an investment of several thousand dollars is wasted. Would be better if those ASICs worked like a Co-Processor or something..

That is essentially the FPGA value proposition. While they are more expensive than an ASIC because they are re-programmable, you don't have a huge up front investment in masks and wafers. So a decent FPGA set up would probably set you back $10K but you could re-design at will into different sort of 'coin' miners. The book 'Cracking DES'[1] talks about building an array of FPGAs to brute force DES keys. If you are at all interested in ASIC/FPGA based crypto 'mining' you should have and read that book.

[1] http://www.amazon.com/Cracking-Des-Encryption-Research-Polit...

What about Scrypt-based crypto currencies like Litecoin? Will this tech help?

No. The performance of Scrypt based mining is relative to memory bandwidth. GDDR5 is significantly higher than DDR3

The turbo button slowed your computer down to allow older programs (namely game) that relied on counting cycles to still be playable.

Among other things, this has lots of applications for molecular dynamics (computational chemistry simulations) [1]. Before you had to transfer data over to the GPU, which if you're dealing with small data sets and only computationally limited is no big deal. But when you get bigger data sets that becomes a problem. Integrating the GPU and the CPU means they both have access to the same memory, which makes parallelization a lot easier. If, as someone else here said, AMD is partnering with Oracle to abstract the HSA architecture with something more high-level like java [2], then you don't need to go learn CUDA or Mantle or whatever GPU language gets cooked up just for using that hardware.

I'm personally hoping that not only will we get to see more effective medicines in less time, maybe some chemistry research professors will get to go home sooner to spend time with their kids.

[1] http://www.ks.uiuc.edu/Research/gpu/

[2] http://semiaccurate.com/2013/11/11/amd-charts-path-java-gpu/

As to your last point, I suspect there's a sort of Jevons' Paradox for scientific research (and creative problem solving fields in general).

Which is to say that when you're seeing progress more quickly, you'll spend more hours working on the problem, not less.

Determining whether this is good or bad is left as an exercise to the reader :)

Doesn't that kind of imply we're screwed when it comes to sustainability?

No, that would be the 2nd law of thermodynamics which guarantees we're fucked when it comes to sustainability.

Yes, with regards to the demand side; no, with regards to the supply side.

In general yes, unless we manufacture 100% of our energy sustainably.

HSA will also be compatible with LLVM, so you get all the languages supported by that, too.

Can you give examples of medicines developed thanks to computational chemistry simulations?

I'm rather ignorant about this area of research, but it always seemed to me kinda fruitless? Sorta like high throughput screening - a ton of resources and computation is dedicated because it sounds like a good idea... but ultimately there is very little to show for it all.

Way too little signal and way too much noise.

There are lots, and a quick Google search would have cured your unnecessary scepticism. It would be more surprising if this method didn't work than that it does.

Imatinib[1] is a good, and pretty old example:

Imatinib was developed by rational drug design. After the Philadelphia chromosome mutation and hyperactive bcr-abl protein were discovered, the investigators screened chemical libraries to find a drug that would inhibit that protein. With high-throughput screening, they identified 2-phenylaminopyrimidine. This lead compound was then tested and modified by the introduction of methyl and benzamide groups to give it enhanced binding properties, resulting in imatinib

[1] https://en.wikipedia.org/wiki/Imatinib#History

Your example is about experimental HTS, not computational simulations which the parent requested.

I'm not giving you what you want, but here's a visual example of using a computational microscope to understand how drug actually works and finds its binding site:


If you can't see a difference between that and say, X-Ray crystallography then well..

I would be interested in seeing publications on this - the last time I checked, I think the algorithms and software were being treated as in-house intellectual property by pfizer, novartis, and co

Actually, the "algorithms" are usually fairly straight-forward and well documented integrators for Newtonian dynamics, usually some form of Verlet Integration. The most popular and efficient Molecular Dynamics packages are open source, some GPL (e.g., NAMD, Gromacs). The technology is nowhere near in silico drug design, but we're converging on useful tools to measure important quantities like free energies of binding. Pharmaceutical companies have some interest in this technology, but at this point, don't direct too much effort into atomistic simulations.

ah thanks. My only experience with this was going to a lecture back in school when they brought in someone from Pfizer's in-house modeling team, but the presentation was a bit light on references/citations...

Mantle's is a graphics, not a compute API. For compute AMD uses OpenCL and HSAIL.

I'd be surprised if Mantle did not have a way to issue compute shaders.

I've worked with CUDA before (and hit issues with large datasets, like you've mentioned). While HSA is presumably going to make things a lot easier, I'm curious about AMD's partnership with Oracle - how much of a performance hit do you reckon we'll see if Java is implemented? And wouldn't it have been easier to go with something like Python that also supports multithreading and (to the best of my limited knowledge) has better scientific libraries?

It's likely their implementation of Java will add constructs to aid parallelization, but Java already has very robust threading support with no GIL.

Yeah perhaps I could have worded my comment better. I meant Python (just like Java) supports parallelization as well, plus it seems to have better scientific libraries.

For a review of a couple of the processors in the Kaveri range: http://www.anandtech.com/show/7677/amd-kaveri-review-a8-7600...

Anandtech article is infinitely more helpful than the one in OP. Hopefully it floats atop.

Not too likely. It's aimed at a relative handful of pretty narrow specialists.

I have a question: Previous systems with discrete GPU memory had some pretty insane memory bandwidths which helped them be way faster than software rendering. Now GPU and CPU share memory. Doesn't that mean the GPU is limited to slower system RAM speeds? Can it still perform competitively with discrete cards? Or is system RAM now as fast as discrete-card bandwidth? If so does that mean software rendering is hardware-fast as well? Bit confused here...

Yes, these APUs are limited to fairly slow DDR3 RAM. It has been suggested that making the GPU part of the chip bigger wouldn't help because it would be bandwidth limited. There are a couple of possible solutions to this: The PS4 uses GDDR so it's more like a traditional GPU with a CPU added on. Intel's Iris Pro uses a large fast L4 cache. The Xbox One has some fast graphics RAM on chip. AMD will need to do something to increase memory bandwidth if they want to sell processors for more than $200.

It's a trade-off. The off-chip GPU will have the most memory bandwidth and the most crunching power (larger thermal envelope). The on-chip GPU will have wildly superior latency, lower cost, and lower power consumption. It also has access to the on-die L2 cache now, although I'm not sure what that means for bandwidth- how does CPU L2 compare to GPU L2?

High-end on-chip GPU's can compete with midrange off-chip GPU's.

This is an interesting development indeed. In light of http://images.anandtech.com/doci/7677/04%20-%20Heterogeneous... I wonder if we'll soon see a rise in cheap, low-power consumption dedicated servers meant for GPU-accelerated tasks (e.g., for an image host to run accelerated ImageMagick on to resize photographs). Do you think this would be viable in terms of price/performance?

And in case you were, like me, wondering about how much the new AMD CPUs improve on improve on their predecessors' single-thread performance you can find some benchmarks at http://www.anandtech.com/show/7677/amd-kaveri-review-a8-7600....


The comparison is hardly disingenuous: the i5 may not be given Intel's highest branding designation, but it is an enthusiast processor and only a slight step down from the top-of-the-line i7-4770k, lacking only hyperthreading.

And this is completely irrelevant, since the i5-4670k ships with Intel's highest integrated graphics option for desktop chips, which is what is being compared to the A10-7850k.

At the moment AMD's processors can't compete with Intel at the high end. It makes no sense to berate a company for not doing what it can't.

Also, AMD has chosen to fab Kaveri in a bulk process that trades off clock frequency for density (-> more chips/wafer -> cheaper).

The HSA stuff could really be something exciting if they can get the software and OS support. Winning both latest gen consoles is bound to get some clever people spend cycles on it.

Also the inter K chips are crippled - they lack some visualization technologies (vt-d) for no good reason.

I'm actually of the thought that k-series CPUs are just regular CPUs with a non-working IOMMU, so instead of throwing it out they unlock the multiplier and sell it as an unlocked CPU.

Binning is a hell of a thing.

Though from the research I did, unless you're running a Type 1 hypervisor vt-d is not necessary (or usable). Type 2 hypervisors will use vt-x only.

What are your sources for that? VirtualBox seems to fall under "type 2", but supports PCI passthrough using VT-d on linux hosts.

About 10 hours of crawling through forums, VMWare documentation and a couple of conversations with people about ESXi. So all of it pretty unreliable ;)

I'd love to see sources/evidence to the contrary.

IOMMU is needed for VGA and PCI passtrough in general.

VT-d is still useful without any hypervisor, to guard against DMA attacks by malicious connected devices.

Enthusiasts aren't generally using the onboard HD 4600 graphics -- the power profile of CPUs yield a reality that they remain seriously underpowered compared to standalone, 100s of W dedicated GPUs. But for those who do want to maximize the integrated graphics with Intel, the highest option for desktop chips is the Iris Pro 5200 (on the i7-4770R, for instance), which is some 60%+ faster than the 4600.

Nonetheless, these are usually targeted at business PCs and the like (hence the "9 out of 10". Businesses consume the overwhelming majority of PCs).

This article reads like a really bad press release. For instance-

"The new chips show that AMD is moving in a very different direction from Intel"

How so? Intel is fully embracing compute, is increasingly improving the onboard integrated graphics, and already has the beginnings of unified memory (http://software.intel.com/en-us/blogs/2013/03/27/cpu-texture...). Where is the big divide?

fair point. Here's a more interesting benchmark [1]

Thus we’ve discovered and confirmed Kaveri’s biggest advantage over Richland, performance per watt. At the high-end Kaveri doesn’t have a lot to offer non-gamers but once you bring TDPs down into standard small form factor or laptop ranges the performance profile of AMD’s newest chip is a lot more competitive. At the present time Kaveri’s performance appears to be a little behind, but still near what we’ve seen from Intel’s ~45 Watt Iris Pro or GT3e graphics solution.

or [2]

It is interesting to note that at the lower resolutions the Iris Pro wins on most benchmarks, but when the resolution and complexity is turned up, especially in Sleeping Dogs, the Kaveri APUs are in the lead.

Seems that Kaveri might not beat intel on the desktop, but might do so on the laptop.

[1] http://semiaccurate.com/2014/01/14/difference-50-watts-make-... [2] http://www.anandtech.com/show/7677/amd-kaveri-review-a8-7600...

The 45 Watt version of kaveri isn't even out yet and is nothing more than a paper launch. The 15 Watt laptop version hasn't even paper launched yet. I really don't think that this is going to provide any advantage in the laptop space.

One nitpick: it is not possible to really get Iris Pro on the desktop. The 4770R (all the R chips, actually) are FCBGA and not sold retail.

Maybe the big brand guys sell some desktops with Iris Pro, and I know Gigabyte has it in one of their NUC alternatives, but otherwise 'enthusiasts' can't get their hands on one

it is not possible to really get Iris Pro on the desktop

Current iMacs come with Iris Pro, and of course as you mentioned there are integrated products with it: While you can't buy it as a discrete chip at retail, you can certainly get Intel-equipped desktops with it, which was the point I was discussing.

Which is certainly by design by Intel, based upon an understanding their market: They put higher performance graphics in their mobile and FCBGA chips because those markets are where it is actually likely to be demanded -- from companies like Apple, or on a mobile where it is the primary graphics. When they sell a chip retail, it is overwhelmingly likely the buyer is going to be coupling it with a stand-along graphics card, so there really isn't much of a point.

Which is going to be the issue that AMD is going to come up against. They are selling something as an enthusiast chip while providing graphics capabilities that lie in that no-man's land of being overpowered for a standard business desktop, but underpowered for the market that is likely paying attention.

I'm sorry that I completely ignored Apple. Although I am not sure I would consider someone buying an iMac (or any brand name prebuilt desktop system) to be an 'enthusiast.'

Why not?


* It isn't a great gaming computer

* It isn't great value for money


* It does give pretty decent performance (notably, it outperforms the new Mac Pro on some workloads)

* It looks attractive

* A prebuilt OS-X system means less futzing around with drivers etc

I don't play games (beyond the occasional Minecraft session with my son) and I'm not particularly price sensitive. I've built (many!) of my own computers, going back to a 386DX40, and I'm happy to do it again if I see a good reason. But at the moment I don't.

Desktop computers that seem attractive to me at the moment:

* Intel NUC

* iMac

What am I missing?

This came off as much more judgmental than I intended, especially towards Apple which I respect as a company and whose products I admire from a design and integration perspective. I was also not trying to belittle Apple fans or customers of any of the other big name manufacturers.

I also really like OS X. As a FreeBSD user for many years, seeing OS X be successful is even a little gratifying because I know there's a lot of cross-pollination going on behind the scenes. I'm not a mobile/laptop kind of guy, but did use a MBP for a couple of years and it was without question the nicest laptop I've ever used. If I were to buy a laptop today it would probably be a Macbook Pro.

I've been building computers from parts for 30 years. I enjoy the research, part selection and construction aspect of the process. I like that I can go into the process with a specific set of criteria and come out with something that satisfies them exactly or, barring that, that I'm in control of the compromises. I like that if these criteria change or I find I made a mistake (more likely), I can just swap out a part and continue on. This is possible with most of the name-brand PC desktops, less so with the Apple products, but I like building it all myself the most.

Also, as a FOSS user, it's typical for hardware support to be an issue. Sometimes it feels like various industries either do not care about me as a user or actively want me to suffer; constructing a modern PC that doesn't have support issues is a challenge that brings a small amount of satisfaction when overcome. I understand if people think this is silly.

So, I consider myself an enthusiast. Given this explanation, hopefully my original comment makes more sense.

You're forgetting about portable (notebook, tablet, etc.), where most personal computers are sold.

One reason I even consider buying a desktop anymore is that discrete-GPU notebooks tend to be enormous and/or awful. If AMD can produce an APU with reasonable graphics performance, I'll gladly buy a notebook with that in it rather than a new desktop rig.

This chip can't be used in a tablet and I would only consider this if I wanted a laptop to game on.

AMD already released Jaguar a few months ago, which is their gcn based ULP line: https://en.wikipedia.org/wiki/AMD_Jaguar

Yes I know but it is a completely different architecture so thats a completely different topic.

Do enthusiasts include those looking to build a Home Theater PC?

I just did this. Unless you're targeting something really high end with a lot of 4K and gaming, 4600 is more than enough to handle typical HTPC duties.

I'd certainly look at Kaveri today instead of Haswell for HTPC if I was doing it all over again though.

I've gone integrated on my last two HTPC builds.. my current one is going on 5 years old now.. and have been considering a jump... ironically it's by far the slowest computer in my apartment, and the one I use the most.

Something like this really appeals to me.. I'd been keeping an eye on the F2 line, and will probably go that way, or maybe NUC for my next HTPC, it just feels like the NUC options are just a little under powered.

The Gigabyte BRIX has an Iris Pro-equipped model if that's something you want.

In my case I wanted in-box optical and tv tuner, which eliminates the NUC stuff. There are a handful of right-sized 'htpc' mini-itx cases that enable this and a whole bunch of nice mini-itx boards now.

> AMD: You compare the highest end version of your brand-new chip with Intel's mid-level desktop processor.

I don't see why not if they have similar prices, and Intel's Core i7, while more powerful, is also more expensive.

They're making gaming their focus it seems and that is the go to gamer chip hands down.

If you're looking to do an small gaming rig for under $600, this is probably the chip to get... the APUs from AMD have been really good options price wise.. and the new 45W parts are really compelling for the parents/grandparents. Also thinking this would be a good option for my next HTPC, current one is about 5yo now.

Kaveri means 'Buddy' in Finnish. I guess the CPU and graphics are buddies in this case.

Also a river in India (http://en.wikipedia.org/wiki/Kaveri).

I think this is named after the Indian river. 'Kabani' is also a river in India.

Any initial insights as to whether this new CPU/GPU combo will play any nicer with linux than previous AMD GPUs?

Setting up Catalyst and getting my ATI Radeon cards to work properly in a linux setup is probably my least favorite step in setting up a linux computer.

They updated the binary driver today [1], but it looks like open-source support is not working properly [2]:

"However, when the X.Org Server started, the screen remained black and nothing appeared ever on the display nor was anything outputted to the X.Org Server log after reporting it was using RadeonSI and initializing GLAMOR. This was with the Mesa 10.0.1 driver packages in Ubuntu 14.04. Lastly, I tried adding in Mesa from Git master (Mesa 10.1-devel) but here when launching the X.Org Server and going with GLAMOR for 2D acceleration, there was a segmentation fault."

[1]: http://www.phoronix.com/scan.php?page=news_item&px=MTU3MTE

[2]: http://www.phoronix.com/scan.php?page=article&item=amd_kaver...

Personally, I find installing the AMD driver on common Linux distros pretty easy. 1 command to build itself into a package for your distro, then 1 command to install. Reboot and you're good.

Sure, you probably can't fix the screen tearing. And their VDPAU equivalent isn't the greatest. But getting up and running? It's always been really easy.

In my personal experience (I have a thinkpad x120e with a E-350) it used to be the case that I needed the proprietary drivers to get anything respectable. However, things have changed recently (in the last year or so) and I've been getting good performance out of the opensource drivers.

I can install the open source ATI package on Arch linux in one single command:

pacman -S xf86-video-ati

...but it doesn't actually work. The card doesn't get used to it's full potential. Which two commands are you talking about?

open source drivers don't have full 3d support yet.

Also, you can change your power profiles and specify a few custom configs.

But typically open source drivers are still not as good as proprietary regarding 3d support.

Probably not. It doesn't seem very high on their priority list, and as long as they ignore Linux, Nvidia (and their CUDA) will continue to dominate them. With the big push by Steam, as well as GPU computing and Linux's dominance in supercomputing, you'd think AMD would be a little more motivated...

AMD has begun to ship drivers on steamOS. And have begun to pick up their game.


As someone totally unfamiliar with hardware. Can someone explain what exactly I'm looking at here? What do the holes and different colored sections mean?

Teal squares on right: CPUs

Teal rectangles on right (in between squares): L2

Orange mass on left: GPU

Blueish rectangle on bottom: DDR interface (?) Possibly L3, I forget if these actually have L3.

They don't have L3 cache.

You are looking at a picture of the surface of the silicon chip with the circuits etched on it. I'm not sure what the holes are, there are too many of them to be wire pads, perhaps this design uses holes to aid in heat transfer. The colors are post-processing added to the image to highlight different functional units of the chip - I'm not sure what they all mean in this case, but on CPUs, the largest/most uniform areas are always the L2/L3 cache.

If you pick up a packaged CPU, you can't see this surface unless you rip the package apart. In a package, this surface is mounted face down on a tiny PCB which connects the wires to pads or pins for the socket that the CPU is inserted into, and also, a metal lid is slapped on top of the silicon chip (with thermal paste in between). Then a heatsink is slapped on top of the metal lid.

Holes? Do you mean the grid of dots? Those are the flip-chip equivalent of wire pads.

There are too many of them to be wire pads.

Just had an idea. Put holes in the die for gas flow, if there is a keep out region around the hole the flow will remain laminar at the edges provided the pressure balance is correct and not interfere with the chip. Cooling gas could flow directly through the chip.

Next up, microfluidic channels and on chip expansion chambers.

You're looking at a very high level picture of the CPU. The different coloured sections represent the different logical segments of the CPU. I can't tell you specifically what each section is and maybe someone more informed could clear that up but among there will be: registers (memory), alu (math), CU (control, brain), buses (transport), the GPU, and some other miscellaneous sections (IR,PC,etc..). For the holes, I don't know what they mean I have never seen those on a cpu die.

Unbelievable! The holes are for thermal, but how can they be so much aligned in a grid? Don't they step into the wires?

Dat silicon.

This is interesting, but my experience is that Intel's CPU's are so monumentally superior that it will take a lot more than GPU improvements to make me start buying AMD again.

Specifically I'm dealing with compile workloads here: compiling the Linux kernel on my Haswell desktop CPU is almost a 4x speedup over an AMD Bulldozer CPU I used to have. I used to think people exaggerated the difference, but they don't: Intel is really that much better. And the Haswells have really closed the price gulf.

Did you use "make -j $number_of_cores" ?

Give me some credit man. ;)

The speedup was similar but slightly less when $number_of_cores included hyperthreads (or whatever AMD calls them), as I recall. I don't have the Bulldozer machine to play with anymore ,unfortunately.

Bulldozer's cores are closer to full cores than hyperthreads, especially in workloads that don't use floating point operations, like the compilations you mentioned.

I'm actually expecting Nvidia and Apple to catch-up to Intel on CPU performance before AMD does, and I think it will happen within a year or two, after they switch to 16nm FinFET. This can happen among other reasons because Intel has stopped focusing on increasing performance too much since Sandy Bridge. They mainly focus on power consumption and increasing GPU performance these days, which obviously leads to a compromise on CPU performance.

You need to appreciate the economics to realize why it won't happen: a new chip foundry for these types of processors costs something like $5 billion, and that price has only gone up over the years (since we keep wanting to put more and more mass-produced nanotechnology into these things).

Every single time you change a process in anyway, millions of dollars of equipment - minimum - is being ripped out, retooled and replaced. And that's fine, because this industry is all about economies-of-scale, but it means Intel has a huge advantage: they can build more chips. As in, they can convert several fabrication lines to build chips, and simply have more out the door and on the market then their competitors, which means they can afford a price drop which other people can't - because they need to pay for the upkeep, running and loans to build those fab plants in the first place.

Intel is focusing on power and GPU because that's where the gains are to be had and what the market needed, and because they have to - current gen high-end CPUs have more thermal output density then a stove hotplate. Power use had to drop to have any hope of running higher performance into the future, and anyone hoping to compete has the exact same problems to contend with. And since new battery technology isn't happening, mobile has to find power savings on the demand side.

TSMC which is where nvidia and others make their chips already has 16nm factory running.

You probably won't see anything on their 16nm node until some time in mid to late 2015 and it should be pointed out that it is not a real node shrink but instead a fake one where they use the same 20nm node but have introduced FinFET. While this gets them increased energy efficiency it does not give them the other benefit of a node shrink (increased transistor density) and so that means that means there will be less chips made per wafer driving up the cost.

If they don't shift that focus in 5-7 years they'll find themselves kicking ass at the top of the scale in a market no one can afford. Buying 20 of their competitors chips would be cheaper, just as fast and save a boatload of cash in power, cooling and equipment design compromise.

No one will beat Intel at their own game competitors need to keep them going sideways.

I really doubt this to be honest as everything I have seen indicates that Intel foundries are only pulling ahead even further from the competitors.

Impossible, their CPUs today are about 3-4 times slower than Haswell. It's too much to make during two or even three years.

Hey, they finally built an Amiga-on-a-chip!

I know that you're joking but as someone working in software in the semiconductor industry, building system on chips, I can't help but seeing parallels in modern SoCs and the Commodore Amiga with the "Original Chip Set".

In Amiga, you had the choice between using the CPU, the Copper or the Blitter for certain tasks, e.g. a memset or a memcpy might have been faster to do using the facilities of the chip set, which had direct access to memory.

Similarly, at work we are often making a choice which unit of the SoC (cpu, gpu, copy engine) should be used for a particular task, e.g. a big memset, an image blit or perhaps an alpha blend. Like the OCS chip set in the Amiga, all these various units on modern SoCs have direct access to physical memory.

My educated guess is that most computers will be based on some kind of SoC architectures very soon. There are some experiments being done in putting the DRAM silicon in the same package as the SoC and it may be that in the future, 99% of your computer is just a single silicon chip and the motherboard is there only to connect to peripheral devices.

I never owned an Amiga, but I thought one of its key features was its modularity? Wouldn't Amiga-on-a-chip inherently be a contradiction then?


The major difference between an Amiga and a PC (which was essentially "cpu only") was that the Amiga had a chipset with some units that were somewhat programmable.

This could be an interesting solution for a compact steambox, essentially very similar to the hardware in the ps4 & xbox one, though I wonder if the lack of memory bandwidth would hurt performance noticeably.

"AMD says Kaveri has 2.4 billion transistors, or basic building blocks of electronics, and 47 percent of them are aimed at better, high-end graphics."

This sentence would have been so much better off if they'd just punted on the weak explanation of "transistor" and left it to anyone unsure to look it up.

Old ATI chips were named Rage. Kaveri seems to be a river in India.... but it would've been much more cooler if it was named Kolaveri, which according to my poor translation skills must mean Rage in Indian (or one of it's dialects - possibly tamil).

And then there is the song... :)

You are right! Kaveri is a river in India and Kolaveri does mean something synonymous with Rage or 'intense passion/anger' in colloquial tamil. Only tamil is not a dialect of Indian. An interesting tidbit. There are hundreds of languages spoken in different parts of India.

From wikipedia, Individual mother tongues in India number several hundreds;[9] the 1961 census recognized 1,652[10] (SIL Ethnologue lists 415). According to Census of India of 2001, 30 languages are spoken by more than a million native speakers, 122 by more than 10,000

All that aside, I'm curious to know about the origin of this name.

Yes. Kaveri is indeed a river in southern India. AMD has used more river names like Kabini (another river in southern India), Temash, Llano, Desna etc. for their APU chipsets.

Kaveri is also finnish for a friend.

Friend, yes, but colloquial, so a more accurate translation might be "buddy". To a Finn, it's rather amusing to see it used as a processor codename.

Aver (plural - averi) - means friend in bulgarian too (I'm bulgarian, but more commonly people would use "priatel" or "drugar")

The words may be etymologically related. The word kaveri has two suggested etymologies; the first is that it's originally a variation of toveri (which is more formal; also means "friend", but has a leftist connotation, like "comrade"). The latter is fairly straightforward loan from the Russian товарищ (tovarištš), which is probably closely related to the Bulgarian aver. Interestingly, the ka- in kaveri may originate from the Swedish kamrat, quite clearly related to the English "comrade".

The other hypothesis is that kaveri is from the Yiddish חבֿר (khaver), which is a direct loan from Hebrew.

Thanks! I forgot about tovarish (and russian was the first foreign language I've learned in school).

I wish Nvidia would join HSA already, and stop having such a Not Invented Here mentality.

I wish Nvidia would join Mesa already, and stop having such a Not Invented Here mentality.

I wish Nvidia would drop Cuda and focus on openCL already, and stop having such a Not Invented Here mentality.

I wish Nvidia would use Miracast already, and stop having such a Not Invented Here mentality. (with regards to their proprietary game streaming)

I wish Nvidia would push edp (/ displayport 1.4 variable refresh) instead of their in-house proprietary gsync already, and stop having such a Not Invented Here mentality.

I wish Nvidia would standardize unencumbered physx, and stop having such a Not Invented Here mentality.

"I wish 3dfx would drop Glide and focus on OpenGL already, and stop having such a Not Invented Here mentality" - Nvidia in the end of 90's.

I still remember their rhetoric about open standards and how they are good for consumer and that's why we should purchase their GPU's and that's why game developers shouldn't use just Glide.

Somehow I feel like I was tricked by them. As I was in my early teens back then I was somewhat naïve and thought they were serious. Oh how wrong I was.

Nice name. A majestic river in South India.. https://en.wikipedia.org/wiki/Kaveri

« The A-Series APUs are available today. »

It's nice to read a tech article about a new tech that is available now, and not in an unknown point in the future.

Are there open-source drivers or will the driver builders have to reverse engineer the thing?

AMD releases public documentation [1] and employees several full time open source driver developers.

[1] http://developer.amd.com/community/blog/2013/10/18/amd-gpu-3...

Is that documentation sufficient for a full-featured open-source driver on par with their proprietary ones?

I ask so because, for the past decade or so, I've been using Intel CPUs and GPUs exclusively for their excellent Linux support. If AMD can provide the same or better level of support, I'd consider switching.

These new chips aren't supported in their Mesa driver, no. They probably will be in a few months, albeit in a buggy state where some display outs may not work and the HSA isn't in use at all.

And since its radeonSI based, you don't have opengl past 3.1 or opencl, and you won't likely ever see the bulleted features like Mantle or TrueAudio.

Though, on the other side of the isle, Intel just got support for opengl 3.3 in their driver, and they don't support opencl at all on their IGP parts.

The only real toss between them when comparing gpu freedom is that Intel uses wholly foss drivers while AMD ships proprietary boot firmware that they have staunchly opposed getting rid of.

Then again, AMD supports coreboot on all their chipsets and don't use proprietary signed microcode payloads on their cpus. And even in the driver space, Intel ships firmware blobs for their wireless NICs, so they aren't saints there either.

And Intel pushed uefi, which is such a colossal PITA that makes me angry on any board I've dealt with it on. And even when Google pressures them into coreboot support on some boards only for Chromebooks, they still use firmware blobs so obfuscate the chipset anyway.

Though Intel is pushing Wayland forward, mostly for Tizen, but still they are paying a lot of Wayland devs, which is a good thing. AMD participates in kernel development, but not nearly as much as Intel. Then again, Intel is a magnitude larger company and has wiggle room on their budgeting since they are dominate the industry so much with their ISA stranglehold, so I have to give AMD some credence there.

In the end neither company is "great' for open source while the other is bad. They both do good and evil in the ecosystem (unlike Nvidia, where publishing 2d documentation is supposed to be good enough). I try to support amd when I can, if I have just an "A or B without preference" choice, since they are the underdog. Also, they produce a lot more open standards - they pushed opencl, they are supporting edp for variable refresh screens, etc - whereas Intel keeps making proprietary technologies only for their stuff like smartconnect or rapid storage.

Though some recent AMD technologies like TrueAudio and Mantle haven't been open at all, so once again, it is a toss.

Depends on how good the open source drivers for GCN are. At last check they weren't great performance but worked well enough for day-to-day tasks, while r600 worked almost perfectly.

AMD needs to die shrink their R9 chip to 20nm or less and put four of them on a single pci-e board.

They'd make a fortune.

Well, first, they need to spend the several hundred million to build a 20nm fab plant since nobody else has one.

Then they need to change quantum mechanics so they could cool 4 300w 300mm die packages on one pcb without liquid nitrogen or liquid epeen.

Sounds pretty expensive.

Even 14nm fabs exist: http://en.wikipedia.org/wiki/14_nanometer

Vendors are pretty tight-lipped about yield, though, and your other points stand as well, obviously.

Maybe encased in non-conductive oil with a waterblock-like circulation?

i think there's a good chance that board would melt. the current R9 is 275W TDP? even with a process shrink, this wont be enough to cut it down sufficiently.

> It is also the first series of chips to use a new approach to computing dubbed the Heterogeneous System Architecture

Are these not the same sort of AMD APU chips used in the PS4, i.e. the PS4 chips already have HSA?

According to the following article, The PS4 has some form of Jaguar-based APU: http://www.extremetech.com/extreme/171375-reverse-engineered...

This is great progress, and the inevitable way we're going to head for compute heavy workloads. Once the ability to program the GPU side really becomes commonplace then the CPU starts to look a lot less important and more like a co-ordinator.

The question is, what are those compute bound workloads? I'm not persuaded that there are too many of them anymore, and the real bottleneck for some time with most problems has been I/O. This even extends to GPUs where fast memory makes a huge difference.

Lack of bandwidth has ended up being the limiting factor for every program I've written in the last 5 years, so my hope is while this is great for compute now the programming models it encourages us to adopt can help us work out the bandwidth problem further down the road.

Still, this is definitely the most exciting time in computing since the mid 80s.

> The question is, what are those compute bound workloads? I'm not persuaded that there are too many of them anymore, and the real bottleneck for some time with most problems has been I/O. This even extends to GPUs where fast memory makes a huge difference.

The bottlenecks in the problems themselves shouldn't be underestimated though. Some types of problems are intrinsically difficult or outright impossible to reformulate so as to take advantage of vectorized processing.

That being said, there are a lot of other problems which do. I'm quite enthusiastic about this.

All of Intel's recent mass market chips have had built in GPUs as well. That's not particularly revolutionary. The article itself states "9 out of 10" computers sold today have an integrated GPU. That 9 out of 10 is Intel, not AMD.

The integrated GPUs make sense from a mass market, basic user point of view. The demands are not high.

But for enthusiasts, even if the on die GPU could theoretically perform competitively with discrete GPUs (which is nonsensical if only due to thermal limits), discrete GPUs have the major advantage of being independently upgradeable.

Games are rarely limited by CPU any more once you reach a certain level. But you will continue to see improvements from upgrading your GPU, especially as the resolution of monitors is moving from 1920x1200 to 2560x1440 to 3840x2400.

I believe the revolutionary aspect of this is the programmatic access to the GPU as a co-processor.

AMD's APUs have crossfire support built in. So if/when your graphics needs get to the point where you need more oomph, you can add a discrete gpu and take advantage of both the on-die and the plug-in gpu.

Well, not exactly. AMD calls this Dual Graphics and it only works with certain graphics cards. In the case of Kaveri this is like 2 entry level R7 series cards that use DDR3 memory instead of the traditional GDDR5 for more powerful graphics cards.

This makes it almost pointless in my opinion as you can get a discrete card that is more powerful on its own then the combination for not much more money.

> AMD now needs either a Google or Microsoft to commit to optimizing their operating system for HSA to seal the deal, as it will make software that much easier to write.”

I'd say this is perfect for Android, especially since it deals with 3 architectures at once: ARM, x86, MIPS (which will probably see a small resurgence once Imagination releases its own MIPS cores and on a competitive manufacturing process), and AMD is already creating a native API for JVM, so it's probably not hard to do it for Dalvik, too. It would be nice to see support for it within a year. Maybe it would convince Nvidia to support it, too, with their unified-memory Maxwell-based chip next year, instead of trying to do their own thing.

I also don't get why AMD needs Google or MS to do anything. Do they mean getting HSA in Java / C#? Because it seems to me that getting a gpu to do HSA just requires the drivers and library infrastructure (libgl, libcl) to use it.

Do AMD even have Android drivers, or are they just using their Mesa or Catalyst one? Even then, why not just contribute HSA support to the kernel / their drivers?

Here's something that confuses me, and maybe someone with better know-how can explain this:

1: The one demo of Mantle I have seen so far[1] says they are GPU bound in their demo, even after underclocking the CPU processor.

2: Kaveri supports Mantle, but claims to be about 24% faster than Intel HD processors, which are decent, but hardly in the ballpark of the type of powerful graphics cards used in the demo.

So combining those two, aren't these two technologies trying to pull in different directions?

[1] Somewhere around the 26 minute mark: http://www.youtube.com/watch?v=QIWyf8Hyjbg

I think you may have misunderstood the purpose of Mantle and that demo. Or he may have explained it poorly in the video.

Saying We are GPU bound even when we underclock the processor is attempting to illustrate how cheap Mantle makes issuing tons of instructions to the GPU. Mantle doesn't make the GPU faster, it makes submitting tasks to the GPU faster in terms of CPU-time.

I'm pretty sure his point was that an integrated gpu has no need for this because it is the limiting factor and not the CPU. This will probably not be true in the future has CPU are not getting much faster and instead more cores are being used which is one of the problems Mantle is intended to help with.

Even if the integrated GPU is the limiting factor, it still saves you CPU cycles to spend on other things, and has the additional bonus of simply being compatible with code that you would run on discrete GPU's. T'would be awful if games built with Mantle just couldn't run on APUs.

In particular, Mantle on an APU will give you much better control of memory allocation, usage, transfer, and caching. Which is something that having the GPU on die will excel at. Mantle is also good at decreasing the overhead of issuing new items on the command queue.

I think that is probably the case yes, if you are not using a dedicated graphics card more than likely you are gpu bound already. If you are going to use a dedicated graphics card then I would go with an Intel chip as they have a much better CPU and you could still get an AMD graphics card and use Mantle.

It's really nice to see AMD getting back into being a game changer.

The problem I see with AMD's APUs is the GPU performance, even if it's twice as fast as Intel's GPUs, both Intel & AMD's integrated GPUs are totally adequate for 2D graphics, low end gaming, and light GPU computing. Both require a discrete card for anything more demanding. IMO AMD is sacrificing too much CPU performance. Users with very basic needs will never notice the GPU is 2x faster and people with more demanding needs will be using a discrete GPU either way.

To explain the gap better an i3 dual core Intel chip for 100 dollars has more performance than these quad core chips in the CPU department.

This looks really cool. However it suffers from the same issue as their Mantle API suffers from. The actual interesting features are still just hype with no way of us accessing them.

Yeah the HW supports them but before the drivers are actually out (HSA drivers are supposedly out at Q2 2014) nothing fancy can be done. It'll probably be at end of 2014 until the drivers are performant and robust enough to be of actual use.

> the power consumption will range from 45 watts to 95 watts. CPU frequency ranges from 3.1 gigahertz to 4.0 gigahertz.

I was fairly dispassionate until the last paragraph. My last Athlon (2003-ish) system included fans that would emit 60dB under load. Even if I haven't gotten exactly the progress I would have wanted, I have to admit that consumer kit has come a long way in a decade.

I'm a bit slow on the uptake ... but does this remind anyone of the Cell architecture? How different are those two architectures?

Cell was neither x86 for the main cores, nor had sufficient industry standards and tooling ready (OpenCL, LLVM, OpenGL, DirectX..) for the accelerator part. AMD's new offering is fully intended for mass market, while Cell was a strange mixture of HPC architecture and PS3 processor. I'd say AMD has a significantly higher chance for success, these new chips should be pretty much a no brainer for mid-end media/ gaming PCs. If they can scale it down to a much lower TDP, it could also become interesting for 'Surface Pro class' (if you can call that a class) tablets.

iirc, Cell architecture was FPUs with a stack swapper. I.e. using scheduling to maximize use of limited FPUs on a large amount of memory.

The main innovation here is a tightly coupled cpu and gpu memory space. Prior to this, CPU and GPU mem were separate. To get data into the GPU (and out), it had to be shoved under the door via DMA. To maximize this, some Cell-like scheduling could be used....

But now, CPU and GPU share the same memory. e.g. malloc(1<<20) and either GPU or CPU can work on that buffer! No crazy scheduling needed. Also, the cache can be configed so both CPU and GPU access share it.

About as different as the standard CPU/GPU combo is from Cell.

This is literally a CPU, with a GPU, with a cache coherent memory bus.

The Cell on the other hand is a PowerPC unit, mated with a few Vector FPU units, with no direct memory access. Then a seperate MMU is setup that's able to handle DMA copy's for everything.

The wheel of reincarnation [1] keeps spinning. I hardly see anything revolutionary behind the barrage of hype produced by AMD's marketing department.

[1] http://www.catb.org/jargon/html/W/wheel-of-reincarnation.htm...

"Kaveri" is name of one of major river in India. Must have involved ( or headed) by Indian guy.


I suppose the graphics cards could only have been headed by a Pacific Islander then?

Will the APU and graphic card cooperate to form a multi-GPU with single output? It sounds as it could create a more effective gaming platform than a CPU and GPU combo.



Of course it has all the issues of multi-GPU setup, so ymmv

Hope to see AMD back in its glory days since the Athlon XP

So we finally get to see what HSA can bring to the table.

Is there support on OS level for that? Something that rewrites existing binaries on the fly and paralelises where possible? Is it even possible?

There is no OS level support, and there is nothing that rewrites existing libraries but I am hoping that programming languages themselves (Ruby, Python) get optimizations built in. An example would be hash lookups in Ruby: Why couldn't the GPU do this for us in certain use cases? You could see large performance increases for all apps written for the language with no code changes needed for thousands of developers.

> An example would be hash lookups in Ruby

You mean for a hash table? I don't think you'll be seeing that any time soon. Hash computation will almost certainly be faster on the primary cpu then just the scheduling and waiting overhead. And then the GPU isn't particularly good at any pointer chasing required for the rest of the lookup.

Scheduling and waiting overhead? What waiting?

I assumed that in most cases when you deal with a hash table you want some data returned. That's perhaps not true in the case of adding to the hash table, but if you don't need the result you can just add it to a queue, and do it in any old thread, since it's clearly not performance sensitive.

No. From how I understand hsa works, you would be able to throw raw memory at the gpu, but you'd still need to write the code to tell what memory the gpu needs to look at. I'm not certain how much work is involved in redesigning your app for this -- the dev tools AMD has available are from november, and I don't think they'll have all the Kaveri stuff in them yet.

I wonder how well they can be used for mining scrypt.

Want to buy, now! Can someone give me a hand at choosing a motherboard or something that allows using about 4 to 8 of these APU's?

So the next computing revolution is based on more power hungry chips for gamers?

How do I get one?

Holy cow, I am not used to product launching so close to the announcement.

I mean, so close to release announcement. I swear sometimes I hear, "Such and such a product has been released!" and then you can't buy it for 6 months.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact