Obviously Intel (once led by Andy Grove, author of "Only the Paranoid Survive") is aware of the threat posed by ARM. If someone could explain how Intel will fail to meet the challenges of getting x86's performance-per-watt to match ARM's, and how this compares to the challenges ARM vendors face in order to get raw performance up to Intel's level, I would love to read it. However, this post offers little such insight.
The other big technical issue is the end of Dennard scaling. For most of the last three decades, scaling CMOS processes bought you three things: more transistors, higher frequency and lower power. Things are different now. We can't really scale frequency any more because we've run into the power wall. We used to get lower power at the same frequency by scaling the supply voltage, but this also required us to scale the threshold voltage (a device parameter). Unfortunately we can't scale the threshold voltage willy-nilly like in the past because leakage power increases for lower threshold voltages and is now a significant contributor to total power. We still get more transistors per unit area, but it's not clear whether the economic costs of building up new fabs and switching to a new process are offset by the benefits of having more transistors to play with.
The bottomline is that it's not clear whether Intel's biggest competitive advantage, that of having a manufacturing process superior to everyone else, is still that much of an advantage.
PS. One thing I find truly amazing is that Dennard predicted that we'd run into all these problems back in his landmark paper in 1974!
Exactly. It's also not obvious that the mobile chip market will pay a premium for Intel-calibre fabs. If it won't, then the question becomes whether a TSMC-made Atom is better than a TSMC-made ARM.
It's also fairly common to have custom hardware added to SoCs. Is Intel prepared to open up their processes to that sort of thing?
Unfortunately we can't scale the threshold voltage willy-nilly like in the past because leakage power increases for lower threshold voltages and is now a significant contributor to total power.
This one cuts both ways though. With leakage dominating active power, within a given node, the fabrication process will be relatively more important than microarchitecture, which is a point in Intel's favour.
This is kinda nitpicking, but I'm not sure leakage will ever dominate active power. We still have the ability to reduce leakage if we want, we just have to give up frequency for it. In the past we didn't have to play this trade-off but even now I don't think it ever makes sense to run your chip so fast that leakage is more than dynamic power.
I do agree that for any given node, Intel is still going to ahead of the rest. It'll be interesting to see how much this helps them.
Active power is the one that's related to the frequency (P ~= CV^2f). Leakage power will "leak" even if the transistor is not switching.
Maybe you're referring to some papers that used to come out a few years ago which suggested that leakage power will dominate total power. As I said above, this is unlikely to happen. It doesn't make sense to operate at a combination of supply voltage (Vdd) and threshold voltage (Vt) where leakage dominates total power. I think these papers misunderstood the fact that threshold voltage and hence leakage itself is a knob that the device manufacturing folks can control.
Active power is the one that's related to the frequency (P ~= CV^2f). Leakage power will "leak" even if the transistor is not switching.
If you're implying that leakage power doesn't affect frequency, you are wrong. Transistor speed depends on the gate overdrive which, for modern velocity-saturated devices is proportional to Vdd-Vt. Leakage power itself is proportional to exp(-Vt). There is a clear trade-off here between how fast you run your chip and how much it will leak.
Threshold voltage is not really an effective knob, unless you assume that the feature size to be a knob and go against Moore's law, or that brand new, once in 10-years process innovation is a knob that designers can pick out of a hat. I don't think anyone's clamoring for return to 130nm parts on a smartphone. At each new process node, you're going to lose out on the amount of control you'll have over Vth.
This is basically what Intel did with the tri-gate transistors which gives them longer lease on life until they bump against subthreshold leakage. TSMC is on their first generation high-k metal gates, and still a process node or two away before jumping over to the tri-gate party.
Threshold voltage is not really an effective knob
Why is it not an effective knob? Most modern designs include sleep transistors in an attempt to not leak when a circuit is inactive. These would not work unless we could engineer high-vt transistors.
So far, Intel has shown that they aren't very good at simultaneously developing two parallel product lines of CPUs. Their tick/tock strategy of alternating process shrinks and microarchitecture updates has been working great for years, but Atom has clearly been neglected. Prior to that, they had the P4 NetBurst architecture and the P6-based Pentium Ms on the market at the same time, but NetBurst hit a wall and the company lost a lot of ground to the Athlon 64 before they could come up with a high-performance successor to the Pentium M.
You realize that Intel's EMT-64 is effectively AMD-64, right? Intel, which loathed that cross-patent deal, is now reaping AMD's rewards. Core2 and later series processors are all using AMD's intellectual property (legally).
I am, however, a Software Engineer. I know that most of the perceived lag on a modern desktop is not due to the CPU, but inefficient I/O to the hard disk or network. One must only look at the iPad 2 too see that's very possible to make a fast computer with beautiful 60 FPS animations and snappy applications using only an 800 MHZ dual core ARM CPU. Ironically, my iPad feels way faster than my Macbook Pro most of the time.
You don't need to be an expert in the microprocessor industry to know that the CPU performance race is over. It's all about power consumption now, and X86 fails miserably at lower power computing. Unless you know something I don't.
x86 currently doesn't scale down to the level required for smartphones.
However, it is getting close in the tablet space. Estimates for the Tegra 3 are around 3-4W TDP, while the Cedar Trail Atoms are around 5.5W TDP. In early 2012 Intel will release their Medfield Atom chips, which will make the competition even more interesting.
Based upon history, reports of the x86's death have been greatly exaggerated - since the late 1980's.
Here's a nice 1999 article from Ars: [http://arstechnica.com/cpu/4q99/risc-cisc/rvc-1.html]
and the archive.org version for those without IE4 or Netscape Navigator: [http://web.archive.org/web/19991129051550/http://arstechnica...]
Floating Point operations are an example of the hurdle faced by the RISC processors such as ARM - RISC ideology suggests that dedicated FPU hardware and instructions should not be used despite the performance hit that software implementations incur.
On the other hand, the x86 CISC approach has allowed for increased integration based on changing market demands over the past 20 years (e.g. FPU integration with the 80486 in 1989 and MMX in 1996 on the Pentium).
That sort of flexibility has advantages.
edit: My excuses, I couldn't access the cited arstechnica article.
Most perceived lag on a modern desktop comes from excessive abstraction which results in poor coding practices. You could certainly argue that IO bottlenecks or a lack of system resources will certainly have an impact but that impact wont be realized until the environment is somewhat saturated. A simple solution to the hard drive bottleneck is to throw a SATA3 SSD in there instead, or to give a system more RAM to boost disk caching, problem solved. On the other hand, no amount of system resources will alleviate a performance hit caused by shoddy coding. This is the reason that I refuse to use Google docs, the performance is about as good as Wordperfect on Windows 95 because of all the abstraction insanity.
One must only look at the iPad 2 too see that's very possible to make a fast computer with beautiful 60 FPS animations and snappy applications using only an 800 MHZ dual core ARM CPU.
The iPad 2 is about as powerful as my Pentium 4 was back in the early 2000s. Shrinking it down to that level is certainly an accomplishment but it's not worth the shock and awe that you present it to be. It's nice to have a device such as the iPad 2 to fill the time when you wish you had a computer but it is in no way a full desktop substitute.
Ironically, my iPad feels way faster than my Macbook Pro most of the time.
Your MacBook is a fundamentally different device than your iPad. They may feel similar but this is purely superficial, the underlying operations are vastly different. If your MacBook is that sluggish, it's either because you're using an Apple product or you've got a PEBKAC error.
You don't need to be an expert in the microprocessor industry to know that the CPU performance race is over
Yes you do. The CPU performance race has been over for the past 5 years but not for the reason you think it is. The CPU performance race is over because AMD choked and threw in the towel. In 2007 AMD's flagship Phenom processor was bested by Intel's then worst in class Core2Quad Q6600 in almost benchmark (if not every benchmark). In 2011 AMD's flagship octal core Bulldozer processor was beaten by a Intel's worst in class quad core i7 920 from 2 years ago which also had an added handicap of only having 2 of its 3 memory channels loaded with DIMMs. Don't blame AMD's failures on the market, or Intel, blame them on AMD.
The fact that the CPU performance race is over doesn't mean that Intel has won, it merely means that Intel is the only competitor since AMD is effectively now a non-contender. It also doesn't mean that there is room in the desktop market for ARM CPUs, or that desktop hardware manufacturers are suddenly going to start writing drivers for two completely different architectures.
While it is certainly true that ARM is gaining on Intel in the performance space, it is still a long long way behind and that gap is only going to get harder and harder to close as time goes on. This is going to be doubly difficult when ARM manufacturers try to catch up to Intel in the general purpose execution department. It's easy enough to say that ARM has a lead in performance per watt if you ignore all of the special hardware capabilities that Intel CPUs have which are mostly absent on ARM or if you forget that power consumption scales logarithmically with voltage and that voltage is necessary to maintain a higher frequency.
It's all about power consumption now, and X86 fails miserably at lower power computing. Unless you know something I don't.
I do know something you don't. Architectures aren't designed to scale infinitely in both directions on the power scale yet Intel still manages to operate dual core full featured processors in the 17 watt range that will still destroy any dual or quad core ARM processor that gets put up against it. Also, I'm not sure how you can justify your statement "it's all about power consumption" because for 95% of the desktop market heat is a non issue whereas a lack of performance certainly is. If you live in a datacenter the constant whine of fans and AC units can certainly get annoying but as I mentioned above, there are already low power solutions that can be had without reinventing the wheel.
This feels like throwing the baby out with the bathwater.
This is worthless without actual numbers, which I doubt you have. Hardware people blame software, software people blame hardware, as it has always been, so mote it be, amen.
Here is what John Carmack talks about his troubles with the lack of PC performance due to the multitude of APIs to reach the hardware:
John Carmack: ... That's really been driven home by this past project by working at a very low level of the hardware on consoles and comparing that to these PCs that are true orders of magnitude more powerful than the PS3 or something, but struggle in many cases to keep up the same minimum latency. They have tons of bandwidth, they can render at many more multi-samples, multiple megapixels per screen, but to be able to go through the cycle and get feedback... “fence here, update this here, and draw them there...” it struggles to get that done in 16ms, and that is frustrating.
Later in the article John expands on the thick software problem.
The article is here:
Let's say it is possible. That would mean current systems are about ten thousands times bigger than they could be. That's 4 orders of magnitude. And even if it isn't 4 full orders of magnitude, I'm willing to bet on 3.
It is not yet about raw speed, or latency. But when a system is at least 3 orders of magnitudes bigger than it could be, it does mean that something there vastly suboptimal. And runtime performance could very well be part of that "something".
But that's kind of a straw man. Even if you convince me that feature creep really is valuable, lack of features explains but 1 order of magnitude out of 4. There's still 3 to go. I have two explanations for those.
First, they reuse their code. A lot. When they write a compiler, all phases (parsing, AST to intermediate language, optimizations, code generation) are done with the same tool (augmented Parsing Expression Grammars, search for the OMeta language for more details). When they draw something on the screen, be it a window frame, a drawing, or text, they again use a single piece of code. Mere factorization goes a long way. Id' say it explains about 1 order of magnitude as well.
To sum up, we could argue that current systems are about 4 orders of magnitude too big. Of the 4, 1 may be debatable (lots of features). Another (not reusing and factorizing code) is obviously something that has Gone Wrong™ (I mean, it could have been avoided if we cared about it). The remaining 2 (DSLs) are a Silver Bullet. Not enough to kill the Complexity Werewolf, but it sure makes it much less frightening. By the way, we should note that the idea of DSLs is around for quite some time. Not using them so far may count as something that has Gone Wrong as well, though I'm not sure.
Adding an SSD seemed to make no difference, but with luck, the software will get some love and speed up.
That being said, in terms of CPU's being shipped that are 'customer facing' and programmable with applications from multiple third parties, ARM chips in 'smart' phones and tablets are taking up a bigger chunk of the pie than any previous instruction set architecture (ISA). That includes both PowerPC (Apple products) and Motorola's 68K architecture (Sun and Apple products).
However, what the Andrew misses out on completely is the distinction between systems and processors and the effect that has on adoption rate. This 'secret weapon' that guards the x86 ISA from death like the charm on Harry Potter's head, was put there by IBM in 1981.
In 1981 IBM shipped its first "Personal Computer" and because it was new to IBM to do that and they expected mostly hobbiests to buy them, the 'hardware information' manual came with schematics, a BIOS listing, and where all the various chips were addressed and how those chips would work. Then as its popularity soared, it was 'cloned' (and this is very important), right down to the register level and with identical BIOS code. The parts were available from non-IBM sources and there was really nothing preventing an engineer from doing it except the off chance that IBM would sue them for something.
As it turned out they did sue for copyright violation on the BIOS code but that was really all they could do, the schematic could be copyrighted but implementations of the schematic were not. Once someone had implemented a BIOS in a 'clean room' and that the BIOS was legitimate was sucessfully litigated, the door was opened and the 'PC' business was born. The key here however was that every single one of them was register and peripheral compatible.
Another event happened at this time which helped seal the charm. Microsoft started selling MS-DOS which was software compatible (which is to say had the same APIs) as PC-DOS but could run on hardware that was not register compatible. Intel made a high integration chip, the 80186, which you could think of as a ancestor of today's system-on-a-chip (SOC) ARM chips. It ran MS-DOS but because the registers and peripherals were slightly different (better engineering wise, but different) programs that ran on PCs would not run on it if they talked to say the interrupt controller or the keyboard processor. Thus the term 'well behaved' programs was born, and they were few and far between. And the other side was Microsoft Flight Simulator that, in order to get any sort of performance at all, talked almsot exclusively to the bare metal, became the barometer of 'clone' ness. The question "Can it run Flight Simulator?" was a buyer discriminator and if the answer was 'no' then sales were disappointing.
Those two events, cemented for almost two decades the definition of what it meant to be a 'PC'.
Into those decades billions of person-hours were invested in software and tools and programs and features. A meeting of Microsoft and Intel regularly got together with OEMs and chip makers and system builders to define all of the details, the same details that were originally from the PC Hardware Manual, that everyone would agree on constituted a "PC". These became known as the "PC-98" standard (for PC's built after 1998) or the "PC-2000" standard. Things like power supplies, keyboards, board form factors and slot configurations all became sub processes within that ecosystem and followed the lead of this over-arching standard. Obscure stuff like what the thread pitch would be on the screws that sealed the cabinet, not so obscure stuff like the dimensions of the 'cut out' for built in peripheral ports. And during all that time the basic registers, the boot sequence, what BIOS provided, and the set of things that could be counted on to exist so that you could boot to a point to discover the new stuff all remained constant.
ARM doesn't have any of that. ARM, as an ISA, is controlled by a company that doesn't build chips, doesn't sell systems using those chips, and is not affected by 'stupid' choices in their architecture. All of that is offloaded to the 'ARM licensees.' And since anyone can license and ARM chip, they do. And that means you have ARM chips in FPGAs and ARM chips from embedded processor manufacturers, and ARM chips from video graphics companies. They are all different. Worse, they all boot differently, they all have different capabilities, they don't talk to a standard graphics configuration, they don't have a standard I/O configuration, they don't have a place where USB ports are expected to appear, or a standard way of asking 'what device is booting me and can I ask it for data?' Quite simply there is no standard ARM system.
And because they don't have a standard system, there isn't any leverage. Its like running a race with lead shoes, possible but very tiring.
Now some folks, and Andrew here is clearly one of them, think the system problem is solved by 'Android.' They believe that because software developers can write to Android APIs and have their code run on all Android machines, that they are done. Except that getting Android to run on an ARM system is painful. And worse the 'high volume' Android systems have features at different places (where the accellerometer is, how the graphics work, can it do 2D accelleration or not?) There is not Android 'pc' which gets to define all the detail bits and thus free manufacturers from the grip of having to hire expensive software types to figure this out.
In the end I agree with Stephen's comment that "If someone could explain how Intel will fail to meet the challenges of getting x86's performance-per-watt to match ARM's...." is a red herring, since Intel has literally years of runway to do that, meanwhile ARM platforms are dying (Playbook anyone?) because the cost to make them pushes them out beyond what the market will bear (and yes the iPad/iPhone are keeping a lid on what you can charge for one of these things)).
Now ARM could come up with a spec for the 'ARM System Standard' and license/certify that. That has some possibility if someone like Google made sure that the Android kernel always ran on the 'reference design' standard. But that level of strategic thinking has been very hard to co-ordinate to date.
As you say, Google is in a similar position, so perhaps a Microsoft/Google jointly supported standard makes a certain amount of sense, as odd as that sounds...
Look, I still think microkernels are better than monolithic kernels, but you don't see me claiming Linux is doomed just because the L4 microkernel is running on 300 million mobile phones worldwide (http://en.wikipedia.org/wiki/Open_Kernel_Labs).
Monolithic kernels aren't going anywhere, and neither is x86.
ARM doesn't have to beat Intel in raw performance. They just need to make them irrelevant to most people. And they are succeeding.
In a classic disruptive innovation fashion, Intel will (continue) to move up market, in servers and super computers, where it will be more profitable for them and where ARM won't be able to reach them (for now). This will become obvious when ARM-based Windows 8 and OS X computers will become available at the end of next year or in 2013. As soon as that happens, laptops will start becoming a low-margin business for Intel.
As a "normal" person, why would you get a $1000 ultra-book, if a similar looking Transformer-like device will be available for $500, and have almost the same "perceived" performance. Dual core and quad core 2.5 Ghz Cortex A15 chips will show how that kind of performance is enough for most people.
personally, i don't want a transformer device. the main reason is that it runs android and has little connectivity options. it cannot do anything CLOSE to what I do on the laptop.
i own some tablets (!) and i rarely use them. i like to test stuff on them, and sometimes, browse the web or the like. but they're not very useful.
i like the phone better (its smaller!!) and the light laptop better (it does everything without compromises ! and im talking using word, the web, IM, etc, not coding. heck im+web and copy pasting around in android is such a pita. not even talking about getting a proper video out, or copying files on a usb stick.)
At the high end, look at the http://top500.org/. #1 is based on SPARC VIIIfx. #10 is based on the PowerXCell 8i. #2 and #4 both derive much of their power from GPUs. Even that understates the situation, because many of the most powerful computers next year - Blue Waters, Mira, Sequoia - will also be based on non-x86 architectures. Then look at what Tilera or Adapteva are doing with many-core, what Convey is doing with FPGAs, what everyone is doing with GPUs. Intel is going to be a minority in the top ten soon, and what happens in HPC tends to filter down to servers.
So Intel has already lost mobile and HPC. Even if Intel keeps all of the desktop market, what percentage of the laptop and server markets could they afford to lose before they follow AMD? Maybe it will happen, maybe it won't, but anybody who can see beyond the "Windows and its imitators" segment of the market would recognize that as a realistic possibility.
Is this conventional wisdom? How does a petaflop race affect app servers and databases? It seems like most traditional server workloads could get by without a single FPU. The only thing they have in common is IO. Are there many data centers using Infiniband? (Maybe there are I don't know.)
The Cell architecture is an evolutionary dead end. SPARC is no more of a threat to X86 now than before. GPUs may be the next big thing for HPC but its got a long way to go to get out of its niche in the server market. (That niche being... face detection for photo sharing sites? Black-Scholes? Help me out here.)
I mean, I agree with your overall point, but I think it's more likely that ARM will steal all the data center work before anything from the HPC world does. They are too focused on LINPACK.
I'm not quite sure it's valid to write off SPARC as an architectural dead end when the current fastest computer in the world uses it, and the next crop of US competitors for that crown are all based on the Cell/BlueGene lineage. GPUs are also more broadly applicable than you might think. Besides video and audio processing, they can be used for many crypto-related tasks (witness their popularity for Bitcoin mining), various kinds of math relevant to data storage (e.g erasure codes or hashes for dedup), and so on. Many of their architectural features are also being copied by more general-purpose processors as core counts increase, as well.
Yes, high-end HPC is too obsessed with LINPACK. Nonetheless, it remains a good place to look when trying to predict the future of commodity servers. Even if ARM does displace x86 instead, many features besides the ISA are likely to come from HPC. Perhaps more relevantly, either outcome is still very bad for Intel.
Additionally, intel still has plenty of time to get up to pace in the mobile market. The tablet market is as yet largely untapped, especially globally. I wouldn't be surprised if next gen atom processors made their way into leading edge tablets in the next few years, for example.
Generally speaking: forecasts that require intel to roll over and take a massive beating while billions upon billions of business leaks away to its competitors don't tend to pan out in reality. The only way that works is if intel goes bankrupt the instant a competitor comes on the scene, and that's just fantasy.
I've seen this claim often, yet I could not find any sources that could back up this claim. Can you post a link to an article or some research that compares performance/watt (as well as actual power usage) between ARM and x86? I'm genuinely interested in this.
At what point did "consumer" start meaning "low end" or "handheld"? The 27" quad-core i7 iMac is a consumer computer. Gaming PCs are consumer computers.
Quad core machine with a million gigs of ram for email and a web browser?
Sure there are LOTS of good reasons for having legitimate CPU power, but a lot of times any random Ghz level processor is going to provide plenty of responsiveness for daily tasks. The only thing I can think of that people typically do that is processor intensive is HD playback, and that is easily accelerated nowadays.
It's not always about absolute performance, it's about "good enough" performance. If ARM is going to supply good enough performance with the additional benefits of being cheaper and more portable, then why NOT use it?
This isn't about ARM versus Intel. This is about having adequately powered portables.
Intel is losing the low-end CPU market. That much is true. But the low-end CPU market is the new middle-end CPU market. I think we are going to see an age where more and more people have "low-end" portables as their main computers. The barrier between low-end, middle-end, and high-end has shifted significantly I think. A few years ago, we all had uses for high-end computers. Nowadays, what would be considered high-end is a waste for most people.
Also, we can't forget the impact of the cloud on this. We don't need a lot of computing power locally now. For many of the types of applications that one would need high cpu for, the cloud potentially provides those solutions for us.
I, for one, don't see myself trading in my desktop at work anytime soon. But I do see myself using my laptop a lot more than my desktop at home. My couch is a lot more comfortable than my computer chair.
I hear this sentiment a lot, and in general I agree, but the fact is the app with the largest RAM footprint on my laptop is Firefox. Given the proliferation of web-based apps, I don't see the complexity of web browsers going down. We can always use more power.
What people fail to see is that "just a browser" is a completely idiotic and misinformed statement.
Browsers are probably one of the _most_ complex and powerful app on the system.
Browsers are basically running entiere applications, virtualized, in a sandbox per tab!
Heck, some websites are just not viewable on mobile right now (unfortunately) because mobile jus't aint nearly fast enough. Think WebGL for example. Few mobile browsers support it, but when they do, its pretty slow if the author didnt make a super low polygon and texture count version...
And that's the real problem for Intel. Mobile computing power is improving an incredible rate -- probably faster than anyone could have predicted -- and soon enough they'll reach a level where ARM and Intel are actually competing at the same level. We're not there yet but it's close. At that point we'll see if Intel has what it takes to stay in the game.
While this may be true, it is not necessarily. Just pull out your old 4GHz P4, and see how it stacks up against your modern 3GHz desktop.
The point being architecture has come a long way for x86, and even if they match frequency, ARM will not necessarily come out of the gate capable of matching that.
Sure, web browsers nowadays do much more than rendering html. It's actually among the most complex software package that your computer runs. Especially the high amounts of memory do not go to waste for a heavy web user.
Also, note that the savings/benefits can be in ways other than more battery life e.g. price, fanless & sealed cpu, etc.
Without 64-bit support from its competitors, Intel doesn't have much to fear, especially in the datacenter space, where performance per watt is a powerful selling point.
Flagged for no meaningful content and inflammatory title.