I would not be surprised if they made a Thumb32 to match Thumb. Its a rather useful mode for certain kinds of programming from what I've heard from coders on ARM platforms I know.
My prediction: once this architecture is in the release channels, Intel is out of them.
In a matter of years, Intel will be the dinosaur of microprocessors. Unless Intel also (re-)embraces ARM. Its simply a matter of price and configurability. ARM is simply unbeatable when it comes to price. Let alone energy efficiency. Its time to short Intel shares..
ARM is unbeatable in terms of price right now, but that's because they have much smaller parts that are much cheaper to produce. What makes you think that will still be true if they want to approach the performance of mainstream Intel parts?
Historically, low-end parts have managed to kill high-end parts by selling in larger volumes and amortizing their fixed costs over a larger base. ARM sells in huge volumes, but how is selling $1 embedded chips going to help them amortize the cost of their high end stuff? Decades ago, doing that sort of thing would helped Intel fill up their idle fab time while competitors like DEC struggled to pay for their fabs, but ARM has never owned its own fabs, and selling completely different low-end parts doesn't really help amortize the cost of masks, post-silicon debug, etc. of their high end parts.
I'm also skeptical that ARM will retain its power:performance advantage if it tries to scale up performance to match low-end mainstream Intel parts (e.g., core i3). The main reason ARM has better performance:power is that there are diminishing returns to performance. Another is that Atom is a new product line, and Intel doesn't have much experience trying to optimize for low-end parts. That won't be true going forward. Yet another transient advantage ARM has is that Intel uses a high performance process, but they've said that they're going to have both a high performance and a power optimized process in the future. If anything, Intel's process wizardry should give them an advantage there, once they decide to push it. What other inherent advantages does ARM have over x86? It's a PITA to decode x86 instructions, but the ARM instruction set isn't very nice, either (e.g., look at how they ran out of opcode space, and overlayed some of their "new" NEON instructions on top of existing instructions by using unused condition codes for existing opcodes).
I wouldn't bet against ARM but, if they win, it will be because of the usual reasons: better marketing, better business practices, and better engineering, not because there's some inherent advantage that lets them be cheaper than Intel. And, by the way, the configurability advantage you mention actually hurts them in terms of cost, by increasing the fixed cost per part.
The flipside in terms of selling in large volumes to amortize fixed costs is simple: the iPad. Monumental volume tied to one single SKU of CPU.
If they needed to, Apple could afford to front someone the entire cost of building a leading-edge fab (or, I suppose, build one for themselves, except that I doubt they would want to operate a fab).
So if Apple wanted to make some super advanced ARM chip available at a consumer price, they could probably change the colume aspect of the equation radically.
However, all modern x86 cores (P2 and up, PPro, K7 and up, etc) are micro-op RISC-ey cores, and the only x86-ness in the entire CPU is in the instruction decoder and scheduler block.
Atom is interesting only in the sense that Intel stripped everything they could off the P6 core and then redesigned stuff purely to use less power. In the end, they produced a horridly slow waste of time, money, and power.
About 6 months later AMD came out with a cut down K8 (while the desktop chips were second generation K10s) that has the same power usage, runs several times faster, supports dual channel DDR3-1333 (vs the Atom's single channel DDR2-533), and comes with a GPU that can decode H264 off Blurays (Atom's paired GPU, a low end GMA, can't decode anything) and can do DX11 (vs DX9) and can do Aero (vs nada).
Intel is coming out with newer Atoms that suck less, but they're nothing compared to what the new Bulldozer-based mobile chips are rumored to be.
Unfortunately, it looks like it all was a fluke. I haven't heard of any notebooks with Atom Z6xx series, presumably because Windows won't run on them because they don't have a PCI bus. With the architecture consisting of a single model (Vaio P-Series) the possibility of running Linux is vanishing: there was a special build[sic!] of Ubuntu that should run on it, but that's it.
This is mostly mypoia, based on the technology you see every day and think is important. CPU cores aren't remotely all of Intel's business, just as ARM Ltd. gets no revenue from semiconductor manufacturing. Intel has hands-down the best process technology out there, and gets insane margins on their parts (average of $2/mm^2 or something like that). They aren't going anywhere. The PC market is stable, but not shrinking. The server world is booming.
Intel clearly wants to be a player in the consumer SoC market. And they've had... let's say rather mixed success there. But they'll keep trying. Similar challenges were seen in the early 90's vs. the RISC world and '99-02 years vs. AMD. They reworked, adapted, and clobbered the competition.
As someone else posted, their death has been predicted so many times over the years that it's almost a running gag in the industry.
Lynnfield is the 45nm Core 2 Duo, kind of old to make a comparison. Intel doesn't sell anything as big as 300mm^2 in their current line except for $2k monsters like Westmere EX.
I honestly don't know where I got that number from. I was thinking something like $200 average for the 2-core Sandy Bridge die, which is 90mm^2 I think. I'm obviously not privy to sales volume numbers nor to what their wholesale prices look like. So feel free to apply a factor of N to that intuition.
And obviously I meant revenue, not margin. Apologies.
But then you have things like the Core i7-980, which is also about 300 mm^2 but retails for up to $1200. I don't know what the average is, but $1-$2/mm^2 is in the right ballpark at least.
Proof that it is myopia: Those cell phones everyone loves, that are bringing the future? They depend on the cloud, the other darling herald of the future. What drives the cloud?
Yes, in exactly the same way that Intel is "making" SoC chips for consumer electronic devices. Having a part that boots and runs isn't the same thing as having a competitive product in the market.
Didn't we hear this with mips, sparc, ppc, transmeta and every other cpu architecture of the moment? Yes, ARM has a place, but it's not going to displace x86. What will slowly displace x86 is the transition to post-pc devices. But these will run whatever has the current sweet spot between cost and power.
The most important reason x86's are relevant today (and why we are stuck with them) is because they run Windows. ARM doesn't need that and, even if it did, Microsoft needs ARM enough they are promising Windows 8 and Office for it.
>The most important reason x86's are relevant today (and why we are stuck with them) is because
they had better price/performance at the start and just plain better performance a bit later. Alpha ran Windows and was a nice CPU. Unfortunately it did _cost_ .
Can ARM run Java for enterprise users? Nehalems with DDR3 have hard time doing it.
There is of course the market for energy efficiency coming with ARM's straight in-order execution, and if we multiply these simple cores, and with programmers and compilers doing better work on parallelization we even can get decent performance, yet i doubt it will beat out-of-order superscalars.
On the other side it isn't [directly] about price, performance or efficiency because it isn't direct competition. x86 run desktops which was explosively growing computing segment of yesterday. ARM runs mobile - explosively growing computing segment of today. The arguments of big iron CPUs vs. x86 back then were similar to the arguments of desktop/server x86 vs. ARM today. These arguments are technically valid, yet they just don't matter today like they didn't matter yesterday as they talk about different computing devices and task they do. The mobile device primary task isn't Excel. It is maps, photos and Facebook/Twitter. This mobile computing looks primitive the way Visicalc looked primitive to the computing of big iron. Yet it is the largest piece of tomorrow's computing and ARM is just more suitable for it than x86 the way x86 was more suitable than Alpha/Sparc/Cray/etc... for desktop.
ARM can run Java, in fact they used to have a (sadly undocumented) op extension called Jazelle which ran common Java techniques JVMs use (such as stuff involved in exceptions or stack management or bytecode decoding/execution).
When using a JVM that is based on ARM's JVM (which does NOT include Android's), a lot of bytecodes directly translated into existing ops and Java ran much faster.
Due to the lack of documentation, open source JVMs (or even non-Java VMs) could not take advantage of it.
However, Jazelle has been replaced with ThumbEE, which does the same thing, but generically so any language that could benefit from it can use it, and it is also documented and FOSS friendly.
Java (->Jazelle RCT, Java over ThumbEE), Perl, and Python have VMs that take advantage of ThumbEE now.
I don't think ARM can outdo Intel given Intel's tech and advanced fab capability, but I think business model wise Intel is in serious trouble with this announcement.
Intel has done its best to lock companies out of adding value to Intel's chips. Intel forced Nvidia into a more limited role and even limits the usage of Atoms. Intel also is failing at keeping the wattage of its chips down.
Companies are getting very good at adding their own custom designs to the ARM core (something Intel is kinda / sorta conflicted about). With a 64-bit core, some of Intel's safe space might got away as companies go with a solution that they can decide on fab partners and on chip design elements. Don't want this type of port, then don't put it on. Need more GPU, add it.
The game really isn't technology, it's business model.
Is there any backstory on why this is only happening now? The address space situation is already borderline in some areas of ARM usage, and cell phones are about to break 1GB of physical RAM (if some haven't already?).
Seems like this is a couple years overdue to have anything close to a comfortable margin.
1. Cell phones may cross over the 2 GB boundary soon,
but ARM can already comfortably address terabytes
of RAM through segments, as mbell notes. At worst,
we're limiting individual applications to under
2 GB of RAM--not the end of the world.
2. While being able to address 64 bits at once is useful
for some computing tasks, *most* of those (scientific
computing being a biggie, but also some parts of video
editing or sound mixing) just aren't ones people
currently do on a cell phone. They likely will
eventually, but the hold-up there is as much about
getting the data onto the phone in the first place as
actually doing anything significant with it.
3. Adding 64-bit wide paths adds a *lot* of silicon for
something that would currently not get used at all,
and in the future won't get used much for quite some
time. More silicon means a shorter battery life and
more heat--not exactly what you want in a cell phone.
Look at how much effort ARM's going to in big.LITTLE
just to let them go superscalar, let alone 64-bit.
4. Finally, the on case where you might get good
benefit--games--are better served through GPUs and
ARM's existing NEON instructions, which already work
on I believe 128-bit-wide values.
So those are all the reasons why I don't see the lack of 64-bit ARMs right now as a real problem. And, indeed, if ARM were only getting used in cell phones, I wouldn't honestly see a pressing need to go to 64-bit chips, period.
But ARM wants to move into servers, and there, the situation's obviously very different. 64-bits is a no-brainer for any moderately sized database, and that's where I suspect ARM really needs this ISA. But those will take a few years to build and gain acceptance, anyway; they've got time.
> Adding 64-bit wide paths adds a lot of silicon for
something that would currently not get used at all,
and in the future won't get used much for quite some
time.
I can't find the quote, but John Mashey said that adding 64 bit support to MIPS processors cost something like 5% area. Datapaths and architectural registers really aren't that expensive in area or power.
It does takes more power to move 64 bits across the pins and 64 bit addresses take more space. I suspect that those factors are far more important wrt battery and cell phone cost than the processor costs.
Although not without subtle problems, the hardware was generally
straightforward, and not that expensive—the first commercial
64-bit micro’s 64-bit data path added at most 5% to the chip
area, and this fraction dropped rapidly in later chips. Most
chips used the same general approach of widening 32-bit
registers to 64 bits. Software solutions were much more complex,
involving arguments about 64/32-bit C, the nature of existing
software, competition/cooperation among vendors, official
standards, and influential but totally unofficial ad hoc groups.
I think it's talking about the MIPS R4000. It's nice to mention registers, but what about the increase in size of the comparators, adders, multipliers/dividers (ouch), cache system, bus interface, decoder, etc.? Perhaps the point is that being 64-bit compatible does not take a lot of surface, even though being fully 64-bit would be expensive. The article mentions how being forward-compatible can be a good thing (even if it costs a little performance), so I suppose it would make sense to support 64-bit right now and make it fast later, without breaking ISA compatibility.
Edit: reading about the MIPS R4000, it's definitely not a fully 64-bits processor. Some Wiki quotes:
The shifter is a 32-bit barrel shifter. It performs 64-bit shifts in two cycles, stalling the pipeline as a result. This design was chosen to save die area. The multiplier and divider are not pipelined and have significant latencies: multiplies have a 10- or 20-cycle latency for 32-bit or 64-bit integers, respectively; where as divides have a 69- or 133-cycle latency for 32-bit or 64-bit integers, respectively. Most instructions have a single cycle latency. The ALU adder is also used for calculating virtual addresses for loads, stores and branches.
The memory management unit (MMU) uses a 48-entry translation lookaside buffer to translate virtual addresses. The R4000 uses a 64-bit virtual address, but only implements 40 of the 64-bits for 1 TB of virtual memory. The remaining bits are checked to ensure that they contain zero. The R4000 uses a 36-bit physical address, thus is able to address 64 GB of physical memory.
(By the way, thanks for the quotation, that was quite an interesting read. I didn't expect forward-compatibility to be so cheap.)
64 bit operations don't significantly affect the decoder (which isn't a big deal anyway). The option to pull 64 bits out of the cache at a time isn't a big deal given the ability to pull 8, 16, and 32.
> the MIPS R4000, it's definitely not a fully 64-bits processor
Sure it is. It does 64 bit arithmetic with a single instruction. Yes, instructions for different lengths may take different amounts of time, but so what?
I note that various 586/686 implementations have had faster operations on shorter data types (and in at least one case, had slower operations on shorter datatypes). (I'd argue that the first 386 in 32 bit mode was a 32 bit processor, but surely the pentium-class machines are.)
> only implements 40 of the 64-bits for 1 TB of virtual memory
No current x86-64 implementation implements more than 48 bits for either virtual or physical addresses.
Legal alpha implementations could implement as few as 43 bits.
Care to name an implementation of any processor that has 64 virtual and or physical address bits?
It is such a shame that ARM has taken over MIPS place in the embedded world, and that SGI crashed and killed it in the server space, MIPS is a much nicer architecture than ARM or PPC, (and nevermind SPARK or god forbid IA64).
Well, even 32bit x86 can address more than 4GB memory, but at least there you have to use horrible hacks like PAE. When the native data size is smaller than the address space, taking advantage of the full address space and working with large data sets becomes a hassle.
Oh? How does it handle this? Something similar to PAE? To PSE? Explicitly using cooperative coprocessors? A completely different scheme?
Also, isn't it relevant to process-addressable (virtual) memory? Barring explicit APIs such as AWE. As far as I know, ARM registers are 32b so a process should not be able to load data from an address higher than the 4GB mark (in its virtual memory), am I wrong?
Two separate issues, memory per process (which is limited to 32bits of address space) and total physical address space available to the MMU. It is similar to PAE in fact ARM calls it LPAE.
No, it can do glorified bank-switching. That's not an increase in address space, it's a hack that assumes applications will still keep their memory usage to no more than 2-3GB.
Its not bank switching, its the MMU mapping the 32bit virtual address space of each process to a much larger physical address space. You are right that each process is still limited to 32 bits of virtual address space but bank switching has nothing to do with it.
Having extra hardware to help speed it up doesn't stop it from being bank switching. That is its essential nature, it doesn't matter how heavy the lipstick is.
By that same logic, all of paged memory management is bank switching. I might also point out that current x86 CPUs map 64-bit addresses into 43-bit physical addresses using pages. Is that reverse-bank switching?
It's purely coincidental if your CPU was designed to map N-bit virtual-space pointers into N-bit physical-space pointers. It's much more common, both now and historically, to map from N-bit to M-bit. There is no hack here.
Its worth mentioning also that x86 64bit CPUs still use PAE, in fact they have to, its mandatory in long mode ("real 64 bit mode"). Additionally the version of PAE used in 64 bit mode is an extension of the original 32 bit PAE, adding an extra directory layer to support up to 52 bits of physical address space. Although i don't think any processor has implemented more than 48bits of address space so far.
Having extra hardware to help speed it up doesn't stop it from being bank switching. That is its essential nature, it doesn't matter how heavy the lipstick is.
I wonder how these two new modes will relate to existing modes like Thumb2?