I wonder how these two new modes will relate to existing modes like Thumb2?
The answer is that there is only one 64-bit ISA, but all current 32-bit modes are still supported.
Historically, low-end parts have managed to kill high-end parts by selling in larger volumes and amortizing their fixed costs over a larger base. ARM sells in huge volumes, but how is selling $1 embedded chips going to help them amortize the cost of their high end stuff? Decades ago, doing that sort of thing would helped Intel fill up their idle fab time while competitors like DEC struggled to pay for their fabs, but ARM has never owned its own fabs, and selling completely different low-end parts doesn't really help amortize the cost of masks, post-silicon debug, etc. of their high end parts.
I'm also skeptical that ARM will retain its power:performance advantage if it tries to scale up performance to match low-end mainstream Intel parts (e.g., core i3). The main reason ARM has better performance:power is that there are diminishing returns to performance. Another is that Atom is a new product line, and Intel doesn't have much experience trying to optimize for low-end parts. That won't be true going forward. Yet another transient advantage ARM has is that Intel uses a high performance process, but they've said that they're going to have both a high performance and a power optimized process in the future. If anything, Intel's process wizardry should give them an advantage there, once they decide to push it. What other inherent advantages does ARM have over x86? It's a PITA to decode x86 instructions, but the ARM instruction set isn't very nice, either (e.g., look at how they ran out of opcode space, and overlayed some of their "new" NEON instructions on top of existing instructions by using unused condition codes for existing opcodes).
I wouldn't bet against ARM but, if they win, it will be because of the usual reasons: better marketing, better business practices, and better engineering, not because there's some inherent advantage that lets them be cheaper than Intel. And, by the way, the configurability advantage you mention actually hurts them in terms of cost, by increasing the fixed cost per part.
If they needed to, Apple could afford to front someone the entire cost of building a leading-edge fab (or, I suppose, build one for themselves, except that I doubt they would want to operate a fab).
So if Apple wanted to make some super advanced ARM chip available at a consumer price, they could probably change the colume aspect of the equation radically.
However, all modern x86 cores (P2 and up, PPro, K7 and up, etc) are micro-op RISC-ey cores, and the only x86-ness in the entire CPU is in the instruction decoder and scheduler block.
Atom is interesting only in the sense that Intel stripped everything they could off the P6 core and then redesigned stuff purely to use less power. In the end, they produced a horridly slow waste of time, money, and power.
About 6 months later AMD came out with a cut down K8 (while the desktop chips were second generation K10s) that has the same power usage, runs several times faster, supports dual channel DDR3-1333 (vs the Atom's single channel DDR2-533), and comes with a GPU that can decode H264 off Blurays (Atom's paired GPU, a low end GMA, can't decode anything) and can do DX11 (vs DX9) and can do Aero (vs nada).
Intel is coming out with newer Atoms that suck less, but they're nothing compared to what the new Bulldozer-based mobile chips are rumored to be.
Unfortunately, it looks like it all was a fluke. I haven't heard of any notebooks with Atom Z6xx series, presumably because Windows won't run on them because they don't have a PCI bus. With the architecture consisting of a single model (Vaio P-Series) the possibility of running Linux is vanishing: there was a special build[sic!] of Ubuntu that should run on it, but that's it.
Intel clearly wants to be a player in the consumer SoC market. And they've had... let's say rather mixed success there. But they'll keep trying. Similar challenges were seen in the early 90's vs. the RISC world and '99-02 years vs. AMD. They reworked, adapted, and clobbered the competition.
As someone else posted, their death has been predicted so many times over the years that it's almost a running gag in the industry.
A mid-level Lynnfield CPU of about 300 mm^2 would mean a margin of $600, but the processors sell for typically $200-$500.
I honestly don't know where I got that number from. I was thinking something like $200 average for the 2-core Sandy Bridge die, which is 90mm^2 I think. I'm obviously not privy to sales volume numbers nor to what their wholesale prices look like. So feel free to apply a factor of N to that intuition.
And obviously I meant revenue, not margin. Apologies.
they had better price/performance at the start and just plain better performance a bit later. Alpha ran Windows and was a nice CPU. Unfortunately it did _cost_ .
Can ARM run Java for enterprise users? Nehalems with DDR3 have hard time doing it.
There is of course the market for energy efficiency coming with ARM's straight in-order execution, and if we multiply these simple cores, and with programmers and compilers doing better work on parallelization we even can get decent performance, yet i doubt it will beat out-of-order superscalars.
On the other side it isn't [directly] about price, performance or efficiency because it isn't direct competition. x86 run desktops which was explosively growing computing segment of yesterday. ARM runs mobile - explosively growing computing segment of today. The arguments of big iron CPUs vs. x86 back then were similar to the arguments of desktop/server x86 vs. ARM today. These arguments are technically valid, yet they just don't matter today like they didn't matter yesterday as they talk about different computing devices and task they do. The mobile device primary task isn't Excel. It is maps, photos and Facebook/Twitter. This mobile computing looks primitive the way Visicalc looked primitive to the computing of big iron. Yet it is the largest piece of tomorrow's computing and ARM is just more suitable for it than x86 the way x86 was more suitable than Alpha/Sparc/Cray/etc... for desktop.
When using a JVM that is based on ARM's JVM (which does NOT include Android's), a lot of bytecodes directly translated into existing ops and Java ran much faster.
Due to the lack of documentation, open source JVMs (or even non-Java VMs) could not take advantage of it.
However, Jazelle has been replaced with ThumbEE, which does the same thing, but generically so any language that could benefit from it can use it, and it is also documented and FOSS friendly.
Java (->Jazelle RCT, Java over ThumbEE), Perl, and Python have VMs that take advantage of ThumbEE now.
That's really cool!
Intel has done its best to lock companies out of adding value to Intel's chips. Intel forced Nvidia into a more limited role and even limits the usage of Atoms. Intel also is failing at keeping the wattage of its chips down.
Companies are getting very good at adding their own custom designs to the ARM core (something Intel is kinda / sorta conflicted about). With a 64-bit core, some of Intel's safe space might got away as companies go with a solution that they can decide on fab partners and on chip design elements. Don't want this type of port, then don't put it on. Need more GPU, add it.
The game really isn't technology, it's business model.
Perhaps Version 4 of the Raspberry Pi will use it!
Seems like this is a couple years overdue to have anything close to a comfortable margin.
1. Cell phones may cross over the 2 GB boundary soon,
but ARM can already comfortably address terabytes
of RAM through segments, as mbell notes. At worst,
we're limiting individual applications to under
2 GB of RAM--not the end of the world.
2. While being able to address 64 bits at once is useful
for some computing tasks, *most* of those (scientific
computing being a biggie, but also some parts of video
editing or sound mixing) just aren't ones people
currently do on a cell phone. They likely will
eventually, but the hold-up there is as much about
getting the data onto the phone in the first place as
actually doing anything significant with it.
3. Adding 64-bit wide paths adds a *lot* of silicon for
something that would currently not get used at all,
and in the future won't get used much for quite some
time. More silicon means a shorter battery life and
more heat--not exactly what you want in a cell phone.
Look at how much effort ARM's going to in big.LITTLE
just to let them go superscalar, let alone 64-bit.
4. Finally, the on case where you might get good
benefit--games--are better served through GPUs and
ARM's existing NEON instructions, which already work
on I believe 128-bit-wide values.
But ARM wants to move into servers, and there, the situation's obviously very different. 64-bits is a no-brainer for any moderately sized database, and that's where I suspect ARM really needs this ISA. But those will take a few years to build and gain acceptance, anyway; they've got time.
I can't find the quote, but John Mashey said that adding 64 bit support to MIPS processors cost something like 5% area. Datapaths and architectural registers really aren't that expensive in area or power.
It does takes more power to move 64 bits across the pins and 64 bit addresses take more space. I suspect that those factors are far more important wrt battery and cell phone cost than the processor costs.
Although not without subtle problems, the hardware was generally
straightforward, and not that expensive—the first commercial
64-bit micro’s 64-bit data path added at most 5% to the chip
area, and this fraction dropped rapidly in later chips. Most
chips used the same general approach of widening 32-bit
registers to 64 bits. Software solutions were much more complex,
involving arguments about 64/32-bit C, the nature of existing
software, competition/cooperation among vendors, official
standards, and influential but totally unofficial ad hoc groups.
I think it's talking about the MIPS R4000. It's nice to mention registers, but what about the increase in size of the comparators, adders, multipliers/dividers (ouch), cache system, bus interface, decoder, etc.? Perhaps the point is that being 64-bit compatible does not take a lot of surface, even though being fully 64-bit would be expensive. The article mentions how being forward-compatible can be a good thing (even if it costs a little performance), so I suppose it would make sense to support 64-bit right now and make it fast later, without breaking ISA compatibility.
Edit: reading about the MIPS R4000, it's definitely not a fully 64-bits processor. Some Wiki quotes:
The shifter is a 32-bit barrel shifter. It performs 64-bit shifts in two cycles, stalling the pipeline as a result. This design was chosen to save die area. The multiplier and divider are not pipelined and have significant latencies: multiplies have a 10- or 20-cycle latency for 32-bit or 64-bit integers, respectively; where as divides have a 69- or 133-cycle latency for 32-bit or 64-bit integers, respectively. Most instructions have a single cycle latency. The ALU adder is also used for calculating virtual addresses for loads, stores and branches.
The memory management unit (MMU) uses a 48-entry translation lookaside buffer to translate virtual addresses. The R4000 uses a 64-bit virtual address, but only implements 40 of the 64-bits for 1 TB of virtual memory. The remaining bits are checked to ensure that they contain zero. The R4000 uses a 36-bit physical address, thus is able to address 64 GB of physical memory.
(By the way, thanks for the quotation, that was quite an interesting read. I didn't expect forward-compatibility to be so cheap.)
64 bit operations don't significantly affect the decoder (which isn't a big deal anyway). The option to pull 64 bits out of the cache at a time isn't a big deal given the ability to pull 8, 16, and 32.
> the MIPS R4000, it's definitely not a fully 64-bits processor
Sure it is. It does 64 bit arithmetic with a single instruction. Yes, instructions for different lengths may take different amounts of time, but so what?
I note that various 586/686 implementations have had faster operations on shorter data types (and in at least one case, had slower operations on shorter datatypes). (I'd argue that the first 386 in 32 bit mode was a 32 bit processor, but surely the pentium-class machines are.)
> only implements 40 of the 64-bits for 1 TB of virtual memory
No current x86-64 implementation implements more than 48 bits for either virtual or physical addresses.
Legal alpha implementations could implement as few as 43 bits.
Care to name an implementation of any processor that has 64 virtual and or physical address bits?
The Cortex A15 can already address up to 1TB of memory.
Also, isn't it relevant to process-addressable (virtual) memory? Barring explicit APIs such as AWE. As far as I know, ARM registers are 32b so a process should not be able to load data from an address higher than the 4GB mark (in its virtual memory), am I wrong?
By that same logic, all of paged memory management is bank switching. I might also point out that current x86 CPUs map 64-bit addresses into 43-bit physical addresses using pages. Is that reverse-bank switching?
It's purely coincidental if your CPU was designed to map N-bit virtual-space pointers into N-bit physical-space pointers. It's much more common, both now and historically, to map from N-bit to M-bit. There is no hack here.