
ARM Goes 64-bit - enos_feedler
http://www.realworldtech.com/arm64
======
ajross
It's interesting to see a very x86_64-like attempt to shake off the weirdness
of the ancestral architecture here. The PC is no longer an addressable
register. Thumb has been dropped. The predication bits are no more. The weird
register aliasing thing done by NEON is gone too. The register banking (and it
seems most of the interrupt architecture) is entirely different.

And, just like in the Intel world, market pressures have introduced all new
CISC quirks: AES and SHA256 instructions, for example.

But of course an architecture document does not a circuit make. All the
weirdness (old and new) needs to be supported for compatibility (OK, maybe
they can drop Jazelle), so the fact that they no longer talk about some things
doesn't really save them any transistors in practice.

Honestly, this is sounding _more_ like an Intel or AMD part, not less.

~~~
pjscott
Is having hardware acceleration for AES and SHA256 really a "CISC quirk", or
just a really specialized set of arithmetic instructions? The classic RISC
idea of making the core simple and fast doesn't really apply here; internally,
it's all simple micro-operations driving special-purpose hardware. It seems
similar to having a fused multiply-accumulate operation: they've figured out
how to accelerate the core of a common task, and this is the API they've
decided to give it.

~~~
simcop2387
That's actually an almost reasonable definition of CISC for laypeople. Take a
look at the definition on wikipedia.

    
    
      A complex instruction set computer (CISC, play /ˈsɪsk/)
      is a computer where single instructions can execute several 
      low-level operations (such as a load from memory, an arithmetic 
      operation, and a memory store) and/or are capable of multi-step
      operations or addressing modes within single instructions. 
    

Now the reality is that on the whole things are not quite as cut and dry. In
this case they're doing it to give access to dedicated hardware for power
gains most likely, which is why something that's typically close to RISC would
add something like that. As time has gone on, both CISC and RISC systems have
moved more toward a blend of both in-order to get the best of both worlds,
from what i've heard interally most x86 chips actually work like a risc chip
they just translate between things in the instruction decoder.

~~~
anamax
> A complex instruction set computer (CISC, play /ˈsɪsk/) > is a computer
> where single instructions ... and/or are > capable of multi-step operations
> ... within single > instructions.

What's a "multi-step operation"?

I ask because I worked on the microarchitecture (read "implementation") of a
microprocessor that had what was generally regarded as a very RISC instruction
set.

Yet, almost every instruction had multiple steps. Yes, including integer add.

Were we doing something wrong?

And no, "one cycle fundamental operations" doesn't change things. Dividing
things into cycles is a design choice. For example, one might reasonably do
integer adds in two steps.

~~~
sliverstorm
It's a fuzzy distinction, but it becomes more clear if you look at the x86
instruction set and its extensions.

A very RISC chip usually just has ADD, OR, AND, LOAD, STORE, etc. But in x86
(CISC) we have things like these:

UNPCKLPS: (sse1) Unpack and Interleave Low Packed Single-FP Values

MOVSHDUP: (sse3) Move Packed Single-FP High and Duplicate

AAM: ASCII Adjust AX After Multiply

~~~
anamax
If those ops are register-register, how are they necessarily not-RISC?

Yes, division is inherently more complex than bitwise NAND, but it's not
obvious to me where the line is that you find so clear.

FWIW, I've seen a very serious architecture proposal that used two
instructions for memory-reads. (It had one instruction for memory writes.)
Along those lines, register-value fetch can be moved into a separate
instruction....

~~~
sliverstorm
The sse1 instruction provide the option of register-register, but also support
register-memory. I didn't realize it supported register-register mode, so now
I see why it would be less obvious to you.

~~~
anamax
Why is copying a value from register to memory (or memory to register) "RISC"
while performing some logical operation to the value to the value as moves
"not risc"?

I'd agree that memory to memory is "not risc", but given the amount of work
necessary to do a register access, it's unclear why doing work on a value is
"not risc".

Datapaths are NOT the complex part of a microprocessor.

------
klodolph
> The one surprise in ARMv8, is the omission of any explicit support for
> multi-threading. Nearly every other major architecture, x86, MIPS, SPARC,
> and Power has support for multi-threading and at least one or two multi-
> threaded implementations.

What does this even mean? Are they talking about atomic operations?
Hyperthreading?

~~~
enos_feedler
I think they are talking in the hardware sense: hyperthreading/SMT

~~~
duaneb
How much does that actually help? In my extremely fuzzy memory, it only worked
out to around a 30% increase _in ideal situations_. I'd rather see them work
on features that can be exploited with less voodoo.... like hardware 64-bit
support, or SIMD support, or HTM, or hell, clock rate.

~~~
tedunangst
But it's 30% you get for basically free. I kind of thought HT was mostly a
gimmick (look, now with 256 virtual CPUs), but changed my mind since it
doesn't cost anything (in terms of die space) to add it to a chip. 30% more
performance for 1% more cost is a better deal than 100% more performance for
100% more cost, assuming you can live with only 30% more performance.

I should add I think what AMD is doing with Bulldozer (claiming two virtual
cores are actually full cores) is bullshit.

~~~
duaneb
> I should add I think what AMD is doing with Bulldozer (claiming two virtual
> cores are actually full cores) is bullshit.

I think AMD is doing whatever it can to get people to buy its CPUs. If it
weren't for their ATI purchase, I think they'd be basically dead by now. It
still amazes me how far they've fallen: I built my first computer with an AMD
X2 when I was 15 (6 years ago now) - they looked like they were going to upset
Intel as deciding the future of x86 chips. They did for a while - we got a
sane 64-bit architecture out of it. I'm not sure where they went wrong: was it
marketing, was it manufacturing tech, was it profit margins, was it Apple? I
don't even know if their current processors are competitive or not in the
performance market - things like "Bulldozer" make me think not.

Anyway, could SMT be implemented on top of ARM v8? My knowledge of hardware
doesn't include multithreading. However, from my limited understanding of it,
I don't see SMT making much difference in tight RISC code, which is designed
to have a high instruction throughput per cycle, leaving little for
instruction reordering to optimize.

~~~
tedunangst
One way to think of SMT is context switches for free, and lots of them. What
happens when you run two processes on one core? Every 10ms the kernel copies
out all the registers from one process to memory, copies in the regs for the
other, and switches. What happens when you use SMT? Every "2" instructions the
CPU switches from one process to the other, transparently, without hitting
memory. After 20ms, the same amount of work is done, possibly a little more,
and if process two only had 1ms of work to do, it doesn't have to wait the
full 10ms timeslice of process one.

SMT is not about instruction reordering at all (within one process). Just like
the OS switches between processes whenever you wait for disk, now the CPU
switches processes whenever you wait for memory. It just happens that virtual
cores are the way the OS programs the CPU scheduler.

------
Symmetry
A question about how HN works. I'd submitted the same article to HN at a much
less opertune time so it fell off the "new" page before it got its first
upvotes. [1]

Normally when somebody then resubmits the same article at a better time I
thought they had to add a '~' at the end of the URL or something, but I don't
see anything like that in this case. So how'd they do it?

(And I should say I'm glad that you all get to see this article, so thank you
enos_feedler).

[1]<http://news.ycombinator.com/item?id=4380604>

~~~
parenthesis
This submission has url <http://www.realworldtech.com/arm64>, yours has
<http://www.realworldtech.com/arm64/>.

------
rogerbinns
I wonder if it is possible to make the CPU be 64 bit only, and how much die
space/power it would save by not including the 32 but cruft.

I'd also be curious if it was possible to software translate 32 bit arm
binaries into 64 bit while retaining comparable performance.

------
lunarscape
Honest question since I'm confused about terminology: Why is ARMv8 described
in places as "backwards compatibility for existing 32-bit software" when some
existing instructions will be removed in AArch64?

~~~
Symmetry
Because the ARM front end is so simple, it isn't hard to have multiple ones.
One of these will be able to run existing 32-bit software.

------
rblackwater
It looks like an interesting article, so it is a shame that it was split into
5 pages with no way to view everything on one page. I have no recourse but to
not read the article at all.

~~~
dangrossman
> I have no recourse but to not read the article at all

You could click the next button 4 times and read the full article. There's
lots of content on each page. It would've taken 100% less typing than this
complaint, and you would've spent that time learning instead of grumbling.

It's a real shame you can't read books either. Whole libraries of documents
split into pages with no "view all" button.

~~~
rblackwater
There's a big difference in turning a page the size of your hand when you are
already holding the book and trying to click a micro button the size of a word
when you use the arrow keys to move the browser window.

I'll just wait until the exact same information appears on a single page. I
was expressing sincere regret because I liked the first page, but I absolutely
will not read paginated articles.

Also: you're obviously irritated by my grumbling, but grumbling about it is
just a massive load of hypocrisy, so please realize I'm not going to be taking
any of your comments all that seriously.

~~~
waterhouse
I agree with your sentiment. Hence:

    
    
      ~ $ curl -s 'http://www.realworldtech.com/arm64/'{1..5}'/' --compressed > a.html; open a.html
    

("open" is OS X-specific; "nautilus-open" might have a similar function on
Linux or something.) Interestingly, that website seems to deliver gzip-
compressed output no matter what you request.

~~~
steve19
That is clever. What is the {1..5} syntax called? I am trying to figure out
what the zsh equivalent is.

~~~
waterhouse
Relevant terms are "brace expansion" and "range". And, um, at least for me,
the command I wrote works verbatim in zsh. (I think zsh is supposed to be
bash-compatible like that.) Brace expansion works like this (in zsh and bash):

    
    
      % echo {1..5}
      1 2 3 4 5
      % echo meh{1..5}
      meh1 meh2 meh3 meh4 meh5
      % echo {1..5}{1..5}
      11 12 13 14 15 21 22 23 24 25 31 32 33 34 35 41 42 43 44 45 51 52 53 54 55
      % echo {1,2,4}{1,3,9}
      11 13 19 21 23 29 41 43 49
    

I have observed one difference in brace expansion: {a..f} -> "a b c d e f" in
bash, but "{a..f}" in zsh. Curious. Oh well.

