
Porting GHC: A Tale of Two Architectures - lelf
http://www.chiark.greenend.org.uk/ucgi/~cjwatson/blosxom/2014-04-15-porting-ghc-a-tale-of-two-architectures.html
======
rwmj
Interesting. I packaged up OCaml native aarch64 (arm64) and ppc64le backends
for Fedora a few weeks ago. (Most of the hard work was done by Benedikt Meurer
and Michel Normand). Neither required cross-compilation, since the native
compiler compiles itself using the OCaml bytecode interpreter.

It did reveal an actual bug in [edit: a branch of] qemu. It wasn't emulating
the aarch64 RET xN instruction correctly. Apparently no other software in the
whole of Linux uses this strange variant of RET.

[https://bugs.launchpad.net/qemu/+bug/1263747](https://bugs.launchpad.net/qemu/+bug/1263747)

~~~
pm215
Ha, I'd forgotten about that bug report (just closed it) -- it was only in the
SuSE version of the aarch64 QEMU, not in the mainline one.

I expect most code generators use BR for "jump to destination in arbitrary
register" \-- BR and RET behave identically except for RET providing a hint
for branch prediction and for debuggers that it's a return-from-subroutine
rather than a random jump; so "RET LR" is really common and "RET <any other
register>" is kind of weird.

~~~
rwmj
This is the function using ret x19:

[https://git.fedorahosted.org/cgit/fedora-
ocaml.git/tree/asmr...](https://git.fedorahosted.org/cgit/fedora-
ocaml.git/tree/asmrun/arm64.S#n272)

What do you think? Looks like a return to me, albeit using an unusual
register. Note that this code is especially speed critical because every time
OCaml calls a C function that could allocate GC'd memory, it has to go through
this function.

~~~
pm215
Yeah, that's a legitimate use; it is (judging by the code) relying on the fact
that it's glue between two calling conventions where the outer one has more
callee-saves registers, so we can save LR in a register rather than putting it
on the stack.

------
pedrocr
Does anyone have any good pointers on why Ubuntu is adding a little-endian PPC
port? I thought they had dropped the regular PPC port a while ago.

Googling around it seems the use for little-endian PPC is to be more easily
compatible with GPUs. Is there really a market for running GPU computing with
PPC64 CPUs?

~~~
cliffbean
This document lists some reasons [0]

• Growing interest in running entire OS in little-endian mode – Ease porting
of programs from other architectures – Ease porting of programs which access
files containing LE binary data – Ease communication with GPUs • New OpenPower
Consortium – IBM, Google, Tyan, Nvidia, Mellanox

Also, see [1].

[0] [http://www.linux-kvm.org/wiki/images/7/70/Kvm-
forum-2013-Mac...](http://www.linux-kvm.org/wiki/images/7/70/Kvm-
forum-2013-Mackerras.pdf‎) [1]
[https://www.ibm.com/developerworks/community/blogs/fe313521-...](https://www.ibm.com/developerworks/community/blogs/fe313521-2e95-46f2-817d-44a4f27eba32/entry/confessions_of_a_recovering_proprietary_programmer_endianness?lang=en)

------
spatulon
Most PPC chips have been bi-endian for a long time but, as far as I know,
everyone treated it as a big-endian processor. For example, it's big-endian
all the way in the automotive industry, where PPC is incredibly popular.

Where has the demand come from to start supporting little-endian mode
suddenly? I assume the existence of a Debian/Ubuntu port is evidence of such
demand.

~~~
pedrocr
>Where has the demand come from to start supporting little-endian mode
suddenly?

I found a Debian discussion on this and the only reasons presented were GPU
computing and porting apps that assume little-endian. I assume the GPU
computing is the big reason. As we move to unified memory architectures
between CPU and GPU they need to both use the same endianness and I guess GPUs
are usually little-endian to match x86.

~~~
dman
Can you post the link about GPU computing. I am a bit perplexed about what
powerpc has to do with GPUs.

~~~
_delirium
> I am a bit perplexed about what powerpc has to do with GPUs.

As others note that IBM is funding most of the work, my guess is the main use-
case here is IBM's POWER-based HPC clusters that have PowerPC CPUs augmented
by a bunch of GPU coprocessors for Cuda/OpenCL offloading. IBM is trying to
position POWER clusters as competing with x86-based clusters for certain kinds
of work (especially scientific computing), and to match x86-based clusters
decked out with GPUs they probably need to have GPU options for their POWER-
based clusters as well.

The idea, as I read it (have not had occasion to encounter it myself) is that
things are a lot easier if your CPU and GPU have the same endianness, or else
you have to byte-swap whenever transferring data to/from the GPU (and there
are even more complications if you have unified memory spaces). Since most
(all?) commercially available GPUs are little-endian, if you want PowerPC CPUs
along with GPUs for auxiliary processing, and want the endianness matched, the
little-endian mode of PowerPC becomes important.

~~~
dman
Interesting!

