
Exercises in Emulation: Xbox 360’s FMA Instruction - nikbackm
https://randomascii.wordpress.com/2019/03/20/exercises-in-emulation-xbox-360s-fma-instruction/
======
udp
_> I left the Xbox team long before the Xbox One shipped and I haven’t paid
any attention to it since then, so I don’t know what they decided to do._

From experience trying to play 360 games on an Xbox One, the console reads
nothing from the DVD and instead downloads the game from the Internet. It also
only works for specific games. I therefore assume they gave up with emulation
and simply recompiled certain 360 games for x64.

~~~
MikusR
[https://www.eurogamer.net/articles/digitalfoundry-2017-xbox-...](https://www.eurogamer.net/articles/digitalfoundry-2017-xbox-
one-x-back-compat-how-does-it-actually-work)

~~~
Asooka
This makes me wonder if the Halo Master Chief Collection release on PC will be
based on emulation.

~~~
WorldMaker
The Xbox One is x86/64, so emulation shouldn't be necessary for the MCC. The
remaining questions are how much it is a "port" from what was released on the
Xbox One.

\- Supposedly the early Xbox One games used a fork/version of DirectX that
wasn't ever quite released in that form in Windows 10. Rumors are that has
been corrected in the upcoming April Release.

\- Also, the early Xbox One games used very different Microsoft Store asset
servers/etc. The State of Decay Insider tests seem to indicate that April
Release of Windows 10's Microsoft Store can now speak to Xbox asset
servers/etc directly. (Indications that the State of Decay game installed from
possibly the exact same assets in the April Release as it would on an Xbox
One.)

\- Finally, early Xbox One games sometimes made assumptions that they ran in
lightweight VMs with almost no multitasking (except for Kinect-related
restrictions when that was mandatory) and some very specific performance
characteristics versus the wild west of available PC hardware. Rumors abound
that Hyper-V improvements in Windows 10 solve some of the VM/multitasking
issues. Leaving only, perhaps, the need to test on a variety of PC hardware
configurations.

Obviously, a lot of speculation there. Presumably a lot more information will
surface as the April Release happens and as E3 approaches.

~~~
WorldMaker
Also, to point out it doesn't appear that the MCC itself uses 360 emulation.
It predates the 360 emulation, for one thing, and you can easily contrast how
the MCC runs to Rare Replay which does clearly boot up the 360 VM for some of
its games (and which show up outside of the combined Replay launcher in the
Games Library as back compat 360 games).

Supposedly a true x64 port of the multiple generations of the Halo engines was
an important goal of the 343 team to feel that they knew the engines inside
and out, and had a good handle on using that engine for Halo 5 to give a
longer lead time on their next engine. (Halo Infinite [6] supposedly will use
a brand new engine codenamed "Slipspace".)

Other signs that MCC was most likely a very "proper" port of the engines to
x64 include some of the ways bugs riddled the early post-launch phases,
especially in multiplayer code.

------
anisppp
Personally found this article overly dramatic and a bit dumb. All this wall of
text just to point out that FMA needs to be emulated... ok.. but it’s a solved
problem. This is also not a new problem.. fmaf is defined in C99 standard so
There are multiple open source library implementations of FMA using round to
odd for years.

You can just read that code and see how it’s done. Despite the ridiculous
ending there was no mystery to begin with.

[https://www.lri.fr/~melquion/doc/08-tc.pdf](https://www.lri.fr/~melquion/doc/08-tc.pdf)

~~~
eindiran
That is weird. The paper was published in 2009 by Microsoft Research, and the
Xbox One entered development in early 2011 according to Wikipedia
([https://en.wikipedia.org/wiki/Xbox_One](https://en.wikipedia.org/wiki/Xbox_One)).
So when the author started working on the problem for Microsoft, Microsoft
already knew how to solve the problem.

~~~
brucedawson
Author here: I don't remember exactly when I did my investigation but I left
the Xbox team in early 2009, so it would have been some time before that.

The 2011 date from Wikipedia is for when hardware development began.
Investigations into what hardware would be used would obviously predate that
so there's no contradiction.

Anyway, I'll have to digest that FMA emulation article and see what sort of
performance their algorithm achieves. A branchless and vector-math compatible
implementation would be important for performance.

~~~
eindiran
Thank you for response/clarifying the timeline. I didn't realize that
development on non-hardware parts of the system would happen so much earlier,
but it makes sense that you'd want to know if changing archs made your catalog
unplayable!

------
lifthrasiir
As far as I know, a correct emulation of FMA involves the double-double
approach [1], that is to split a logical mantissa potentially larger than the
native mantissa and merge them later. This is of course expensive and probably
not a good fit for the OP's purpose anyway.

[1] [https://hal-ens-lyon.archives-
ouvertes.fr/inria-00080427v2/d...](https://hal-ens-lyon.archives-
ouvertes.fr/inria-00080427v2/document) has a verified proof.

~~~
pascal_cuoq
double-double or quad precision is necessary to emulate double-precision FMA.
The article is talking about emulating single-precision FMA with double-
precision. The relevant paragraph is:

> Luckily the vast majority of floating-point math in games is done to float
> (32-bit) precision, and I was quite happy to use double (64-bit precision)
> instructions in the emulation of FMA.

One big difference between quad-precision and double-double is that quad-
precision has a much wider exponent range. If you use double-double, you need
to worry that the result of the multiplication may underflow a double. In the
article you cited:

> First, the value ul has to be the error term of the multiplication a · b, in
> order to avoid some degenerate underflow cases: the error term becomes so
> small that its exponent falls outside the admitted range. […] The algorithm
> will behave correctly even if some computed values are not normal numbers,
> as long as ul is representable.

… implying that the algorithm may not compute the FMA if the error of the
multiplication is not representable, which can happen when it is below the
normal range.

~~~
lifthrasiir
Probably I was not as careful in the choice of a word "double-double". My
point was that the error recovery (or, as the paper refers, the error-free
transformation) seems crucial for FMA emulation in general. You are entirely
right that double-double has many pitfalls.

My thinking was that, in this particular case we need around 24 × 3 = 72 bits
of mantissa (I haven't verified the exact number, but it clearly exceeds 60
bits) to avoid the double rounding---which double precision cannot provide.
The verified algorithm gives a lot more than enough headroom for this
particular setting: ExactMult is just a normal double multiply and ExactAdd
will recover the error out of double addition. It might even be possible to
optimize later cases. But it seems to me that you can't really get rid of the
error recovery procedure itself. Well, I may be wrong.

EDIT: Oh, I see your neighboring replies. So I _was_ wrong! The glibc solution
however looks pretty expensive and it is unfortunate that there exists no
faster alternatives known.

~~~
pascal_cuoq
There is no reason to estimate the required precision as 3 times the original
precision, because floating-point addition does not work like that.

If you want to compute the exact result of a floating-point addition, you need
approximately emax - emin bits of precision. Floating-point addition is never
computed this way.

On the other hand, multiplication does have the property that the required
precision for representing the result of multiplying numbers with precisions p
and q is p+q.

~~~
lifthrasiir
We don't compute the exact sum, we just need enough precision to ignore the
double rounding. The most pathological cases are therefore either:

\- the product is _just_ above the ULP of the addend, or

\- the addend is _just_ above half the ULP of the product.

I'm not sure about the latter (the possible bit patterns of the product are
constrained) but the former clearly requires 3 times the original precision,
and beyond that there is no possibility of double rounding. The same thing can
be said for the latter.

Of course all these points are moot when it is known that rounding-to-odd can
be used to avoid error recovery at all.

------
robbiet480
> If you apply for a mortgage when your job title is emulation ninja then you
> are in a quandary. If you write that on the mortgage application then you
> look like a lunatic. If you write “software engineer” then you get in
> trouble when the mortgage broker calls your employer. You know.
> Hypothetically.

~~~
chrisseaton
Do mortgage brokers call your employer? What’s your mortgage got to do with
your employer?

~~~
colechristensen
Verifying the things you put on your mortgage application are true, like
employment details.

~~~
chrisseaton
I’ve never had that happen - they just looked at my bank account. I wouldn’t
want a mortgage provider talking to my employer.

~~~
mcculley
As an employer, I have had mortgage brokers call our HR lead quite often to
verify loan applications.

For those who are concerned about privacy: I have found that those loan
providers are then selling that data to websites that offer aggregated salary
data.

~~~
SmellyGeekBoy
Just wanted to add another data point and confirm that I've also had to do
this as an employer (in the UK). Salary, length of employment and job title at
least.

Edit: Also for rental contracts.

------
SlowRobotAhead
The whole time I was reading I was interested in what they chose to do... a
lot of build up and no solution :/

Maybe they did recompile the games for backwards compatibility, maybe they do
have an emulator, IDK, but I want to know how they would have fixed this
issue.

~~~
monocasa
They ended up shipping processors that support FMA on the xbone, so they
probably just used that.

~~~
dogen
everywhere i looked said jaguar doesn't though

~~~
monocasa
Oh weird, you're totally right.

Looks like the removed FMA4 on the bobcat to jaguar transition.

------
pedrocr
Do games really rely on the rounding behaviors of floats to not break? Seems
like there should always be plenty of margin around that. But maybe something
does a loop with these instructions over and over and the error compounds?

And wouldn't the solution on x86 be to use the more than double precision
floats that are available in the platform?

[https://en.wikipedia.org/wiki/Extended_precision](https://en.wikipedia.org/wiki/Extended_precision)

~~~
kevingadd
They do. Slight problems with floating point behavior are the cause of many
historical problems with GameCube and Wii games in the Dolphin emulator (which
also has to emulate PPC on x86/x64)

For a simpler example from personal experience, I had a textbook
implementation of triangulation via ear clipping to turn polygons into a list
of triangles I could send to the GPU. On Windows it worked great but when I
ran it on an XBox 360 it looped infinitely. This turned out to be because the
rounding behavior on the 360 was different and the algorithm is fundamentally
unstable using floats.

~~~
pedrocr
> This turned out to be because the rounding behavior on the 360 was different
> and the algorithm is fundamentally unstable using floats.

That sounds like a bug though but I guess that's the point. Unless you're bug
for bug compatible a reasonable amount of code will fail a significant amount
of time...

~~~
unwind
Performance is often a critical feature in games.

Nobody will play a janky game and go "yeah, it's a bit rough but I'm sure the
floating-point operations are done sensibly if it were to run under emulation,
two console generations in the future".

Console games are not like ordinary/business applications where being 15 ms
late is almost always better than tying the code to the hardware.

------
pascal_cuoq
Bruce Dawson always does an excellent job of explaining subtle floating-point
behaviors simply, but there is one sentence I do not agree with in this
particular post:

> However, for any rounding rule that you might come up with there is a case
> where the double rounding will give you a different answer from a true FMA.

For every “directed” rounding (up, down, towards zero), rounding the result of
one operation first to higher precision and then to the intended precision is
identical to rounding directly to the intended precision. For this reason,
computing the FMA as “first compute the multiplication in higher precision so
that no rounding happens in this step, then add the third operand at the same
precision, then round to the nominal precision” does not suffer from double-
rounding issues in all these rounding modes (which are all the rounding modes
defined by IEEE 754 other than “round to nearest”).

So you do not even need to “come up with” them. They already exist, they are
all the standardized rounding modes other than “round to nearest”.

Note: the reasoning above assumes the result of the multiplication is
representable as a normal number in the higher-precision format. It is a
property of IEEE 754 formats that the next more precise one can always
represent the result of the multiplications of two finite numbers from the
format below it as a normal number.

