
Doubling Mono’s Float Speed - ot
http://tirania.org/blog/archive/2018/Apr-11.html
======
aetherspawn
Wow, I hadn’t imagined that because the coordinates are sent to the GPU as
floats, that means triangles start voxelizing when you get further away from
origin.

Seems like it could make an awesome shader-free animation if you translated
the entire worlds position by a ridiculously large (increasing in value)
float.

~~~
Const-me
> that means triangles start voxelizing when you get further away from origin

They can in a few edge cases, but in most cases they don’t.

Simplifying many things, to render a model, a game engine uploads 2 things to
GPU:

1\. Model’s vertex buffer + index buffer. The vertices are in mesh’s own
coordinate system, and most 3D designers don’t design their meshes placed
100km away from origin.

2\. A single 4x4 matrix, containing ( world * view * projection ) transform.
World transforms the model from local to world coordinate system, view from
world to camera related, and projection from camera related to 2D screen
coordinates + depth.

If you’re 200km far from origin, and are looking at a model near the camera,
world transform will contain large values because you’re very far, view
transform will also contain large values because camera’s also very far, but
multiplied together they won’t have very large values, because model is near
the camera.

And if your model is very far from the camera, so the ( world * view *
projection ) transform contains huge values, you won’t notice precision
degradation, because the whole model will occupy a single pixel at most.

~~~
dahart
> They can in a few edge cases, but in most cases they don’t. [...] world
> transform will contain large values because you’re very far, view transform
> will also contain large values because camera’s also very far, but
> multiplied together they won’t have very large values, because model is near
> the camera.

This is a good point; this problem is more prone to show up in a ray tracer
than a rasterizer, since rasterizers have to apply the camera transform to the
geometry, and ray tracers don't.

It's pretty easy to see this problem while using Maya though. Z-buffer
resolution in the editor drops off from the origin.

We might see this issue crop up with increasing frequency as more and more
people use GPUs for ray tracing...

~~~
1wd
And the problem is old and well known e.g. by game developers. For example "A
Real-Time Procedural Universe, Part Three: Matters of Scale" by Sean O'Neil
from 2002 [1] discusses rendering problems at planetary, star-system and even
galaxy scale, including Z-buffer precision and various options regarding
32-bit float vs. 64-bit double vs. 128-bit fixed-point integer + float offsets
(for vertex coordinates of a mesh) etc.

[1]
[https://www.gamasutra.com/view/feature/131393/a_realtime_pro...](https://www.gamasutra.com/view/feature/131393/a_realtime_procedural_universe_.php?print=1)

------
Retr0spectrum
"Keanu wonders: is Minecraft chunky purely because everything’s rendered
really far from the origin?"

Actually, Minecraft used to have some interesting float-precision-induced
artifacts once you got far from the origin:

[https://minecraft.gamepedia.com/Far_Lands](https://minecraft.gamepedia.com/Far_Lands)

~~~
Amelorate
Minecraft's Far Lands aren't actually related to floating point precision,
although movement near the Far Lands is affected.

My understanding of what causes the Far Lands isn't that great, but I think
it's caused by one of six or eight shorts overflowing in the generation
algorithm.

Edit: It seems I needed to read more of the Minecraft Wiki link for the
Farlands.
[https://minecraft.gamepedia.com/Far_Lands#Cause](https://minecraft.gamepedia.com/Far_Lands#Cause)
covers it much better than I could.

------
drew-y
I've run into a similar issue writing a ray tracer in Swift! We are even both
using Peter Shirley's "Ray Tracing in One Weekend" as a reference! I'm to this
day struggling to get the performance near the level of the C++ reference
implementation. I've made some improvements but overall the swift version is
around 5x slower.

If any one is interested the source code is available here:
[https://gitlab.com/youngwerth/ray-
tracer/settings/repository](https://gitlab.com/youngwerth/ray-
tracer/settings/repository)

Compiled using "swift build -c release".

~~~
CyberDildonics
You should look at the assembly to know for sure (after you have profiled and
narrowed it down).

With this sort of speed difference my guess is that there is either excess
memory allocation going on, pointer hopping when you think you are using
something by value on the stack, or both.

~~~
maxxxxx
I'll second looking at the assembly code. I had a similar problem a few years
ago and an assembly expert told us that we were moving data in and out of the
cache all the time instead of leaving it there. We had to change the way we
looped through arrays and got a huge improvement. this could be wrong but I
would also guess that the Swift optimizer is not as good as the ones C++ has
mainly for the reason that it's newer.

~~~
CyberDildonics
You don't need to look at assembly to avoid cache misses, you just need to
understand what the language is doing under the hood in a general sense, then
make sure you access memory in a way that it can be prefetched.

My guess is that it is either the program being written slightly differently
or that swift is doing some sort of indirection under the hood.

~~~
maxxxxx
"you just need to understand what the language is doing under the hood in a
general sense"

Isn't the best to find out to look at the assembly? How else can you know what
the compiler and optimizer are doing?

~~~
CyberDildonics
No, understanding what the language is doing is not the same as looking at
assembly or even understanding what the optimizer is doing.

If swift is creating some variables on the heap and/or creating virtual tables
for inheritance (I don't know much about it) then you don't need to look at
the assembly to know that you are creating indirection or doing too many heap
allocations.

------
bjourne
It pains me to hear about developers doing the wrong thing just to beat a dumb
benchmark. If you really care about cpu raytracing performance, you need to
write handcrafted simd code and C# default float handling is of no consequence
to you. Correctness > speed.

~~~
vanderZwan
Did you even read the article?

> _In Mono, decades ago, we made the mistake of performing all 32-bit float
> computations as 64-bit floats while still storing the data in 32-bit
> locations._ (...) _Applications did pay a heavier price for the extra
> computation time, but [in the 2003 era] Mono was mostly used for Linux
> desktop application, serving HTTP pages and some server processes, so
> floating point performance was never an issue we faced day to day._ (...)
> _Nowadays, Games, 3D applications image processing, VR, AR and machine
> learning have made floating point operations a more common data type in
> modern applications. When it rains, it pours, and this is no exception.
> Floats are no longer your friendly data type that you sprinkle in a few
> places in your code, here and there. They come in an avalanche and there is
> no place to hide. There are so many of them, and they won’t stop coming at
> you._

The raytracer is just a good performance test.

~~~
bjourne
Yes I did and fwiw, comments asking people whether they read the article or
not is ot on hn. A performance test is exactly the same thing as a benchmark.
In any real world code, slower floats doesn't matter at all. None of you who
have commented have been able to or even tried to prove me wrong on that
point.

~~~
kibibu
> In any real world code, slower floats doesn't matter at all

This is the same reasoning that tanked the Cyrix 6x86.

If it were true, why don't we just ditch hardware floating point altogether
and just emulate it with integer arithmetic instead? I'm sure chip
manufacturers would appreciate having the die-space back.

~~~
bjourne
That's a straw man. I explained that I meant that a slower builtin floating
point _TYPE_ doesn't matter. "If you really care about cpu raytracing
performance, you need to write handcrafted simd code and C# default float
handling is of no consequence to you."

