
My work for CERN on gather-scatter for AVX in software 2012 - zpiro
Most of the content sent to ARM and AMD along with Intel investor relations. Due to this not being shared from CERN and only Intel using it.<p>&quot;Attached is my draft for publication of my summer student at CERN OpenLab in 2012.
Where we discovered the problem of transient data in pipe-lining data from memory.
To take full use of the AVX instruction set due to data pointers in arrays would be created and destroyed.
Leaving mostly empty data that would need to be searched through.<p>I discovered it would save a lot of ASM memory commands by looping over the data.
In order to create an index map over where real data was located such that JMP could
be used.<p>Was particularly useful to study the algorithm and data structures used in GEANT prototype project for this.
Due to the use of multiple geometries to calculate particle trajectories through.
In addition to different types of particles lacking particle-particle interactions.<p>Thus allowing to take advantage of this semantics and structure to keep track of real-data
for efficient pipe-lining of data through vector instructions.<p>Recently discovered that this paper wasn&#x27;t published and not sure if my last draft even was the final one.
For these explanations seem to be a bit unclear. This work was done to study and make sure the world
of High Energy Physics could keep taking advantage of the developments in this direction.&quot;
======
zpiro
Actually,it was for auto-vectorization. Given the data structure and algorithm
used it wasn't feasible. And ended up proposing gather-scatter scatter-gather
in software as though different math involved for different geometries, it is
still embarrassingly parallel, but not for single simulated collision events.
And it was for GEANT to take advantage of future hardware and increase
instructions per clock where memory bandwidth is a problem:
[http://geant.cern.ch/](http://geant.cern.ch/)

------
zpiro
Worth mentioning a Haswell system with QDR i helped build. I discovered that
it was needed to use an older version of libpsm to avoid a bandwidth boost
tweak that increased latency which isn't the competitive advantage of IB QDR.
Also highly disappointing they down clocked Haswell as soon as AVX2 was
touched. Given no temperature increase, there isn't a thermal argument for it
just for using more of the micro instructions on the die. Was an early system,
and needed to use end-of-life CentOS for the correct library versions.

------
zpiro
So gather-scatter is slightly inaccurate, as it is more detailed memory
management.

~~~
zpiro
Although, quite obviously used to accelerate memory access in AVX2. Sorry for
adding much after, its 6 years ago.

