
How to Program a Supercomputer - rbanffy
https://www.cray.com/blog/how-to-program-a-supercomputer/
======
eatbitseveryday
The title of this blog post is overblown. The text is a setup to advertise the
authors' book, and to meet the company at the Supercomputing conference.

------
0xfaded
I've been writing some ARM assembly lately to basically solve a very specific
non linear least squares problem for SLAM (specifically bundle adjustment).

I started with the library everyone uses (g2o), attempted a vectorised c++
implementation, and ultimately decided it would be easier to develop in
assembly. I've been taking measurements as I go.

    
    
      C++ (g2o) ~ 10-16x
      C++ (NEON intrinsics) ~ 1.5-2x
      ASM 1x
    

In this case register layout was very important and GCC isn't able to, for
example, arrange for two results to wind up in adjacent 64 bit registers and
become a 128bit register. This business of combining registers is an ARMv7
wart and not an issue in Aarch64.

In other cases, such as some bilinear interpolation code I wrote, GCC was able
to basically generate what was more or less optimal.

------
pcunite
What language is shown in the initial graphic? Python?

~~~
submeta
You can tell from this that it's gotta be Python: `"".join(...`

~~~
tyingq
os.makedirs() gives it away as python as well.

------
fermienrico
Why not show Frequency graph instead of clock cycles? I don't know what 1x10-9
means in terms of CPU speed.

~~~
Arelius
Because clock cycles are a linear measurement of time.

Also, for the record, 1x10-9 seconds should be a nanosecond.

~~~
fermienrico
Yes, I know it is a nanosecond but that wasn't my point.

My point is, Freq is widely used in reporting CPU speeds. No one "knows" what
3 ns/clock speed relates to. Everyone knows 4.0 GHz. You see my point?

~~~
Arelius
Yes, I see your point, which is what the primary part of my comment was
directly addressing. seconds/clock is a _linear_ measurement of time, graphs
better, and for people who _actually_ deal with timing of performance
sensitive code, time/op is a much more natural measurement that is generally
preferred when measuring and profiling code.

~~~
fermienrico
Thanks, I think that makes sense since 1/x is a non-linear function.

I have a follow-up question - what's then the advantage of measuring speed as
ops/unit-time (Freq)? How come we don't see 0.25 ns/op as advertised speed on
a Processor?

~~~
Arelius
Exactly.

Honestly I don't know. I suspect it's mostly historical and marketing driven.
I think CPU clocks are driven by some sort of oscillator, which is likely
thought of as a frequency. And ops/unit-time may be a bit ambiguous, since if
we try to choose say instructions, not all instructions take the same amount
of clock cycles.

Additionally from a marketing perspective, I suspect you get a bit of an hz,
more is better, sort of increase of numbers.

Honestly though I really don't know, this is all just speculation. Similarly
video games are often "marketed" in Frames/Second, which is exactly the same
problem.

~~~
fermienrico
You're kind of blowing my mind. The more I think, the more stupid I feel.

If Freq is non-linear function of period, which it is... then Period is also a
non-linear function of Freq. They're an inverse of each other.

So, if we're talking about Frequency, then the same argument could be made
that Period is non-linear and it cannot be used to compare speed, since Period
is 1/f - a non-linear function.

wtf is going on here... my brain hurts.

~~~
Arelius
Yes, that's very true. I think it can be sometimes hard to ground yourself in
what the actual thing is you care about. I think here is where experience
trying to use the values is helpful, hence my video game example in a sibling
thread.

