VonTum's comments

VonTum · 2024-10-27T10:22:53 1730024573

If you don't somehow make something useful, or built a thing people actually use, then what was the point of it all? To have fun and then be forgotten?

Though, with AI looming to take this last shred of human dignity too, maybe having a bit of fun along the way isn't such a bad idea.

VonTum · on June 28, 2023

Lennart here,

Both fundamental techniques were known beforehand, and in both cases the major innovation was adapting the algorithm for the specific compute platform. For us it was exploiting FPGAs to compute these P-Coëfficients, for Jäkel, his major contribution was restructuring the summation as large matrix multiplications, which could then be executed very efficiently on GPU.

VonTum · on June 28, 2023

Lennart here,

While Noctua 2 is certainly not one of the larger clusters out there, it does stand out as one of the only FPGA superclusters in Europe. This is the unique capability that made our project possible there.

And yes, assigning unique credit in this case isn't trivial. I think that the only good answer here is: "It was discovered Simultaneously, independently"

VonTum · on June 28, 2023

Hi, I'm Lennart, the author of the FPGA paper.

You can't really compare them on a FLOPS basis. Firstly because our and Jäkel's algorithms are completely different. In fact within the FPGA accelerator itself we don't even use a single multiply, all our operations are boolean logic and counting. Whereas Jäkel was able to exploit the GPU's strong preference for matrix multiplication. All his operations were integer multiplications.

In fact, in terms of raw operation count, Jäkel actually did more. There appears to be this tradeoff within the Jumping Formulas, where any jump you pick keeps the same fundamental complexity, with even a slight preference towards smaller jumps. It is just that GPU development is several decades ahead of FPGA development, thanks to ML and Rendering hype, which more than compensates for the slightly worse fundamental complexity.

As a sidenote, raw FLOP counts from FPGA vendors are wildly inflated. The issue with FPGA designs is that getting this theorethical FLOP count is nigh-impossible, because getting all components running at the theorethical clock frequency limit is incredibly difficult, compare that with GPUs, where at least your processing frequency is a given.

versteegen · on June 29, 2023

Hello, thanks for taking the time to reply here! My own Master's thesis was also about optimising and implementing algorithms to count certain (graphical) mathematical objects, but you picked a much more famous problem than me. I'm very surprised I didn't know the definition of Dedekind numbers, although it's related to things I touched on.

I'm not too familiar with FPGAs but hope to have a use for them some day. Measuring their performance in FLOPs seems strange. How close to those theoretical limits does one typically get? Are there are a lot of design constraints that conspire against you, or is it just that whatever circuit you want can't be mapped densely to the gate topology?