The classic answer is "C with inline assembly". Nowadays I would add "Rust with inline assembly" as an option, but either way you're going to spend a lot of time working at a level of abstraction significantly lower than a typical contemporary game engine like Unity or Unreal.
Depending on exactly what calculations your engine will be performing, you might be able to reuse one of the libraries designed for AI work. For example I recently had a good experience with Intel's OpenVINO, which is able to run StableDiffusion at acceptable speed on a my (AMD) CPU.
Also, given that your problem is so computationally difficult that it can't be run on current CPUs, it seems odd to add a secondary goal of running on cheap machines without a powerful GPU. Maybe try to get it to run at all first (which might take a few years), then see what the market looks like and decide at that point how much you want to support legacy hardware.
Depending on exactly what calculations your engine will be performing, you might be able to reuse one of the libraries designed for AI work. For example I recently had a good experience with Intel's OpenVINO, which is able to run StableDiffusion at acceptable speed on a my (AMD) CPU.
Also, given that your problem is so computationally difficult that it can't be run on current CPUs, it seems odd to add a secondary goal of running on cheap machines without a powerful GPU. Maybe try to get it to run at all first (which might take a few years), then see what the market looks like and decide at that point how much you want to support legacy hardware.