C++ could add a an explicit conditional move I suppose. `x` and `y` types would have to be restrictive:
x = std::cmove(y,flag);
The compiler would be slightly mrve compelled to use a hardware conditional move than in the following case:
if (flag) x = y;
The other option is to do something more like CUDA / SIMD kernels do ... every line gets executed but each each instruction inside the "false branch" becomes a no-op. Of course this requires hardware support.
I've used it (sparringly) for vectorizable, branchy code and it's mostly been a simple process, with very efficient binaries produced (often beating hand written intrinsics by intermediate level coder - themselves beating the autovectorizer).
Don't know about using it in prod on multi generational hardware though.