I’ve moved on to other things so I can’t really give details anymore. I understand this is annoying to hear as someone who works on that library but I also want to say that your comment is also annoying for different reasons, which mostly answer your question so I’ll explain anyway.
Highway is (I feel not very controversially) kind of like a compiler but worse at its job. It’s not meant to be as general and it only targets a limited set of code, namely code that is annotated to vectorize well. But looking at it as a compiler is kind of useful: it’s supposed to make writing faster code easier and more automatic. Sometimes compilers are not able to do this, just as Highway can’t either. Maybe its design lacks the expressiveness to represent the algorithm people want. Perhaps it doesn’t quite lower to the optimal code. Maybe it turns out that so little of the operation maps to the constructs that a huge amount needs to go through the escape hatch that you offer, at which point it’s not really worth using the library anyway. In that situation, given an existing and friendly relationship, I would be happy to reach out. But this is a cost to me, because I need to simplify and generalize the thing I want. Then I hand it to you and you decide how you want to tackle it, if at all. All the while I’m waiting and I have code that needs to be written. This is a cost, and something that as an engineer I weigh against just using the intrinsics directly, which I know do exactly what I need but with higher upfront and maintenance costs. When you see someone write their own assembly instead of letting the compiler do it for them, they’re making their version of the same tradeoff.
> it’s supposed to make writing faster code easier and more automatic
Agree with this viewpoint. I suppose that makes it compiler-like in spirit, though much simpler.
I also agree that waiting for input/updates is a cost. What still surprises me, is that you seem to be able to do something differently with intrinsics, while believing this is not possible as a user of Highway.
It is indeed possible to call _mm_fixupimm_pd(v1.raw, v2.raw, v3.raw, imm), and the rest of your code can be portable.
I would be surprised if heavy usage were made of such escape hatches, but it's certainly interesting to discuss any cases that arise.
I do respect your decision, and that you make clear that raw intrinsics have higher upfront and maintenance costs. I suppose it's a matter of preference and estimating the return on the investment of learning the Highway vocabulary (=searching x86_128-inl.h for the intrinsic you know).
Personally, I find the proliferation of ISAs makes a clear case against hand-written kernels. But perhaps in your use case, only x86 will continue to be the only target of interest. Fair enough.
Highway is (I feel not very controversially) kind of like a compiler but worse at its job. It’s not meant to be as general and it only targets a limited set of code, namely code that is annotated to vectorize well. But looking at it as a compiler is kind of useful: it’s supposed to make writing faster code easier and more automatic. Sometimes compilers are not able to do this, just as Highway can’t either. Maybe its design lacks the expressiveness to represent the algorithm people want. Perhaps it doesn’t quite lower to the optimal code. Maybe it turns out that so little of the operation maps to the constructs that a huge amount needs to go through the escape hatch that you offer, at which point it’s not really worth using the library anyway. In that situation, given an existing and friendly relationship, I would be happy to reach out. But this is a cost to me, because I need to simplify and generalize the thing I want. Then I hand it to you and you decide how you want to tackle it, if at all. All the while I’m waiting and I have code that needs to be written. This is a cost, and something that as an engineer I weigh against just using the intrinsics directly, which I know do exactly what I need but with higher upfront and maintenance costs. When you see someone write their own assembly instead of letting the compiler do it for them, they’re making their version of the same tradeoff.