Do you use any special hardware (like FPGAs) to mitigate the increase in computational cost or do you rely on standard hardware?
I'm very interested in FHE in the context of machine learning models without requiring access to unencrypted data at any stage (be that training or inference). So far, the performance hit wouldn't make this practical, so I was wondering whether maybe hardware solutions exist to deal with that.
We use standard hardware, with some special attention paid to making sure we maximally leverage AVX2/AVX512. The computation is naively parallel and very simple, so better hardware doesn't seem terribly likely to decrease costs. If anything would work, our bet is on GPU's, which seem to offer better memory bandwidth and are widely commercially available.