
Ask HN: What are you using to serve ML models in low latency? - avin_regmi
https:&#x2F;&#x2F;panini.ai&#x2F; is the easiest and fastest way to serve ML&#x2F;DL models at low latency and makes the model deployment to Kubernetes in a few minutes. It also handles load balancing, caching and batching of user inputs. What are you guys using to serve ML models in low latency?
======
malux85
Low latency for us means we can't spend 100+ ms on a round trip to an external
server / hosted solution.

If your unique selling point is low latency, you should at least show some
numbers / benchmarks on your homepage.

and finally, there's no way us or our clients would allow our models to be
uploaded to an external provider, it would have to be on-prem

~~~
avin_regmi
How big is low latency issue for you? What happens if it's more than 100ms?
Also we do offer our software to be deployed in your kubernetes couster via
helm.

