
Show HN: Panini AI – A platform to serve ML/DL models at low latency - avin_regmi
https://panini.ai/
======
nl
This claim (3x faster than TF serving) and the metrics on the site (~500
predictions per second vs ~200 for TF serving) seem more a function of scaling
than any technology.

Given that you can horizontally scale model prediction infinitely the only
sensible way to compare is to include price.

I agree that this looks compelling while it is free! But will it be price
competitive later?

And if price competitiveness is claimed, then how is it possible? Yes, you can
do the whole spot instance thing, but that is difficult to make reliable
enough at scale.

~~~
avin_regmi
Hey, both prediction for TF serving and panini serving was done in a single
thread in the same specification machine. We used a simple model for image
classification of CIFAR dataset. Roughly, 500 predictions were made for panini
and 200 predictions for TF serving.

You can always download the entire panini in your own private server and not
pay anything. Ie. used Helm to install in your own kubernetes or DockerHub.
For now, We're making it free for models under 2GB. Our main goal is to make
it usable and we don't want cost to be a factor.

~~~
nl
So you claim that TF Serving (written in C++ I believe) has over double the
overhead compared to Panini?

This seems surprising. What makes it so much faster?

Edit: Unless of course you are hitting the cache for a lot of the predictions?

~~~
avin_regmi
Optimized TF serving would perform similarly to Panini however, it's really
hard to find good documentation on optimizing TF serving compilation
parameters. Panini automatically finds the right batch size to maximize the
throughput and it adaptively changes. We also have a technique to reduce bound
tail latency. I would love it for you to try it and provide me some feedback.
Thanks

------
tedivm
Is it actually 3x faster, or does it just scale more?

In other words, if I had a model that previously took three seconds to get a
response from would this platform respond in one second?

~~~
dang
We changed the title from "Show HN: 3x Faster Than Tensorflow Serving" to what
the page says, which is less baity.

[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

~~~
tedivm
That's definitely an improvement, but I'm hoping someone from the Panini team
will step in and clarify regardless.

------
etaioinshrdlu
This is so fishy looking on many levels!

It is very hard to believe that deploying your models on GKE is going to be
cost saving for anyone involved.

~~~
avin_regmi
Hey, you don't have to deploy in GKE and it's not GKE that makes it faster. We
also give you option to deploy in your own private Kubernetes via Helm or
private server via DockerHub. GKE may not be the right option for you
depending on your application. Your feedback would be very valuable to us.
Please tell me why you think its fishy? We're always tryiing to make it
better.

~~~
tedivm
It sounds super fishy, especially since you won't answer the question above
confirming whether you're talking about latency or throughput.

Google has also dumped a lot into tensorflow serve, so if you are
outperforming it by that much it would be great to know how.

~~~
avin_regmi
Sorry, I should've been more clear. Both predictions for TF serving and panini
serving was done in a single thread in the same specification machine. We used
a simple model for image classification of CIFAR dataset. Roughly, 500
predictions were made for panini and 200 predictions for TF serving. The graph
on the website is for throughput. I'm planning to write a medium post soon
regarding the benchmark test. There are many other projects getting higher
throughput compare to TF serving. I've heard TF Serving could be optimized to
make it more efficient but making it more optimized is not documented
properly. We're planning to make it open source if there is enough interest
from the community!

~~~
ScoutOrgo
What is your business model if your platform is free? Either that price has to
change, or you plan on making money on the same thing all other free services
run on: data.

The site isn't very upfront about it, which is the sketchy part. Other than
that, it looks much more straight forward than other options (I did watch the
youtube tutorial). I like the idea, just question the motives.

~~~
avin_regmi
Our platform is free for the beta users to try it with limit of 2GB per model.
We are just starting and we haven't decided on our business model yet.

If a user downloads panini to their private server and use it that will always
be free since there is not infrastructure cost for us. If you're deploying it
in our website we will be charging you to pay for the infrastracture cost.

Our main goal currently is to find out if people find this product useful and
if it's worth for us to spend more time working on it. Thanks for watching the
YouTube tutorial and if you have further questions, please contact us. Thanks

~~~
ScoutOrgo
Fair enough, thanks for answering.

