Hacker News new | past | comments | ask | show | jobs | submit login
We improved Tensorflow Serving performance by over 70% (mux.com)
172 points by craigkerstiens 26 days ago | hide | past | web | favorite | 17 comments



FWIW, I believe that the current state of the art for batch-size 1, fp32 inference for ResNet-50 on Intel CPUs is AWS's work in https://arxiv.org/abs/1809.02697. After the low-hanging fruit outside of model execution are picked, this kind of work is probably quite relevant.


Hey! Author here, thanks for linking the paper. The article was from an infrastructure perspective, but we're definitely diving deeper into graph execution optimizations after this:)


Are there any particular optimizations you are looking into ?


Have a look here: https://github.com/IntelAI/OpenVINO-model-server/blob/master... You can replace tf-serving with OpenVINO to get even better performance and latency when running on CPU


What useful models run at decent speed on a CPU these days?

Even basic image classifiers tend to be 100x faster on a GPU or TPU...


Inference is not that super slow on CPU, especially for network requests that already have quite a bit of latency, so plenty of companies use CPUs on the cloud for lambda/flexible loads where GPUs aren't available.


https://www.microsoft.com/en-us/research/publication/deepcpu...

TensorFlow has some known inefficiencies.


Cool work! It feels like the improvement is a little overstated due to how you're measuring - your measurements include import/setup time so you get big gains by improving imports. But in reality, you won't be creating a new client for each request and client import/setup time is unrelated to TF serving performance. TF serving performance is really about the time elapsed between request received and response returned.


> containers are run on a 4 core, 15GB, Ubuntu 16.04 host machine

What CPU is being used?

Assuming the benchmark is done with something like an EC2 C5 instance, the results in this post are quite slow. Somewhere around 14x slower than benchmarks from a year ago on EC2 C5 instances. [1]

[1] https://dawn.cs.stanford.edu/benchmark/ImageNet/inference.ht..., using the c5.2xlarge benchmark and assuming linear scaling


Hi bwasti, the host's CPU platform is Intel Broadwell. While the CPU architecture of our production hosts are the same, the resources allocated are much higher than 4 cores. This post details an overview of the relative improvements that can be made from a vanilla setup :)

-masroor (author)


You may want to check out Intel's optimized version of TensorFlow Serving[1] for further improvements (on the order of 2x for ResNet-50[2]).

As an aside, I took into account the resource allocation in the parent comment. The c5.2xlarge has 8 cores, 8GB RAM [3] and does a single fp32 inference in ~17ms. If we chop that down to 4 cores and assume linear scaling we can fathom running ResNet-50 in ~35ms compared to the ~500ms achieved here. I'd recommend comparing to a known baseline rather than a "vanilla setup" to ensure you aren't missing any simple changes that may dramatically improve performance.

[1] https://github.com/IntelAI/models/blob/master/docs/general/t...

[2] https://www.intel.ai/improving-tensorflow-inference-performa...

[3] https://aws.amazon.com/ec2/instance-types/c5/


@bwasti, really good points - this is something we look forward to evaluating! Our post does indeed outline optimizations from tensorflow/serving to tensorflow/serving:* -devel [1]. The next logical improvement (given intel architecture and docs linked) is start building on top of the * -devel-mkl image.

-masroor(author)

[1] https://github.com/tensorflow/serving/tree/master/tensorflow...


The grpc.beta code elements are deprecated and may go away anytime. (gRPC 1.0.0 is also super old and unsupported)


Good point - we're still in the process of migrating to >= 1.17. The gRPC connection and client stub should still translate (few semantic updates).

```

channel = grpc.insecure_channel('0.0.0.0:9000')

stub = PredictionServiceStub(channel)

```

-masroor (author)


There is an optimized version of Tensorflow based on Clear Linux and MKLDNN -https://clearlinux.org/stacks , would be interest to see the performance difference between the natively compiled version and this .


Hey! That's super interesting - so far we went with Tensorflow's ubuntu based official Docker devel image, but a clearlinux base looks like it would definitely be worth looking into!

-masroor (author)


This is amazing.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: