
Tensor Compilers: Comparing PlaidML, Tensor Comprehensions, and TVM - hedgehog
http://vertex.ai/blog/compiler-comparison
======
byefruit
These guys need to be clearer they're the developers of PlaidML - I don't
think it's made very obvious.

Worth pointing out for anyone else that it seems PlaidML is AGPL licensed - so
maybe not worth getting too excited about if you have any commercial
applications in mind.

~~~
denfromufa
AGPL would be a restriction if you need to deploy this model on top of PlaidML
in production. It is still very useful during the training time after which
the neural network can be offloaded into production framework such as
tensorflow.

~~~
forresti
Heh, I think the part of the point of PlaidML is to avoid the speed/efficiency
limitations of TensorFlow in deployment/production.

------
masahi
The TVM results on resnet50 and mobilenet seem a bit off. On GTX 1070 Ti, with
an input of size (1, 3, 224, 224)

TVM result

Resnet50 : 100 inference/sec (0.009983 sec per each run)

Mobilenet: 450 inference/sec (0.002220 sec per each run)

PlaidML result

Resnet50 : 107 inference/sec (0.009302 sec per each run)

Mobilenet: 473 inference/sec (0.002112 sec per each run)

My benchmark script for tvm is here
[https://gist.github.com/masahi/a386c2ce5b5f8c2d9f7af5e09a8d8...](https://gist.github.com/masahi/a386c2ce5b5f8c2d9f7af5e09a8d880b)

~~~
b33pr
Thank you so much for pointing this out. We'll get updated numbers out soon.
How did you benchmark plaid, out of curiosity? The error which I correct here
([https://github.com/brianretford/nnvm-
rocm/blob/master/mxnet_...](https://github.com/brianretford/nnvm-
rocm/blob/master/mxnet_imagenet_inference.py)) was caused by a desire to
roughly approximate how keras does things, and plaidbench w/ keras is the
easiest way for us to evaluate things, though it definitely adds in a lot of
overhead. My script roughly matches the numbers I get out of your script,
though I will say that I think the TVM time_evaluator should be calling Sync
on the inside of its loop, to be fair (which I patched it to do to compare
against your methodology). It doesn't make a huge difference, but it does
exist.

If I just pull the overall kernel runtime from our logs, I get ~525
inferences/sec.

~~~
masahi
for plaid, I used

plaidbench keras mobilenet

plaidbench keras resnet50

time_evaluator is what tvm/nnvm folks use for benchmark. See their benchmark
script here
[https://github.com/dmlc/nnvm/blob/master/examples/benchmark/...](https://github.com/dmlc/nnvm/blob/master/examples/benchmark/gpu_imagenet_bench.py)

