I didn't understand it that way, they didn't built TensorFlow Lite with a QNNPAC...

dr_zoidberg · on Oct 30, 2018

Replying to your other comments (about how QNNPACK integrates and implementations of the Android NN API):

I'm not entirely sure what they're aiming for there. Usually when you see talk about "kernels" it's more of how particular filters/convolutions/low-level operations are optimized, and it is implied that kernels run on GPU (most of the time). They do talk a lot about microarchitectural details, size of caches and ARM/NEON operations, so it seems to be all implemented on CPU, but I don't really grasp how it ties with the vendor-specific implementations that you mention.

It could be that these are some new algorithms/implementations that focus on the strength of the systems (not particularly the CPU or the microarchitecture) and try to "go easy" on the memory bandwidth, for example, to get a better performance out of equivalent (maybe?) code.

This reminds me a bit of the numexpr[0] project, that accelerates numpy computations on python by rearranging data on memory to be more cache-friendly.

[0] https://github.com/pydata/numexpr

dr_zoidberg · on Oct 30, 2018

You're right, as I was skimming over parts of the text, I didn't read carefully the first time. They're using QNNPACK+Caffe2 to outperform TensorFlow Lite.