Hacker News new | past | comments | ask | show | jobs | submit login

I didn't understand it that way, they didn't built TensorFlow Lite with a QNNPACK "backend". They compared both version on the same benchmarks, but they didn't "merge" the solutions.

So, theorically QNNPACK could be used to implement a TensorFlow Lite interpreter. However it seems the most interesting implementations will use hardware specific accelerations, such as TensoCore RT from Nvidia, or the Google's TensorCores, but QNNPACK seems to only target SIMD optimizations from CPUs.

That still a good amount of work to identify the optimizable building blocks, or validate other approaches such as TFLite, but each mobile processor vendors (Qualcomm, ARM, Intel) already provide implementations of the Android NN API that maximizes the usage of the hardware.

That's why I'm not sure how QNNPACK integrates with the entire ecosystem.

Edit : as I see it, to consume a model for an application, the diagram looks like this : developer <-> tflite interpreter API <-> Android NN API (if target is Android) <-> vendor provided accelerated implementation (blackbox/binary blob, that's where most of the acceleration is supposed to happen)

Edit2 : Now that I think about it, it doesn't make sense to compare TensorFlow Lite in a benchmark. TensorFlow Lite is only an API and a file format spec, it's not a specific implementation, from what I understand.




Replying to your other comments (about how QNNPACK integrates and implementations of the Android NN API):

I'm not entirely sure what they're aiming for there. Usually when you see talk about "kernels" it's more of how particular filters/convolutions/low-level operations are optimized, and it is implied that kernels run on GPU (most of the time). They do talk a lot about microarchitectural details, size of caches and ARM/NEON operations, so it seems to be all implemented on CPU, but I don't really grasp how it ties with the vendor-specific implementations that you mention.

It could be that these are some new algorithms/implementations that focus on the strength of the systems (not particularly the CPU or the microarchitecture) and try to "go easy" on the memory bandwidth, for example, to get a better performance out of equivalent (maybe?) code.

This reminds me a bit of the numexpr[0] project, that accelerates numpy computations on python by rearranging data on memory to be more cache-friendly.

[0] https://github.com/pydata/numexpr


You're right, as I was skimming over parts of the text, I didn't read carefully the first time. They're using QNNPACK+Caffe2 to outperform TensorFlow Lite.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: