
Developer preview of TensorFlow Lite - runesoerensen
https://developers.googleblog.com/2017/11/announcing-tensorflow-lite.html
======
rajatmonga
__TensorFlow Lite __is TensorFlow’s lightweight solution for mobile and
embedded devices! TensorFlow has always run on many platforms, from racks of
servers to tiny devices, but as the adoption of machine learning models has
grown over the last few years, so has the need to deploy them on mobile and
embedded devices. TensorFlow Lite enables low-latency inference of on-device
machine learning models.

Looking forward to your feedback as you try it out.

~~~
linuxkerneldev
> Looking forward to your feedback as you try it out.

Thanks Rajat. We use typical Cortex-A9/A7 SoCs running plain Linux rather than
Android. We would use it for inference.

1\. Platform choice

Why make TFL Android/iOS only? TF works on plain Linux. TFL even uses NDK and
it would appear the inference part could work on plain Linux.

2\. Performance

I did not find any info on performance of TensorFlow Lite. Mainly interested
in inference performance. The tag "low-latency inference" catches my eye, just
want to know how low is low latency here? milliseconds?

~~~
rajatmonga
1\. The code is standard C/C++ with minimal dependencies so it should be
buildable on even non-standard platforms. Linux is easy.

2\. The interpreter is more optimized for being low overhead and the kernels
are better optimized especially for ARM CPUs currently. While model
performance varies by model - we have seen significant improvements on most
models going from TensorFlow to TensorFlow Lite. We'll share benchmarks soon.

~~~
linuxkerneldev
> The code is standard C/C++ with minimal dependencies so it should be
> buildable on even non-standard platforms. Linux is easy.

Glad to hear that Rajat. Since it is easy as you say, I look forward to your
upcoming release with Linux as standard. :-)

------
pjmlp
So it uses Bazel on Android....

Google devs, could you please get yourself together in one room and agree in
ONE BUILD SYSTEM for Android?!?

Gradle stable, cmake, ndk-build, Gradle unstable plugin, GN, Bazel, ...,
whatever someone else does with their 20%.

I keep collecting build systems just to build Gooogle stuff for Android.

~~~
d4l3k
It might be a tad annoying, but the rest of Tensorflow uses Bazel so it makes
sense that Tensorflow Lite also uses it. It also probably matches the internal
Google workflow better since Google uses Blaze internally.

~~~
pjmlp
I thought it was to be used outside Google, not that we have to learn every
single build system they happen to use inside.

~~~
solipsism
If all Google teams who make use of external build systems were going to agree
on one (not likely), it would be Bazel.

------
m3kw9
Why would I use this for iOS when I can use CoreML and convert TensorFlow into
a CoreML model where there is already native support?

~~~
dr1337
CoreML doesn't actually support Tensorflow. It's support for Tensorflow is
only through Keras which is fine if you just want to build stock standard
models but if you're doing crazy research implementations then that's not
going to work.

~~~
m3kw9
Is all in the converter tool, if the converter tool can get the tf file into a
.mlmodel properly, then it will be supported. Inside is just a bunch of
weights and layers and parameters. We just need a proper script to translate
it

~~~
dgacmu
"just a bunch of weights and layers and parameters" \-- I think you and the GP
are agreeing. That's the definition of standard: If the model can be expressed
using the currently-blessed set of layer definitions in CoreML, then yes. But
if you're doing nonstandard stuff with weird control flow behavior, or RNNs
that don't map into some of the common flavors, then all bets are off.

An example: Some of my colleagues put a QP solver in tandem with a DNN, so
that the neural network could 'shell out' to the solver as part of its
learning, and learned to solve small sudoku problems from examples alone:
[https://arxiv.org/abs/1703.00443](https://arxiv.org/abs/1703.00443) The
pytorch code for it is one of the examples I like to use as a stress-test for
doing funky things in the machine learning context.

TensorFlow is a very generic dataflow library at its heart - which happens to
have a lot of DNN-specific functionality as ops. It's possible to express
arbitrary computations in it, whereas CoreML and and similar frameworks make
more assumptions that the computation will fit a particular mould, and
optimize it thereby.

~~~
m3kw9
Looks like you are right, CoreML only support these 3 DNNs: Feedforward,
convolutional, recurrent. I suppose capsule nets are not any one of those, if
it were implemented in TF

------
mtgx
How would this differ from uTensor? Did they make uTensor redundant?

[https://github.com/neil-tan/uTensor](https://github.com/neil-tan/uTensor)

~~~
infnorm
We developed TensorFlow lite to be small enough to target really small devices
that lack MMU’s like the ARM Cortex M MCU series, but we haven’t done the
actual work to target those devices. That being said, we are excited when the
ecosystem and community around machine learning expands.

~~~
ianhowson
Cortex-M compatibility was literally my first thought when I read this --
especially low-memory systems. Might have to hack it up myself.

------
barbolo
Would that be a viable option to deploy TensorFlow models on serverless
environments (Lambda, Functions)?

~~~
rasmi
You can deploy TensorFlow model binaries as serverless APIs on Google Cloud ML
Engine [1]. But I would also be interested in seeing a TensorFlow Lite
implementation.

[1] [https://cloud.google.com/ml-engine/docs/deploying-
models](https://cloud.google.com/ml-engine/docs/deploying-models)

Disclaimer: I work for Google Cloud.

~~~
barbolo
Thanks, @rasmi. I have a feedback for you guys. The pricing for predictions
inference in GCP is not very fair. If I deploy a small model (like a
SqueezeNet or Mobilenet) I pay almost the same price of someone deploying
large models (like Resnet or VGG). That’s why I’m deploying my models on
serverless environments and paying about 5 dollars for 1 million inferences.

The pricing of GCP is: $0.10 per thousand predictions, plus $0.40 per hour.
That’s more than 100 dollars for 1 million inferences.

~~~
rasmi
I see what you mean. To some companies, ML Engine's cost as a managed service
may be worth it. To others, spinning up a VM with TensorFlow Serving on it is
worth the cost savings. If you've taken other approaches to serving TensorFlow
models to get around ML Engine's per-prediction cost, I'm curious to hear
about them.

------
MBCook
Is it possible that a future version may be able to leverage CoreML on iOS?

~~~
rajatmonga
With TensorFlow and TF Lite we are looking to provide a great experience
across all platforms, and are exploring ways to provide a simpler experience
with good acceleration on iOS as well.

------
therealmarv
But on iOS we still cannot use swift with that, see
[https://github.com/tensorflow/tensorflow/issues/19](https://github.com/tensorflow/tensorflow/issues/19)
?! Btw. what about Kotlin?

UPDATE: It seems some third party developer have developed some swift
compatible APIs.

------
nightsd01
On iOS, does TensorFlow Lite utilize the GPU for inference when needed or is
it CPU only?

If so, does it use OpenCL or something?

------
aprao
Is this the next iteration of TensorFlow for Mobile? Is on-device training
something planned for the future?

~~~
runesoerensen
Yes to your first question, from the article: _”As you may know, TensorFlow
already supports mobile and embedded deployment of models through the
TensorFlow Mobile API. Going forward, TensorFlow Lite should be seen as the
evolution of TensorFlow Mobile, and as it matures it will become the
recommended solution for deploying models on mobile and embedded devices.”_

Also check out this post for more info and examples:
[https://research.googleblog.com/2017/11/on-device-
conversati...](https://research.googleblog.com/2017/11/on-device-
conversational-modeling-with.html)

------
ausjke
This is pretty Android/iPhone-only, wish it can be more flexible to be used on
other edge devices such as home routers or other embedded products where
resource is constrained.

~~~
rajatmonga
The current examples talk about Android/iPhone, however the core runtime is
pretty lightweight with the goal of supporting all kinds of embedded products.

Do let us know if you build/run on other platforms.

------
qhwudbebd
I was hoping this link might be to a version of TensorFlow that sheds the
heavyweight java dependency for building. Sadly not; still bazel-infested.

------
tadeegan
How does this compare to using XLA for AOT compilation?

~~~
rajatmonga
XLA for AOT is useful for cases when you know exactly what architecture you
are shipping to, and are ok updating the code whenever the model changes.

TF Lite addresses the segment where you need more flexibility

\- you ship single app to many types of devices

\- would like to update the model independent of the code itself e.g. no
change to Android APK, and update the model over the wire.

Even with this generality, TF Lite is still quite fast and lightweight as that
was the focus building it up.

------
ralphc
How do I know which handsets or tablets have "New hardware specific to neural
networks processing" for the NNAPI?

------
thepoet
Is the Lite convertor also doing some sort of quantization or is it purely for
file format conversion?

~~~
d4l3k
Tensorflow has supported quantization for a long time (and is recommended for
mobile devices) so it very likely is.

~~~
infnorm
Quantization comes in many different forms. TensorFlow lite provides optimized
kernels for 8-bit uint quantization. This specific form of evaluation is not
directly supported in TensorFlow right now (though it can train such a model).
We will be releasing training scripts that show how to setup such models for
evaluation.

------
amq
What are the minimum requirements? Would something like ARM M4F with 72 MHz
and 512 KB RAM work?

------
kau_mad
I would like to how small an Inception-V3 model becomes when converted into
.tflite format.

------
cyberpunk0
Didnt they announce this at Google I/O? Where it was supposed to be available
that day

~~~
theDoug
Definitely announced at I/O, but all the language I'm finding from around that
time is of the "want to" and "will" variety, like this Wired piece:

[https://www.wired.com/2017/05/google-really-wants-put-ai-
poc...](https://www.wired.com/2017/05/google-really-wants-put-ai-pocket/)

> “Google won't say much more about this new project. But it has revealed that
> TensorFlow Lite will be part of the primary TensorFlow open source project
> later this year”

------
1_over_n
how does this relate to other hardware beyond iOS / Javascript i.e. raspberry
pi, nvidia jetson etc andddddd.........whats the likelihood of libraries that
sit on top of TF supporting this like keras and pytorch. Just some questions
that spring to my mind

------
amelius
I'm wondering if TF has something like pytorch's autograd. Does anyone know?

~~~
igorbark
I only just briefly read the doc for autograd, but automatic differentiation
is the strong default in TF if that's what you're asking.

~~~
rajatmonga
Yes, it does have auto differentiation from day one. There's also a new
autograd like functional API as part of eager. See
[https://research.googleblog.com/2017/10/eager-execution-
impe...](https://research.googleblog.com/2017/10/eager-execution-imperative-
define-by.html)

------
piratebroadcast
Any React Native APIs?

~~~
infnorm
Not at this time. However, in principle it would be possible to create such
bindings.

------
fiatjaf
So we'll start to see more and more battery-consuming "AI" apps in mobile
devices?

~~~
dgacmu
And we'll start to see more battery-efficient hardware to run those apps
without consuming all of your battery. :)

(I'm saying that glibly, but I'm dead serious -- look at what we've seen
emerge just this year in Apple's Neural Engine, the Pixel Visual Core, rumored
chips from Qualcomm, and the Movidius Myriad 2. The datacenter was the first
place to get dedicated DNN accelerators in the form of Google's TPU, but the
phones -- and even smaller devices, like the "clips" camera -- are the clear
next spot. And this is why, for example, TensorFlow Lite can call into the
Android DNNAPI to take advantage of local accelerators as they evolve.

Being able to run locally, if battery life is preserved, is a huge win in
latency, privacy, potentially bandwidth, etc. It'll be good, though it does
need advances in both the HW and the DNN techniques (things like Mobilenet,
but we need far more).

~~~
fiatjaf
Thank you.

