
XLA: The TensorFlow compiler framework - hurrycane
https://www.tensorflow.org/versions/master/resources/xla_prerelease
======
ktta
This page has the first mention of Google TPUs since the initial announcement
from Google.

Anyone know what the status is? When TPUs will be allowed to be used in Google
Cloud?

I'm confused as to why they ever announced it at a big event like Google I/O
rather than a paper or even a simple blog post if they aren't going to give
people access to them. There's some hint of it being offered in conjunction
with TF and other ML cloud offerings in the blog post[1], and this 'XLA
compiler framework' looks it's related. But I'm still wondering how much time
people have to wait.

[1]:[https://cloudplatform.googleblog.com/2016/05/Google-
supercha...](https://cloudplatform.googleblog.com/2016/05/Google-supercharges-
machine-learning-tasks-with-custom-chip.html)

~~~
visarga
> When TPUs will be allowed to be used in Google Cloud?

TPUs are only useful for prediction, not training, and for very high volumes
of work where energy use is a major cost, so, it's for Google and FaceBook-
type of situations. They are not designed as accelerated general purpose
processors for deep learning.

Cheap inference can be done on CPUs as well, because production models can be
optimized to only a small fraction of the computation (20x reductions are not
impossible) by pruning neurons, weight quantization and other techniques.

~~~
halflings
What leads you to think that TPUs are not useful for training?

~~~
ogrisel
The aforementioned google translate paper suggests so.

My hunch is that TPUs work with a mix of 8 bit and 16 bit integer arithmetic
on quantized networks with quantized data [1].

As far as I know nobody has managed to make SGD work properly with integer
weights and that could be a reason why TPUs are not used for training (yet).

[1] [https://petewarden.com/2016/05/03/how-to-quantize-neural-
net...](https://petewarden.com/2016/05/03/how-to-quantize-neural-networks-
with-tensorflow/)

I could be completely wrong though.

------
Seanny123
Neato! I'm surprised they went with a JIT compiler over a full-on compiler,
but that might just be me not understanding: a) Compilers b) How a JIT
compiler would apply to this situation

My lab-mate Jan Gosmann recently did something similar for our spiking neural
network software Nengo [1]. Although it isn't Deep Learning, it also builds a
computational graph of operations. He ended up optimising the layout of the
operations in memory to increase the efficiency of Numpy operations and reduce
the amount of time spent in Python. He's in the process of writing a paper
about it.

[1]
[https://github.com/nengo/nengo/pull/1035](https://github.com/nengo/nengo/pull/1035)

~~~
BenoitP
From my limited understanding:

JITs can take into account the actual data being processed. And most
importantly here, its size.

Knowing the size will help with making chunks of work fit into L1, L2, L3
caches. Creating sensible SIMDs operations. Choosing what goes into a warp.

Also, sometimes it is better to rematerialize a computation than having stored
it; and the threshold to do it depends on the space and computing costs.

Languages like Halide [1] let you hand-tune this threshold. I guess this is
the kind of work XLA does here.

[1] [http://halide-lang.org/](http://halide-lang.org/)

~~~
emu
Precisely. Just-in-time compilation allows the compiler to specialize the
generated code for the shapes of the Tensors that appear at run time. This
allows us to generate better code.

XLA also has an experimental ahead-of-time mode, which we think will be
particularly interesting for some production and mobile deployments. This is
all work in progress though, and we're looking forward to getting the
community involved.

------
mafribe
The compiler does not appear to be open at this point. Anybody know when this
will change? Which team in Google is writing the compiler?

~~~
emu
I'm on the Tensorflow team.

Unofficial answer: no promises, but it should be open-source soon. It may even
be released in the next day or two. Watch this space!

~~~
emu
The code is up:

[https://github.com/tensorflow/tensorflow/commit/1e67c90e2cac...](https://github.com/tensorflow/tensorflow/commit/1e67c90e2caceeff82d09793d1ef5fa0300d219b)

[https://github.com/tensorflow/tensorflow/tree/master/tensorf...](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler)

The corresponding documentation hasn't been pushed yet; I'll post a link when
it is up.

Note that XLA is work in progress --- we're releasing the code early because
we want to get the community involved. The GPU backend is in good shape, and
improving by the day. We haven't had as much time to devote to the CPU
backend, and it only has limited support for parallelism. Contributions
welcome!

~~~
emu
The documentation is now up too:
[https://www.tensorflow.org/versions/master/experimental/xla/](https://www.tensorflow.org/versions/master/experimental/xla/)

------
pilooch
It bears some similarities with with Nvidia's tensorRT that is closed source.

