
TF-Coder, a tool that writes tricky TensorFlow expressions - homarp
https://blog.tensorflow.org/2020/08/introducing-tensorflow-coder-tool.html
======
YeGoblynQueenne
>> First, the tool asks for an input-output example of the desired tensor
transformation. Then, it runs a combinatorial search to find TensorFlow
expressions that perform that transformation.

Ouch! The combinatorial space of arbitrary programs is huge. This can't work
for programs of any but the smallest size.

Indeed:

>> There are limitations to TF-Coder. It can currently find solutions
involving 3-4 operations within a minute of searching, but solutions involving
6 or more operations are too complex to find in a reasonable amount of time.

Btw, my PhD research is pretty much on something like this but learning Prolog
programs from examples, rather than Python. The problem is extremly hard and
all solutions so far look at ways to restrict the large hypothesis search
spaces, mainly by imposing some kind of strong inductive or language bias
(i.e. restricting what can be learned and how) but there is always a point
beyond which any way one tries to search for an arbitrary program will fall
into a huge search space and disappear down a myriad winding paths that lead
nowhere. So while it's interesting to see this effort (taking hints from
natural language is a nice touch) program syntehsis or program learning will
not be solved by any search-based algorithm.

Edit: Actually, the bit about how solutions involving 6 or more operations
take forever to return, is eerily familiar. An algorithm I was working on at
the start of my PhD was pretty much limited to programs of 5 clauses (roughly,
5 function calls). I'm sure it's a coincidence but, golly.

~~~
im3w1l
I think the intended usage is not to find your whole program from example.
Instead the programmer breaks the program down into simple operations, and
uses this tool to remember what the name of that operation is. And finding
short compositions is a bonus.

~~~
YeGoblynQueenne
There is certainly a lot of use that one can make out of a program with this
kind of limitation, but it is a limitation- if it wasn't, there wouldn't be a
warning about it in the article.

In any case, if there was no such limitation most people here would be out of
a job now, so it's an important limitation.

>> And finding short compositions is a bonus.

You mean because they can be easily inspected and debugged and edited by hand
if required? If so, yes, I agree. But, for example, the algorithm I worked
with, that I discuss above, a big criticism of the underlying technique has
always been that _in practice_ it's limited to learning small programs.

~~~
im3w1l
No, I mean finding short compositions is a bonus feature over finding a single
function from example.

~~~
YeGoblynQueenne
OK, I don't understand what you mean. Can you explain?

~~~
ghj
I can give another example use case.

Say I know math but don't know tf api. I already know how to matrix multiply
with `@` but don't know the function for matrix power.

Then in theory it should be fairly quick for me to create a random matrix and
feed it `mat @ mat`, `mat @ mat @ mat`, `mat @ mat @ mat @ mat`, etc until it
spits out the correct tf function I should call for matrix_power.

This is even better than googling since it guarantees that the function it
found is correct for your example (googling returns tf.pow tf.exp tf.expm,
none of which are actually what you want!)

~~~
YeGoblynQueenne
Thanks, I see what you two mean. It didn't occur to me that programming with
tensors can be so challenging and I thought the authors just big it up a
little to motivate the work - as one does :)

~~~
ghj
It's more that programming with _math_ is hard.

When communicating in mathematics you often assume the reader can understand
_precisely_ which mathematical object you're talking by extrapolating from
ellipses "...". For example "the naturals are 1,2,3,4,..." or "the evens are
2,4,6,8,..."

Or I might tell you a shift matrix is something that looks like:

    
    
      [0 1 0 0 0]
      [0 0 1 0 0]
      [0 0 0 1 0]
      [0 0 0 0 1]
      [0 0 0 0 0]
    

And that single example with the suggestive naming should be enough for most
people to understand what a shift matrix is for arbitrary size.

But if you want to explain what a shift matrix is to a computer, you must use
a formal definition (and to do it idiomatically you need to know esoteric
details like that numpy.diagonal takes in an offset parameter so you can do
something like
[https://stackoverflow.com/a/30230801](https://stackoverflow.com/a/30230801)).

It would be nice if we can talk math with a computer in the same intuitive
language we use with humans, just give it a few examples and it will generate
the the rest and let you double check the formula is correct.

So I see stuff like tf-coder as a great step, not for synthesizing programs we
don't know how to solve, but for language translation: between math notation
and programming notation.

~~~
YeGoblynQueenne
I wonder whether the same effect could be achieved with a better designed
programming language, though. If the problem is ambiguity- there's nothing
stopping a programming language from accepting ambiguous statements, which of
course it would then interpret unambigously, according to context. It's just
that this takes a lot of work and it's really not the done thing in terms of
programming languages as far as I can tell- for reasons of eficiency of
parsing also, I think.

Anyway I guess I'm turnning into the typical stuffy academic who can't even
begin to think of real-world applications, hah. Thanks for pointing out the
practical uses of the work described in the article.

------
nl
This looks _amazing_ and if works as advertised will be a huge productivity
breakthrough.

The number of times I've sat with 2 developers looking over my shoulder trying
to workout _WTF_ my TF (or Numpy or Pytorch) manipulations aren't ending up
the correct shape is huge. It's probably $50-100K/year of time for the last
4-5 years.

It's always obvious in retrospect.. until next time you have to read that
code.

~~~
mcabbott
> until next time you have to read that

But how will this tool help? If you put the sample you used in a comment, then
your confusion is wondering if that's in sync... and whether the input is like
what you wrote.

IMO the sweet spot that's easy both for humans & computers is index notation.
Instead of `tf.add(X, tf.expand_dims(Y, 1))` (their example, which I think
also applies to 4-tensors in ways not trivial to see from the sample)
something like "out_{r,c} = X_r + Y_c" is completely unambiguous, and quicker
to type than making a sample.

I don't think you can do that in tf.einsum, nor einops. Is there a Python
package which does this? In Julia you can write `@cast out[r,c] := X[r] +
Y[c]` (with my package, but several others share this notation).

~~~
im3w1l
I recommend

X + Y[:, tf.newaxis]

For higher dimensions of Y, you can use (but it might be ill-advised)

X + Y[:, tf.newaxis, ...]

~~~
mcabbott
Sure! The desire for fancier notation is from more complicated examples, the
kind where you write elaborate comments to explain why you're permuting 3rd &
4th dim of B to line up with C and part of D.

------
thereisnospork
Combinatorial function look up by example transformation, seriously slick in
the 'how has no one thought of this before'[0] kind of way.

[0]imo always a hallmark of something special. Or maybe someone has thought of
it and I'm out of the loop, but either way this approach seems seriously
useful for streamlining the syntactic process of generalizing a function for a
set of known inputs/outputs.

~~~
YeGoblynQueenne
Sorry to burst your bubble, but that's basically Inductive Progamming in a
nutshell :)

[https://en.wikipedia.org/wiki/Inductive_programming](https://en.wikipedia.org/wiki/Inductive_programming)

The wikipedia article doesn't make it very clear but basically it's what you
say. The paper accompanying the system in the article goes through a bit more
detail on Program Synthesis, of which Inductive Programming is sometimes
considered to be a sub-field, but in any case, Program Synthesis is generating
programs from complete specifications while Inductive Programming is learning
programs from incomplete specifications- i.e. input-output examples of the
target program. Examples are often supplemented by constraints.

The field goes back to the 1970's and is today comprised of two sub-fields,
Inductive Functional Progamming, i.e. learning programs in functional
languages and Inductive Logic Programming, i.e. learning programs in logic
programming languages (and, incidentally, the subject of my PhD research).

In particular, the way you describe TF-Coder reminded me of Algorithmic
Debugging:

[https://en.wikipedia.org/wiki/Algorithmic_program_debugging](https://en.wikipedia.org/wiki/Algorithmic_program_debugging)

In short, this was a technique first described in 1982 by Euhd Shapiro, that
assisted a programmer with finding bugs in a logic program (in Prolog) by
taking in examples of true and false statements that the program should prove
or disprove, respectively, and identifying program statements that made a true
statement unprovable, or a false statement provable. The funny thing is that,
once you can do this sort of thing, identify which parts of a program lead to
coverage of true and false statements, you can go the other way and also
generate those statements.

In any case, it's a huge field that goes way back and there are many pointers
on more information in the wikipedia article.

------
bquinlan
_Work for Google - have never worked on Tensorflow_

I tried:

    
    
      inputs = {
          'matrix': [[10, 20, 30],
                     [5, 5, 10],
                     [2, 2, 1]]
      }
      output = [[85/10, 85/20, 85/30],
                [85/5, 85/5, 85/10],
                [85/2, 85/2, 85/1]]
    

And got:

    
    
      tf.cast(tf.divide(tf.reduce_sum(matrix), matrix), 
              tf.float32)
    

But now I realize that is pretty close to one of the examples. Did anyone try
something complex?

~~~
ghj
I tried giving it a problem that required matrix power and it couldn't solve
it. (this was a somewhat real task where I couldn't google how to raise a
matrix to a power in tf).

    
    
        # A dict mapping input variable names to input tensors.
        inputs = {
            'matrix': [[1, 1], [1, 0]],
            'vector': [[1], [0]],
        }
    
        # The corresponding output tensor.
        output = [[13],
                  [8]]
    
        # A list of relevant scalar constants, if any.
        constants = [6]
    
        # An English description of the tensor manipulation.
        description = 'Whatever the equivalent of numpy.linalg.matrix_power is in tf. Calculates fibonacci.'
    

I was hoping it would at least return

    
    
        matrix @ matrix @ matrix @ matrix @ matrix @ matrix @ vector
    

But it just didn't find anything. (the constant for the exponent is large
because smaller numbers give nonsense results)

------
hgoury
This is neat. However I've rather use einsum and einops [0] to reshape stuff
in practice.

[0]
[https://github.com/arogozhnikov/einops](https://github.com/arogozhnikov/einops)

------
blueblisters
How does the English description of the task help in running the combinatorial
search? Is it running some form of keyword search or natural language
inference? Or is it merely to tag the problem for reference/future use?

~~~
cf
Well there is a dedicated paper on it. The short answer is the English
description helps a guided search

[https://arxiv.org/abs/2003.09040](https://arxiv.org/abs/2003.09040)

------
blackbear_
It would be nice to know how reliable it is as the number of operations needed
grows, since the examples shown are quite trivial.

~~~
croon
Under "Caveats":

> There are limitations to TF-Coder. It can currently find solutions involving
> 3-4 operations within a minute of searching, but solutions involving 6 or
> more operations are too complex to find in a reasonable amount of time.
> Furthermore, TF-Coder currently does not support complex or string tensors,
> or RaggedTensors. The full list of supported operations can be found in the
> Colab notebook.

------
emcq
At some point shouldn't they focus on optimizing for loops and other
expressions rather than tensor broadcasting.

~~~
habitue
The way the python interpreter works, there's no way for tensorflow to know
you're using a for loop in order to optimize it. With method calls and
operator overrides etc, tensorflow can hold off evaluating your graph, see
what the final thing looks like, and then compile it efficiently.

You can think of tensorflow like a different language where the syntax tree is
constructed of python objects. For loops end up being like a preprocessor
macro on that syntax tree. They can add a predetermined set of extra nodes,
but the "for loop" part isn't represented in the final syntax tree itself, so
tensorflow doesn't know it has anything to optimize.

There are ways out of this: one is dynamic graph execution. But this is less
optimizable. Tensorflow doesn't get to see the whole graph so it can't do as
many optimizations, and it can't optimize for loops because it's actually
executing eagerly on each loop iteration.

Another way is this article: write whatever python code you want to generate
some transformation, and tensorflow can try to guess which efficient
operations produce it. Then you can paste the efficient operations into your
final program.

~~~
brilee
Disclaimer: I worked on AutoGraph @ google.

Your first two paragraphs describe TF 1 perfectly, and you also correctly
point out that TF 2 has eager mode, which executes op by op.

The part you missed is that @tf.function lets users tell TensorFlow that TF
should execute the entire function, inspect the graph that it produces, and
optimize it. On top of this JIT functionality that tf.function provides,
AutoGraph will inspect the source code of your program and see that there's a
for loop. If the for loop executes over a Python object, then it acts as a
preprocessor macro. If it executes over a tensor, then AutoGraph transforms it
into a tf.while_loop. So you can have both Eager-style convenience and graph-
style optimizations.

Of course, some caveats apply; check out
[https://www.tensorflow.org/guide/function](https://www.tensorflow.org/guide/function)
for the full guide.

~~~
habitue
Ah wow, that is interesting. I didn't know about tf.function, thank you for
the correction

