Hello, I'm the person spearheading this Theano fork! Your comments match my expe...

albertzeyer · on Dec 16, 2020

Hey! Thanks for the answer!

By graph building, I actually meant graph compilation. In TF the first `session.run`, or in Theano the `theano.function`.

I did not get too much into the internals of the graph compilation + optimization (despite writing a couple of simple own optimization passes), so I don't really know whether sth is done really inefficient, but I can easily believe that. I agree, if sth is inefficient there, it should be rewritten in a more efficient way. But I also think that even if you have it as efficient as it can be, it still would be slow, compared to a C/C++/Rust implementation, easily by a factor of 100 or so. And even in C/C++ it can still be slow, when I consider how much time LLVM or GCC takes in their optimization passes.

Yes, TensorFlow does not have much optimization, although I think the idea was always to extend that. But then, as you say, this also is one of the reasons the graph compilation is so fast. But comparing the runtime performance of Theano vs TF, in most cases, TF was just as fast or faster (which is likely dependent on the specific model; but as far as I remember, that was the general observation by the community). So because of that, I was questioning whether all that heavy graph optimization is really worth it. Numerical stability is another topic, of course. But you can also have some simple logic for that, e.g. implement your own `safe_log`, which checks if the input is `softmax(x)`, and then directly returns `log_softmax(x)`. See e.g. here: https://github.com/rwth-i6/returnn/blob/6cd6b7b3b3d3beb33140...

Btw, graph rewriting in TF is certainly also possible, and not so complicated. But it's not really optimized for that. You cannot rewrite parts of the graph inplace. You would need to create a new copy. (Although, technically, I think it would not be too complicated to allow for more graph rewriting, also inplace. But it was/is just not a high priority.)

About `Scan`: I think the main problem is the API itself. I think it is easier if the underlying op would be `WhileLoop` or so, very similar to `tf.while_loop`. Then everything becomes very natural. However, then you would need some good way to accumulate your outputs, if you actually want to have the logic of `scan`. Sth like `ys = concat(ys, [y])` inside the loop. And then it probably is necessary to have specific optimizations on that to make that efficient. Or introduce sth like `TensorArray`. But in both cases, I think this is easier than working with `Scan` as the underlying op for loops.

Btw, in the blog post, it is written that TF is focusing on dynamic graphs now. While this indeed was an important focus when TF2 was introduced, I'm not sure whether they might take a step back again. Of course this is just speculation. But I think even internally, they are seeing the problems with dynamic graphs, and many groups still use the non-eager mode with static graphs and don't have any intention to switch away from that.