
Writing RNNs in Tensorflow - jacobianjacob
http://n-s-f.github.io/2017/07/10/rnn-tensorflow.html
======
ganeshkrishnan
Are RNNs ideal to process non-textual time series information?

We are looking to replace our Arima models with RNNs and the results so far
has been far from satisfactory.

The usecase is: based on sale quantity in past year, predict the sale quantity
tomorrow.

Regression does not consider weekdays or weekends or similar bumps and we
thought RNN w/LSTM would be well suited for this problem

~~~
xgb84j
I would try to include additional indicator features for weekdays or weekends
directly into your Arima model.

Also I believe that RNNs are useful mainly for highly non-linear problems.
Non-linearities in a problem such as sales forecasting are best handled
through interaction terms or by including non-linear transformations (e.g.
logarithm) of existing features into your model.

~~~
ganeshkrishnan
The problem with ARIMA is that it's more of an exponential regression rather
than deep learning. We already have added weekday,weekend,month variable to it
but its more of regression than deep learning.

So you are saying that RNN won't be suited for non-linear sales forecasting?

~~~
xgb84j
I don't have the details of your exact use case, but I cannot imagine any
complex non-linearities involved in your sales process. If ARIMA models
produce decent results for your use case I would try to improve the ARIMA
model through additional data, rather than switching to deep learning.

If you are convinced that there are complex non-linearities that an ARIMA
model cannot describe, then I would try to use RNNs to find a pattern in your
ARIMA models' residuals and try to augment your ARIMA model with an RNN.

~~~
nonbel
>"I cannot imagine any complex non-linearities involved in your sales process"

Knowing nothing about his sales process and features, I would assume there are
many complex non-linearities (to be possibly leveraged for better
predictions). I find this statement bizarre.

~~~
xgb84j
What I mean by this statement is that there are lots of "tricks" such as
interaction terms, regime switching and non-linear transformation of features
to handle non-linearities in linear models (e.g. different food sold before
Christmas).

But if you can give me an example of a non-linearity in sales forecasting that
cannot be fit by a linear model but can by an RNN I'd honestly be really
interested in that.

------
greato
I read the article and seems to be well-written though lacking.

For even more customized RNNs such as attention mechanism, beam search as in
Seq2Seq, you'll need to skip the tf.dynamic_rnn abstraction and use a symbolic
loop directly: tf.while_loop

~~~
fdrdrive
I think that's covered in the article - there's a passage on using `tf.scan`
when the `tf.dynamic_rnn` abstraction won't cut it. `tf.scan` is more flexible
than `tf.dynamic_rnn`, but provides a little more scaffolding for RNNs than
using `tf.while_loop` directly.

~~~
greato
Using tf.scan is a _bad_ idea.

scan implements strict semantics so it will always execute the same number of
timesteps no matter what the accumulator is (nan).

while_loop implements dynamic execution (quit once cond is not met) and at the
same time allows parallel execution when some ops are not dependent on
accumulator.

If you read the code for `dynamic_rnn` and contrib.legacy Seq2seq model you'll
find while_loop. I have yet to see tensorflow library code using tf.scan
anywhere!

Also, internally, scan is defined using while_loop. In my code, I find scan
lacking in RNN and always have to fall back to while_loop.

Here is video of a talk by the RNN/Seq2Seq author himself:

[https://youtu.be/RIR_-Xlbp7s?t=16m3s](https://youtu.be/RIR_-Xlbp7s?t=16m3s)

~~~
fdrdrive
I don't follow. tf.scan will execute as many time steps as there are elements
in the input series, which is the same behavior you'd get with tf.while_loop
or tf.dynamic_rnn. It does not execute for a fixed number of time steps, which
I think is what you're implying?

The difference from using tf.while_loop directly is that tf.scan handles the
logistics of an accumulator to keep track of hidden states, so you don't have
to implement that piece yourself.

As you say, tf.scan uses tf.while_loop internally; it's not particularly
different from something you might build using tf.while_loop yourself.

~~~
greato
In neural translation seq2seq, using while_loop in the decoder RNN saves a lot
of GPU time because it can quit early when a sentence ends.

~~~
fdrdrive
I see - you're talking about a use case like this:
[https://github.com/google/seq2seq/blob/4c3582741f846a19195ac...](https://github.com/google/seq2seq/blob/4c3582741f846a19195ac62a5867cfc90a9aa903/seq2seq/contrib/seq2seq/decoder.py#L279-L288)

I agree that you have to use a tf.while_loop in those cases. But then tf.scan
isn't an option, so I don't understand what you mean by 'quit early' or 'saves
time'.

When tf.scan is possible, i.e. when you have an input sequence you want to
scan over, it is a perfectly good option.

~~~
greato
Unless you want to execute the structure on multiple GPUs.

~~~
fdrdrive
I don't understand how that's related.

