
[1901.11390] MONet: Unsupervised Scene Decomposition and Representation - pplonski86
https://arxiv.org/abs/1901.11390
======
rsaha
Is the second equation in the paper correct, shouldn't the mask be just the
output of the attention network?

