
GitHub Repository for Video Generation with Deep Neural Networks Released - bernhard2202
https://github.com/bernhard2202/improved-video-gan/
======
unwind
Mods: please edit the title, it has both a typo ("Repositoriy") and is quite
unclear. The repo seems to be related to a a research paper, code and data to
aid in reproducing the results I guess.

That said, the readme needs tome editing too. This sentence made me recurse
for a while:

 _Every frame needs to be at least 64x64 pixels and contains between 16 and 32
frames._

------
amelius
I was just looking at Figure 5 in the paper, and I'm not terribly impressed.
The fixed-boxes example has boxes which overlap mostly with sky, and the
system reconstructs that the sky is blue, but all of the surroundings are also
blue. In the other examples, part of an airplane is masked out, but the
reconstruction looks rather vague. And worse, you can easily tell where the
original masked out area was.

~~~
deepmonkey
Yes, but this is entirely unsupervised, and happening with GANs that are
generating the entire clip from scratch instead of keeping its spatial
dimensionality. I think it was more the point to show that this is possible at
all, even if the video has moving background.

------
alew1
Am I missing something in the paper, or is iVGAN actually just a vanilla
Wasserstein GAN? Is the contribution of the paper “we tried GAN on video and
it worked”? That’s still a useful result, but what justifies the new iVGAN
name? Perhaps the specific network architecture they’re using?

~~~
firefred
How I understood it, the name refers this NIPS 2016 paper
([http://carlvondrick.com/tinyvideo/](http://carlvondrick.com/tinyvideo/)) in
which video-generation happens in a two-stream architecture; therefore,
relying on a static background. The approach posted here is generating videos
in a single stream from what I see. Therefore it does not rely on such
assumptions. This is possible by their architecture, on the one hand, and
optimization within the WGAN framework on the the other hand. The dropped
assumption on the background also allows using the architecture for different
applications if you look at their homepage.

~~~
alew1
Thanks. I can see that this is a useful experiment, and I'm glad that the
authors did it. It's just not clear to me that their model is new or needs a
special name; isn't this single-stream approach just the simplest
interpretation of "using a WGAN to generate video" (just generate all the
pixel data)? If anything, the two-stream architecture you link to seems like
the special case/new idea.

~~~
firefred
From my experience, I would say its the regular approach yes. But, since all
pixels are generated and not copied from a static background image it is also
much harder and probably unstable. Apparently it was not possible so far,
otherwise, I cannot explain the gap in publications since the NIPS-2016 paper.

------
almostdigital
Great work!

