
Visualizing Music with GANs - who-knows
https://twitter.com/xsteenbrugge/status/1188785427295195136
======
reikonomusha
I had trouble connecting the music to the transitions and morphs. I perceived
it just as music overlaid on fun-to-see GAN-generated images, though clearly
that’s not the aim. The music is beautiful and the imagery is intriguing,
however.

~~~
McIceT
This is something I've been working on
[https://www.youtube.com/watch?v=52qWiLoOeIQ](https://www.youtube.com/watch?v=52qWiLoOeIQ)
you should clearly see the connection between the morphs and the music

~~~
jressey
I'm blown away.

So the glasses are always present when the bass/808s are hitting, so is there
something that maps the sound to the images?

What is it about the algorithms that make the images 'dance' so quickly
between the 3.5 beat and the 1? Is it because there are static risers that
move so quickly through the wave spectrum?

Wait... is light skin mapped to when highs dominate and dark skin to lows?

~~~
McIceT
I'm glad you like it! Actually compared to the linked post I don't do any
manual latent-space representation selection. It's just a bit of "smart"
signal processing. I've written a framework that makes it really simple to do
these visualizations (not open-source yet). Here's one more example:
[https://www.youtube.com/watch?v=X4r4njUjE2M](https://www.youtube.com/watch?v=X4r4njUjE2M)

~~~
eufouria
Cool stuff! Would be interested in knowing if and when this library becomes
open source. Where could I follow you?

~~~
McIceT
You could follow me on twitter @tsmcalister. I'll post there once it's
released. Depends a lot on how much time I have to work on it. Hopefully by
the end of the month!

------
sillysaurusx
GANs are an interesting frontier. Videos are much more engaging than photos.
The next logical step is to make a real-time GAN, like a videogame you can
walk around in.

Imagine using a Vive to explore a GAN interactively. You'd be able to control
the GAN using vive controllers and by walking around your room.

Right now it takes 163ms to render a 1024x1024 frame on a K80 GPU. That's 6
FPS, which is within an order of magnitude of 60FPS.

I haven't timed a 256x256 GAN, but presumably it would be 16x faster to
generate. If so, then you'd be able to achieve 98FPS.

The above timings are based on the 1024x1024 FFHQ GAN model, which generates
portraits of humans. [https://github.com/pbaylies/stylegan-
encoder](https://github.com/pbaylies/stylegan-encoder)

And indeed, it looks like the author uploaded an FFHQ music video 14 minutes
ago!
[https://www.youtube.com/watch?v=3TLEfOMBbMw](https://www.youtube.com/watch?v=3TLEfOMBbMw)
It looks cool.

Someone should train a 256x256 FFHQ and make a 90FPS interactive renderer for
it.

Unfortunately it's not possible to take a large GAN like 1024x1024 FFHQ and
only generate a 256x256 image. Each GAN is trained for a specific size, so
you're stuck with 6 FPS at 1024x1024. I wish the FFHQ authors had saved a
256x256 checkpoint during training.

Training a 256x256 GAN from scratch costs somewhere in the range of $150 GCE
credits. But you might be able to bootstrap a 256x256 FFHQ using the weights
from the 1024x1024 FFHQ (aka transfer learning). That might train a lot
faster.

There is also the recent NoGAN technique, which skips progressive growing by
pretraining the generator: [https://github.com/jantic/DeOldify/#what-is-
nogan](https://github.com/jantic/DeOldify/#what-is-nogan) Supposedly it speeds
up GAN training by a huge amount.

------
alexcnwy
This is so incredible and well executed. It's crazy to see how quickly GANs
are moving (e.g. check out this tweet [1] by Ian Goodfellow on 4.5 years of
GAN progress) ... excited to see what they can do a few years from now!

[1]
[https://twitter.com/goodfellow_ian/status/108497359623614464...](https://twitter.com/goodfellow_ian/status/1084973596236144640?lang=en)

------
RootReducer
This is a totally different way of visualizing a track - no GAN here, all done
by hand, but also very cool:
[https://www.youtube.com/watch?v=LvEGgMbTW1s](https://www.youtube.com/watch?v=LvEGgMbTW1s)

------
BasDirks
Insanely cool. Also see this work by the same author:
[https://www.youtube.com/watch?v=6du033n-J3s](https://www.youtube.com/watch?v=6du033n-J3s)

------
autokad
Is there shareable / repeatable code to go behind it? Its cool but without
seeing how it was produced, it could be just as well a fancy output of a video
editing software

------
manuaero
this is nice. Great choice of track too!
([https://www.youtube.com/watch?time_continue=137&v=K-AmcJhqV7...](https://www.youtube.com/watch?time_continue=137&v=K-AmcJhqV7w)

------
captn3m0
So who’s converting this to Milkdrop?

------
withparadox2
Amazing, how to make this happen?

~~~
FpUser
This may be GANs as the author stated but the end result looks surprisingly
similar to a bunch of pixel shaders making transitions between source and
target images with said transitions driven either by pure algorithms and/or
derived from blurred images themselves.

I've implemented music visualizer ages ago using similar concepts (pure algos
though, no real images). It happened when nVidia released the first affordable
consumer video card with decent shader support. I think it was 6600GT . My
animation part that made video dance to music was a bit more sophisticated
though.

~~~
pierrec
Regarding the music synchronization, OK, this is ancient stuff.

However, in terms of graphics, this strikes me as different from anything that
was possible before the recent advances in GANs. During the era you're talking
about, the art of shader-based music visualizers was being pushed by projects
like Milkdrop 2, and nowadays a lot of similar research still happens on
Shadertoy, and the demoscene, of course, hasn't stopped blowing people's
minds.

But this is on another level entirely. It's as is the content and seemingly
human concepts themselves are being smoothly animated.

~~~
FpUser
_this strikes me as different from anything that was possible before the
recent advances in GANs_

\- well this is because you did not see my vis. It looked just like the one
you saw on GAN's related link with similar transitions. Except that all
imagery was generated by math formulas running in pixel shaders instead of
ready bitmaps/videos.

Here is the actual screenshot:
[https://exsotron.com/exs_files/exvis-0003.jpg](https://exsotron.com/exs_files/exvis-0003.jpg)

Actually I played with the actual music video clips as a source of the imagery
and the results were really cool but obviously other then experiment at home
could not really do this part due to copyrights etc.

------
carlbarrdahl
beautiful! I enjoy how the morphs surprises me, making them hard to predict.

------
sladix
Amazing work ! Congrats !

