
Autoencoding Blade Runner: reconstructing films with artificial neural networks - arto
https://medium.com/@Terrybroad/autoencoding-blade-runner-88941213abbe
======
astrange
> The model also struggles to make a recognisable reconstruction when the
> scene is very low contrast, especially with faces.

It could be getting this wrong if his error function is calculating linear
data from the given image pixels, which are in the totally not linear sRGB
colorspace. That would make it badly underestimate any error in a dark image.

Quick check of the PIL docs doesn't mention gamma compensation, so they
probably forgot about it. People usually do.

~~~
svantana
Actually, it would be even worse if it was run on gamma-corrected
(proportional to light intensity) pixels. The sRGB space is designed to
approximate human perception, which is only advantageous for this application.

------
leecho0
The autoencoder converts an image to a reduced code then back to the original
image. The idea is similar to lossy compression, but it's geared specifically
for the dataset that it's trained on.

According to the defaults in the code, it uses float32 arrays of the following
sizes:

    
    
      image: 144 x 256 x 3 = 110,592 
      code:  200
    

Note that the sequence of codes that the movie is converted to could possibly
be further compressed.

~~~
fratlas
This should be mentioned in the article, I was looking for a simple
mathematical comparison.

~~~
terencebroad
Point taken! I have edited the article now, there has obviously been some
confusion and was an oversight on my part not to have explained that properly.

------
argonaut
Correct me if I haven't looked into this closely, but one glaring problem is
that all the results are from the training set. So it's not surprising you get
something movie-ish by running the network over a movie _it was trained on_ ;
the network has already seen what the output of the movie should look like.

~~~
aab0
He only trained it on Blade Runner, I think. (This is doable because a single
movie has a lot of frames.) So all of the other movies should be out of
sample.

~~~
argonaut
Some of the other movies (the really bad examples) are out of sample. The last
example is also from training set, though.

------
DennisAleynikov
This is incredible, and almost sounds like the algorithm dreamed up by Pied
Piper's middle-out. Incredible application of machine learning technology.

------
e12e
I'm a little confused by the article: it appears to me that the input to the
neural net is a series of frames, and the output is a series of frames? So it
works as a filter? Or is the input key-frames, and so the net extrapolates
intermediary frames from keyframes?

[ed: does indeed appear from the github page, that the input is a series of
png frames, and the output is the same number of png frames, filtered through
the neural net. No compression, but rather a filter operation?]

~~~
raverbashing
When one talks about autoencoding it usually means "compression"

I think it's doing something like this (but more complex)
[https://cs.stanford.edu/people/karpathy/convnetjs/demo/autoe...](https://cs.stanford.edu/people/karpathy/convnetjs/demo/autoencoder.html)
where you have a bottleneck on the network

------
failrate
What I found most interesting was that A Scanner Darkly, which is rotoscoped,
looked live action in several of the coherent frames that had been filtered
through his Blade Runner-trained network.

------
stared
Correct me if I am wrong, but it is not so much "reconstruction" as
"compression". (Or I got it wrong. Or the description is utterly unclear:
reconstruct what from what.)

If it is the compression case, I am curious for the size of the compressed
movie.

------
renaudg
I'm a developer who knows nothing about AI but is fascinated by the recent
painting/music/"dreaming" applications of it.

What would be some good resources for 1. getting the bare minimum knowledge
required for using existing libraries like Tensorflow 2. going a bit further
and having at least some basic understanding of how most popular ML/AI
algorithms work ?

------
aardshark
What is the difference between using a neural network to do this and using a
filter that obtains the same or similar effect by distorting the frames of the
input randomly?

I guess I feel like there's no practical result here. It's only interesting
from an aesthetic point of view.

Am I being unfair?

~~~
joe_the_user
After reading the article, I'm still not sure what the purpose of the training
is. If they're trying to reconstruct a film from stills, it seems like a
failure since it looks like they wind-up with all sorts of swirly stuff rather
than, say, the original film.

If they're trying to create interesting swirly stuff, where do they intend to
go after that?

I mean, sure it's aesthetic though not on the level of weirdness of deep
dreams modification.

~~~
visarga
I am not an expert, but: This approach creates powerful embeddings for images.
It can convert an image into embedding space and vice-versa, generate images
back from embeddings. It is built to function like a perception and
imagination module. The embeddings are much lower dimensional and the latent
variables are disentangled. There is a component for "has glasses" which you
can flip and get the same image + glasses, for example. It is obvious this
would be very useful in building all sorts of classifiers, image generators
and agents (because agents need to compute reward over state and action space
and disentangled representations of the state space are good for this task).

~~~
argonaut
The article does not evaluate the quality of the embeddings, so there's not
much to say about all of that.

------
xt00
It's a cool thing this guy did.. It would be interesting to see how small the
files are that are generated in this process.. Just low pass filtering the
video like somebody else suggested would probably achieve a similarly lossy
image. I guess what I take away from this is that maybe the way we store info
in our brains looks kind of like this? Kinda fuzzy versions of the real thing?

Would be interesting if somebody one of these days can actually reconstruct to
a high level of fidelity what our brain is "seeing".. I bet it would look kind
of like this..

~~~
nathancahill
2011: [http://news.berkeley.edu/2011/09/22/brain-
movies/](http://news.berkeley.edu/2011/09/22/brain-movies/)

There's probably newer research out there too.

------
Shivetya
on a side note, I never heard this voice over version before. I am only used
to the Harrison Ford voice over. This one lets me understand why many didn't
like the VO.

Now back to the article, can someone explain about how many passes before it
gets to near film quality? Can it extrapolate missing frames eventually?

~~~
pvg
That particular voiceover is from the theatrical trailer, it wasn't in any of
the 125050123.07 versions of the movie. You can see it here:

[https://www.youtube.com/watch?v=4lW0F1sccqk](https://www.youtube.com/watch?v=4lW0F1sccqk)

------
alivarys
The article lacks some details (I guess many can be found in cited papers),
but it definitely seems to be a giant step toward usable large scale image
analysis (providing a meaningful description). Maybe this could benefit
google's new cpu...

------
listic
What are the potential applications for an autoencoder, aside from an exercise
in neural networks?

Lossy compression with super high compression rates?

------
bcheung
Very interesting. This makes me wonder if a similar technique can be used for
compression?

~~~
aab0
Absolutely. The point of an autoencoder is dimensionality reduction: boil a
big set of data down to a few hundred or thousand numbers in a vector which
summarizes it. You could treat it either as lossy compression and store just
the encoding, or you can treat it as a hybrid format in which the autoencoder
lossy encoding is then corrected to lossless by additional bits in the stream.

In practice, even the hyper-efficient compression algorithms used in something
like zpaq tend to use only very small shallow predictive neural networks
because no one wants to wait days for their data to be compressed or ship
around big neural nets as part of their archives, so it's more of an
information-theoretic curiosity. Few enough people will even use 'xz'.

~~~
vintermann
You could, but I don't think this would be competitive on compression ratio at
all, even allowing for an order of magnitude more time.

~~~
sp332
The technique is very effective, after all a variant of paq holds a
compression record. [http://prize.hutter1.net/](http://prize.hutter1.net/) You
can try paq yourself in PeaZip.
[http://filehippo.com/download_peazip_64/](http://filehippo.com/download_peazip_64/)

~~~
vintermann
Last I checked, PAQ only uses a shallow (two layer) neural network as a last
step to weight the predictions from the multiple handmade next-bit prediction
models it contains.

~~~
sp332
So, you think the additional accuracy of the larger neural network isn't
enough to overcome the storage size of the network itself?

