
FastPhotoStyle from Nvidia - scraft
https://github.com/NVIDIA/FastPhotoStyle/blob/master/README.md
======
dbranes
I'm probably missing something obvious here, maybe someone can explain the
following to me.

\- Their approach is a composition of 2 steps, what they call "stylization"
and "smoothing".

\- Top left of 2nd page they claim: "Both of the steps have closed-form
solutions"

\- Equations 5 is the closed form solution for the "smoothing" step.

My question: Where's the closed-form solution for the stylization step that
they're claiming?

Are they calling equation 3 a closed-form expression? In this case the title
and the claim in the introduction are rather misleadinng, because computing 3
requires you to train autoencoders.

~~~
saurik
You don't train it for every image; in this way, a neural network often is a
"closed-form solution": it provides you an equation, admittedly a very
convoluted one, which can be used to obtain its solution, admittedly usually
an approximation, in a finite amount of time. The normal solution to this
problem (according to the paper) is an iterative technique "to solve an
optimization problem of matching the Gram matrices of deep features extracted
from the content and style photos", whereas this one is simply two passes:
stylization and smoothing.

~~~
dbranes
Not sure if I understand, don't every neural network ever produce some
approximation in finite time? In what sense is this approach "closed-form"?

~~~
nmca
Previous stylisation was slow because it needed to SGD optimisation _for each
image to be stylised_. This uses a NN _trained once_. When you've trained a NN
it is precisely a closed form solution, in the style y = max(0, 3x + 4).
However they are normally a little longer to wrote down :P

~~~
dbranes
Ah okay right this is the answer. Previous approaches [1] are deep generative
models that you have to optimize for each input, whereas here you run just a
forward evaluation on a model that you've trained beforehand.

I would still argue the term closed-form is misleading here, because:

\- Even during training at any given time you can read off a "closed-form
expression" of the neural network of this type, so closed-form in this broad
sense really doesn't mean much. Furthermore any result of any numerical
computation ever are also closed-form solutions according to this, on the
grounds that they result from a computation that completed in finite number of
steps. So really whenever you ask a grad student to run some numerical
simulation expect them to come back saying "Hey I found a closed-form
expression!"

\- The reason the above is absurd is that these trained NN's aren't really
_solutions_ to the optimization problem, but _approximations_. So this is
really saying I have a problem, I don't know how to solve it but I can produce
a infinite sequence of approximations. Now I'm gonna truncate this sequence of
approximations, and call this a closed form solution.

The analogy in highschool math would be computing an infinite sum that doesn't
converge, but now let's instead just add to some large N, and call this a
closed-form solution.

[1] e.g.
[https://arxiv.org/pdf/1508.06576.pdf](https://arxiv.org/pdf/1508.06576.pdf)

~~~
nmca
Actually, I agree with you. Initially you seemed to object to the term "closed
form"; this now highlights the more pertinent point - these models are 100%
closed form, but 0% "solution" in the formal sense.

------
scribu
Notice that all of the examples ilustrated in the paper contain similar
scenes. The content image is a building, while the style image is also a
building. Or an image of trees is styled using another image of trees.

But how well does it fare when you give it an image of a house and an image of
something completely different, like a dog or a slipper?

~~~
JackFr
What would you expect the outcome to be?

What is the correct answer to a question that's not well-formed?

~~~
semi-extrinsic
The interesting question, then, is how far off can this be and still work? Is
the limit "reasonable", or is there room for improvement of the algorithm?

E.g. I think most humans would say taking this content picture:

[https://wallpapershome.com/images/pages/pic_hs/10150.jpg](https://wallpapershome.com/images/pages/pic_hs/10150.jpg)

and styling it with this picture:

[https://c2.staticflickr.com/4/3499/3876547311_c2e32759d9_z.j...](https://c2.staticflickr.com/4/3499/3876547311_c2e32759d9_z.jpg?zz=1)

is a pretty well-posed operation. How does that look using this algorithm?

~~~
IanCal
Your first link just redirects to their homepage for me, can you explain which
picture it was?

~~~
yorwba
It shows a red crab on a beach in front of the bright blue ocean with a blue
sky and white clouds.

I guess transfer of the wooden house amidst yellow fields with a reddening sky
might lead to a wooden crab on a yellow field in front of a reddish-yellow
ocean with red sky and clouds, or something.

~~~
yorwba
It actually looks better than expected:
[https://imgur.com/a/5BjvC](https://imgur.com/a/5BjvC)

~~~
IanCal
Looks nice!

Did you have to do anything extra to get it working? I've set things up
according to the documentation (I think), but I get dimension size errors when
running it.

~~~
yorwba
Haha, yes, I had to rewrite their code a bit. All the
.unsqueeze(1).expand_as(...) in photo_wct.py need to be replaced by just
.expand_as(...) and the return value of __feature_wct needs to be wrapped in
torch.autograd.Variable.

I'm going to submit a PR, but it took me a bit of experimentation to fix these
errors, so the code is a bit messier than I'd like.

~~~
IanCal
Ahh that looks like the error I was hitting, thanks. I might try replacing the
bits as well, though I just upgraded pytorch from 0.1.12 to 0.3 and it became
much slower (I killed it after 5-6 minutes of setup).

~~~
yorwba
My fork is here:
[https://github.com/Yorwba/FastPhotoStyle](https://github.com/Yorwba/FastPhotoStyle)

I was using the pytorch 0.1.12 installed with conda (following their USAGE.md)
and it took ~30s total for the transfer.

~~~
IanCal
Much appreciated thanks!

For some reason it's taking me about 4-5 minutes for the transfer, but the
code now runs and the rest of the runtime is only a few seconds.

------
ttoinou
Thoses interested in that technology : I made two videos 18 months ago, No
optical flow and youtube compression kills everything but still decent if
watched in 4K on a big screen :)

[https://www.youtube.com/watch?v=2YRVt80g2Ek](https://www.youtube.com/watch?v=2YRVt80g2Ek)

[https://www.youtube.com/watch?v=i69cBYI6f-w](https://www.youtube.com/watch?v=i69cBYI6f-w)

------
TD-Linux
I'm going to be that person - why a non-OSI approved license? Given that it's
CUDA-specific, I'd expect NVIDIA to want people to use it.

~~~
dingo_bat
> Licensed under the CC BY-NC-SA 4.0 license

Seems fine to me. If you want to develop something commercial you'd roll your
own anyway. Nothing else is restricted by this license.

~~~
daeken
Consider artists. There's a tremendous potential in using technology like this
in art, and preventing someone from selling their works will often put them
off of using it at all.

~~~
mfgmfg
What does the license of the product have to do with the output of the
product? You can use GIMP and GCC commercially, for example and libraries used
with GCC often have runtime exemptions for their output

~~~
daeken
Because this tool is licensed non-commercial. Using it for art that you sell
would be a commercial use, and a violation of the license.

~~~
andybak
Hmmmm. Does the licence of the tool affect the output from the tool? Photoshop
is propriety but Adobe doesn't have to explicitly grant me rights to the work
I create with it.

~~~
bb88
Usually no, unless say, the tool put some part of itself in the output.

The license of GCC doesn't affect the license of your binaries.

The license of python doesn't affect the license of your software.

etc.

------
SeanBoocock
I wonder whether this could be applied to a real-time scenario. Modern Real-
time renderers for games will often have a tone mapping step that let artists
color grade the final output. The paper cites a 11+ second runtime for 1K
inputs, which is orders of magnitude off what it would need to be, but perhaps
a simpler version run on the GPU is feasible.

~~~
josephpmay
Notice that the research was done by Nvidia

~~~
piracykills
Nvidia is pretty big in the machine learning space in general, not game
specific these days - GPUs are pretty general purpose highly multithreaded
number crunchers and Nvidia's been making moves further in this direction with
their own CUDA-based training tools, the DGX-1, the Jetson and other products.

------
supermdguy
Paper this is based on:
[https://arxiv.org/pdf/1802.06474](https://arxiv.org/pdf/1802.06474)

It's really great that NVIDIA is releasing code for their deep learning
research.

------
milanfar
(a) This problem is long known as color/contrast transfer, and it was solved >
10yrs ago, (b) the results shown in this paper aren't objectively or
subjectively better/more photo-realistic than Kokaram etc.'s work; and (c) I
question whether this task even requires deep learning at all.

[https://francois.pitie.net/colour/](https://francois.pitie.net/colour/)

------
milanfar
This problem is long known as color/contrast transfer, and it was solved more
than 10yrs ago. The results shown aren't objectively or subjectively
better/more photo-realistic than Kokaram etc.'s work which is far simpler. I
question whether this task even requires deep learning at all.

[https://francois.pitie.net/colour/](https://francois.pitie.net/colour/)

------
p1necone
These very low res example images aren't particularly useful for judging how
good this actually is.

------
grondilu
Only tangentially related, but has anyone ever tried to apply style-transfer
on human faces for artificial aging or rejuvenation? Like for the movie
industry or something?

~~~
robbomacrae
FaceApp does this (inc gender swapping) and its quite fun for an hour or so of
messing around.

------
dingo_bat
The examples seem to be too good to be true. I don't have a GPU lying around
so I cannot try it unfortunately.

~~~
medhir
paperspace provides pretty easy setup cloud GPUs for ~$0.40/hr if that's of
interest :)

------
andybak
The only machines with decent GPUs in them I have access to run Windows and
Windows Subsystem for Linux doesn't allow GPU access. Other than dual-booting
or running Linux in VirtualBox - is there any way I can try this?

~~~
exDM69
None of the dependencies seemed to be Linux-specific at a quick glance. You
might be able to install all that on Windows (not sure how pleasant experince
it'll be).

Virtualbox won't help you, because you can't give proper access to the GPU for
the VM guest unless you set up PCI-e passthrough and dedicate your whole GPU
to the VM guest (and use your integrated graphics for the host). Not sure if
this is even possible if Windows is the host.

If you don't feel like setting up a Linux install on your box, you could try
some of the GPU cloud services.

~~~
ATsch
Also I am told the proprietary nvidia drivers have a software lock that
prevents you from using GPU passthrough unless you buy certain more expensive
models.

~~~
mtreis86
There is a work around. A number of GeForce cards gave the exact same chipset
as a Quadro card but with a resistor pulling down an external pin. That
resistor can be changed to make the card identify as a Quadro.

[http://www.eevblog.com/forum/chat/hacking-nvidia-cards-
into-...](http://www.eevblog.com/forum/chat/hacking-nvidia-cards-into-their-
professional-counterparts/msg207550/#msg207550)

Apparently this can also be done from software

[http://archive.techarp.com/showarticleefc1.html](http://archive.techarp.com/showarticleefc1.html)

~~~
exDM69
This is just spoofing the PCI VID:PID numbers to the driver and relying on
driver bugs(?) to function. You could do the same with a few lines of kernel
hacks far easier than soldering. It does not enable any features that are
fused off in the hardware. This setup is not reliable.

Also, these posts are from 2008 and 2013, 5 and 10 years old. These hacks
probably don't work any more.

------
JeffreyKaine
Is it really all that hard to have a demo site for these things? It would be a
lot of fun to play with crossing pictures. I'm guessing it's because using a
graphics card in the browser isn't good enough yet?

~~~
volker48
I'm not sure how fast their FastPhotoStyle approach is, but a TensorFlow
implementation of the original neural style transfer can take upwards of 20
minutes to create the final stylized image. If someone had the pre-trained
model and neural net code in JS to read it and you could do it all client side
then it would be possible, but still very slow.

~~~
ehsankia
The tech has come a long long way since the original, even before this
FastPhotoStyle project.

A few months ago, there was TensorFire [0] that was able to do it in the
browser. Quick google also gives other results [1]. There's also many apps
that can do it in seconds. Speed definitely isn't an issue anymore, but
getting it to work in browser can be tricky.

[0] [https://tenso.rs/demos/fast-neural-style/](https://tenso.rs/demos/fast-
neural-style/)

[1] [https://reiinakano.github.io/fast-style-transfer-
deeplearnjs...](https://reiinakano.github.io/fast-style-transfer-deeplearnjs/)

------
krn1p4n1c
That top left style will be perfect for the family xmas photo.

------
limaoscarjuliet
Is there some research doing the same in voice area? \- Fix/change accent, \-
Improve person's voice, \- Perhaps even make one sound like another.

~~~
STRiDEX
I saw a clip from adobe a while ago

[https://www.youtube.com/watch?v=I3l4XLZ59iw](https://www.youtube.com/watch?v=I3l4XLZ59iw)

------
abledon
Has anyone had luck using this for their tinder profile?

~~~
skocznymroczny
Unfortunately, it only transfers style, not attractiveness

------
jczhang
I assume you need a Nvidia card for this? Also has anyone tested it and seen
how long it takes to render?

------
ivanceras
>Preparation 1: Setup environment and install required libraries >Python
Library Dependency

> conda install pytorch torchvision cuda90 -y -c pytorch

What is conda? How do i install it on ubuntu 16.014?

~~~
dagw
Conda is basically an alternative to pip and virtualenv, used by the Anaconda
python distribution that's really popular in the data science and machine
learning community. The easiest way to get it is to install miniconda:
[https://conda.io/docs/user-
guide/install/linux.html](https://conda.io/docs/user-guide/install/linux.html)

------
ttoinou
Is it faster than previous implementation ?

~~~
koverda
Looks like it's a lot faster. They compare their approach to the Luan et al.
approach, and for a 1024x512 image, they are about 30-60x faster. They also
seem to be more accurate with better results.

~~~
ttoinou
Oooooh no I'm going to get back on nerding this 100 % of my time :'(

------
dharma1
what's the max resolution with this?

------
simonhamp
What witchcraft is this?

~~~
sannee
Of course there is a relevant XKCD:
[https://xkcd.com/1838/](https://xkcd.com/1838/)

