
Deep Photo Style Transfer - mortenjorck
https://github.com/luanfujun/deep-photo-styletransfer
======
dvcrn
This is super impressive and something that I didn't think would be possible
without someone very skilled in photoshop going over the images.

As a photo enthusiast, I am very excited about this, but also a little worried
that soon very simple apps are capable of doing the craziest of edits through
the power of neural nets. Imagine the next 'deep beauty transfer', able to
copy perfect skin from a model onto everyone, making everything a little more
fake and less genuine.

The engineer in me now wants to understand how to build something like this
from scratch but I think I'm probably lacking the math skills necessary.

~~~
hnarayanan
Here is an (in progress) article I am working on that might help you:
[https://harishnarayanan.org/writing/artistic-style-
transfer/](https://harishnarayanan.org/writing/artistic-style-transfer/)

Repository of explanatory notebooks: [https://github.com/hnarayanan/artistic-
style-transfer](https://github.com/hnarayanan/artistic-style-transfer)

~~~
goldenkey
You need to make a correction. The map isnt linear with the addition of the b
vector. Its an offset linear map.

~~~
chestervonwinch
a.k.a.,
[https://en.wikipedia.org/wiki/Affine_transformation](https://en.wikipedia.org/wiki/Affine_transformation)

~~~
goldenkey
Yes thank you for the proper name!

------
jjcm
One interesting thing that may help the final output quality is preserving the
detail layer of the original image, and then applying that to the output
image. Here's my quick attempt at it:
[http://dev.jjcm.org/tonetransfer/](http://dev.jjcm.org/tonetransfer/)

I basically just used a technique called frequency separation - it's extremely
quick with the most computational part being a gaussian blur, and it allows
you to separate detail from tone into two separate layers. From there I just
took the detail layer of the original image and applied it to the tone layer
of the output image.

~~~
epigramx
Their implementation appears to lose a lot of detail in many examples. This is
especially true on human faces or pictures that include human faces. Can you
reproduce an improvement there?

~~~
jjcm
I didn't see any examples of human faces, can you link me to one? Happy to try
it out and see the results.

------
vwcx
I am a professional photo editor for a major magazine. Despite what the title
sounds like, I spend much of my day sending photographers on assignment and
sourcing images rather than manipulating files. I often wonder how my career
will evolve in ten years; the writing is on the wall for my job, given the
state of the media market.

Looking at this, I now see a future as a forensic imaging specialist. At some
point, the algorithm will get pretty damn good, and it will be my job to look
for tells -- cracks in the generated image where reality doesn't quite line
up. The question will be whether I am seeking out these abnormalities to cover
them up or to call them out.

~~~
lucidrains
you may need to train some powerful discriminator networks to do that job
instead of relying on the one in your head soon..

------
jshmrsn
It almost causes me goosebumps to think about how if someone asked me "imagine
this clear sky, but with this red sunset over here" I could very plausibly
come up with a similar result as shown in these examples.

The transfer of the flame to the blue perfume bottle looks like a very
practical way to prototype marketing images.

~~~
aorth
Yes! I imagine being like, "Damn, I always wanted to see Venice when it was
snowing, but alas it was sunny during my short visit."

~~~
paradite
Or even better, snow in a tropical country, which can never happen in real
life.

~~~
maury91
You are wrong it happened this year,
[http://www.italymagazine.com/sites/default/files/styles/940x...](http://www.italymagazine.com/sites/default/files/styles/940xauto/public/neve_sardegna_alghero.jpg?itok=NqGQKioM)

~~~
dualogy
Sardinia isn't a tropical locale by a long shot

------
rcthompson
This is really cool, but the only thing I can think right now, the question
that's eating away at my soul, is: why did it color-cycle the apples?

~~~
isoprophlex
There are also semantic masks applied (can be found in the github repo), I
think that's how the Colorado swapping is done.

~~~
rcthompson
Ah, I see. They're right here:

Input mask: [https://github.com/luanfujun/deep-photo-
styletransfer/blob/m...](https://github.com/luanfujun/deep-photo-
styletransfer/blob/master/examples/segmentation/in23.png) Output mask:
[https://github.com/luanfujun/deep-photo-
styletransfer/blob/m...](https://github.com/luanfujun/deep-photo-
styletransfer/blob/master/examples/segmentation/tar23.png)

So, presumably these masks were drawn manually. The three different colors in
the input mask tell it to train 3 different models from the masked regions,
and then the same three colors in the output mask tell it where to apply each
trained model. So it trains a model for each apple and then applies that model
to the next apple over. It's not just randomly swapping colors on the apples.

------
gedy
Wow. Our kids may be able to walk around with glasses and see the world styled
in real-time as they please - perpetual sunshine, gloom, night, whatever..

~~~
drusepth
I actually really hope this will turn out to be true.

I'm unfamiliar with the processing power required to run this on a single
image, let alone a video feed. Does anyone have any back of the napkin
estimations that might show just how far away AR glasses are from this?

~~~
coldtea
> _I actually really hope this will turn out to be true._

Yeah, screw reality.

~~~
drusepth
Reality is nice. Being able to augment it in various ways would also be nice,
and has a ton of applications across a ton of fields.

AR games and hollywood filters are the obvious go-tos, but I wouldn't be
surprised to see seemingly random industries like therapy see benefits (a few
applications that come to mind are brightening up the world to combat SAD,
tinting views to ease aggression, or even Black Mirror-esque "blocking" of
people who upset or anger you IRL if you want to trend towards dystopian
tropes).

Personally, I just want to up the world's saturation to max and chill with
rainbows and bright colors while I'm otherwise productive. Seems like having
fun filters would (or could) make life mildly better for many people going
about their usual day.

~~~
coldtea
> _Personally, I just want to up the world 's saturation to max and chill with
> rainbows and bright colors while I'm otherwise productive_

That would mostly lead to overstimulation and desensitization. People get
bored with everything they have immediate and easy access to.

~~~
drusepth
Of course, doing anything in moderation is obviously important.

As long as it were an optional thing (i.e. not forced on people like the
traditional dystopian take), I don't see how letting people choose to change
how they see the world would be a guaranteed negative outcome. Especially
since that's pretty much literally the advice in every self-help book: look at
things from a new perspective, get a new perspective on life, surround
yourself with positive things, go where the scenery makes you most happy, be
more cheerful, fake it 'til you make it, etc.

Obviously it could be abused by some (overstimulation/desensitization don't
sound so bad compared to alternatives), but I imagine overall it'd be a net
positive on the lives of more.

I guess we'll just have to wait and see!

------
npgatech
I think it would be amazing if Adobe incorporated some of these projects
(Neural Style, etc.) as part of their "Creative Cloud" offerings...actually
compute it in the cloud and return the result back to Photoshop.

~~~
Jhsto
I don't know whether you noticed it, but the paper reveals it is co-authored
by people from Adobe. There very well might be intentions towards that later
on.

~~~
ttoinou
That might take years

~~~
ttoinou
@dagwTrue @coldtea True However the technology is not ready yet. Takes too
long to render* and needs too much VRAM (=> limited resolution, varying on the
current GPU).

But a consumer-grade plugin with an online renderer might be feasible for now.

* Fast solutions include a pre-training step that can take days. Would be weird to ask the user to do this

~~~
wyldfire
> Would be weird to ask the user to do this

Alternatives to training include downloading the trained model. It could be a
big download but an overnight download is not that weird to ask users to do.

~~~
hoschicz
> Alternatives to training include downloading the trained model. It could be
> a big download but an overnight download is not that weird to ask users to
> do. Or keep the pretrained network on superpowerful cloud machines.

------
vwbuwvb
notice how the lighting in the style-transferred images isn't physically
plausible according to the target images. For example, in the 5th example the
house is still lit as in the original, as if by sunlight not by spotlights.
Maybe that's why they've chosen to showcase examples with flat or ambiguous
lighting (like the nighttime scenes or the autumn scene). The DNN doesn't
model the physical reality of the scene, it doesn't get that it's a 3d world,
it simply transports a high-dimensional vector (the 2d image) from one space
to another. What our imaginations do is map that 2d image back into 3d before
transforming it.

~~~
phkahler
And the city images that are styled into night were originally taken near dusk
so the lights inside the buildings were already on but not very pronounced. It
will not take a full daylight scene and decide where to turn the lights on at
night.

------
agumonkey
I feel overwhelmed by the domain; as an ex photoshop addict, it's already
above "complex job" level. I wonder if people feel bad about AI "stealing
their hobs" (not a joke).

~~~
intoverflow2
I feel Photoshop/Illustrator and 2D design in general is one of the only
industries that has been pretty much stagnant for 15 years or so due to
monopolies.

Compare it to 3D where artists are drowning in new tools and techniques every
year, any movement in this space is a good thing.

Worth remembering there was a time when the heal brush didn't exist, a time
when the clone brush didn't exist and even a time where the physical airbrush
didn't exist.

~~~
agumonkey
Illustration doesn't need much tech. It has more to do with human emotions and
symbols.

~~~
intoverflow2
FYI although it's called Illustrator it's used for everything from actual
illustration to iconography, to web/app design to even basic type layout.

~~~
agumonkey
I kinda know that, spent extreme amounts of time on everything graphical be it
2D, 3D, animated, synthesis, procedural generation. And 2D work is not really
tech related; it's closer to an old art. Kinda like logos, it's hard to
express a logical relationship in a composition. A few details away you go
from amazing to lame.

------
sixQuarks
What was up with the "apple" photo?

~~~
GordonS
All 3 looked the same to me, but I guessed it was because I'm colour blind.
It's not just that then?

~~~
stpe
The colors are shifted in the output. red, yellow, green => yellow, green,
red.

------
rl3
Someone should create an entire film using this technique. Shoot two films in
parallel with similar scenes, then see what happens when they're blended.

Give it a creepy and/or surrealist plot so the ethereal-looking output suits
the film. Perhaps the visuals could be the result of viewing the world through
a robot or AI with imperfect cognition. Would be an interesting twist on the
old unreliable narrator trope.

------
johndough
Does anyone know how long it takes per image? The only piece of information I
could find was that the run script uses 8 GPUs, which suggests that it takes a
while.

~~~
jjcm
Probably a significantly long time. That said, I do wonder if you need to
actually run this at the resolution of the output image. Since really this is
just changing the tones in the image and not altering the details, you could
probably optimize the algorithm heavily such that it works for high resolution
images quickly.

As an example, I used frequency separation to split the detail layer from the
tone layer in the original, high resolution stock photo of SF. From there I
took the lower res (25% size) output from this script and used it as my tone
layer. The results are OK:
[http://i.imgur.com/oakLUiE.jpg](http://i.imgur.com/oakLUiE.jpg)

It has the same overall tones, and some of the sharpness is preserved at a
high resolution. My 30 second approach suffers from some edge glow, but I'm
sure it could be greatly improved in an efficient, automated way.

------
tabeth
Wow, this is ridiculously impressive. I wonder what the audio equivalent of
this would be.

Anyone know how long this takes per image?

~~~
visarga
> I wonder what the audio equivalent of this would be.

Voice transfer. Imagine your words in the voice of any celeb.

~~~
zild3d
or timbre transfer in general. Drum recording on a cheap beginner drumset ->
high end Tama kit

------
veli_joza
Some impressive results! I would like to see some heavily stylized examples
like achieving Sin City and Scanner Darkly visuals.

Was it really necessary to involve Matlab, Python and Lua?

------
jtraffic
I wondered what was new about this particular implementation (since I've seen
several others, notably the one this code is partially based on, from Justin
Johnson). From the paper:

"Our contribution is to constrain the transformation from the input to the
output to be locally affine in colorspace, and to express this constraint as a
custom CNN layer through which we can backpropagate. We show that this
approach successfully suppresses distortion and yields satisfying
photorealistic style transfers in a broad variety of scenarios, including
transfer of the time of day, weather, season, and artistic edits"

I was skeptical at first (even posted then deleted a sort of negative
comment), but now that I read this, I see the value. The images _are_ much
more crisp and distortion free.

------
VMG
Premium Instagram filters as-a-service?

------
thecopy
Really cool! This could prove useful for interior designing.

------
hayksaakian
now if you implemented this as a web app, i'd be sharing it with everybody

really cool samples, would be cool to upload my own photos and run it through
this

~~~
Joeboy
There's [http://neuralstyle.com/](http://neuralstyle.com/) , based on
different code that does a similar thing . It's quite GPU intensive so it
takes a while and I think the free options are limited.

~~~
Sunset
I can donate free time on a GeForce 1060 6GB and a 12core XEON. If anyone
wants to make a web frontend for the code in the linked repo.

~~~
sho12121
I'd like to spend a bit of time on this and try it out! Write me. Would need a
MatLab license to use on the server.

~~~
lucidrains
Ugh Matlab. Someone needs to rewrite it in Julia

------
mistercow
Jeez, how many style transfer papers have there been in the last year? It's
awesome, but what an odd thing to become its own subfield.

------
pseudobry
For anyone interested in a sci-fi book about how a society filled with this
kind of technology might work, I recommend reading Rainbow's End by Vernor
Vinge.

[https://en.wikipedia.org/wiki/Rainbows_End](https://en.wikipedia.org/wiki/Rainbows_End)

------
gtani
That's funny, i was just going over OS X makefile with somebody yesterday and
i came here to look for the thunderbolt external GPU adapter.

The repo author is super responsive if you're RAM constrained:

[https://www.reddit.com/r/CUDA/comments/61gpzd/make_error_whe...](https://www.reddit.com/r/CUDA/comments/61gpzd/make_error_when_trying_to_compile_cu_file_on_osx/)

and [https://github.com/luanfujun/deep-photo-
styletransfer/issues...](https://github.com/luanfujun/deep-photo-
styletransfer/issues/5)

________________

similar project [https://github.com/jcjohnson/neural-
style](https://github.com/jcjohnson/neural-style)

~~~
gtani
oh, that similar project repo seems to be the inspiration/genesis of this
thread's repo.

Also that repo's author did the Stanford lecture comparing theano, tensorflow,
torch which is very worth digging out of the various archives

------
wyldfire
Wow -- lua, Python, matlab and CUDA.

I see now that the image segmentation is probably the key element to getting
these stunning results. Other style transfers I've seen took abstract elements
from the donor picture, but this really captures the donor and transforms the
recipient significantly.

------
bedros
awesome, wish there's no matlab requirement; can the matlab code be converted
to python?

~~~
johndough
Python is Turing Complete, so yes.

~~~
prashnts
While correct, I think GP is more interested in knowing if there're sufficient
libraries available in Python so porting MATLAB code to Python is easy.

Looking at the source, almost everything in MATLAB can be ported though.

------
guepe
Funny I saw a very similar attempt at Fitchburg Art Museum (MA) over the week-
end. I seem to recall this was an MIT-project: is this related in some way ?
It was coming with a rather complex (at least at first sight) interface
allowing quite a bit of operations/transformations, which does not appear
here, so this might be a separate attempt.

Looks like this thing is "in the air".

------
hamilyon2
This is where neural network-generated images might start feeding into other
neural networks. Imagine a limited dataset of tagged pictures and a vast
number of styles. We could on-demand generate permutations and train network
that recognize them much more accurately than they would be able to do with
original limited dataset

------
boxcardavin
This is super fun and impressive, I'm going to get it going on my machine and
start playing immediately.

~~~
starik36
A write-up on how to get it going would be appreciated.

~~~
taherchhabra
+1

------
double051
Those results look amazing! I'm curious how well it holds up at higher
resolutions and closer inspection.

Also, there doesn't seem to be any mention of the distribution license for the
project. Would any of the maintainers be able to add a license to the
repository? Thanks!

~~~
oarfish
there is an issue discussing that; the problem is that the work was done at
adobe labs, so they will determine the license.

------
Cofike
Holy cow, this is amazing. The results are way better than I would ever expect
as possible.

------
markab21
It seems this could have application in conceptual interior design. Think room
makeovers.

------
folli
Anyone's got a link to a good write-up on how such a style transfer works on a
high level?

------
kowdermeister
Now imagine what we could do if it were computed in real time, 60FPS, 100x
resolution/detail, embeded in AR or contact lenses.

Yep, sci-fi. But I often imagine that sometime soon I'll have a server farm in
my pocket (I know I already has via the cloud, but it's a whole new game if
you can do the computation on-site, low latency)

------
a7958479
A ball

------
drodil
So cool :)

------
draw_down
This is so cool, such an interesting and great idea. Really impressive and
well-done.

