Hacker News new | past | comments | ask | show | jobs | submit login
Deep Photo Style Transfer (github.com/luanfujun)
1088 points by mortenjorck on Mar 27, 2017 | hide | past | favorite | 173 comments

This is super impressive and something that I didn't think would be possible without someone very skilled in photoshop going over the images.

As a photo enthusiast, I am very excited about this, but also a little worried that soon very simple apps are capable of doing the craziest of edits through the power of neural nets. Imagine the next 'deep beauty transfer', able to copy perfect skin from a model onto everyone, making everything a little more fake and less genuine.

The engineer in me now wants to understand how to build something like this from scratch but I think I'm probably lacking the math skills necessary.

Here is an (in progress) article I am working on that might help you: https://harishnarayanan.org/writing/artistic-style-transfer/

Repository of explanatory notebooks: https://github.com/hnarayanan/artistic-style-transfer

This is excellent blogpost, thanks!

Edit: most comprehensive, best illustration, so far I have seen on this topic.

I think you need a little more information on the Gram matrix (maybe even ditch some of the elementary "this is gradient descent" stuff and just assume the reader knows a bit more about convnets so that you can dive deeper into the style transfer specifics -- there are plenty of other sources that cover the former already).

Coming soon! This got way more popular than I expected well before it was finished.

You need to make a correction. The map isnt linear with the addition of the b vector. Its an offset linear map.

I predict cameras with some built in style transfer filters.

This specific blog post made me realize how great some of the pictures look after the edits on colors.

Yes thank you for the proper name!

Oh sweet, thanks! I'm definitely bookmarking this

This is awesome, thanks!

> I didn't think would be possible without someone very skilled in photoshop going over the images

And this is similar how deep learning will likely erode the need for programming (IMHO).

Deep learning won't necessarily write programs (any more than this AI manipulates images via photoshop). Folks that say, don't worry, we can't write programs easily with AI are missing the vector. Writing programs isn't necessary for there to be widespread disruption.

Most programming is essentially hooking up I/O (of which UIs are a subset) to APIs, data stores and data manipulation. The "goal" of programming is not the code, but the functionality it provides.

AI's don't need to learn to code any more than they need to learn to use photoshop. They need to learn to provide functionality (or in this case manipulate image data).

> "AI's don't need to learn to code any more than they need to learn to use photoshop. They need to learn to provide functionality (or in this case manipulate image data)."

This is interesting. My counterpoint would be that if you rely on AI over programs you lose human-editability and determinism. So fixing a bug or adding a new feature might mean diving into some opaque model rather than adding a few lines of code. You couldn't do anything where consistency is important, like security, manipulating a database with important information, or GUI design. I think that at least protects large swaths of software development.

Even this example seems less like a replacement for Photoshop and more like a cool new feature Photoshop could add

In the real world we rely on humans for lots of stuff, and humans aren't actually deterministic either. Sure, if you train someone to perform a task they'll probably do a good job, or they might suddenly come into work distracted and cause a problem. Diagnosing problems with people is often similarly hard, and we've had all of civilization to work on it.

This hasn't caused the sky to fall, yet. So, perhaps we'll just learn to make AI behave properly under most circumstances, and deal with failures and glitches as we always have with people.

_Coding_ is more about understanding what your boss/clients want and turning it into something more concrete, so it's merely a NLP problem. This will, I think, see adoption in "app/website builder" tools like Squarespace.

Then there is real programming, which IMHO will get automated in the far future.

Its a NLP problem if your boss/client wants to talk with you. At the end of the day, they don't really "want" to TALK WITH YOU, they want the functionality they get as a result of talking with you.

If there are other ways for them to efficiently get the functionality, they are good with that (as much as they might like you).

Similar to you wanting a pizza. You could call and talk with someone (which you don't really want, it was a necessary step), or you fill out the right form/app. Either way, you want the result, not the process.

Your boss/client wants the result of your work, not the work process required to get it necessarily.

It seems likely to me that modern deep learning enabled tools will make it easier for your boss/client to get the result they want directly.

Deep learning + more graphically oriented data flow UIs seems like it will heavily erode the need for traditional programming as users will be able to more directly achieve the functionality they are looking for.

The are many scenarios where we still need provable code/security, i.e. health and safety related matters.

Not sure I would fully entrust a trained AI to control even an elevator door, where failure could result in bodily harm.

The planning an AI could take over from existing elevator controllers already uses constrained access to the motors. Nothing about AI demands stupid system design.

No, but AIs will override all elevator scheduling code. It just need to keep tuning all the knobs until everyone gets to their floor as fast as possible.

Please don't build an elevator whose ai is instructed to get people to their floor "_as fast_ as possible"

Someone still has to program the AI for the forseeable future.

If you're writing machine learning/intelligence, please just consider how to do so without condemning us to a dystopian future.

or if it's a dystopian one, at least make it a cool dystopian one.

So while maximising paperclip production, it should also manipulate the stock market to bring down the price of black leather pants?

no, don't, please. programming is a means to an end.

programming is fun the same way long division by hand is fun.

i can't wait for the robots to release me from the monotony.

Not everyone would agree with you :) Although I think even most of us who enjoy programming would be happy to have some form of automation as an option.

I suspect it's not going to be long until there's a rather creepy "nudity from source image" generator.

There's your killer app.

Feed it a million or so porn images of people with all kinds of different body types. Then have it guess the closest match. Finally run this. Presto change-o! It's those x-rays glasses kids everywhere wanted for years.

I could easily imagine it being done live-motion and 3-d. Run the whole thing on a set of AR glasses.

Or morph a face onto existing footage. I'm surprised this doesn't more obviously exist already, although I guess I haven't been actively looking for it.

The tech is there to do a rough approximation of a dozen combinations of this. I could imagine an intermediate step where a wire-frame mesh is constructed around the image. As I understand (And I know nothing of this stuff), there was already an app to take a picture of breasts and jiggle them.

I think it's just that nobody has put all of the pieces together yet. (Or if they have, the mainstream media hasn't heard about it)

Amazing how technology comes together, this seems like the type of thing people were dreaming about just a mere 10 years ago.

It's all fun and games until someone feeds it pictures of children.

Now you have an illegal information firehose.

I can't help but think that DiscoGAN (https://arxiv.org/abs/1703.05192 "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks", Kim et al 2017) would be perfect for this. Simply feed it a ton of photos of regular clothed people and other naked people (don't have to be the same people), and it'll learn a mapping on its own. Scale that up, refine it for a few years...

Discogan is amazing and the name, lol

Cue John Travolta?

I'm okay with creepy. Let me know when this comes out if you find it first!

Looks like I can't edit this to avoid gaining more down votes. I'm not sure if I'm losing points for asking to be informed of technological advancements or that I'm okay with being creepy. Any insight is appreciated.

It might not be obvious, but there are actual people whose lives are impacted negatively by creeps who will go to any length to get their fix.

Your comment would have been fine without the "I'm okay with creepy" part.

> able to copy perfect skin from a model onto everyone, making everything a little more fake and less genuine

Everybody will do it for a couple of years, then it will get old and people will have to think of something original. Just like autotune.

Only we still have autotune everywhere.

And 20+ years on, all cover shots etc are still photoshopped.

Yes but autotune is a style choice for artists. It's a not something every musician uses. It hasn't taken over music, and it's not like anyone can become a musician because autotune exists.

Disclaimer: I'm not a music producer, and my ear isn't that great, but I did write software for one of the major pitch-correction vendors for many years, and I think I have a better-than-average ear for it after listening to it for many years.

Pitch correction is something which is used on many, many professionally-produced tracks, and often without the knowledge or consent of the performers. Whether you can hear it or not is a stylistic choice (provided adequate skill from the production team: see [1]). But just because the pitch correction isn't in your face, T-Pain- or Cher-style, doesn't mean it isn't there. The software is better than that, and in the right hands, it just makes people sound more skilled than they are, and you can't hear it.

Producers generally are pretty quiet about where they use it to mask blemishes in the performance, probably because they don't want to embarrass anyone. But the producers we sold to would certainly say how much they used it, without naming artists or tracks.

[1] http://productionadvice.co.uk/aretha-autotune/

I believe I hear pitch correction whenever it's used. Do you have an example where it is used and I would struggle to hear it?

I was involved in the recording and production for a top 40 producer, and can confirm that there was autotune on every single vocal track that left the studio.

Here are a few that I was in the room when the artist was recording, and can confirm pitch correction:




It would've been much better if you posted 6 links, 3 with and 3 without autotune. See if people can figure out which is which.

The first one has that metallic sound that is a dead give away. First falsetto is quieter than the second one, and you can really hear as he increases the loudness of his voice, the metallic kicks in https://youtu.be/450p7goxZqg?t=1m27s

Second one has a "Cher moment" almost straight away, just after "wandering the desert a thousand days" the following "mmmm" has a glissando between two notes where we clearly hear the hard edge on what I assume is an auto tune lookahead. I don't actually know how they work, I just assume there's a lookahead for the next note approximation which makes glissandos sound funny. https://youtu.be/M8uPvX2te0I?t=31s

The last one I can't really fault for too much autotune, more a lack of it. The bridge is especially intense https://youtu.be/E0oyglKjbFQ?t=1m51s

Say the singer loses the pitch slightly for half a second on a held note. If that fluctuation is corrected, what auditory information could be left for you to detect the modification?

I believe I hear pitch correction when it's obviously used, and it's a lot. Pretty much most of the "top forty" pop pablum from the last 20 years. I believe there is pitch correction that I don't spot: the "dark matter" of pitch correction that is done less cheesily.

The worst of it sounds almost like packet loss concealment in a G.722 voice stream: the sustained part of a vocal note basically sounding synthesized.

>Yes but autotune is a style choice for artists. It's a not something every musician uses. It hasn't taken over music, and it's not like anyone can become a musician because autotune exists.

You'd be surprised on both fronts :-)

On the first because autotune is prevalent regardless of genre and style choices (even in rock, country, etc). It's just the Cher/T-pain effect that has been toned done, but autotune is very much in use in the industry for vocal correction.

On the second, because almost any crap singer pop idol with nice looks can pretend to be in tune and pass out bearable results because of autotune.

And it's awesome. Music shouldn't be singing olympics; self-expression and original ideas are worth more than technical skill.

We disagree on that, and that's OK. To me there is something special about a live performance. Even more so when it's challenging for the performer. When a signer demonstrates range, or a musician displays technical excellence or provides emotional depth through expression it adds a LOT in my opinion. Knowing this is all faked in recorded music takes something away from it.

Ditto for photography. To take an image and retouch it, or to artificially saturate colors can make a great picture. But with a raw photo it's even more interesting to think that scene actually existed and someone captured it for us to look at.

In either case, I can enjoy the work but will only be impressed if I know that it's authentic. This is more true than ever today.

Holography, you can't currently fake that.

But then again, where do you draw this line for what is authentic and isn't?

In music are you allowed to use amplifiers and speakers? They can add a lot of color and distortion. How about reverb? Rooms that aren't there. EQ to remove unwanted frequencies? Synthesizers? Digital effects? At what point is it not authentic anymore?

Same for photography. Are you allowed to touch the aperture? ISO? Shutter speed? Flash? Digital camera? At what point is it not authentic?

The thing is, the subject of a photo, the scene and its subjects, are usually not the artists. The photographer is the artist. Photography is processing from the get-go: how the film or CCD responds to light and so on. The grain from low-light on sensitive film can be part of the art and so on.

If you mess with the colors of a scene, you're not taking away artistic control from that scene.

You also don't put limitations on the post processing art; you're not doing it to fool anyone.

There is post processing in music that is obvious art in an analogous way, like taking sampled sounds and re-mixing them to create new stuff. There are effects that are obviously effects. I'm not going to scoff at a great studio reverb, or some echo applied to a vocal or whatever. Nobody is saying that this was recorded in some fjord in Sweden with real echo bouncing off a distant ice wall; there is no lie.

In that case, we are just transferring artistic control from one human into another. In the past recording audio had fairly little artistic control and the subjects of the recording most of the control. Now with better audio manipulation software the person doing the recording has artistic tools at their disposal. They are the 'photographers' of the scene, while the singer is the subject.

That's right, and the subject might as well be a dog, or any other audio signal source, just like anything that reflects light can be a suitable one for the photographer's creative process. Cute puppies; water lilies; sunsets ...

The thing is, I somehow don't hear the studio's creative input either when I hear the latest auto-tuned Fido or Bowser. They're just applying some automatic something that's supposed to make the dog sound like a more able dog.

This is like when people just batch apply the same color enhancement and sharpening of their Florida vacation pictures. I've seen one instance, I've seen them all.

This is where auto-tune actually can fall. Singing is not exactly all about hitting the "in equal temperament tune" note all the time.

Take the fantastic singer with great technical skill. Most pitch correction algorithms, as far as I know, are strictly based on equal temperament / 12TET. Fantastic singers are capable of hitting the right harmonics, some of which are not 12TET. Fantastic singers slide into notes, they use vibrato, they add "blue notes" (https://en.wikipedia.org/wiki/Blue_note). If you over-apply pitch correction, in other words, you could easily make a fantastic singer sound worse.

Let's also take the singer who is technically a bit pitch deficient, but has "character" that makes up for it. You don't want to make this type of singer too in-tune, either. Too much tuning might remove the "character".

I understand in the industry there are some engineers that are good enough to selectively apply auto-tune, to only fix obvious issues, and avoid the pitfalls. There are also some productions that just apply auto-tune to everything with no consideration of the content. The later will probably work for glossy pop productions, but if I was a really good singer (or a singer with "character") I probably wouldn't like the results.

That's not true, melodyne is capable of arbitrary tunings, and even has a feature where you can create custom scales.

Thanks for the information, that's one product I'm not too familiar with. I've demoed the Antares product and a couple of freebies. (It seems like there is a couple of newer plugins out there as well, eg Synchro Arts Revoice Pro).

The problem is, I'm not sure though that even a pure alternate tuning can work though for all examples. EG: for blue notes, what is "correct" varies with performers and style.

Now, I'm more talking about the "automatic modes"; I understand Melodyne offers a pretty impressive level of editing control (Antares did too from what I remember). So it would certainly be possible to get a really great take, and then hand-correct any truly off notes to whatever frequency you wanted.

As in many things (see: Photoshop and model photos), a lot of the reaction to the tool is less on how it could be used, and more on how it is being used in glossy "crap singer pop idol" productions.

>Music shouldn't be singing olympics; self-expression and original ideas are worth more than technical skill.

Only it's about neither, but more about hot bodies and rich marketing campaigns.

Depends on what kind of music you're listening to

I'm sure a lot of the music you hear is using autotune without you realizing it. It's not some fad.

I'm sure all of it is.

One interesting thing that may help the final output quality is preserving the detail layer of the original image, and then applying that to the output image. Here's my quick attempt at it: http://dev.jjcm.org/tonetransfer/

I basically just used a technique called frequency separation - it's extremely quick with the most computational part being a gaussian blur, and it allows you to separate detail from tone into two separate layers. From there I just took the detail layer of the original image and applied it to the tone layer of the output image.

Their implementation appears to lose a lot of detail in many examples. This is especially true on human faces or pictures that include human faces. Can you reproduce an improvement there?

I didn't see any examples of human faces, can you link me to one? Happy to try it out and see the results.

That's pretty good, a nice improvement. Frequency separation is one of the most common retouching techniques, but works wonders in this case as well.

I really like decomposing an image using wavelets (more than 2 layers, in general I use 7). You can make some good corrections on skin imperfections in the middle layers, keepin it natural (high frequencies untouched).

I'm thinking how it would be if I could you your method (or the OP) on the lower layers, which mainly control colors...

I am a professional photo editor for a major magazine. Despite what the title sounds like, I spend much of my day sending photographers on assignment and sourcing images rather than manipulating files. I often wonder how my career will evolve in ten years; the writing is on the wall for my job, given the state of the media market.

Looking at this, I now see a future as a forensic imaging specialist. At some point, the algorithm will get pretty damn good, and it will be my job to look for tells -- cracks in the generated image where reality doesn't quite line up. The question will be whether I am seeking out these abnormalities to cover them up or to call them out.

you may need to train some powerful discriminator networks to do that job instead of relying on the one in your head soon..

"forensic imaging specialists" already exist in modern GAN networks and are called discriminators. They examine how likely is it that they would be from a human and then provide feedback to the generator network on what a human would do differently.

Very interesting. I work in AI. Curious about trends in your industry. Please ping me at the email address in my profile.

It almost causes me goosebumps to think about how if someone asked me "imagine this clear sky, but with this red sunset over here" I could very plausibly come up with a similar result as shown in these examples.

The transfer of the flame to the blue perfume bottle looks like a very practical way to prototype marketing images.

Yes! I imagine being like, "Damn, I always wanted to see Venice when it was snowing, but alas it was sunny during my short visit."

Unfortunately I don't think this will work with snow. It changes the colors of the objects and the snow too will be changed. That being said, being able to change the sky color and building colors looks so amazing.

Or even better, snow in a tropical country, which can never happen in real life.

Sardinia isn't a tropical locale by a long shot

Italy isn't tropical.

This is really cool, but the only thing I can think right now, the question that's eating away at my soul, is: why did it color-cycle the apples?

There are also semantic masks applied (can be found in the github repo), I think that's how the Colorado swapping is done.

Ah, I see. They're right here:

Input mask: https://github.com/luanfujun/deep-photo-styletransfer/blob/m... Output mask: https://github.com/luanfujun/deep-photo-styletransfer/blob/m...

So, presumably these masks were drawn manually. The three different colors in the input mask tell it to train 3 different models from the masked regions, and then the same three colors in the output mask tell it where to apply each trained model. So it trains a model for each apple and then applies that model to the next apple over. It's not just randomly swapping colors on the apples.

I'm afraid the answer, like the answers to so many other soul-eating questions, is that it's random.

Wow. Our kids may be able to walk around with glasses and see the world styled in real-time as they please - perpetual sunshine, gloom, night, whatever..

Rose tinted glasses to shelter them from the legacy of a devastated planet. How about the Black Mirror episode Men Against Fire for a real dystopian vision of this future.

The excellent short film MORE by Mark Osborne is about similar reality-editing glasses: https://www.youtube.com/watch?v=cCeeTfsm8bk

I actually really hope this will turn out to be true.

I'm unfamiliar with the processing power required to run this on a single image, let alone a video feed. Does anyone have any back of the napkin estimations that might show just how far away AR glasses are from this?

It's effectively doable now. Facebook has already shipped a realtime smartphone style transfer app using Caffe2go (no word on how much it hammers battery life, though). That was for a fixed style transfer network, but a variable style transfer only needs to tack on an additional smaller network to process the style image as well as the input, so <2x as much computation/model size.

Yes. Go look at the thread starting here: https://news.ycombinator.com/item?id=13949385

and a figure of 20+ years comes up, assuming that you want it to be truly indistinguishable from reality.

It could be sooner with major algorithmical improvements, and the display technology itself could be 5-10 years out.

Thanks for the link. This makes me excited for the future.

>I actually really hope this will turn out to be true.

Yeah, screw reality.

Reality is nice. Being able to augment it in various ways would also be nice, and has a ton of applications across a ton of fields.

AR games and hollywood filters are the obvious go-tos, but I wouldn't be surprised to see seemingly random industries like therapy see benefits (a few applications that come to mind are brightening up the world to combat SAD, tinting views to ease aggression, or even Black Mirror-esque "blocking" of people who upset or anger you IRL if you want to trend towards dystopian tropes).

Personally, I just want to up the world's saturation to max and chill with rainbows and bright colors while I'm otherwise productive. Seems like having fun filters would (or could) make life mildly better for many people going about their usual day.

>Personally, I just want to up the world's saturation to max and chill with rainbows and bright colors while I'm otherwise productive

That would mostly lead to overstimulation and desensitization. People get bored with everything they have immediate and easy access to.

Of course, doing anything in moderation is obviously important.

As long as it were an optional thing (i.e. not forced on people like the traditional dystopian take), I don't see how letting people choose to change how they see the world would be a guaranteed negative outcome. Especially since that's pretty much literally the advice in every self-help book: look at things from a new perspective, get a new perspective on life, surround yourself with positive things, go where the scenery makes you most happy, be more cheerful, fake it 'til you make it, etc.

Obviously it could be abused by some (overstimulation/desensitization don't sound so bad compared to alternatives), but I imagine overall it'd be a net positive on the lives of more.

I guess we'll just have to wait and see!

We can't see reality, we can only see a very limited representation of a small slice of reality. Why is that particular representation any better than others?

Found the philosopher lol

Oh, no, it's a physical fact, that I'm reminded of every time I put on my glasses; in a very real way, my vision is already being mediated by technology.

in some way, we are just piecing together representations of each other from the photons that fly into our eyes. so maybe 'reality' is only as good as these photons that we receive anyhow.

If the computing power doesn't fit in a headgear, you could just put a massive computer (e.g. iPhone size) in your pocket and have it transmit the images using WiFi/Bluetooth to your glasses.

I think it would be amazing if Adobe incorporated some of these projects (Neural Style, etc.) as part of their "Creative Cloud" offerings...actually compute it in the cloud and return the result back to Photoshop.

I don't know whether you noticed it, but the paper reveals it is co-authored by people from Adobe. There very well might be intentions towards that later on.

That might take years

Adobe moves pretty fast when they want to. Take content aware fill for example. It went from a presentation at ACM SIGGRAPH to Adobe hiring some of the people who wrote the paper to a working feature in Photoshop in about a year.

@dagwTrue @coldtea True However the technology is not ready yet. Takes too long to render* and needs too much VRAM (=> limited resolution, varying on the current GPU).

But a consumer-grade plugin with an online renderer might be feasible for now.

* Fast solutions include a pre-training step that can take days. Would be weird to ask the user to do this

> Would be weird to ask the user to do this

Alternatives to training include downloading the trained model. It could be a big download but an overnight download is not that weird to ask users to do.

> Alternatives to training include downloading the trained model. It could be a big download but an overnight download is not that weird to ask users to do. Or keep the pretrained network on superpowerful cloud machines.

Adobe has incorporated research stuff they've shown to the public as mere papers/demos into PS in just 1-2 versions several times.

That would be great. I was actually thinking of coding a plugin like that

code one for darktable or gimp.

notice how the lighting in the style-transferred images isn't physically plausible according to the target images. For example, in the 5th example the house is still lit as in the original, as if by sunlight not by spotlights. Maybe that's why they've chosen to showcase examples with flat or ambiguous lighting (like the nighttime scenes or the autumn scene). The DNN doesn't model the physical reality of the scene, it doesn't get that it's a 3d world, it simply transports a high-dimensional vector (the 2d image) from one space to another. What our imaginations do is map that 2d image back into 3d before transforming it.

And the city images that are styled into night were originally taken near dusk so the lights inside the buildings were already on but not very pronounced. It will not take a full daylight scene and decide where to turn the lights on at night.

I feel overwhelmed by the domain; as an ex photoshop addict, it's already above "complex job" level. I wonder if people feel bad about AI "stealing their hobs" (not a joke).

I feel Photoshop/Illustrator and 2D design in general is one of the only industries that has been pretty much stagnant for 15 years or so due to monopolies.

Compare it to 3D where artists are drowning in new tools and techniques every year, any movement in this space is a good thing.

Worth remembering there was a time when the heal brush didn't exist, a time when the clone brush didn't exist and even a time where the physical airbrush didn't exist.

Illustration doesn't need much tech. It has more to do with human emotions and symbols.

FYI although it's called Illustrator it's used for everything from actual illustration to iconography, to web/app design to even basic type layout.

I kinda know that, spent extreme amounts of time on everything graphical be it 2D, 3D, animated, synthesis, procedural generation. And 2D work is not really tech related; it's closer to an old art. Kinda like logos, it's hard to express a logical relationship in a composition. A few details away you go from amazing to lame.

If they feel this way, they could try to learn more about this new world. It is fun and opens up so many possibilities!

Here is an (in progress) explanation I am working on aimed at beginners that might help: https://harishnarayanan.org/writing/artistic-style-transfer/

Nice work! Definitely bookmarking.

the paper is co-authored by adobe folks

What was up with the "apple" photo?

The paper states that it shows swapping of texture between apples by using a manual segmentation input. A manual mask is also used to transfer the fireball texture to the perfume bottle.

Presumably it is a demonstration of the network extracting the texture of an image, ignoring "spatiality", such that when a style/texture transfer is applied on one and the same image, the texture may very well order up differently than on the original image.

More specifically, ignoring spatiality is the effect of matching the response of the covariance matrix of the feature vectors, which was the key insight in the original style transfer paper by Gatys.

All 3 looked the same to me, but I guessed it was because I'm colour blind. It's not just that then?

The colors are shifted in the output. red, yellow, green => yellow, green, red.

As in: the input and reference have the same color order, which makes it weird that the output has a different one.

I think the input and the style are the same photo, which is odd. But if you look at the output; you can see that even though the colors have switched, the patterns and shapes have stayed the same. The middle apple has dots on it, but it is now green. The right apple is kinda spotty, but it is now red.

If you look at the URLs, most of them have the input in the 'examples/input/' dir, and the reference in the 'examples/style/' dir. The apple reference is in the 'examples/input/' dir. Perhaps that really does mean that the input and style were identical, and they deliberately re-used the same image URL, but to me it feels like a copy-and-paste error :)

The "style" image (tar23.png) is a symlink to the input image[0], so I guess it's intentional. I'm still not sure why that ended up swapping the colors in the output, though.

[0] https://github.com/luanfujun/deep-photo-styletransfer/tree/m...

Someone should create an entire film using this technique. Shoot two films in parallel with similar scenes, then see what happens when they're blended.

Give it a creepy and/or surrealist plot so the ethereal-looking output suits the film. Perhaps the visuals could be the result of viewing the world through a robot or AI with imperfect cognition. Would be an interesting twist on the old unreliable narrator trope.

Does anyone know how long it takes per image? The only piece of information I could find was that the run script uses 8 GPUs, which suggests that it takes a while.

Probably a significantly long time. That said, I do wonder if you need to actually run this at the resolution of the output image. Since really this is just changing the tones in the image and not altering the details, you could probably optimize the algorithm heavily such that it works for high resolution images quickly.

As an example, I used frequency separation to split the detail layer from the tone layer in the original, high resolution stock photo of SF. From there I took the lower res (25% size) output from this script and used it as my tone layer. The results are OK: http://i.imgur.com/oakLUiE.jpg

It has the same overall tones, and some of the sharpness is preserved at a high resolution. My 30 second approach suffers from some edge glow, but I'm sure it could be greatly improved in an efficient, automated way.

With the linked implementation (there is at least 1 fork rewriting the matlab bits in python) the three steps needed per image amount to probably over an hour on one k80. However, you probably need the -backend cudnn and -cudnn_autotune flags not to run out of memory. Downscaling the images from currently 700 to say 500 significantly speeds up the process as well. Still, it definitely takes longer than neural style.

Wow, this is ridiculously impressive. I wonder what the audio equivalent of this would be.

Anyone know how long this takes per image?

> I wonder what the audio equivalent of this would be.

Voice transfer. Imagine your words in the voice of any celeb.

or timbre transfer in general. Drum recording on a cheap beginner drumset -> high end Tama kit

Same rhythm with a different instrument? The problem is that audio is very dense for this (now)

Some impressive results! I would like to see some heavily stylized examples like achieving Sin City and Scanner Darkly visuals.

Was it really necessary to involve Matlab, Python and Lua?

I wondered what was new about this particular implementation (since I've seen several others, notably the one this code is partially based on, from Justin Johnson). From the paper:

"Our contribution is to constrain the transformation from the input to the output to be locally affine in colorspace, and to express this constraint as a custom CNN layer through which we can backpropagate. We show that this approach successfully suppresses distortion and yields satisfying photorealistic style transfers in a broad variety of scenarios, including transfer of the time of day, weather, season, and artistic edits"

I was skeptical at first (even posted then deleted a sort of negative comment), but now that I read this, I see the value. The images are much more crisp and distortion free.

Premium Instagram filters as-a-service?

Really cool! This could prove useful for interior designing.

now if you implemented this as a web app, i'd be sharing it with everybody

really cool samples, would be cool to upload my own photos and run it through this

There's http://neuralstyle.com/ , based on different code that does a similar thing . It's quite GPU intensive so it takes a while and I think the free options are limited.

I can donate free time on a GeForce 1060 6GB and a 12core XEON. If anyone wants to make a web frontend for the code in the linked repo.

I'd like to spend a bit of time on this and try it out! Write me. Would need a MatLab license to use on the server.

Ugh Matlab. Someone needs to rewrite it in Julia

Ping me too, I'd be happy to help build whatever

Can all this be stuck in a docker image?

Same question here – is it possible via docker and without CUDA?

Well, does docker even do CUDA? I wanted to have it run in an isolated container of sorts. Because I still do use the same machine for other stuff.

Jeez, how many style transfer papers have there been in the last year? It's awesome, but what an odd thing to become its own subfield.

For anyone interested in a sci-fi book about how a society filled with this kind of technology might work, I recommend reading Rainbow's End by Vernor Vinge.


That's funny, i was just going over OS X makefile with somebody yesterday and i came here to look for the thunderbolt external GPU adapter.

The repo author is super responsive if you're RAM constrained:


and https://github.com/luanfujun/deep-photo-styletransfer/issues...


similar project https://github.com/jcjohnson/neural-style

oh, that similar project repo seems to be the inspiration/genesis of this thread's repo.

Also that repo's author did the Stanford lecture comparing theano, tensorflow, torch which is very worth digging out of the various archives

Wow -- lua, Python, matlab and CUDA.

I see now that the image segmentation is probably the key element to getting these stunning results. Other style transfers I've seen took abstract elements from the donor picture, but this really captures the donor and transforms the recipient significantly.

awesome, wish there's no matlab requirement; can the matlab code be converted to python?

I'm sure it can, but I personally am going to go for MXNet in Julia (which is syntactically similar to MatLab) if I find the time.


Python is Turing Complete, so yes.

While correct, I think GP is more interested in knowing if there're sufficient libraries available in Python so porting MATLAB code to Python is easy.

Looking at the source, almost everything in MATLAB can be ported though.

Someone did it already: martinbenson/deep-photo-styletransfer

Funny I saw a very similar attempt at Fitchburg Art Museum (MA) over the week-end. I seem to recall this was an MIT-project: is this related in some way ? It was coming with a rather complex (at least at first sight) interface allowing quite a bit of operations/transformations, which does not appear here, so this might be a separate attempt.

Looks like this thing is "in the air".

This is where neural network-generated images might start feeding into other neural networks. Imagine a limited dataset of tagged pictures and a vast number of styles. We could on-demand generate permutations and train network that recognize them much more accurately than they would be able to do with original limited dataset

This is super fun and impressive, I'm going to get it going on my machine and start playing immediately.

A write-up on how to get it going would be appreciated.


Don't get your hopes up just yet. It's not usable in its current state, unless you have Matlab (octave doesn't work, see closed+open Github issues). There are efforts under way to port the matlab code to python, but i haven't tried​ the fork yet. Also, processing a single image on a K80 takes several hours, no comparison to neural style. Likewise, generating the laplacians also takes a loooong time, at least with octave. Maybe this will be accelerated in an python version.

Those results look amazing! I'm curious how well it holds up at higher resolutions and closer inspection.

Also, there doesn't seem to be any mention of the distribution license for the project. Would any of the maintainers be able to add a license to the repository? Thanks!

there is an issue discussing that; the problem is that the work was done at adobe labs, so they will determine the license.

Holy cow, this is amazing. The results are way better than I would ever expect as possible.

It seems this could have application in conceptual interior design. Think room makeovers.

Anyone's got a link to a good write-up on how such a style transfer works on a high level?

Now imagine what we could do if it were computed in real time, 60FPS, 100x resolution/detail, embeded in AR or contact lenses.

Yep, sci-fi. But I often imagine that sometime soon I'll have a server farm in my pocket (I know I already has via the cloud, but it's a whole new game if you can do the computation on-site, low latency)

A ball

So cool :)

This is so cool, such an interesting and great idea. Really impressive and well-done.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact