As a photo enthusiast, I am very excited about this, but also a little worried that soon very simple apps are capable of doing the craziest of edits through the power of neural nets. Imagine the next 'deep beauty transfer', able to copy perfect skin from a model onto everyone, making everything a little more fake and less genuine.
The engineer in me now wants to understand how to build something like this from scratch but I think I'm probably lacking the math skills necessary.
Repository of explanatory notebooks: https://github.com/hnarayanan/artistic-style-transfer
Edit: most comprehensive, best illustration, so far I have seen on this topic.
This specific blog post made me realize how great some of the pictures look after the edits on colors.
And this is similar how deep learning will likely erode the need for programming (IMHO).
Deep learning won't necessarily write programs (any more than this AI manipulates images via photoshop). Folks that say, don't worry, we can't write programs easily with AI are missing the vector. Writing programs isn't necessary for there to be widespread disruption.
Most programming is essentially hooking up I/O (of which UIs are a subset) to APIs, data stores and data manipulation. The "goal" of programming is not the code, but the functionality it provides.
AI's don't need to learn to code any more than they need to learn to use photoshop. They need to learn to provide functionality (or in this case manipulate image data).
This is interesting. My counterpoint would be that if you rely on AI over programs you lose human-editability and determinism. So fixing a bug or adding a new feature might mean diving into some opaque model rather than adding a few lines of code. You couldn't do anything where consistency is important, like security, manipulating a database with important information, or GUI design. I think that at least protects large swaths of software development.
Even this example seems less like a replacement for Photoshop and more like a cool new feature Photoshop could add
This hasn't caused the sky to fall, yet. So, perhaps we'll just learn to make AI behave properly under most circumstances, and deal with failures and glitches as we always have with people.
Then there is real programming, which IMHO will get automated in the far future.
If there are other ways for them to efficiently get the functionality, they are good with that (as much as they might like you).
Similar to you wanting a pizza. You could call and talk with someone (which you don't really want, it was a necessary step), or you fill out the right form/app.
Either way, you want the result, not the process.
Your boss/client wants the result of your work, not the work process required to get it necessarily.
It seems likely to me that modern deep learning enabled tools will make it easier for your boss/client to get the result they want directly.
Deep learning + more graphically oriented data flow UIs seems like it will heavily erode the need for traditional programming as users will be able to more directly achieve the functionality they are looking for.
Not sure I would fully entrust a trained AI to control even an elevator door, where failure could result in bodily harm.
programming is fun the same way long division by hand is fun.
i can't wait for the robots to release me from the monotony.
Feed it a million or so porn images of people with all kinds of different body types. Then have it guess the closest match. Finally run this. Presto change-o! It's those x-rays glasses kids everywhere wanted for years.
I could easily imagine it being done live-motion and 3-d. Run the whole thing on a set of AR glasses.
I think it's just that nobody has put all of the pieces together yet. (Or if they have, the mainstream media hasn't heard about it)
Now you have an illegal information firehose.
Your comment would have been fine without the "I'm okay with creepy" part.
Everybody will do it for a couple of years, then it will get old and people will have to think of something original. Just like autotune.
And 20+ years on, all cover shots etc are still photoshopped.
Pitch correction is something which is used on many, many professionally-produced tracks, and often without the knowledge or consent of the performers. Whether you can hear it or not is a stylistic choice (provided adequate skill from the production team: see ). But just because the pitch correction isn't in your face, T-Pain- or Cher-style, doesn't mean it isn't there. The software is better than that, and in the right hands, it just makes people sound more skilled than they are, and you can't hear it.
Producers generally are pretty quiet about where they use it to mask blemishes in the performance, probably because they don't want to embarrass anyone. But the producers we sold to would certainly say how much they used it, without naming artists or tracks.
Here are a few that I was in the room when the artist was recording, and can confirm pitch correction:
Second one has a "Cher moment" almost straight away, just after "wandering the desert a thousand days" the following "mmmm" has a glissando between two notes where we clearly hear the hard edge on what I assume is an auto tune lookahead. I don't actually know how they work, I just assume there's a lookahead for the next note approximation which makes glissandos sound funny. https://youtu.be/M8uPvX2te0I?t=31s
The last one I can't really fault for too much autotune, more a lack of it. The bridge is especially intense https://youtu.be/E0oyglKjbFQ?t=1m51s
The worst of it sounds almost like packet loss concealment in a G.722 voice stream: the sustained part of a vocal note basically sounding synthesized.
You'd be surprised on both fronts :-)
On the first because autotune is prevalent regardless of genre and style choices (even in rock, country, etc). It's just the Cher/T-pain effect that has been toned done, but autotune is very much in use in the industry for vocal correction.
On the second, because almost any crap singer pop idol with nice looks can pretend to be in tune and pass out bearable results because of autotune.
Ditto for photography. To take an image and retouch it, or to artificially saturate colors can make a great picture. But with a raw photo it's even more interesting to think that scene actually existed and someone captured it for us to look at.
In either case, I can enjoy the work but will only be impressed if I know that it's authentic. This is more true than ever today.
Holography, you can't currently fake that.
In music are you allowed to use amplifiers and speakers? They can add a lot of color and distortion. How about reverb? Rooms that aren't there. EQ to remove unwanted frequencies? Synthesizers? Digital effects? At what point is it not authentic anymore?
Same for photography. Are you allowed to touch the aperture? ISO? Shutter speed? Flash? Digital camera? At what point is it not authentic?
If you mess with the colors of a scene, you're not taking away artistic control from that scene.
You also don't put limitations on the post processing art; you're not doing it to fool anyone.
There is post processing in music that is obvious art in an analogous way, like taking sampled sounds and re-mixing them to create new stuff. There are effects that are obviously effects. I'm not going to scoff at a great studio reverb, or some echo applied to a vocal or whatever. Nobody is saying that this was recorded in some fjord in Sweden with real echo bouncing off a distant ice wall; there is no lie.
The thing is, I somehow don't hear the studio's creative input either when I hear the latest auto-tuned Fido or Bowser. They're just applying some automatic something that's supposed to make the dog sound like a more able dog.
This is like when people just batch apply the same color enhancement and sharpening of their Florida vacation pictures. I've seen one instance, I've seen them all.
Take the fantastic singer with great technical skill. Most pitch correction algorithms, as far as I know, are strictly based on equal temperament / 12TET. Fantastic singers are capable of hitting the right harmonics, some of which are not 12TET. Fantastic singers slide into notes, they use vibrato, they add "blue notes" (https://en.wikipedia.org/wiki/Blue_note). If you over-apply pitch correction, in other words, you could easily make a fantastic singer sound worse.
Let's also take the singer who is technically a bit pitch deficient, but has "character" that makes up for it. You don't want to make this type of singer too in-tune, either. Too much tuning might remove the "character".
I understand in the industry there are some engineers that are good enough to selectively apply auto-tune, to only fix obvious issues, and avoid the pitfalls. There are also some productions that just apply auto-tune to everything with no consideration of the content. The later will probably work for glossy pop productions, but if I was a really good singer (or a singer with "character") I probably wouldn't like the results.
The problem is, I'm not sure though that even a pure alternate tuning can work though for all examples. EG: for blue notes, what is "correct" varies with performers and style.
Now, I'm more talking about the "automatic modes"; I understand Melodyne offers a pretty impressive level of editing control (Antares did too from what I remember). So it would certainly be possible to get a really great take, and then hand-correct any truly off notes to whatever frequency you wanted.
As in many things (see: Photoshop and model photos), a lot of the reaction to the tool is less on how it could be used, and more on how it is being used in glossy "crap singer pop idol" productions.
Only it's about neither, but more about hot bodies and rich marketing campaigns.
I basically just used a technique called frequency separation - it's extremely quick with the most computational part being a gaussian blur, and it allows you to separate detail from tone into two separate layers. From there I just took the detail layer of the original image and applied it to the tone layer of the output image.
I'm thinking how it would be if I could you your method (or the OP) on the lower layers, which mainly control colors...
Looking at this, I now see a future as a forensic imaging specialist. At some point, the algorithm will get pretty damn good, and it will be my job to look for tells -- cracks in the generated image where reality doesn't quite line up. The question will be whether I am seeking out these abnormalities to cover them up or to call them out.
The transfer of the flame to the blue perfume bottle looks like a very practical way to prototype marketing images.
Input mask: https://github.com/luanfujun/deep-photo-styletransfer/blob/m...
Output mask: https://github.com/luanfujun/deep-photo-styletransfer/blob/m...
So, presumably these masks were drawn manually. The three different colors in the input mask tell it to train 3 different models from the masked regions, and then the same three colors in the output mask tell it where to apply each trained model. So it trains a model for each apple and then applies that model to the next apple over. It's not just randomly swapping colors on the apples.
I'm unfamiliar with the processing power required to run this on a single image, let alone a video feed. Does anyone have any back of the napkin estimations that might show just how far away AR glasses are from this?
and a figure of 20+ years comes up, assuming that you want it to be truly indistinguishable from reality.
It could be sooner with major algorithmical improvements, and the display technology itself could be 5-10 years out.
Yeah, screw reality.
AR games and hollywood filters are the obvious go-tos, but I wouldn't be surprised to see seemingly random industries like therapy see benefits (a few applications that come to mind are brightening up the world to combat SAD, tinting views to ease aggression, or even Black Mirror-esque "blocking" of people who upset or anger you IRL if you want to trend towards dystopian tropes).
Personally, I just want to up the world's saturation to max and chill with rainbows and bright colors while I'm otherwise productive. Seems like having fun filters would (or could) make life mildly better for many people going about their usual day.
That would mostly lead to overstimulation and desensitization. People get bored with everything they have immediate and easy access to.
As long as it were an optional thing (i.e. not forced on people like the traditional dystopian take), I don't see how letting people choose to change how they see the world would be a guaranteed negative outcome. Especially since that's pretty much literally the advice in every self-help book: look at things from a new perspective, get a new perspective on life, surround yourself with positive things, go where the scenery makes you most happy, be more cheerful, fake it 'til you make it, etc.
Obviously it could be abused by some (overstimulation/desensitization don't sound so bad compared to alternatives), but I imagine overall it'd be a net positive on the lives of more.
I guess we'll just have to wait and see!
But a consumer-grade plugin with an online renderer might be feasible for now.
* Fast solutions include a pre-training step that can take days. Would be weird to ask the user to do this
Alternatives to training include downloading the trained model. It could be a big download but an overnight download is not that weird to ask users to do.
Compare it to 3D where artists are drowning in new tools and techniques every year, any movement in this space is a good thing.
Worth remembering there was a time when the heal brush didn't exist, a time when the clone brush didn't exist and even a time where the physical airbrush didn't exist.
Here is an (in progress) explanation I am working on aimed at beginners that might help: https://harishnarayanan.org/writing/artistic-style-transfer/
More specifically, ignoring spatiality is the effect of matching the response of the covariance matrix of the feature vectors, which was the key insight in the original style transfer paper by Gatys.
Give it a creepy and/or surrealist plot so the ethereal-looking output suits the film. Perhaps the visuals could be the result of viewing the world through a robot or AI with imperfect cognition. Would be an interesting twist on the old unreliable narrator trope.
As an example, I used frequency separation to split the detail layer from the tone layer in the original, high resolution stock photo of SF. From there I took the lower res (25% size) output from this script and used it as my tone layer. The results are OK: http://i.imgur.com/oakLUiE.jpg
It has the same overall tones, and some of the sharpness is preserved at a high resolution. My 30 second approach suffers from some edge glow, but I'm sure it could be greatly improved in an efficient, automated way.
Anyone know how long this takes per image?
Voice transfer. Imagine your words in the voice of any celeb.
"Our contribution is to constrain the transformation from the
input to the output to be locally affine in colorspace, and
to express this constraint as a custom CNN layer through
which we can backpropagate. We show that this approach
successfully suppresses distortion and yields satisfying photorealistic style transfers in a broad variety of scenarios, including transfer of the time of day, weather, season, and artistic edits"
I was skeptical at first (even posted then deleted a sort of negative comment), but now that I read this, I see the value. The images are much more crisp and distortion free.
Was it really necessary to involve Matlab, Python and Lua?
really cool samples, would be cool to upload my own photos and run it through this
The repo author is super responsive if you're RAM constrained:
similar project https://github.com/jcjohnson/neural-style
Also that repo's author did the Stanford lecture comparing theano, tensorflow, torch which is very worth digging out of the various archives
I see now that the image segmentation is probably the key element to getting these stunning results. Other style transfers I've seen took abstract elements from the donor picture, but this really captures the donor and transforms the recipient significantly.
Looking at the source, almost everything in MATLAB can be ported though.
Looks like this thing is "in the air".
Also, there doesn't seem to be any mention of the distribution license for the project. Would any of the maintainers be able to add a license to the repository? Thanks!
Yep, sci-fi. But I often imagine that sometime soon I'll have a server farm in my pocket (I know I already has via the cloud, but it's a whole new game if you can do the computation on-site, low latency)