
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks - bottlek
https://junyanz.github.io/CycleGAN/
======
soVeryTired
I've only recently started reading about deep neural networks, and the thing
that strikes me the most about the literature is the _lack of mathematics_.

Open a NIPS paper from 2010 or so, and you'll see extremely dense mathematics:
nonparametrics, variational approximation, sampling theory, riemannian
geometry. But from my (admittedly small) sampling of the convnet / RNN
literature there really doesn't seem to be much maths there. The typical paper
seems to run along the lines of "We tried this, and it worked".

I'm not sure whether there's anything to learn from this observation, but I
think it's striking all the same.

~~~
habitue
So what we're seeing is these fields pre-deep learning were mathematical
disciplines making steady progress on well-understood foundations. Then deep
learning came in, was exceptionally effective at problems that had been
difficult to crack, and people shifted focus because it seems weird to be
diddling around with incremental gains on techniques that are significantly
less effective.

What this created though, was a clean break from a field with a theoretical
foundation like what we expect in CS usually, to an empirical foundation, like
what we see in psychology, sociology, medicine etc. The theory will eventually
catch up, but the fact that "it works" really trumps having a full
understanding of why it works (at least for now).

~~~
kcorbitt
> The theory will eventually catch up

Maybe... but the emergent behavior in a complicated system (and these networks
are only getting more complicated, not less) is likely to quickly become more
complex than the human mind can reasonably be expected to understand, given
any amount of time.

We actually know a lot less about biology, for example, than your typical
"biology 101" course would lead you to believe. It's pretty cool that we
figured out how DNA translates into proteins, but actually figuring out how a
cell decides _which_ DNA to express at any given time, much less what any
arbitrary protein actually does, takes you right back into guess-and-check
territory. There are heuristics you can use to eliminate some guesses more
quickly than others, but we're so far away from a unified explanatory theory
of biology that I'm doubtful one will ever be developed.

~~~
jszymborski
I mean, that's the nature of empirical sciences. You establish a model that
fits observations you've made, but inevitably their is some chaotic, sometimes
imperceptible noise that gets averaged out that represents some further level
of complexity.

I wouldn't argue that it means we don't know much, in the end you can argue we
don't know anything because all empirical sciences are beholden to the belief
of abstractions that can be attributed to some imperceptible abstraction until
you (don't) reach the bottom of the stack of turtles.

------
MR4D
My one takeaway from this is that the future will be a scary place. This is
impressive work, so please don't take that as a knock against this phenomenal
work.

This shows that computers soon will have the ability to fool our senses so
well that we may not even believe reality when it is right in front of us.
Some of the pictures, when I was just viewing them (before reading captions or
titles) looked real. I was astounded to see that they were derivatives from
paintings.

The implications are significant, not just in things like gaming, or finance,
but especially in psychology, where the delicate aspects of the mind may be
easily disrupted. I expect there will be numerous growth in neuroses over the
coming decades. Technology will have surpassed natural evolution by such a
margin, that it could be difficult to recover.

~~~
ooqr
We'll probably learn how to exploit human psychology faster than we learn how
to treat or understand it.

~~~
theossuary
I'm pretty sure we already have.

~~~
aisofteng
Indeed. See Cambridge Analytica:
[https://en.wikipedia.org/wiki/Cambridge_Analytica](https://en.wikipedia.org/wiki/Cambridge_Analytica)

------
Ace_Archer
It wasn't highlighted on the github readme, but I think that the satellite
photo to map and map to satellite photo (!!!) is incredible as well:
[https://taesung89.github.io/cyclegan/2017/03/25/maps-
compari...](https://taesung89.github.io/cyclegan/2017/03/25/maps-
comparison.html)

~~~
empath75
i don't even know how that's possible. How could it recover a golf course?

~~~
schoen
In all of these cases, the software is supplying details that are likely in
context on the basis of its prior training, rather than details that are
somehow known to be right. One analogy might be asking a human painter to
complete a partial portrait of a person. The painter might be able to guess at
the person's likely posture and plausible items of clothing based on the
information of the unfinished portrait, but of course the real person who was
the model might have been wearing something else entirely. The fact that the
completion is plausible and self-consistent doesn't mean that it's correct.

------
eriknstr
Current title: Unpaired Image-to-Image Translation using Cycle-Consistent
Adversarial Networks

Current link:
[https://junyanz.github.io/CycleGAN/](https://junyanz.github.io/CycleGAN/)

Previous title: Berkeley's software turns paintings into photos, horses into
zebras, and more

Previous link:
[https://github.com/junyanz/CycleGAN/](https://github.com/junyanz/CycleGAN/)

The previous title, which was based on the repo description of the previous
link, was much more informative to me.

~~~
waqf
I would be very interested to see software that turns horses into zebras, but
I don't believe this is that, this is just software that manipulates images.

------
king_magic
This is really incredible. Maybe a neat idea: artist draws/paints frames for
animation, frames are converted into semi-photorealistic images through this
software, and assembled into a movie.

~~~
rememberlenny
I am a painter and did something similar. I took two images, one of nature and
another of street art. Using style transfer, I combined a series of images.
After identifying a few results that I liked, I used actual paint to make
large canvases inspired by the digitally produced images.

Link:
[https://twitter.com/rememberlenny/status/825026441603592193](https://twitter.com/rememberlenny/status/825026441603592193)

------
baq
it feels like there's a new deep learning paper each week, ever so slightly
bringing me closer to an existential nervous breakdown.

~~~
errantspark
Yeah, I feel that. I tell myself they're only tools, really no stranger than
time <-> frequency domain transformations to someone unfamiliar with the
fourier transform.

Yet there really is something disturbing about seeing a computer resurrect so
much of the mind of an artist who's been dead for nearly a century. I know the
GPU doesn't understand what it's doing, but did Monet?

When I paint I don't really understand. There may be occasional moments of
clarity, I like to think that I make deliberate choices based on the emotions
and thoughts within me that I'd like to reflect to the people experiencing my
art, but... Honestly am I so different from a neural network?

I am a neural network. Much of what goes into creating a piece of art is based
on intuition, on experience and practice, pathways eroded into my mind from
years of thoughts traversing the same landscape. The art I create is unique to
me, not in that it cannot be reproduced but in that every action I take is a
reflection and an echo of all the moments remembered and forgotten that create
me.

How much of Monet's mind is in a GPU in Berkeley?

~~~
abelhabel
I think that is only true on first glance. But if you look a little deeper,
none of the photo->painting are really accurate. The painters had more than
color scheme as information. Van Gough, for example, did not always use a
correct perspective. In general, the deeper meaning of art is not in the
technique but in the perspective of the artist.

That said, I am amazed by the results even so.

I also want to note here, that I don't see any reason why machine intelligence
could not produce meaningful works of art but it will require a new way of
looking at it.

------
kore
This has also been making the rounds, Deep Photo Style Transfer:
[https://github.com/luanfujun/deep-photo-
styletransfer](https://github.com/luanfujun/deep-photo-styletransfer)

~~~
oarfish
Unfortunately, no one seems to be able to run it.

~~~
dkural
Would appreciate a link showing this.

~~~
thoughtpalette
Believe you'll find supporting comments on the original HN thread:
[https://news.ycombinator.com/item?id=13958366](https://news.ycombinator.com/item?id=13958366)

Matlab seems to need a commercial license for at least one barrier.

------
jimmies
While this is undoubtedly very impressive, I think it's just another logical
step to what we've been seeing so far:

\- In the past, you needed to have a pianist at home to perform you a song,
with the music box and then the phonograph you don't need to hire anyone
anymore. It's probably not as good as a live performance (maybe?), but it's
good enough for many people, and much much cheaper, faster, and available.

\- You needed advanced knowledge and equipments at home to produce magazine-
style tri-fold leaflets or wedding invitations, with modern word processors
you can use a template and be alright. It's probably not as good as a
professionally customized design (maybe?), but it's good enough for many
people, and much much cheaper, faster, and available.

\- You used to hire a photographer or an artist to have your portrait
photographed/painted, now you can do with your NVIDIA card at home. It's
probably not as good as a professionally painted one, but it's good enough for
many people, and much much cheaper, faster, and available.

~~~
cr0sh
The next step would be to have it actually turned into an actual canvas
rendering, with the texture and such of actual paint. Could probably be done
using 3d printer technology, inkjet, and/or robotics in some manner...

...in fact, I know you can already get photos printed to canvas - but taking
it to the next stage of texture would be amazing - right now, I think the best
you can get is to have a trained person "highlight" areas of the canvas with
paint. From what I understand, there's a whole "village" or small city in
China that specializes in custom painted images (which a lot of online places
use); I would be surprised if there isn't an effort to automate this work.

~~~
vitro
What for? Really, honest question. If you look at it as an achievement of
technology, then it's fine. But if you look at it as real thing from real
person, then it's fake.

~~~
philipkglass
I prefer the aesthetic qualities of paint applied by strokes over an inkjet
print of a "painterly" image. I'd enjoy a service that allows you to upload
source images, pick a stylization, and then buy a painted-by-robotically-
wielded-brush version.

By way of analogy: I own some furniture from Ikea and some nice antique
furniture. I don't know or care much about the provenance of the antique
furniture. It's made up of bargains from yard/estate sales. If a big box store
could sell me an inexpensive antique-alike bureau that was indistinguishable
from my existing one (to unaided human senses), I'd happily buy it. I want the
thing more than the story behind the thing.

------
thriftwy
The next step is "turning pencil drawings into photos" and using it to
fabricate evidence on grand scale.

Why bother catching politicians doing something when you can just draw them
in? Will wreak havoc on societies with weak politics/reporting culture.

~~~
fritzo
This will hurt journalists reporting on facts more than it will hurt targets
of smear campaigns.

Journalist: "Here is a compromising photo of a politician."

Politician: "Here are 1M photos of every politician doing every imaginable
illegal act. Prove that your one photo is not similarly fabricated."

~~~
oconnore
Good journalism is typically not dependent on "gotcha" images. It might hurt
amateur twitter reporting, though.

~~~
jackpirate
What?!

Iconic photos are crucial for telling news in a way people remember. For
example:

* We remember Tienemen square because of [tank man]([https://en.wikipedia.org/wiki/Tank_Man](https://en.wikipedia.org/wiki/Tank_Man))

* We remember napalm in Vietnam because of [Phan Thi Kim Phuc]([https://en.wikipedia.org/wiki/Tank_Man](https://en.wikipedia.org/wiki/Tank_Man))

* There are too many pictures from the civil rights movement, so I'll [just link to the Getty's gallery]([http://www.gettyimages.com/event/the-american-souths-troubli...](http://www.gettyimages.com/event/the-american-souths-troubling-history-of-racism-560585453#freedom-riders-on-a-greyhound-bus-sponsored-by-the-congress-of-racial-picture-id154784547)). These pictures tell the story of violence against black far better than any article ever could.

------
wonderous
Direct link to research paper:
[https://arxiv.org/pdf/1703.10593.pdf](https://arxiv.org/pdf/1703.10593.pdf)

~~~
tyingq
And a direct link to the teaser screenshot...it's scaled down on the github
readme:
[https://raw.githubusercontent.com/junyanz/CycleGAN/master/im...](https://raw.githubusercontent.com/junyanz/CycleGAN/master/imgs/teaser.jpg)

------
lettergram
At this point, I'm wondering "What's real any more..."

Seriously, if this continues, I don't know how to keep up with this field. I
spend at least an hour a day just reading about the work that has been done
(i.e. reading the research).

~~~
haloboy777
I guess you already have read about Prisma App. Here is the research paper it
is based on.. ->
[https://arxiv.org/pdf/1508.06576.pdf](https://arxiv.org/pdf/1508.06576.pdf)

~~~
aisofteng
Strange to see a paper on this topic that appears to have been written in
Microsoft Word.

------
coreyp_1
I thought that the iPhone photo to DSLR was interesting. (The others are, too,
but I thought that this effect was particularly well done.)

~~~
soylentcola
I guess I'm a bit jaded since adding depth of field via software is a bit more
common these days but overall, this project is still really impressive. Even
with obvious artifacts it's not something I'd have considered possible 5 or 10
years ago.

------
nsxwolf
One of my favorite UX problems - click on a photo and watch it get smaller.

------
lucidrains
Similar work DiscoGAN
[https://github.com/SKTBrain/DiscoGAN](https://github.com/SKTBrain/DiscoGAN)

~~~
gwern
/r/machinelearning asks how it's any different from DiscoGAN:
[https://www.reddit.com/r/MachineLearning/comments/62hzqc/r17...](https://www.reddit.com/r/MachineLearning/comments/62hzqc/r170310593_unpaired_imagetoimage_translation/)
no answer so far.

~~~
aisofteng
Reply posted 11h after your comment:

>All DiscoGAN experiments are on 64x64, this is high resolution. I don't know
whether this is an important difference though.

~~~
gwern
I don't think it is. Working on 64x64 makes global coherence easier and the
NNs smaller/faster but shouldn't make a qualitative difference. I believe one
guy using DiscoGAN/CycleGAN on my current Danbooru anime dataset (
[https://www.gwern.net/Danbooru2017](https://www.gwern.net/Danbooru2017) ) is
doing it at 128px without any major changes.

------
felippee
This style transfer idea, though eye candy and sometimes impressive, seems to
be the core application of deep learning these days. François Chollet tweeted
something like that two years ago about the prisma app (yes it was 2015). He
back then anticipated many other killer apps behind the corner but it seems
not much had materialized. It's 2017 now and people are still super excited
about yet another style transfer network. I'm not even sure where this would
be practically useful, aside from being yet another photoshop/instagram
filter. Am I the only one skeptical about this?

~~~
rmellow
I understand your position, but it comes from a false assumption that this is
just about images. This is a demo that uses images as a way to "wow" the
audience.

However, the kind of network used here (GAN) can be used in domains other than
images (text, financial data).

Imagine if you trained a network to generate fraudulent financial data and
another to become an expert at catching fraud, each feeding back into the
other's skill. This is the concept of GANs at heart and definitely disruptive
if correctly executed.

~~~
felippee
It is not entirely obvious that such extrapolations would succeed. GAN is not
a new idea (new in the context of deep learning) but automatic generation of
adversarial samples is a known and exploited subject (even e.g. in the field
of evolutionary strategies). So this is still the promise: look, it works on
the images but there are all these wonderful applications possible. Its 2017
now, I'll check back in 2019 to see if anything has changed.

------
Terribledactyl
My favorite of the results tended to be [anything] -> Ukiyo-e/Cezanne. I think
because these are easier problems, lots of detail to less. The transfiguration
and painting -> photo have me firmly in the uncanny valley, but I suspect this
harder problem will be solved given more training.

~~~
keehun
I also really liked the Ukiyo-e outputs. I wish they would release full-
resolutions at some point!

------
martokus
Is it April Fools' Day already somewhere?

~~~
bbcbasic
It is here. ... no honestly!

------
TheCoreh
Really impressive stuff. Could this same technique be used on human photos to
transfer traits like gender, age, ethnicity?

Could be really useful for those age-progressed photos used in missing persons
listings, for instance.

~~~
aisofteng
Style transfer lies outside my specialization, so take my comment as
speculation informed by intuition from variously related works that may not
correctly carry over to this one.

If you look at the nature of the transformations achieved in this paper, you
might note that they are changes in, well, style. That is, the presentations
of the objects in an image are represented using a different _style_, but they
remain the same object.

As an example, take a look at the horse/zebra transformation; the horse
obtains a zebra's stripes, but it _structurally_ still looks like a horse.
That is to say, a zebra and a horse have identifiably different bodily
proportions, and the horse's bodily proportions are not changed by the style
transfer. Similarly, the trees in the summer -> winter transformation do not
have the slightly saggy branches that they would due to the weight of snow on
them.

With that in mind, I would be surprised if the approach taken in this work,
taken as-is, would be able to change the gender or ethnicity of a person in a
photo. There are structural differences between men's and women's faces, and
similarly between races. I would imagine that an attempt of a race transfer,
as I suppose you would call it, would largely amount to changing skin tone.

The age progression application, though, might work under the limitations I
have speculated, at least for aging photos of a person that is already more or
less mature. A person's facial structure does not significantly change once
adulthood is reached, so simply transferring stylistic features might add
wrinkles and other age-related changes in a realistic way.

Again, speculation. And even if the speculation is correct, that is not to say
that some modifications to the approach would not be able to lift the
constraints I made up.

Edit: reading through the rest of the paper[0] behind this work, looks like I
might not be far off:

>Although our method can achieve compelling results in many cases, the results
are far from uniformly positive. Several typical failure cases are shown in
Figure 17. On translation tasks that involve color and texture changes, like
many of those reported above, the method often succeeds. We have also explored
tasks that require geometric changes, with little success.

The quote is from Section 6 ("Limitations and Discussion"), and example
"limitations" are given in Figure 17.

[0]
[https://arxiv.org/pdf/1703.10593.pdf](https://arxiv.org/pdf/1703.10593.pdf)

------
skarap
Now they just need to feed it its own source and turn it into Skynet source.

------
dnel
I'll believe it if it's still around in 48 hours

------
amelius
This comes just a bit too late for the movie "Loving Vincent" [1,2], for which
artists laboriously translated frames into the style of Van Gogh.

[1]
[https://www.youtube.com/watch?v=47h6pQ6StCk](https://www.youtube.com/watch?v=47h6pQ6StCk)

[2] [http://www.imdb.com/title/tt3262342](http://www.imdb.com/title/tt3262342)

------
beautifulfreak
I want to see mammals turned into reptiles or birds. Horses and zebras are
already alike. Show me horses with feathers.

------
jordache
TIL - remap green pixels to grey/white and you go from summer to a contrived
notion of winter.

This done easily in Photoshop.

\- select green pixels, smooth it a bit, then paint white over it.. \- apply a
blue cast on it

[https://s17.postimg.org/q68dz04sf/test.jpg](https://s17.postimg.org/q68dz04sf/test.jpg)

~~~
MattRix
Yes but I think you're missing the point that the neural network didn't need
to be reprogrammed to specifically "winterize" the photos like your script
did. It's much more general in its applications, and does a decent job no
matter the lighting conditions etc.

~~~
jordache
I have doubts in how agnostic their solution is...

------
bitL
Can't wait to try using this and overfitted RNNs on getting Beethoven & Mozart
symphonies with samples and transitions from Armin van Buuren and Ferry
Corsten 8-)

------
knicholes
So this is it, right? Can we can use this for clothed -> naked? Or naked ->
clothed, of course, for nswf filtering or for... clean movies, or something.

------
nullc
Next up: Your dog on twitter with unpaired never-impulse to text translation
using cycle-consistent adversarial networks.

------
Jack000
very very cool. One thing I noticed that with image to image translation tasks
the output tends to be a bit "organic" looking, like the photo to map example.
With photographic output it's not noticeable, but it's very jarring for
graphical output.

I wonder if there is a way to fix this, possibly by stacking another GAN on
top?

------
intrasight
I foresee a future in which we're all photo-enhanced in VR-space. Less demand
for plastic surgeons? I wonder.

------
rattray
Why is it that Neural Net-based ML only seems to be claiming results with
images and natural language?

Maybe I'm out of the loop, but I haven't seen anything demonstrating results
on "data" – the kinds of challenges that are actually valuable to businesses.

Why is that? Are those just less sexy / more proprietary in nature, or is
there something about those challenges that make NN's less useful to them?

~~~
317070
First of, there is. These NN are good at exploiting the 'spatiality prior' in
some types of data, like text and images. It means that features in the data
which are close together, should be combined when you climb in the hierarchy
of features. Databases with columns and rows don't have that prior for
instance.

Second, there is also the peer reviewing problem. You are still trying to
explain a very abstract concept to your peers in a paper which is usually
limited to 6 or 8 pages. Text and images make for very graspable examples in
such a short paper. That's the reason why some other data with a spatial prior
is not used as often, like time series or EEG-data.

So, there is a combination of those two elements at play.

~~~
argonaut
Only the first reason is correct (NNs are good at data with dimensional
relationships).

The second reason is pretty bogus (text/images more graspable). It's valid if
you're talking about mass media / popular press. But for research papers 1)
images / large snippets of text are actually a negative since images take a
lot of space and 2) the people doing peer review are expert scientists. They
know the benchmarks and the theory.

------
dharma1
Very cool. What's the max resolution (on a 12GB GPU)?

~~~
ddrager
I'm curious too, since the max resolution appears to be 220x220 on a 2GB GPU
in my testing. If that is a linear relationship, it seems like it would be ~
1080x1080 for a 12GB GPU.

~~~
koala_man
If it's linear with pixel count, then it would be ~500*500.

------
jp555
Some of that functionality looks a lot like Pikazo.

------
scott_s
The project page is perhaps a better place to directly link:
[https://junyanz.github.io/CycleGAN/](https://junyanz.github.io/CycleGAN/)

~~~
dang
Thanks! Changed from
[https://github.com/junyanz/CycleGAN](https://github.com/junyanz/CycleGAN).

