Hacker News new | past | comments | ask | show | jobs | submit login
Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (junyanz.github.io)
667 points by bottlek on March 31, 2017 | hide | past | favorite | 135 comments

I've only recently started reading about deep neural networks, and the thing that strikes me the most about the literature is the lack of mathematics.

Open a NIPS paper from 2010 or so, and you'll see extremely dense mathematics: nonparametrics, variational approximation, sampling theory, riemannian geometry. But from my (admittedly small) sampling of the convnet / RNN literature there really doesn't seem to be much maths there. The typical paper seems to run along the lines of "We tried this, and it worked".

I'm not sure whether there's anything to learn from this observation, but I think it's striking all the same.

So what we're seeing is these fields pre-deep learning were mathematical disciplines making steady progress on well-understood foundations. Then deep learning came in, was exceptionally effective at problems that had been difficult to crack, and people shifted focus because it seems weird to be diddling around with incremental gains on techniques that are significantly less effective.

What this created though, was a clean break from a field with a theoretical foundation like what we expect in CS usually, to an empirical foundation, like what we see in psychology, sociology, medicine etc. The theory will eventually catch up, but the fact that "it works" really trumps having a full understanding of why it works (at least for now).

> The theory will eventually catch up

Maybe... but the emergent behavior in a complicated system (and these networks are only getting more complicated, not less) is likely to quickly become more complex than the human mind can reasonably be expected to understand, given any amount of time.

We actually know a lot less about biology, for example, than your typical "biology 101" course would lead you to believe. It's pretty cool that we figured out how DNA translates into proteins, but actually figuring out how a cell decides which DNA to express at any given time, much less what any arbitrary protein actually does, takes you right back into guess-and-check territory. There are heuristics you can use to eliminate some guesses more quickly than others, but we're so far away from a unified explanatory theory of biology that I'm doubtful one will ever be developed.

I mean, that's the nature of empirical sciences. You establish a model that fits observations you've made, but inevitably their is some chaotic, sometimes imperceptible noise that gets averaged out that represents some further level of complexity.

I wouldn't argue that it means we don't know much, in the end you can argue we don't know anything because all empirical sciences are beholden to the belief of abstractions that can be attributed to some imperceptible abstraction until you (don't) reach the bottom of the stack of turtles.

Also, deep net problems are different than traditional CS because we can mathematically define when say a list of numbers is sorted but not when "this image looks photorealistic".

Effectively, deep nets are good at tricking human visual cortex, so I'm not sure theres a deeper mathematical reason they work well, in the same way that the reason mp3 compression works so well is that they tested it against human hearing.

But yes, all deep learning is based on gradient descent which is a greedy heuristic algorithm and it's not a priori apparent it should do anything interesting. But it does.

> in the same way that the reason mp3 compression works so well

But there are deeper reasons that mp3 works well, besides "tested against humans". It's just that those reasons are about psychoacoustics. The most obvious example is that humans are bad at discerning amplitude and phase differences at high frequencies.

I expect that as we get deeper into the theory around why particular algorithms work, we'll find similar answers. We'll be able to say things like "when an image has the following statistical properties, we're likely to interpret it as a cat". Of course, right now we can trivially say that if by "the following statistical properties" we mean "activates this specifically trained network in a particular way", but the point is that we can probably boil that down into deeper (and simpler) statements about how human image recognition works.

I can't find a link about it now, but I remember reading that for their Translate app, Google created a way to lossily compress their character recognition neural net. That kind of research seems likely to me to lead to a better understanding of how to boil down the insights that these learning algorithms are gleaning.

This Two Minute Papers video has some thinking along the same lines: https://www.youtube.com/watch?v=ZBWTD2aNb_o

But I'd also point out that the human visual cortex is itself a bunch of models about the real world. It's gotten that way through a combination of evolution and learning. When engineers train neural networks to do tricks like TFA shows, those networks are effectively modeling the real world in two different ways: directly, and by proxy in modeling human visual systems. So the insights we can find by studying them are likely to tell us not only about our own vision, but about the world itself.

Finally, it's worth noting that graphics algorithms have always been about human visual systems and the real physical world, as opposed to deep mathematical truths. Look at this paper for example: http://www.jiansun.org/papers/Dehaze_CVPR2009.pdf

The oversimplified plain English insight those researchers had is "For a photo that doesn't contain haze, you usually can find a fully saturated pixel within some reasonably small neighborhood of any point in the image." There's no deep mathematical insight there, but there is an important insight into the statistical nature of natural images – a fact about the physical world. And they use that insight to great effect: they can remove haze from photographs, and even use the haze as a way of estimating a depth map.

So sure, while you're right that there probably aren't many deep mathematical insights to find from the results of deep learning algorithms (which is generally true of graphics algorithms anyway), I wouldn't gloss over the deep practical insights that they encode.

It's data-driven... rather than theory-driven.

In some respects this is good, in other respects it's awful.

Agreed with the first paragraph, but I think this sort of behavior is normal for most disciplines. That is, it's not CS looking like psychology or sociology, it's normal for CS to do this. I think we're seeing a similar thing right now in biology with CRISPR.

FWIW, CRISPR isn't a great analogy. It's a well understood mechanism, and not really all that different from standard genome editing techniques like restriction enzymes, zinc-finger nucleases, etc. It's new and flashy and super powerful -- and the applications of CRISPR are still being discovered -- but the mechanism is understood and simple.

That wasn't the aspect I was going for. More for a period of incremental improvements in understanding, from which emerges a particular thing with enormous potential, and then the field is dominated by applications of that new thing.

> The typical paper seems to run along the lines of "We tried this, and it worked".

This sounds like science to me, is "we tried this, and it worked" something to shy away from for some reason? Of course, it's super important to publish "we tried this, and it didn't work." But that's another topic altogether...

If less math really is a trend, do you see it as a bad thing? You didn't state it strongly, but you've hinted at that, and it seems like the replies assume that's what you're suggesting.

I love math and mathy papers, and I would still welcome a trend toward less math in papers in return for more effort spent on making simple ideas plain and easily understood. But academics don't always operate that way. Math in papers is often used to obscure simple ideas, sometimes on purpose, and sometimes it's an indicator that the author doesn't understand the domain clearly enough but still wants to sound smart. Sometimes a paper really requires dense math, but not very often. Dense math almost always makes a paper more difficult to reproduce. Either way it is harder for even experts to evaluate the quality of dense math than of expository writing that strives for simplicity and clarity.

Neural networks are really simple math under the hood, well understood algorithms and simple linear algebra, why not write great papers that work and don't re-hash the math but instead focus on clarity, reproducibility and results?

I agree with everything you've said. But my suspicion is that there's less maths in the papers because no-one knows how to use it to analyse neural networks properly.

That means current research is operating via trial and error. One man's trial and error is another's blind search. Without maths to point researchers in the right direction, neural net research could easily hit a wall once the low-hanging fruit has been picked.

A friend of mine from academia was considering going into industry, so went to some data science meetups. Someone was giving a presentation about convolutional networks, yet did not know what a convolution was. At first I was startled, but in the long run machine learning applications will be decided just as much on user experience and design features as on algorithmic choices.

I'm not a web programmer, but I imagine few developers could remember the mathematics[0] of the sorting algorithms that are fundamental underpinnings of their work (if they ever learned them at all). Yet I'm not sure it matters, even to great developers. The same thing will probably ultimately be true of machine learning. Honestly, you need not know what a convolution is to build a perfectly usable convnet. (And ultimately you may not need to even build your own if you can use a nifty amazon API.)

Whether NIPS should care or not is a separate story. It seems a little sad - I took all this hard pure math as an undergrad, and it doesn't seem to be important if all I'm doing is changing a few layer parameters (even if the change is ingenious).

[0] mathematics as in proof of sorting, proof of bounds on space/time complexity, etc.

Math needs to fulfill a purpose in order to succeed in INDUSTRY. A image recognition is first and foremost about recognizing an image, the math behind it is mystical, but the accuracy is measurable.

If you want your knowledge to be hold as useful, then you need find a usage case for it. Simple as that. This is not only true for math, but for all the other techniques as well. Otherwise so-called knowledge is yet another self-indulgent toy, disconnected even further from being useful.

Contrary to what OP states here, recent development of WGAN and LSGAN pretty much math driven, and it leads to very useful realworld extension to the original model, that improves it quite a bit.

I think once the underpinnings were created via math it became questions about what network geometries to use.

We don't have much math I'm aware of that can describe the capabilities of different network configurations from first principles. Even though we constructed the network it feels like we're back to the beginnings of science with this one.

Let's change this and see what happens.

The thing is, the underpinnings have been there for decades. The convnet was invented in the '90s. The only thing that's changed is the availability of data and processing power.

ConvNet was indeed invented a long time ago. But a lot of things have changed since then other than data and processing power. There have been new development in optimizers, non linearities, architectures, loss functions and theoretical understanding.

NIPS is more math/theory heavy as a conference while ICCV/CVPR/other vision conferences will tolerate less math and more pictures. You have probably been reading mostly vision papers but if you read more papers targeted at NIPS you will find more math. This looks to be an ICCV submission, whereas if you read the original GAN paper (a NIPS paper) there are theorems and proofs. Your observation is correct, you are just comparing different subsets of the field.

I think this is because at it's core, the math behind deep learning is pretty simple and there's not much of it. Linear algebra, some simple activation functions and gradient descent. Implementing a net from scratch in python is pretty concise. The simplicity-to-effectiveness ratio is what I find so interesting about deep learning.

> I'm not sure whether there's anything to learn from this observation

I think it is our duty to explore all the possibilities of neural networks. They are still not fully understood yet. Theory will catch up once we practically understand the beast. It might not be pure math, but it is a necessary exploration nonetheless.

Neural networks are very much math based (more so than I imagined when I first started reading about them), it's just the underlying math hasn't changed much - only the architectures and techniques.

Wait until we have neural networks designing neural networks.

Neural Architecture Search with Reinforcement Learning


Too bad, it is already there :P

Been there, done that. Beats humans :)

https://arxiv.org/abs/1606.02492 ^^

Disclaimer: I am the author

Too bad that Math heavy approaches just cannot compete.

We have a winner.

Yes we do. In terms of spending time mathurbating what is fundamentally limited and incapable, I suggest we spend more time play with the shiny new toys that do fullfill its very own promises.

My one takeaway from this is that the future will be a scary place. This is impressive work, so please don't take that as a knock against this phenomenal work.

This shows that computers soon will have the ability to fool our senses so well that we may not even believe reality when it is right in front of us. Some of the pictures, when I was just viewing them (before reading captions or titles) looked real. I was astounded to see that they were derivatives from paintings.

The implications are significant, not just in things like gaming, or finance, but especially in psychology, where the delicate aspects of the mind may be easily disrupted. I expect there will be numerous growth in neuroses over the coming decades. Technology will have surpassed natural evolution by such a margin, that it could be difficult to recover.

We'll probably learn how to exploit human psychology faster than we learn how to treat or understand it.

I'm pretty sure we already have.

Indeed. See Cambridge Analytica: https://en.wikipedia.org/wiki/Cambridge_Analytica

I wonder if all generated images should somehow be watermarked so that they can be determined some day to be false. Otherwise we could have a lot of false images around the web used for confirmation bias.

It wasn't highlighted on the github readme, but I think that the satellite photo to map and map to satellite photo (!!!) is incredible as well: https://taesung89.github.io/cyclegan/2017/03/25/maps-compari...

The map-to-photo only seems to work because there is a limited training set built with the exact same photographic tile as the vector map. It's reproducing features that are not possible to infer.

i don't even know how that's possible. How could it recover a golf course?

In all of these cases, the software is supplying details that are likely in context on the basis of its prior training, rather than details that are somehow known to be right. One analogy might be asking a human painter to complete a partial portrait of a person. The painter might be able to guess at the person's likely posture and plausible items of clothing based on the information of the unfinished portrait, but of course the real person who was the model might have been wearing something else entirely. The fact that the completion is plausible and self-consistent doesn't mean that it's correct.

Did it recover a golf course? If you're talking about examples #1 and #5, the golf course is the input, and the output is a map.

Current title: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Current link: https://junyanz.github.io/CycleGAN/

Previous title: Berkeley's software turns paintings into photos, horses into zebras, and more

Previous link: https://github.com/junyanz/CycleGAN/

The previous title, which was based on the repo description of the previous link, was much more informative to me.

I would be very interested to see software that turns horses into zebras, but I don't believe this is that, this is just software that manipulates images.

This is really incredible. Maybe a neat idea: artist draws/paints frames for animation, frames are converted into semi-photorealistic images through this software, and assembled into a movie.

I am a painter and did something similar. I took two images, one of nature and another of street art. Using style transfer, I combined a series of images. After identifying a few results that I liked, I used actual paint to make large canvases inspired by the digitally produced images.

Link: https://twitter.com/rememberlenny/status/825026441603592193

I expect this is going to be the future of animation, since drawing in-between frames and backgrounds is very labor-intensive and expensive.

It'll be interesting to watch how copyright law treats training data. Suppose I want to make an animation in the style of Disney or Studio Ghibli or Kyoto Animation, and so I use their entire body of work as training data to generate output in the same style from my own sketches. Is that now a derivative work? Is it different when a human copies a style while drawing by hand (requiring a degree of effort and skill that few possess) versus if a computer does most of the work (requiring low effort and ordinary artistic skills)? Would animation be treated any differently than, say, training an AI to write songs like Bob Dylan or write stories like J.R.R. Tolkien or host a radio show like Garrison Keillor?

Welcome to HN...

I wish they had a more samples of that. I'd be interested to see where the threshold is in terms of detail to get something vaguely photorealistic

edit: in the other post there's a link to the paper with some good examples. I think the amount of detail you'd need would be pretty prohibitive in terms of artistic effort. On the other hand, it would be interesting to use this on something like a rough CG rendering to try to correct some of the shading imbalance and such.

it feels like there's a new deep learning paper each week, ever so slightly bringing me closer to an existential nervous breakdown.

Yeah, I feel that. I tell myself they're only tools, really no stranger than time <-> frequency domain transformations to someone unfamiliar with the fourier transform.

Yet there really is something disturbing about seeing a computer resurrect so much of the mind of an artist who's been dead for nearly a century. I know the GPU doesn't understand what it's doing, but did Monet?

When I paint I don't really understand. There may be occasional moments of clarity, I like to think that I make deliberate choices based on the emotions and thoughts within me that I'd like to reflect to the people experiencing my art, but... Honestly am I so different from a neural network?

I am a neural network. Much of what goes into creating a piece of art is based on intuition, on experience and practice, pathways eroded into my mind from years of thoughts traversing the same landscape. The art I create is unique to me, not in that it cannot be reproduced but in that every action I take is a reflection and an echo of all the moments remembered and forgotten that create me.

How much of Monet's mind is in a GPU in Berkeley?

I think that is only true on first glance. But if you look a little deeper, none of the photo->painting are really accurate. The painters had more than color scheme as information. Van Gough, for example, did not always use a correct perspective. In general, the deeper meaning of art is not in the technique but in the perspective of the artist.

That said, I am amazed by the results even so.

I also want to note here, that I don't see any reason why machine intelligence could not produce meaningful works of art but it will require a new way of looking at it.

Or.. we just greatly overestimated what our brains were doing all along..

>How much of Monet's mind is in a GPU in Berkeley?

Very little, arguably none. Imitation is far removed from creation. If you see an art student reproduce Monet's paintings or redo an existing image in Monet's style, would you ask "how much of Monet's mind is in the art student?"

Are you a graphic artist or a cab/truck driver? :)

I started buying AI textbooks circa 2010 anticipating something was about to change dramatically.

Sadly, I didn't work through the books like I had planned at all. Lacked the discipline to come home and work through them at the time. I really regret it now as things are blowing up and, as you said, it seems like every week there is something new and interesting

Luckily, deep learning is built on super complicated mathematical foundations, but the practice isn't. You could quickly get up to speed with a library like Keras and be productive, building things and trying them out without needing years of theoretical training. I'd highly recommend looking into it, most of the papers are relatively easy to read, and rely largely on empirical results and intuition, rather than deep theoretical proofs or the like.

This has also been making the rounds, Deep Photo Style Transfer: https://github.com/luanfujun/deep-photo-styletransfer

I inquired about the license for that one but unfortunately they can't allow commercial use.


These are amazing.

I wonder if it could start with input of a normal portrait of you, style: some celebrity of your gender, and output: what you would look like after receiving the same style? It doesn't seem beyond the examples shown at your link...

It would bring a whole new meaning to the word "filter" (instagram, etc.) I particularly like that the original is very much present in the output: it would still be "you".

But maybe there are subtle problems that I don't notice because I don't look at the subject matter as carefully. People pay a lot of attention to recognizing each other, perhaps the effect would not transfer as well as these examples presented.

Unfortunately, no one seems to be able to run it.

This person ran it on a hackintosh (I helped a little) https://github.com/luanfujun/deep-photo-styletransfer/issues...

Would appreciate a link showing this.

Believe you'll find supporting comments on the original HN thread: https://news.ycombinator.com/item?id=13958366

Matlab seems to need a commercial license for at least one barrier.

While this is undoubtedly very impressive, I think it's just another logical step to what we've been seeing so far:

- In the past, you needed to have a pianist at home to perform you a song, with the music box and then the phonograph you don't need to hire anyone anymore. It's probably not as good as a live performance (maybe?), but it's good enough for many people, and much much cheaper, faster, and available.

- You needed advanced knowledge and equipments at home to produce magazine-style tri-fold leaflets or wedding invitations, with modern word processors you can use a template and be alright. It's probably not as good as a professionally customized design (maybe?), but it's good enough for many people, and much much cheaper, faster, and available.

- You used to hire a photographer or an artist to have your portrait photographed/painted, now you can do with your NVIDIA card at home. It's probably not as good as a professionally painted one, but it's good enough for many people, and much much cheaper, faster, and available.

The next step would be to have it actually turned into an actual canvas rendering, with the texture and such of actual paint. Could probably be done using 3d printer technology, inkjet, and/or robotics in some manner...

...in fact, I know you can already get photos printed to canvas - but taking it to the next stage of texture would be amazing - right now, I think the best you can get is to have a trained person "highlight" areas of the canvas with paint. From what I understand, there's a whole "village" or small city in China that specializes in custom painted images (which a lot of online places use); I would be surprised if there isn't an effort to automate this work.

What for? Really, honest question. If you look at it as an achievement of technology, then it's fine. But if you look at it as real thing from real person, then it's fake.

I prefer the aesthetic qualities of paint applied by strokes over an inkjet print of a "painterly" image. I'd enjoy a service that allows you to upload source images, pick a stylization, and then buy a painted-by-robotically-wielded-brush version.

By way of analogy: I own some furniture from Ikea and some nice antique furniture. I don't know or care much about the provenance of the antique furniture. It's made up of bargains from yard/estate sales. If a big box store could sell me an inexpensive antique-alike bureau that was indistinguishable from my existing one (to unaided human senses), I'd happily buy it. I want the thing more than the story behind the thing.

"Technologists have produced a 3D-printed painting in the style of Rembrandt"

The next step is "turning pencil drawings into photos" and using it to fabricate evidence on grand scale.

Why bother catching politicians doing something when you can just draw them in? Will wreak havoc on societies with weak politics/reporting culture.

This will hurt journalists reporting on facts more than it will hurt targets of smear campaigns.

Journalist: "Here is a compromising photo of a politician."

Politician: "Here are 1M photos of every politician doing every imaginable illegal act. Prove that your one photo is not similarly fabricated."

Good journalism is typically not dependent on "gotcha" images. It might hurt amateur twitter reporting, though.


Iconic photos are crucial for telling news in a way people remember. For example:

* We remember Tienemen square because of [tank man](https://en.wikipedia.org/wiki/Tank_Man)

* We remember napalm in Vietnam because of [Phan Thi Kim Phuc](https://en.wikipedia.org/wiki/Tank_Man)

* There are too many pictures from the civil rights movement, so I'll [just link to the Getty's gallery](http://www.gettyimages.com/event/the-american-souths-troubli...). These pictures tell the story of violence against black far better than any article ever could.

verifying the source of an image via a blockchain is probably going to have to be a thing

When all you have is a hammer, everything looks like a nail. What do blockchains have to do with this?

Blockchain are timestamped, cryptographically stamped and immutable.

> "turning pencil drawings into photos"

errr.... I hate to break it to you but... https://github.com/phillipi/pix2pix/blob/master/imgs/example...

This is very impressive tech and has a lot of good applications for movies and art but I'm with you that it does scare me that it will find it's most common use case as a tool for propaganda by governments.

How long until personal testimony and non-repudiable crypto signatures are the only admissible evidence in court?

Maybe it's the good thing. Law worked like this for ages and in some ways they managed.

Of course it relied heavily on everybody knowing everybody else in much smaller societies, but that we can do also by doing an AR face lookup on every person we ever interact with.

Since the provenance of other evidence usually is a matter of personal testimony except where the provenance is not in dispute, I'm not sure why that would be necessary; the reliability of evidence is already determined by evaluation of personal testimony.

Of course, eyewitness testimony is notoriously unreliable even when not intentionally deceptive, and testimony is much easier to falsify even without any technical aid than any other form of evidence, so relying solely on personal testimony wouldn't really help to avoid fabricated evidence, anyway.

This is the first thing I thought, really scary.

So... photoshop?

I think beyond photoshop - more like the ability to seamlessly insert or "imagine" situations that never occurred in real life. For instance, imagine if you could feed in a ton of images of person A - then a ton of images of porn - then have the system imagine a porn scene.

...or a drug-usage scene.

...or any other thing you could think up. It's even possible this could be pushed into a generated movie. You could even work the "voices" in (have the person say virtually anything you want, nearly perfectly, synched perfectly with the facial expressions/movements/body language).

There's a lot of scary, yet interesting possibilities here!

Hmm - think of the possibilities for CGI effects? Rather than having to build a 3D model, just have the software imagine the scene (I know - much easier said than done - but I think the possibility is there, now).

Another idea: Could this kind of software take a movie or still image, and "imagine" a three-d model from it? I don't see why not...

...and yes, I understand this software as being a particular form of a neural network, and that for any of this it takes massive compute power in the form of multiple-GPU time and a ton of memory - and likely isn't yet in the realm of doing it yourself at home except in simpler examples - which might still take days or weeks to train, and a good amount of time to render; I just want to indicate that I have a certain minor level of understanding of what this software is about and that I don't believe for a second that it is a basic "photoshop" filtering system.

I've seen some work on using GANs to generate 3D models from single images, but the results aren't particularly high-quality yet.[0]


Direct link to research paper: https://arxiv.org/pdf/1703.10593.pdf

And a direct link to the teaser screenshot...it's scaled down on the github readme: https://raw.githubusercontent.com/junyanz/CycleGAN/master/im...

At this point, I'm wondering "What's real any more..."

Seriously, if this continues, I don't know how to keep up with this field. I spend at least an hour a day just reading about the work that has been done (i.e. reading the research).

I guess you already have read about Prisma App. Here is the research paper it is based on.. -> https://arxiv.org/pdf/1508.06576.pdf

Strange to see a paper on this topic that appears to have been written in Microsoft Word.

I thought that the iPhone photo to DSLR was interesting. (The others are, too, but I thought that this effect was particularly well done.)

I guess I'm a bit jaded since adding depth of field via software is a bit more common these days but overall, this project is still really impressive. Even with obvious artifacts it's not something I'd have considered possible 5 or 10 years ago.

One of my favorite UX problems - click on a photo and watch it get smaller.

Similar work DiscoGAN https://github.com/SKTBrain/DiscoGAN

/r/machinelearning asks how it's any different from DiscoGAN: https://www.reddit.com/r/MachineLearning/comments/62hzqc/r17... no answer so far.

Reply posted 11h after your comment:

>All DiscoGAN experiments are on 64x64, this is high resolution. I don't know whether this is an important difference though.

I don't think it is. Working on 64x64 makes global coherence easier and the NNs smaller/faster but shouldn't make a qualitative difference. I believe one guy using DiscoGAN/CycleGAN on my current Danbooru anime dataset ( https://www.gwern.net/Danbooru2017 ) is doing it at 128px without any major changes.

This style transfer idea, though eye candy and sometimes impressive, seems to be the core application of deep learning these days. François Chollet tweeted something like that two years ago about the prisma app (yes it was 2015). He back then anticipated many other killer apps behind the corner but it seems not much had materialized. It's 2017 now and people are still super excited about yet another style transfer network. I'm not even sure where this would be practically useful, aside from being yet another photoshop/instagram filter. Am I the only one skeptical about this?

I understand your position, but it comes from a false assumption that this is just about images. This is a demo that uses images as a way to "wow" the audience.

However, the kind of network used here (GAN) can be used in domains other than images (text, financial data).

Imagine if you trained a network to generate fraudulent financial data and another to become an expert at catching fraud, each feeding back into the other's skill. This is the concept of GANs at heart and definitely disruptive if correctly executed.

It is not entirely obvious that such extrapolations would succeed. GAN is not a new idea (new in the context of deep learning) but automatic generation of adversarial samples is a known and exploited subject (even e.g. in the field of evolutionary strategies). So this is still the promise: look, it works on the images but there are all these wonderful applications possible. Its 2017 now, I'll check back in 2019 to see if anything has changed.

It's 2017 and cars drive themselves. Isn't that exciting?

They don't. If you've ever chanced upon one of Uber's prototypes, the engineer in the driver's seat takes over often.

My favorite of the results tended to be [anything] -> Ukiyo-e/Cezanne. I think because these are easier problems, lots of detail to less. The transfiguration and painting -> photo have me firmly in the uncanny valley, but I suspect this harder problem will be solved given more training.

I also really liked the Ukiyo-e outputs. I wish they would release full-resolutions at some point!

Is it April Fools' Day already somewhere?

It is here. ... no honestly!

Really impressive stuff. Could this same technique be used on human photos to transfer traits like gender, age, ethnicity?

Could be really useful for those age-progressed photos used in missing persons listings, for instance.

Style transfer lies outside my specialization, so take my comment as speculation informed by intuition from variously related works that may not correctly carry over to this one.

If you look at the nature of the transformations achieved in this paper, you might note that they are changes in, well, style. That is, the presentations of the objects in an image are represented using a different _style_, but they remain the same object.

As an example, take a look at the horse/zebra transformation; the horse obtains a zebra's stripes, but it _structurally_ still looks like a horse. That is to say, a zebra and a horse have identifiably different bodily proportions, and the horse's bodily proportions are not changed by the style transfer. Similarly, the trees in the summer -> winter transformation do not have the slightly saggy branches that they would due to the weight of snow on them.

With that in mind, I would be surprised if the approach taken in this work, taken as-is, would be able to change the gender or ethnicity of a person in a photo. There are structural differences between men's and women's faces, and similarly between races. I would imagine that an attempt of a race transfer, as I suppose you would call it, would largely amount to changing skin tone.

The age progression application, though, might work under the limitations I have speculated, at least for aging photos of a person that is already more or less mature. A person's facial structure does not significantly change once adulthood is reached, so simply transferring stylistic features might add wrinkles and other age-related changes in a realistic way.

Again, speculation. And even if the speculation is correct, that is not to say that some modifications to the approach would not be able to lift the constraints I made up.

Edit: reading through the rest of the paper[0] behind this work, looks like I might not be far off:

>Although our method can achieve compelling results in many cases, the results are far from uniformly positive. Several typical failure cases are shown in Figure 17. On translation tasks that involve color and texture changes, like many of those reported above, the method often succeeds. We have also explored tasks that require geometric changes, with little success.

The quote is from Section 6 ("Limitations and Discussion"), and example "limitations" are given in Figure 17.

[0] https://arxiv.org/pdf/1703.10593.pdf

There is a similar paper that uses unsupervised image to image translation that have applied it for gender conversion.


The full project can be seen here: https://github.com/SKTBrain/DiscoGAN

Is someone also thinking of CRUD on clothing on humans?

Now they just need to feed it its own source and turn it into Skynet source.

I'll believe it if it's still around in 48 hours

This comes just a bit too late for the movie "Loving Vincent" [1,2], for which artists laboriously translated frames into the style of Van Gogh.

[1] https://www.youtube.com/watch?v=47h6pQ6StCk

[2] http://www.imdb.com/title/tt3262342

I want to see mammals turned into reptiles or birds. Horses and zebras are already alike. Show me horses with feathers.

TIL - remap green pixels to grey/white and you go from summer to a contrived notion of winter.

This done easily in Photoshop.

- select green pixels, smooth it a bit, then paint white over it.. - apply a blue cast on it


Yes but I think you're missing the point that the neural network didn't need to be reprogrammed to specifically "winterize" the photos like your script did. It's much more general in its applications, and does a decent job no matter the lighting conditions etc.

I have doubts in how agnostic their solution is...

I'm still up in the air on how much of this is an amazing breakthrough in cognitive image manipulation and how much is a parlor trick. It is very cool, but the scope seems a little too narrow in their examples.

Me too.. in the example where they showed conversion of Yosemite valley from winter to summer - How did they know to make the grass yellow instead of green? Did they index a lot of pics of yosemite valley sufficiently to assume the valley's grass is more often yellow than green?

Okay, now do the Zebra one.

It's a lot cheaper to get an ai to do it than a person.

I'm demonstrating that it doesn't require AI. It's all procedural. A script with https://www.imagemagick.org/script/index.php . would suffice.

Can't wait to try using this and overfitted RNNs on getting Beethoven & Mozart symphonies with samples and transitions from Armin van Buuren and Ferry Corsten 8-)

So this is it, right? Can we can use this for clothed -> naked? Or naked -> clothed, of course, for nswf filtering or for... clean movies, or something.

Next up: Your dog on twitter with unpaired never-impulse to text translation using cycle-consistent adversarial networks.

very very cool. One thing I noticed that with image to image translation tasks the output tends to be a bit "organic" looking, like the photo to map example. With photographic output it's not noticeable, but it's very jarring for graphical output.

I wonder if there is a way to fix this, possibly by stacking another GAN on top?

I foresee a future in which we're all photo-enhanced in VR-space. Less demand for plastic surgeons? I wonder.

Why is it that Neural Net-based ML only seems to be claiming results with images and natural language?

Maybe I'm out of the loop, but I haven't seen anything demonstrating results on "data" – the kinds of challenges that are actually valuable to businesses.

Why is that? Are those just less sexy / more proprietary in nature, or is there something about those challenges that make NN's less useful to them?

First of, there is. These NN are good at exploiting the 'spatiality prior' in some types of data, like text and images. It means that features in the data which are close together, should be combined when you climb in the hierarchy of features. Databases with columns and rows don't have that prior for instance.

Second, there is also the peer reviewing problem. You are still trying to explain a very abstract concept to your peers in a paper which is usually limited to 6 or 8 pages. Text and images make for very graspable examples in such a short paper. That's the reason why some other data with a spatial prior is not used as often, like time series or EEG-data.

So, there is a combination of those two elements at play.

Only the first reason is correct (NNs are good at data with dimensional relationships).

The second reason is pretty bogus (text/images more graspable). It's valid if you're talking about mass media / popular press. But for research papers 1) images / large snippets of text are actually a negative since images take a lot of space and 2) the people doing peer review are expert scientists. They know the benchmarks and the theory.

Because this is B2B business and usually done by specialized software companies. These companies do not publish or open source their solutions because it is either against their own or their customer's interest.

Examples are the whole Predictive Maintenance sector, the medical sector (Computer Aided Diagnosis) or insurance companies which use NN to for all kind of analyses.

There are many good answers already, but my two cents is that there are many standard statistical methods that work on basic "data" type problems. If your data is spreadsheet type data, where it has some number of basic float inputs that lead to some float result, you will probably use ordinary statistical methods like boost. NNs might be able to achieve the same results but they probably won't do much better.

In fact I remember talking to an ML guy with a PHD who was working on one of these types of problems and I asked "why not try NNs on this problem". He looked at me with disgust and said something akin to "it's provable that NNs can never do better than BOOST, so why use them?"

However, boosted decision trees don't work on image analysis at all, so these types of problems have become the standard for NNs.

It is also worth noting that people are more likely to try image problems if everyone else is trying image problems, because then it is easy to compare multiple algorithms together.

It's not just images, audio and text. There's also behavior (reinforcement learning), such as in Atari games and AlphaGo.

Very cool. What's the max resolution (on a 12GB GPU)?

I'm curious too, since the max resolution appears to be 220x220 on a 2GB GPU in my testing. If that is a linear relationship, it seems like it would be ~ 1080x1080 for a 12GB GPU.

If it's linear with pixel count, then it would be ~500*500.

Some of that functionality looks a lot like Pikazo.

The project page is perhaps a better place to directly link: https://junyanz.github.io/CycleGAN/

Thanks! Changed from https://github.com/junyanz/CycleGAN.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact