
Image-to-Image Translation with Conditional Adversarial Nets - cruisestacy
https://phillipi.github.io/pix2pix/
======
jawns
The "sketches to handbags" example, which is buried toward the bottom, is
really cool. It's basically an extension of the "edges to handbags," but with
hand-drawn sketches.

Even though the sketches are fairly crude, with no shading and a low level of
detail, many of the generated images look like they could, in fact, be real
handbags. They still have the mark of a generated image (e.g. weird mottling)
but they're totally recognizable as the thing they're meant to be.

The "sketches to shoes" example, on the other hand, reveals some of the
limitations. Most of the sketches use poor perspective, so they wouldn't match
up well with edges detected from an actual image of a shoe. Our brains can
"get the gist" of the sketches and perform some perspective translation, but
the algorithm doesn't appear to perform any translation of the input (e.g.
"here's a sketch that appears to represent a shoe, here's what a shoe is
actually shaped like, let's fit to that shape before going any further"), so
you end up with images where a shoe-like texture is applied to something that
doesn't look convincingly like a real shoe.

~~~
ape4
This is be a popular shopping website. Sketch your perfect handbag. See an
image of the product. Click to buy.

~~~
daveguy
"Sketch your perfect handbag" may be a bit much to ask of most people.

~~~
sharemywin
Draw your perfect handbag to share with friends. you only need 10 buyers to
have it created.

~~~
fudged71
I doubt friends want the same handbag through! It's an anti-viral feature:
Personalization.

------
aexaey
Truly impressive overall. Unfortunately, it looks like training set was way
too small. Look for example at reconstruction of #13 here:

[https://phillipi.github.io/pix2pix/images/index_facades2_los...](https://phillipi.github.io/pix2pix/images/index_facades2_loss_variations.html)

Notice white triangles (image crop artifacts) present on the original image,
yet completely absent on the net input image. They make re-appearance on the
output of 3 (4 even?) out of 5 nets despite the lack of corresponding cue in
the input image. Looks like network cheated a bit here, i.e. took advantage of
small set size and memorized the input image as a whole. Then recognized and
recalled this very image (already seen during training) rather than actually
reconstructing it purely from the input.

Same (but less prominent) for other images where "ground truth" image was
cropped.

------
mshenfield
Just want to throw out that none of these applications are new. What is novel
about their approach is that, instead of learning a mapping function using a
hand-picked function to quantify accuracy for each problem, they also have a
mechanism for choosing the function that quantifies accuracy. Haven't grokked
the paper to see how they do it, but that is pretty neat IMO.

------
ragebol
Interesting.

What I like about the "Day to Night" example is that is clearly demonstrates
that these sort of networks lack common sense. It expects light to be where
they are clearly (to humans with common sense at least) no things that can
produce light. E.g. in the middle of a roof or in a tree. Of course, there can
be, but it's fairly uncommon.

And the opposite as well, no lights where a human would totally expect a
light, eg. in the front of buildings or on the top of, well, lighting poles.

~~~
drcode
I'd guess the problem is that the daytime pictures allow for easy feature
detection (tree, building etc) but the nightime pictures are washed out- We
humans look at the daytime picture first, then say "that nighttime picture
must have a tree there" which involves feature detection across both pictures
(in the training phase)

I suspect a neural network better specialized for this task (i.e. that has the
data interlaced for both day and nighttime during training) would have no
problem feature detecting trees and leaving them unlit.

------
sebleon
This is awesome!

Makes me wonder how this can apply to image and video compression. You could
send over the semantic segmentation version of an image or video, and system
on the other end would use these technique to reconstruct the original.

~~~
espadrine
You can perform extremely good compression this way, but the computational and
energy cost would be prohibitive.

There are even more traditional tricks that don't make it in things like H.265
because it is too costly.

------
verytrivial
Does anyone else have the feeling that with the current trajectory, something
exactly like this, but with perhaps a million times the amount of feedback and
data, _thought_ will just _emerge_? Yes, this is all 2D and abstract/selective
training sets etc, but what if AI is the ultimate fake-it-until-you-make-it?

~~~
gallerdude
I don't see this happening. What I do see happening, is it figuring us out.
Somewhere out there, there's a function which explains how exactly our society
is completely organized in every way.

From that, the AI could generate books, movies, and do a lot of things.

~~~
rm_-rf_slash
Reminds me of the novel-rewriting-apparatus from 1984, except with more
friggin' superheroes and remakes.

------
willcodeforfoo
The Aerial-to-Map example looks like this may be useful for automatic
map/satellite rectification/georeferencing, but not sure how efficient it'd be
if it has to compare against a large area.

Does anyone have any experience in this area?

------
bflesch
I feel this can potentially revolutionize creative processes, for example in
the clothing industry. You just draw up a purse or a shoe, let the machines
generate dozens of variants (with pictures), and then you only have to filter
and rank them.

You can pipe these product sketches directly into focus groups who tell you
which product is most likely to sell. You don't need massive staff to come up
with product variants any more.

~~~
nathancahill
I feel like we would end up here:
[http://www.gianlucagimini.it/prototypes/velocipedia.html](http://www.gianlucagimini.it/prototypes/velocipedia.html)

~~~
zelpa
I wonder if you were to average the design of the bicycles whether it would
actually produce something that works?

~~~
dTal
I would have thought that, if you are smart enough to find a "bicycle vector
space" in which averaging sketches of a bicycle produces another valid sketch
of a bicycle, then you probably already know enough about bicycles to design
one without the input of imperfect sketches.

------
iraphael
Besides a cool new application of GANNs, I don't see if this architecture is
much different than normal GANNs. Anyone else have thoughts?

------
amelius
I wonder how well this scales to a larger domain of interest. So, e.g., if the
neural net needs to know not only about cars and nature, but about more topics
such as people, faces, computers, gastronomy, santa claus, halloween,
etcetera, how does the neural net scale? And how should its topology be
extended under such scaling?

~~~
visarga
It's being researched with great interest. Building models from text and
images, describing internal structure and relations between objects, building
rich prior knowledge about the world in order to do inference and guide
behavior.

I see lots of papers that go in this direction, of creating a rich, semantic,
predictive representation of images, video and text and then using it as the
basis for reinforcement learning. Learning to understand the world and to act
based on that understanding.

------
romaniv
Kudos for providing proper examples of the network doing its thing, both good
and bad. This is what all researched ought to do. Too many papers these days
handpick a couple coolest looking results and stop at that.

...

I get a feeling this could be used in game design to do some really cool stuff
with map and texture generation.

~~~
31reasons
It could reduce the game size way down if it can generate textures on-the-fly.

------
rosstex
I'm enrolled in Efros' computational photography course this semester, and
Tinghui and Jun-Yan are the GSIs. It's fantastic to experience the bridge
between teaching and cutting-edge research!

------
mmastrac
This is an absolutely incredible result. All of this stuff would be considered
insanely advanced AI ten years ago, but now we look at it and say "this is
just stuff computers can do".

We've got the pieces of visual processing and imagination here and the pieces
of language input/output as part of Google's work. It feels like we just need
to make some progress on an "AI executive" before we can get a real,
interactive, human-like machine.

------
hanoz
I'm interested in having a play. As an out and out ML newbie, is there such a
thing as an AWS image I could run on a GPU instance and then just git clone
and go?

~~~
gregn610
Try one of the bitfusion AMIs on a g2.2xlarge instance.

~~~
hanoz
Thanks very much. If anyone else is interested I can confirm that the
Bitfusion Boost Ubuntu 14 Torch 7 AMI on a g2.2xlarge instance does offer a
relatively painless way to get going with this, although I couldn't get the
python image combiner to work so had prepare those separately. Have just
trained my first neural net, most exciting!

------
oluckyman
Neural nets! Is there anything they can't do?

