
Using Deep Learning to Create Professional-Level Photographs - wsxiaoys
https://research.googleblog.com/2017/07/using-deep-learning-to-create.html
======
wsxiaoys
For those who think it's just another lame DL based instagram filter...

The method proposed in the
paper([https://arxiv.org/abs/1707.03491](https://arxiv.org/abs/1707.03491)) is
mimicing a photographer's work: From taking the picture(image composition) to
post-processing(traditional filter like HDR, Saturation. But also GAN powered
local brightness editing).In the end it also picks the best photos(Aesthetic
ranking)

Selected comments from professional photographers at the end of paper is very
informative. There's also a showcase of model created photos in
[http://google.github.io/creatism](http://google.github.io/creatism)

[Disclaimer: I'm the second author of the paper]

~~~
bluetwo
Did you have anyone say "Professional photographers don't apply HDR filters to
single photos"?

~~~
hoorayimhelping
Why would they say that? You can't have HDR with a single exposure.

I think you're trying to say professional photographers don't apply tone
mapping to single exposures, but that is false. There are times when it's not
'technically proper' to apply tone mapping to a single exposure, but it makes
for a better picture.

Really it sounds like you're trying to make a snarky comment referencing how
some professional photographers complain about novices over using HDR on
flickr. But who cares. You can have a professional quality photograph that
uses what some pros think is incorrect tone mapping. Just like how you can
have a novice hack together a working professional-level product that many
professional engineers think is trash. If it works and meets the criteria, who
cares.

~~~
wl
> You can't have HDR with a single exposure

14 stops of dynamic range in a single shot (RAW on a modern DSLR) counts in my
book.

~~~
barrkel
HDR worthy of the name seems to require multiple levels of light gathering, to
me. Either exposure time or aperture needs to vary. If you don't, you either
blow out on bright light, or you have too much noise in dim areas. You might
not notice if the scene is well and reasonably evenly lit though.

~~~
wl
You never vary aperture when taking bracketed exposures to merge to HDR. The
depth of field would vary from exposure to exposure, making things
inconsistent when you merge them. It's shutter speed you vary.

My point is that with 2005-era DSLRs, I had to use HDR techniques (or strobes)
to, say, correctly expose a room and the environment out a window. Shooting
into the sun, I'd have to use HDR, polarizers, and/or ND grads to expose both
the sky and the foreground. With my D750? I've got 4 stops more dynamic range
and the situations where I need HDR have almost all gone away. I can get the
same thing from a single RAW exposure.

These days, HDR only seems necessary if you want to have ridiculous tone
mapping where you pull in your global highlights to be within a stop or two of
your global shadows and still keep the noise levels down.

~~~
CWuestefeld
I still find a need to bracket occasionally.

But to your point... it seems that "HDR" means two different things these
days.

1\. A tool to expand the dynamic range.

2\. A style of tone mapping.

These two things frequently coincide, but there's no reason for #1 to have
crazy surreal tones, and #2 frequently comes from a single exposure.

------
Lagged2Death
When a topic like self-driving vehicles comes up, the Hacker News crowd is
mainly in favor: “Creative destruction! Disruption! Go go gadget robots!” Not
surprising. How many Hacker News readers drive trucks or taxis for a living?
How many regard commuting as an enjoyable hobby?

Photography, on the other hand, is a very common hobby in the tech community.
And the comments here seem to reflect that this effort strikes a little close
to home: “Those pictures are lousy, if you find them appealing you have no
taste! Just because they're 'professional' doesn't mean they're good! Machines
can’t replace human judgment, they have no soul! I bet that machine had a lot
of human help!”

Tech people may tell you great stories about meritocracy and reason, but in
the end we are just emotional monkeys. Like the rest of humanity.

Those of us who can accept this may at least aspire to be wise monkeys.

~~~
sho
Ha, I'm one of those techies you're talking about. Spent an embarrassing
amount on camera gear, and I'm always "that guy" with the ridiculous rig at
someone's casual party.

But you know what? I love this shit. I like photography for the good photos,
not because I'm building my self esteem on top of it. I want to capture the
scene or the moment in a way that, IMO, does it justice. If this technology
makes that easier, and gives it to more people - great. Progress.

And what I really love is how it subtracts a huge amount from the price of
entry. It's like when software synths came out and made it possible to make
music without $10k+ worth of MIDI hardware. What followed? An explosion of
creativity which I've been luxuriating in ever since. Get the ability to
create beautiful images in as many hands as possible, says I. For me, at
least, "more beautiful images in the world" was the whole idea.

~~~
soundwave106
Honestly, this "creative destruction" has already to some extent entered the
photography field. You can take very capable, often professional looking
photographs for the vast majority of circumstances with just your phone.

There are still many situations where more professional expertise is required
(portrait photography is probably one of the best examples, knowledge of
proper lighting is still required for best results). But I've gotten the
impression that overall demand for photographers -- and pay -- is slower these
days, largely due to technological advances. (See articles like this:
[http://www.nytimes.com/2010/03/30/business/media/30photogs.h...](http://www.nytimes.com/2010/03/30/business/media/30photogs.html))

In a way, this is fine though -- as you say, it gives power to more people,
which is great.

Per the parent post's point, to be honest, I perceived much of the criticism
to be more that they really couldn't think of a good way to use it in their
own hobby level work. I personally can't think of a good use case. On the
other hand, the most useful application I can think of for this is more in
reverse: social media companies with huge image libraries their users have
graciously "donated" to them, could use this AI algorithm or something similar
to pick, or construct, the most "professional" looking image of anything they
can identify from their vast library, and sell it or license it to media /
content publishers for a relatively low cost. Such could be but one more
competitor to any professional landscape / scenery photographers out there.

~~~
CWuestefeld
It's true that a phone today an produce a photo that's technically pretty
good. I've got a friend who's had shows of iPhone photos, and won numerous
awards with them.

This doesn't mean that someone armed with their phone will take good photos.
There's still the matter of composition, as well as capturing the "the
decisive moment"[1].

The tool described in the OP provides an automated means of finding some
worthwhile composition within a library of images, thus providing an aid to
someone less skilled at composition - although it's not going to produce
something from nothing. If the photographer failed to point the camera in the
direction of the best photo, then they've failed.

But I can't see any way that automated tools can, after the fact, help us to
capture that decisive moment.

[1]
[http://truecenterpublishing.com/photopsy/decisive_moment.htm](http://truecenterpublishing.com/photopsy/decisive_moment.htm)

~~~
ebalit
Would frame extracted from a 120 fps video might help for this. If it's
possible to learn composition, I think that selection of the decisive moment
is reachable.

------
andreyk
Talking as a semi-pro (I've put in some money into cameras and lenses and
spent a good bit of time on photo editing), this is a bit underwhelming. For
landscapes (which this seemed to focus on), I've found that opening up the
Windows photo editing programs and clicking 'enchance' or Gimp and clicking
some equivalent already gets you most of the way there in terms editing for
aesthetic effect. The most tricky bit is deciding on the artistic merit of a
particular crop or shot, and as indicated by the difference between the
model's and photographer's opinion at the end of the paper, the model is not
that great at it. Still, pretty cool that they did that analysis.

~~~
dheera
The other huge thing that is lacking is content. Image processing is only 10%
of good photography. The rest is about conveying an idea to your audience. It
can be humor, documenting an event, making people aware about societal
problems, taking people to somewhere they would normally not be able to
access, or seeing the world in a way that most people don't see.

Truly good photographers don't just produce beautiful photos; they produce
_meaningful_ photos.

The tech is great but I'm not a fan of the title of the article ;)

~~~
kinkrtyavimoodh
That depends on what kind of photography it is.

Photojournalism is almost always about the story and what the picture evokes.
It could be a simple pic of a boy covered in dust, but made evocative because
the dust is from a bomb explosion that killed his father and not from playing
in the park.

But in case of a lot of landscape photography, the content is intrinsic to the
picture itself—the photo does not get its value from anything external to it.
Likewise for portraits.

~~~
dheera
Not sure I agree with that. There's good landscape/portrait photography, and
there's really great landscape/portrait photography.

Great landscape photography takes you to an interesting location or
interesting angle. A post-processing algorithm cannot _take you_ somewhere
interesting.

Good landscape photography makes you say, "Wow, that's beautiful!"

Great landscape photography makes you say, "Holy crap, what is that? Where is
that?"

[https://500px.com/photo/4505244/rainbow-glass-by-junya-
haseg...](https://500px.com/photo/4505244/rainbow-glass-by-junya-hasegawa)

[https://1x.com/photo/41156/portfolio/60972](https://1x.com/photo/41156/portfolio/60972)

[https://500px.com/photo/103746295/the-white-silence-by-
danie...](https://500px.com/photo/103746295/the-white-silence-by-daniel-
kordan)

[http://ngm.nationalgeographic.com/2008/08/photo-
contest/stei...](http://ngm.nationalgeographic.com/2008/08/photo-
contest/steinmetz-text)

[http://hongwrong.com/verical-horizon-hk/](http://hongwrong.com/verical-
horizon-hk/)

[https://500px.com/photo/103960697/one-thousand-and-one-
night...](https://500px.com/photo/103960697/one-thousand-and-one-nights-
by-%C4%B0lhan-eroglu)

[https://500px.com/photo/5211062/harvest-by-rob-
kruyt](https://500px.com/photo/5211062/harvest-by-rob-kruyt)

~~~
Retric
I significantly prefer the article's photos to those.

Buildings are boring as is snow, photo of person or animal / shadows not
landscape, and over edited. Now, I suspect as someone that's looking at a lot
of photos those pinged some novelty feeling in you, but that does not mean
they are objectively good.

By comparison the article's photos (and those linked:
[http://google.github.io/creatism](http://google.github.io/creatism))
generally evoked that I wish I was there feeling.

~~~
eru
I see what you mean, though I do like buildings. (But I always had a soft spot
for Hong Kong.)

I predict a similar divide will crop up in computer generated music: very soon
normal people will prefer their pleasant sounds, even while more discerning
(semi-) pros will deride their lack of artistic merit for a lot longer.

Computer generated art will shine as Gebrauchskunst first---eg get a
'professional level' soundtrack and video editing for your youtube video shot
on a phone.

------
jff
Automatically selecting what portion to crop is impressive, but just slamming
the saturation level to maximum and applying an HDR filter is the sign of
"professional" photography rather than good photography.

~~~
sp332
This is mentioned on page 6. They agree it's limitation of the current tuning
method. [https://arxiv.org/abs/1707.03491](https://arxiv.org/abs/1707.03491)

------
d-sc
As someone who lives in a relatively rural area with similar geography to much
of the mountains and forests in these pictures I have noticed previously how
professional pictures of these areas have a similar feeling of over saturating
the emotion.

It's interesting to see algorithms catching up to being able to replicate
this. However when you mention these kind of abilities to photographers, they
get defensive, almost like you are threatening their identity by saying a
computer can do it.

~~~
atarian
> they get defensive, almost like you are threatening their identity by saying
> a computer can do it.

It's a fairly common reaction from most people. Hayao Miyazaki, director of
Spirited Away, got very upset after being shown a demo of AI doing animation:

[https://qz.com/859454/the-director-of-spirited-away-says-
ani...](https://qz.com/859454/the-director-of-spirited-away-says-animation-
made-by-artificial-intelligence-is-an-insult-to-life-itself/)

~~~
sp332
I feel like that misses a lot of the context of the video. It was animating
mutilated bodies in a prototype for a horror game. It's no wonder it was
upsetting.

~~~
averagewall
At the end of the video he seems to show the other side - "I feel like we are
nearing the end of times. We humans are losing faith in ourselves."

------
brudgers
It is an interesting project and shows significant accomplishment. I'm not
sold on the idea of "professional level" except in so far as people getting
paid to make images. I am not sold because the little details of the images
don't really hold up to close scrutiny (and I don't mean pixel peeping).

1\. The diagonal lines in the clouds and the bright tree trunk at the extreme
right of the first image are distractions that don't support the general
aesthetic.

2\. The bright linear object impinging on the right edge of the cow image and
the bright patch of the partial face of the mountain on the extreme left.
Probably the gravel at the left too since it does not really support the
central theme.

3\. The big black lump that obscures the 'corner' where the midground mountain
meets the ground plane in the house image.

4\. The minimal snow on the peaks in the snow capped mountain image is more
documenting a crime scene than creating interest. I mean technically, yes
there is snow and the claim that there was snow would probably stand up in a
court of law, but it's not very interesting snow.

For me, it's the attention to detail that separates better than average
snapshots from professional art. Or to put it another way, these are not the
grade of images that a professional photographer would put in their portfolio.
Even if they would get lots of likes on Facebook.

Again, it's an interesting project and a significant accomplishment. I just
don't think the criteria by which images are being judged professional are
adequate.

~~~
arcticfox
But the emphasis was on the processing, not the content of the images
themselves, other than cropping. I don't think the intention at this point was
to get into adding / removing actual physical elements from the scene.

Edit: I guess it depends if there were other images in their corpus that
didn't have any flaws, but who's to know?

~~~
brudgers
With the exception of the weakly snow capped mountains (part of photography is
showing up at the right time) different crops would have eliminated the
conditions I described. But to unpack my critique a bit further, an
unwillingness to make compromises like the weakly snowcapped mountains and
paying the price of an extra week in the field waiting on foul weather is part
of how professional photographers get professional shots...and part of how the
maintain their professional reputation is not publishing shots that don't live
up to a high standard.

Or to put it another way, compare the images in the blog post to Carter Gowl's
[http://www.gowlphoto.com](http://www.gowlphoto.com). Gowl produces a couple
of images a year and those are good enough to charge a couple of hundred
dollars per print. Even if the images in the blog post were appropriately high
resolution, I don't see them commanding a similar price. YMMV.

~~~
ma2rten
The blog post itself actually confirms what you are saying. They say they
asked professional photographers to rate their creations. 40% of their
creations were rated at semi-pro or pro level. Looking at their plot only very
few of them got a solid 4.0 (professional rating). So, the headline is kind of
misleading.

~~~
brudgers
Out of curiosity, I looked at the linked photos of Jasper National Park (one
of the locations) on Google maps. They make an interesting point of
comparison,
[https://www.google.com/search?q=jasper+national+park](https://www.google.com/search?q=jasper+national+park)

------
matthewvincent
I don't know why but the "professional" label on this really irritates me. I'm
curious to know how the images that got graded on their "professional" scale
were selected for inclusion in the sample. Surely by a human who judged them
to be the best of many? I'd love to see the duds.

~~~
sp332
They use images with high aesthetic ratings from the AVA dataset (Aesthetic
Visual Analysis). Scoring is described in section 5 in the article.

------
fudged71
Very impressed by the results.

I hope that one day our driverless cars will alert us when there is a pretty
view (or a rainbow) so we take a moment to look up from our phones. Every
route can be a scenic route if you have an artistic eye.

~~~
damontal
our driverless cars will just take a photo of the pretty view and notify us
that it's ready to view on our phone.

~~~
motoboi
Problably just tell us how many likes the trip received on the internet.

~~~
d23
Other algorithms could detect whether a picture your friend posted on their
trip is the sort of thing you would like and automatically like it for you.

~~~
dsnuh
[http://supersadtruelovestory.com](http://supersadtruelovestory.com)

Interesting book that centers around some of these themes. Worth a read if you
haven't yet.

------
wonderous
Interesting how hi-res the photos of a small section of Google Street Car
photo can be compared to what users see online; here's an example from the
linked article:

[https://2.bp.blogspot.com/-6bVWUgA8NEI/WWe1uoW8ayI/AAAAAAAAB...](https://2.bp.blogspot.com/-6bVWUgA8NEI/WWe1uoW8ayI/AAAAAAAAB4Q/PeoM8jc_xMwYvNYc5HJAnrJi0GrrjvKMQCEwYBhgL/s1600/image7.png)

~~~
jonas21
You can see the hi-res version in street view if you zoom in.

[http://i.imgur.com/4raWafu.jpg](http://i.imgur.com/4raWafu.jpg)

~~~
wonderous
Maybe I wasn't not clear enough, my point was the hi-res photos used in the
research appear to be of higher resolution than what the public version of
Google Street Car zoom offers.

------
jtraffic
When a photographer takes or edits a picture, she doesn't need to predict or
simulate her own reaction. There is no model or training necessary, because
the real outcome is so easily accessible. However, she is only one person, and
perhaps can't proxy well for a larger group.

The model has the reverse situation, of course: it cannot perfectly guess the
emotional response for any one person, but it has access to a larger
assortment of data.

In addition, in different contexts it may be easier/cheaper to place a machine
vs. a human in a certain locale to get a picture.

If my theorizing makes any sense, it suggests that this technology would be
useful in contexts where: the locale is hard to reach and the topic is likely
to evoke a wide variety of emotional responses.

~~~
sp332
A photographer does have to predict their reaction. I'm not a pro but I take
dozens of photos before one I really like. I've heard pros take hundreds or
thousands for each one that they let other people see. And it makes sense to
automate cases like this where you have 40,000 photos that you want to edit.

------
bitL
Retouching is another field to play with - I am experimenting with CNN/GANs to
clone styles of retouchers I like. If you are a photographer, you know that
most studio photos look very bland and retouching is what makes them pop; for
that everyone has a different bag of tricks. If you use plugins like
Portraiture or do basic manual frequency separation followed by curves and
dodge/burn adjustments, you leave some imprint of your taste. This can be
cloned using CNN/GANs pretty well; the main issue is to prevent spills of
retouched area to areas you want to stay unaffected.

------
seasonalgrit
"Someday this technique might even help you to take better photos in the real
world."

So what? Maybe I missed it, but what are some potentially meaningful
applications of this technology? What motivated this to begin with? Or are
these questions that we even bother asking anymore?

I remember the first time someone showed me the Snapchat app -- it would make
them look like a cartoon dog, or all these other real-time overlays. I
thought, 'jesus, so glad we're all getting advanced computer science degrees
so we can work on utterly useless shit like this...'

~~~
dewitt
Well I don't really get Instagram or Snapchat either, _but_ they created
something that has made tens of millions of people happy, and turned a
thousand other people into millionaires (or billionaires!) basically
overnight. By that measure, what have you or I ever done...?

I think this research is spot on, and can't wait to have it on my phone. And I
love taking photos the old fashioned way, too.

------
mozzarella
this is amazing, but 'professional photographers' aren't really the best
arbiters of what a 'good' photograph is. Also, training on national parks
binds the results to a naturally bland subject, no pun intended. While an
amazing achievement, nothing shown here demonstrates ability beyond a
photographer's assistant/digital tech adjusting settings to a client's tastes
in Capture One Pro. Jon Rafman's 9 Eyes project comes to mind as something
that produced interesting photographs, as does the idea to find a more
rigorous panel of 'experts' (e.g. MoMA), or training the model on
streets/different locations than national parks.

------
agotterer
Related: Arsenal ([https://www.kickstarter.com/projects/2092430307/arsenal-
the-...](https://www.kickstarter.com/projects/2092430307/arsenal-the-
intelligent-camera-assistant-0/description)) is trying to build a hardware
camera attachment that uses ML to find the perfect levels for your photo in
realtime.

------
parshimers
This is cool but I really don't get why one could call this actually creating
"Professional-Level" photographs. It's more like a very good auto-retouch.
There's still the matter of someone actually being there, realizing it is a
beautiful place, and dragging a large camera with them and waiting for the
right light.

~~~
averagewall
Not quite. As the article says, the algorithm did the "realizing it is a
beautiful place" and Google Streetview did that "dragging a large camera".

------
zemotion
I think some of these results are really lovely, the one at Interlaken is a
perfect travel photo. Would be interesting to see more types of work this
could apply to.

Saw a few people talking about retouching and studio work - I do a lot of
studio shoots and retouching on my own, and would be happy to help or
participate in projects. Feel free to reach out.

------
campbelltown
The first thought after going through all these photos was: incredibly
stilted. It's amazingly impressive, but the human photographer will always be
able to capture the subtleties that AI will miss. But very cool nonetheless

------
descala
Instead of augmented reality I would call this "distorted reality". People
will prefer to visit places with Street View than being there. Real reality is
uglier

------
tuvistavie
Up to what point can the output be controlled? Can complex conditions be
created? e.g. a lake with a mountain background during the evening

------
k__
Is deep learning comparable to perceptual exposure?

------
wingerlang
In the future maybe we can just hook up a drone to this and have it fly around
taking nice pictures.

------
BasDirks
I find the colors in the results images consistently worse than in the
original images.

------
known
ML = Wisdom of Crowds

------
seany
Would be interesting to see how well you could train this kind of thing off of
a large catalog of lightroom edit data. to then mimic a specific editors
style.

------
anigbrowl
_For example, whether a photograph is beautiful is measured by its aesthetic
value, which is a highly subjective concept._

Oh really.

------
olegkikin
[deleted]

~~~
evv
You're implying that the source photo was taken by a professional
photographer, but these are all clearly sourced from Google Maps. There was no
pre-existing framing because it is a 360 photo, and the exposure is probably
automatically set by the camera. I doubt the people hiking trails with a
camera backpack really spend much time configuring their gear- because it
would take too much time.

------
cooervo
wow automation isn't leaving any fields untouched

------
Kevorkian
Lately, there has been lots of talk of deep learning applied to create tools
which can generate requirements – designs – software code – create builds –
test builds as well help with deploying builds to various environments. I'm
excited for the future developments capable with ML.

------
mozumder
If they're doing dodging/burning, then they could really use the processing on
raw files instead of jpegs. The dynamic range is obviously limited when
dodging/burning jpegs, as you can see from the flat clouds and blown
highlights on the cows.

------
mtgx
Great, not all we need is specialized machine learning inference accelerators
in our mobile phones. I wonder if Google has even considered making a mobile
TPU for its future Pixel phones.

~~~
jorgemf
Qualcomm added recently some improvements for running deep learnning in
mobile, and google is also working in mobile nets that use less operations. I
don't think you will see TPUs or something similar in mobile soon, but there
are other small improvements that are being done now.

Unless someone can put a huge battery in a small mobile, forget about running
big (and good) networks in mobile.

------
jonbarker
From the article the caption of the first picture was interesting: "A
professional(?) photograph of Jasper National Park, Canada." Is that the open
scene from The Shining? If so I wonder why the question mark, is Stanley
Kubrick not a professional photographer?

~~~
jonknee
Keep reading, it's from Street View!

