
Learning to Predict Depth on the Pixel 3 Phones - SirVeza
https://ai.googleblog.com/2018/11/learning-to-predict-depth-on-pixel-3.html
======
fanzhang
Non-linear combination of PDAF information, along with the contents of the
images themselves is an interesting demonstration of the power of neural
networks. The general concept of non-linear combination is likely how our
brains work too.

Given that they built just one testing rig, and they use information like
"commonly known object sizes" for depth, I wonder if they are using any
transfer learning at all? I imagine whoever is bringing that rig around can't
look at all objects, and transferring the knowledge of "common object sizes"
and even what an object is can be hugely valuable for such a project.

I bet this is also a sneak peak into how powerful self-driving cars can be at
Waymo. Not only do such cars have access to human data (sights, sounds), they
can have way more sensors which can combine in a neural network to predict
things way better than humans. For example, a NN could infer the chance a
pedestrian might veer from the sidewalk into the streets by analyzing:

\- The psychological state of a person using a high res camera on the face.

\- Using chromographs to see if the pedestrian is drunk.

\- Infrared cameras to see the person's temperature / if she's excited, etc.

\- Limb modeling and likely walking paths.

------
mherdeg
I didn't realize that the word "bokeh" is fairly new to English-language
discussion of photography -- only since about 1998 (
[https://en.wikipedia.org/wiki/Bokeh](https://en.wikipedia.org/wiki/Bokeh) ).

I had this series of thoughts about smartphone camera "portrait mode" which
were like this:

(1) Ha, pretty funny that we're spending all this human effort and R&D time to
try to realistically simulate what is essentially a defect in camera lenses
that makes them unable to have a wide enough focal plane to accurately
represent what is in front of them.

(2) But wait, isn't that snobbish and a little shortsighted? (ha ha). What is
a camera for anyway, is it making art or faithfully what it sees or something
else? There's whole academic disciplines about this hmm okay.

(3) Well, no, I mean, a photo which has been selectively blurred like this has
_less information_ than we started with, so it's objectively telling us less,
and that's why it's funny, that we are spending so much time removing
information.

(4) But maybe that's just the same as editing. Surely a good editor of texts
takes a lot of information and removes information to make a better story; in
the same way surely a "portrait mode" is really about figuring out what the
most important parts are of a picture and selectively obscuring the rest, so
that the 'good stuff' can shine through. That's important and hard; with an
85mm lens at f1.8 you'd do it by controlling focus carefully, but doing it
after the fact is definitely interesting.

So okay, the reason this feature is interesting (and hard) is that software is
trying to guess what parts of a picture are noise that should be gently
deemphasized.

Now, it seems like the best way that software companies can think of to
deemphasize unimportant parts of a picture is to mimic what a physical camera
does with a portrait lens -- choose parts not on the same focal plane, pretend
those parts were out of focus, then soften the image in the areas that "should
have been" out of focus and blur them.

Now that they have shipped products that do that I'd be interested to see what
the folks at Apple & Android do next. Do their teams still exist? Have they
thought about how they would emphasize the important parts of a portrait if
they _weren 't_ required to simulate the mechanical effect of a camera lens?
What _else_ could they do, maybe a little more creatively, to make portraits
shine?

(And have they reinvented anything that Photoshop plugin makers have been
doing for a decade?)

~~~
devadvance
It's important to keep in mind that while it seems to be approached as
"simulating" the effects traditionally achieved with mechanical effects of a
camera, it has a basis in fundamental optics. Creating a sense of focus in a
portrait with a camera lens is analogous to how the optics in our eyes work.
Thus, this particular effect is more natural than some of the more Photoshop-
esque techniques.

A good example of "what else" might be the color pop feature in the Google
Photos app. It uses the depth information to selectively decolor the "out of
focus" portions of the photo:
[https://www.androidpolice.com/2018/05/11/google-photos-
color...](https://www.androidpolice.com/2018/05/11/google-photos-color-pop-
images-rolling/)

~~~
mherdeg
Yeah that's a good point. Until a month ago when I got an eye exam and
glasses, I didn't really understand that the way a telephoto lens worked at
wide apertures was very similar to the way the eye was supposed to work at
_most_ distances.

(I was familiar with focusing on one thing and other things being out of
focus, but with lens corrections this now works much better at 10+ meter
distances, which is something I really didn't appreciate until I tried it.)

------
a-dub
Ahhh coool. I didn't know they had this PDAF thing to get stereo images. I
remember when they released their first round of this for the old cameras,
where you were required to smoothly swipe the camera over to the side and (I'm
assuming) they used optical flows + accelerometer data to get two source
locations for stereo stuff.

Interesting application of object detection/segmentation to make up for a weak
depth map from the PDAF, but I have to wonder, if you're constraining yourself
to portraits and are doing a pretty good job of detecting the person, I wonder
how well you could do without the stereo depth data at all...

~~~
a-dub
A fun experiment would be to try to build something that predicts complete
depth maps from single images alone. (ie: train on images with stereo depth
maps, try to predict them from just one side)

~~~
sorenjan
Called depth estimation from single image. There's ongoing research, here's
one example: [0]

Using only the segmented 2D image is what the selfie cam does by the way, at
least on Pixel 2 [1].

[0] [https://papers.nips.cc/paper/5539-depth-map-prediction-
from-...](https://papers.nips.cc/paper/5539-depth-map-prediction-from-a-
single-image-using-a-multi-scale-deep-network.pdf)

[1] [https://ai.googleblog.com/2017/10/portrait-mode-on-
pixel-2-a...](https://ai.googleblog.com/2017/10/portrait-mode-on-pixel-2-and-
pixel-2-xl.html)

~~~
a-dub
Cool!

------
yomritoyj
I'm waiting for the light-field cameras [https://en.wikipedia.org/wiki/Light-
field_camera](https://en.wikipedia.org/wiki/Light-field_camera)

~~~
erikpukinskis
Me too. Apparently the Lytro folks were absorbed by Google.

------
jcims
Can anyone explain to me why I’m not allowed to zoom in on this web page in
Chrome on IOS? Edit: works in safari

