
TensorFlow Graphics: Computer Graphics Meets Deep Learning - lelf
https://medium.com/tensorflow/introducing-tensorflow-graphics-computer-graphics-meets-deep-learning-c8e3877b7668
======
wpietri
This is a really sharp approach, where synthetic scenes are rendered and used
to train a system so it can better understand real scenes. It reminds me of
the bit with Deep Thought in HHGTG, where "even before the data banks had been
connected up it had started from 'I think therefore I am' and got as far as
the existence of rice pudding and income tax before anyone managed to turn it
off." This isn't quite that, but it's a good step in that direction.

~~~
hn_throwaway_99
To me it sounded really similar to a Generative Adversarial Network (a GAN).
With a GAN you have one network whose job is to classify an image (is this
picture _really_ a person?) and another whose job is to essentially fool the
classifier (generate an image that looks like a person).

This case is a little bit of the reverse, in that it's focused on making the
computer vision component (the discriminator) try to match the visual content
that has already been generated.

Seems like these types of "adversarial" approaches will be used in lots of
different domains, as so far they've produced some pretty amazing results.

~~~
codetrotter
Speaking of doing things the other way around, could ML techniques be useful
for tuning shader parameters and light placements in order to make a 3d scene
modeled by a human look as close as possible to a reference photo?

(If so, it should probably be done with multiple reference photos from
different angles, to ensure shaders and lights aren’t adjusted badly so that
the scene only looks good from the one angle that the computer was looking
from when it was tweaking.)

~~~
jeeceebees
Would be even nicer if it could be trained on unpaired datasets (ala CycleGAN
[https://arxiv.org/abs/1703.10593](https://arxiv.org/abs/1703.10593)).

------
andbberger
Pretty cool to see mainstream support arrive for differentiable graphics. It's
been incubating for years

~~~
adw
This is exciting as the first "industrially supported" framework for this
stuff. There's research code out there, but it really _is_ research code – it
was written to publish, not written to build products around.

------
rsp1984
This looks very cool, but can someone with insight into the topic explain why
the graphics part needs to be differentiable?

Couldn't we just automatically generate lots of graphics renderings (that are
by definition already perfectly labelled) and use them to train a ML model?
That wouldn't require any differentiability, would it?

So how is this approach different?

~~~
PeterisP
Differentiable graphics rendering allows you to transform the difference (what
you got vs what you wanted) of the resulting image back to a difference of the
underlying data which you rendered.

It allows you to get from "these 1000 output pixels were wrong" to "the truck
in the scene model actually was two inches to the left compared to what I
expected/predicted" or "the texture of that apple should be changed this way
to match reality".

You wouldn't use it to tweak labels for image classification tasks, but to
learn better underlying models of physical reality and behavior.

~~~
rsp1984
I think I need a higher level explanation.

Is the goal to tweak the rendering parameters until it matches a given input
image? If yes that would be inference, not learning. What am I missing?

~~~
dimatura
Yeah, you can do that as an inference step, like you described, and that has
its uses without any learning required. But you can also make the output of
the renderer depend (differentiably) on parameters apart from whatever you
have as input - and those parameters can be learned via gradient descent. For
example, you could make a conventional CNN take an image of say, a cube, as an
input, and predict its 3D pose wrt to the camera, feed that prediction to the
renderer, and have the renderer render the cube. Then, as training, e.g., a
pixelwise error can be computed and backpropagated to the CNN. At the end of
the process, ideally you would have a network capable of predicting the pose
of a cube (and rendering a close match) in one shot, without iterative
parameter tweaking.

~~~
mov
So, in that case, we could learn without the need of a differentiable renderer
neither graphics operators, right? Maybe the throughput of communicating with
external renderer is too much when compared with iterative parameter tweaking
happening inside of loss functions on CNN?

~~~
PeterisP
> we could learn without the need of a differentiable renderer neither
> graphics operators, right?

No. The sentence of the parent post "a pixelwise error can be computed and
backpropagated to the CNN" is possible only if the renderer is differentiable.

~~~
mov
Got it. So let's suppose we have an external renderer, then we could learn
parameters to tweak rendered scene and then get rendered pixels, so we can
calculate pixelwise errors from it and some target image we're trying to
optimize for. In this way, do we still need differentiable renderer in your
opinion?

Update:

It would require more training cycles and would not be as "atomic" as
iterative tweaks but seems possible.

At same time, I wonder about making loss function talk with some external
renderer would make it possible to mix both approaches.

~~~
PeterisP
_How_ would you learn parameters to tweak the rendered scene if the renderer
is not differentiable, and you can't backpropagate through the renderer to
calculate the appropriate parameter adjustments from the pixelwise errors?

I suppose you theoretically could do it with some trial and error method or
grid search or something like that, but it's going to be absolutely
computationally unfeasible in the general case; the pixelwise errors only
become practically useful if you have an uninterrupted
differentiable/'backpropagatable' path from your parameters to the pixels.

~~~
mov
Yes, throughput would be larger and we would loose the backprop path like you
said, but it seems practical in some ways and actually guiding approaches like
[https://nv-tlabs.github.io/meta-sim/](https://nv-tlabs.github.io/meta-sim/)

------
ehsankia
Why does Light and Materials use the Google Keep logo? :)

------
dimatura
Nice! I played with OpenDR
([http://files.is.tue.mpg.de/black/papers/OpenDR.pdf](http://files.is.tue.mpg.de/black/papers/OpenDR.pdf))
a few years ago, and got really excited about it. Unfortunately it uses a
custom autodiff implementation that made it hard it to integrate with other
deep learning libraries. Pytorch still seems to be lagging in this area, bit
there's some interesting repos on github (e.g. [https://github.com/daniilidis-
group/neural_renderer](https://github.com/daniilidis-group/neural_renderer)).

------
state_less
Wonderful work. I’m curious about the representation. Does it take a scene
graph and infer a scene graph? Can you predict future scenes, given a sequence
of previous scenes?

Seems like a nice way to debug and create more complex models.

------
yogrish
Really cool. “Analysis by synthesis “ way can probably increase the
performance of networks. I am looking at autonomous driving scenarios where it
can remove false object detections.

------
pizza
I wonder if capsule networks would be another good, maybe even better,
approach for scene reconstruction, as opposed to pooling convnets.

~~~
mov
Could you elaborate more?

