
Raytracer.jl: A Differentiable Renderer That Supports Parameter Optimization [pdf] - robomaster12
https://github.com/JuliaCon/proceedings-papers/blob/jcon.00037/jcon.00037/10.21105.jcon.00037.pdf
======
GistNoesis
There is at least one fundamental problem with these kind of approaches :
Working in pixel space make the problem non differentiable.

Imagine a simple scene with a single red point on white background. You are
currently rendering a point on the left of your virtual screen and trying to
update your camera parameters to match the target point at the right of the
screen to register your camera parameters. Your gradients are zero. You are
already screwed.

With more complex scenes, situation gets a little better, because you can get
lucky, if you happen to have overlapping (between current and target) areas of
similar colors then the gradient will tend to make them more overlapping and
hopefully drag you towards the correct camera parameters.

Then you try to mitigate the initial fundamental problem with little tricks,
to increase the likelihood that we are in the lucky configuration. You
increase the blurriness to transform a single point in a little circle so you
have more hope to have overlapping areas. Then you add some spatial pyramids
(i.e. rendering with a scale of resolutions) to allow large movement across
the screen instead of moving by one pixel by iteration. Then because it still
often doesn't work you add random restarting.

Among other issues are : -making object model non parametric makes the search
harder. -making the rendering process too "good" makes the search harder (you
are considering advanced lighting effects i.e. details before global picture).
-Even using triangles is already a bad choice, it was an optimization of the
rendering pipeline to make the forward process faster, here we are tackling
with the inverse problem so these kind of premature optimization are now
playing against us.

So what are better approaches ? There are various approach.

The old school way, are key points registering methods. You render your scene
you look for key-points (eventually dense) with their local statistics, and
you match with key-points of the target, which allows you to jump almost
directly towards the solution, that you finish aligning with Iterative Closest
Point algorithms.

The new school way, you use an invertible GAN model to render an image. This
gives you a latent vector for your images. You do the operation you want to do
on this latent vector, and then use the generator forward to get a new image.
This latent vector contain the information about the scene so you can train a
neural network (eventually with attention) to answer all questions about your
scene parameters from this latent parameters.

~~~
one-more-minute
This is less a problem with differentiable rendering than a problem of
defining image distance, which I agree is hard (though there are plenty of
interesting approaches, like losses defined by conv nets).

More interesting though is when the renderer is part of a larger pipeline, and
the loss is defined some other way (such as how well the agent is driving, in
our DuckieTown simulator). Backprop through the renderer lets us do BPTT,
which is very powerful, but we don't ever have to compare images, avoiding
that non-differentiability issue. Another example would be the autoencoder-
like setups you can use to bootstrap CV models.

It remains to be seen how effective these things are in practice, of course,
but there are plenty of interesting directions.

~~~
GistNoesis
The main objection I have is that a differentiable renderer is giving you the
rope to hang yourself, it helps you work in the wrong space.

There is some special structure to the rendering problem, that is probably
tackled better. One inherent difficulty for efficient solving is working with
multiple hypothesis at the same time (which differentiable renderer makes you
miss). Also there is some special structure of the probability space which is
"once you know the position of the camera it is easy to render, once you know
the position of the objects it is easy to locate the camera". This means
algorithm used for SLAM are a lot more able to tackle the problem.

Techniques like Rao-Blackwellized Particle Filters, for example are more
appropriate, It doesn't need a differentiable renderer for them and you can
combine them with neural networks for example for pose estimation
[https://arxiv.org/abs/1905.09304](https://arxiv.org/abs/1905.09304) .

What I try to say is that there is not a lot of useful extra information in a
differentiable solver, and the information doesn't flow well. These gradients
are on the detail scale, when what matters in on the large scale.

Sure they can be used to stitch model together but that's an unreliable
stitch, and you won't know if your agent is not driving because he can't take
the right decision or your agent is not driving because he can't process the
scene right.

The gradients will also flow a lot better if you use a RL-model based approach
where you ask the agent model to render what it think it sees and train it to
match with the non-differentiable renderer. These gradient will be a lot
smoother and on a larger scale.

------
dnautics
My so does raytracing of sound to do spatial auralizations. I feel like a
differentiable rayracer could be useful, for example, to help architects make
sure their designed spaces meet code. Is it reasonable that this package would
help for those sorts of applications?

~~~
marmaduke
The discussion mentions that they don't optimize for geometry because the
discretization with triangles is non-differentiable

------
pella
github repo: [https://github.com/avik-
pal/RayTracer.jl](https://github.com/avik-pal/RayTracer.jl)

------
mpoteat
So, two observations. First, how does this compare to the naive method of SGD
with differentiation defined using something like simplex search?

The second question, if this is fully differentiate, how do you define the
existence of an polytope in a scene in differentiable way? Is the input space
for a given parameter optimization limited to material or phong / specular
parameters?

~~~
marmaduke
Gradient based algorithms are more efficient for large parameter spaces.

For second question, the discussion mentions that there are lots of non-
differentiable parameters and they don't handle that.

