Implicit Differentiation

shoyer · on June 3, 2021

The general case of implicit differentiation, i.e., for functions y(x) defined by the constraints F(x, y(x)) = 0, where x and y are vectors, is solved by "implicit function theorem": https://en.wikipedia.org/wiki/Implicit_function_theorem

∂ y(x) = -(∂_1 F(x, y(x)))^{-1} (∂_0 F(x, y(x)))

where ∂ denotes partial differentiation.

This turns out to be an incredibly useful identity for calculating derivatives. No matter how you calculated a solution to the equation, computing derivatives is "just" a matter of performing a linear solve.

If the calculation you performed is a solution to solving an equation, implicit differentiation is typically much faster, less memory intensive and more accurate than calculating derivatives by differentiating through your solver. For examples, you might check-out a recent paper I co-authored with colleagues at Google: https://arxiv.org/abs/2105.15183

PTOB · on June 3, 2021

Paul is going to single-handedly make my Alma Mater famous. ... or at least people won't say, "Never heard of it," when I tell them I got a CS degree there.

edflsafoiewq · on June 3, 2021

Assume f(x, y(x)) = 0.

Differentiation wrt. x gives ∂f/∂x + ∂f/∂y y'(x) = 0.

Solving, y'(x) = - (∂f/∂x) / (∂f/∂y).

Tainnor · on June 3, 2021

The result is correct, but that's not (IMHO) "partial differentiation wrt. x", it's finding the (regular) derivative of g := x -> f(x, y(x)), specifically:

We can write g = f . (id, y)^T ("." being function composition).

By the multivariable chain rule

g'(x)

= f'(x, y(x)) * ((x, y(x))^T)'

= (∂f(x, y(x)) / ∂x, ∂f(x, y(x)) / ∂y) * (1, y'(x))^T

= ∂f(x, y(x)) / ∂x + (∂f(x, y(x)) / ∂y) * y'(x).

zwaps · on June 3, 2021

Seems like both of you essentially apply the implicit function theorem

https://en.wikipedia.org/wiki/Implicit_function_theorem

I always found that theorem very useful, not only for computing implicit derivatives, but also for understanding if such things actually exist.

Another way to see it is to use the total derivative and then (somewhat heuristically) solve it for dy/dx. Mathematicians may yell at you, tho

Tainnor · on June 3, 2021

Sort of. Although, for the part that I showed, you don't need the extra assumptions of the implicit function theorem. You only need local invertibility once you actually want to solve the equation. But analysis has been a while for me.

edit: to be more precise, nothing in my answer requires knowing the implicit function theorem. I'm just basically executing the same steps as for the proof of the implicit function theorem, but only for the concrete case of f: R^2 -> R (the implicit function theorem works for any f: R^n -> R^m). If you solve the equation, you need local invertibility (in this concrete example: the partial derivative of f wrt. y needs to be invertible in a neighbourhood of (x, y(x)).

> Another way to see it is to use the total derivative and then (somewhat heuristically) solve it for dy/dx. Mathematicians may yell at you, tho

If you do it properly (i.e. concretely state your assumptions), I don't think mathematicians will yell at you

paulpauper · on June 3, 2021

implicit differentiation can be complicated and counter-intuitive

for example:

9=y^2+x^2 find y' the answer is NOT 2y

so if you jsut studied calculus, this new rule invalidates what you just learned

I hardly use it even though I do use calculus in some of my work. Usually I just need to get the differential of a single variable of a function of the form w=f(x,y,z) or such. I think implicit notation makes things more confusing.

BlackFly · on June 3, 2021

It doesn't invalidate the rule, it just checks if you understood differentiation as an operator and not as a procedure.

The general approach is to take an equation involving a function and differentiate that equation then factor the equation to find the derivative. It usually only comes up if finding a closed form expression for the function is intractable but factoring the derivative of a known equation involving the function is tractable. If the function is vector valued or tensor valued, even that can be intractable since it might be difficult to factor the equation after differentiating to isolate the derivative you want an expression for. I also don't use it very often, but it is invaluable in differential geometry and provides (for example) an easy way to derive an expression for the derivative of a determinant or the derivative of an inverse with respect to a tensor.

whimsicalism · on June 3, 2021

Assuming y' = dy/dx, then 0 = 2 y y' + 2 x -> -x/y = y'

I've honestly never understood what exactly implicit differentiation means or what is implicit about it, it seems to just be the normal rules of calculus?

Why would you expect dy/dx to be 2y?

e: ah, didn't see you were referencing a problem from the page

Tainnor · on June 3, 2021

Because he's for some reason differentiating with respect to y.

You're right, "implicit differentiation" is nothing more than just differentiating both sides of an equation. Pedagogically, it is a bit of a step up from "here's a single expression that denotes a function, differentiate it". You have to recognize that differentiation is an operation on functions, and then you have to realize that something like "9=y^2+x^2" really means "9=y(x)^2 + x^2 for all values of x" (or equivalently as an equation of functions, the constant function 9 = y^2 + id^2) and that it's valid to apply the differentiation operator to both sides.

Many calculus courses probably gloss over subtleties like this or expect students to be able to "intuitively" grasp them (and many very well may), but I can see it leading to confusion.

kragen · on June 3, 2021

An "explicit function" is something like y = √(x³ + 1), where you have a formula like √(x³ + 1) that tells you how to compute a value of the function for any given value of the independent variable. An "implicit function" is the function implicitly defined by an equation like y² - x³ = 1; you don't have a formula. In this case, it actually defines a relation, because for any value of x there are two values of y. You could sort of reasonably claim that by sticking "y =" before a formula, you are defining an implicit function, just a boring degenerate one, but we conventionally reserve the term "implicit function" for things that we can't handle in the explicit-function fashion.

It's called "implicit differentiation" because we're differentiating the implicit-function representation—instead of just taking the derivative of one formula to find the answer, we take the derivatives of the formulas on both sides of the equation. (We know this is safe: if a = b, then we can substitute them for each other in any context, for example da/dx = db/dx.) Or, equivalently, we subtract them from each other, take the derivative of the resulting expression, and set it to 0.

klipt · on June 3, 2021

> what is implicit about it

The relationship between y and x is implicit rather than explicitly y = f(x)

curtisf · on June 3, 2021

The equation itself is called "implicit" because it describes a set of points without making it obvious what that set of points looks like.

For example, consider an equation like `x^(2y) + y^(x^2) = -1`. This is a well-formed equation that describes a unique subset of the (real) plane, but it's not obvious what points are in it (or even, if any points are in it) at first glance.

The "normal" rules of calculus apply to _functions_. But `x^(2y) + y^(x^2)` is not a function; neither syntactically (since it refers to both x and y) nor functionally (since the relation ignores the sign of x).

Yet, despite not being a function, you can still compute an "implicit derivative" of the "equation". The reason this ends up working, hinted at by writing `y` as `y(x)`, is that the equation is equivalent to the graph of unknown "implicit function"(s).

But even this is slightly different -- when you do `𝒟(left) = 𝒟(right)` -- the derivative is an operator that operates on _functions_, so this is actually an equation between two functions, something that doesn't normally come up in "normal" calculus.

It turns out that all of the same algorithmic steps work out to be correct in this slightly different setting, but it does involve some things that are new, if you look at why the manipulations are happening and not just the algorithmic steps.

Tainnor · on June 3, 2021

> But `x^(2y) + y^(x^2)` is not a function; neither syntactically (since it refers to both x and y) nor functionally (since the relation ignores the sign of x).

It is a function, if you interpret it as f(x) = x^(2y(x)) + y(x)^(x^2).

Of course, we don't know what kind of function y is. And if y is given implicitly, there might be multiple possible ys. But that doesn't stop us from manipulating the composite function. Specifically, if there exist multiple ys, then the manipulated equations will be valid for all of these ys.

So the only extra step is thinking about functions abstractly, instead of being given concrete functions. But otherwise it really is the "normal rules of calculus.

Tainnor · on June 3, 2021

If you do differential equations, techniques like this are essentially your bread and butter.

mananaysiempre · on June 3, 2021

I agree (differential geometry as well). Ironically, though, implicit differential equations tend to melt my brain, or rather modern approaches to them melt my brain (contact form on the first jet bundle anyone?) and classic ones look sensible at any single step but utterly magical on the whole (what the hell is a discriminant set? why do we have exceptional solutions?).

mananaysiempre · on June 3, 2021

Well duh, it’s the (slope of the) tangent to a circle, of course it’s not 2y (or 2x).

OK, not duh, of course, or else you wouldn’t have written this, but that is the desired state, so let me try to get you there.

Stop thinking about the equation F(x,y)=0 as specifying a function, start thinking about it as specifying a curve: the set { (x,y) | F(x,y)=0 } of its solutions.

Let us handle the “explicit” case first, because you know what the answer is going to be and it only remains to see how we’re going to get it. The equation is y=f(x), the curve is evidently what you would otherwise call “graph of f”. Our interest is going to be in the tangents to that curve, that is straight lines that most closely approximate the curved shape near some point.

OK, so we must have a point, let’s take one. It’s a point, it has coordinates, (x,y). But not just any coordinates—it must lie on the curve we’re studying, which in our special case means that y is equal to f(x), in particular that y is uniquely determined by x: a graph of a normal “single-valued” function intersects any vertical line only once. It thus makes sense to go from “what’s the tangent to the curve at its point (x,y)?” to “what’s the tangent line to the graph at whichever (unique!) point has horizontal coordinate x?”.

Now it should be apparent how to tie in your knowledge of single-variable differentiation: you know this tangent line has slope f'(x), and given you also know it passes through (x,f(x)), you know everything there is to know about it. (Exercise: work an equation that determines which points lie on it, for example by working out its intercept.)

On the other hand, we can also proceed in a completely general (if hand-wavey) fashion. Gather all terms on the left-hand side for consistency: y−f(x)=0. If we shift a little bit from (x,y) to (x+dx,y+dy), y−f(x) (which is presumably zero because we start on the graph) changes into y+dy−f(x+dx) ≈ y+dy−f(x)−f'(x)dx = dy−f'(x)dx. For the shifted point to lie approximately on the graph (and thus exactly on the tangent), this would have to be zero as well, thus the offset (dx,dy) lies on a line with slope f'(x) (through zero, because it’s an offset). This is the same thing!

Now suppose I wanted to work out the tangents in your example x²+y²−9=0. It’s not of the form y−[something with x]=0, but so what, we’ll proceed anyway. Shifting a little bit from point (x,y) (which we assume is on the curve) to point (x+dx,y+dy) (which we want to be there as well, at least approximately), we go from x²+y²−9, which is zero, to (x+dx)²+(y+dy)²−9 ≈ x² + 2x dx + y² + 2y dy − 9 = 2x dx + 2y dy, which thus also needs to be zero.

But saying this is the same thing as saying that x dx + y dy = 0, or that the vectors (x,y) (from zero to the chosen point) and (dx,dy) (from the chosen point to the shifted point) are orthogonal. This is as nice a description of this condition as you’re likely to get, I think: you can also write it as dy/dx = −x/y if you chose a point with x≠0 (line slope, compare with first example) or as dx/dy = −y/x if you chose a point with y≠0 (also line slope but with axes swapped) but these exceptional points are annoying.

Just for fun, let’s look a bit more closely at what just happened (no more implicit differentiation but may ground you a bit better). The original equation was x²+y²=3². Pythagoras says this means “point (x,y) is 3 units away from the origin”, that is, the “curve” I was talking about is actually just a circle, and indeed you might’ve learned in geometry that a tangent to a circle is perpendicular to its corresponding radius (no differentials required, apparently, but the geometric definition of a tangent that you used then was probably rather dodgy). The annoying exceptional points above are simply those where the tangent is vertical or horizontal (where the circle intersects the axes), so of course it didn’t have a finite slope if you preferred the wrong axis.

On the other hand, if you cut out the points with x=0 (say), the circle falls apart into two pieces which are graphs of functions, one of which is f(x) = √(9−x²) (the upper part) and the other has a minus in front (the lower part). There’s no reason you can’t compute the slope of the tangent in this “explicit” way; a standard single-variable calculation gets you [slope at (x,f(x))] = f'(x) = 1 / 2√(9−x²) × (−2x) = −x / √(9−x²) which happens to be = −x / f(x), same as −x/y you got before. It also works out for the bottom part and even for the exceptional points on the sides, but now you have to guess the pretty expression, whereas the “implicit” approach gives it to you immediately and is less painful to boot (no differentiating square roots!).

It is also better in that, if you just look at a circle, the “exceptional” points on the sides... aren’t! They’re only special because of how you chose your axes. Except going the “explicit” route forces you to choose axes. I would thus say that the implicit approach is less confusing, not more.

(The terminology and historical notation around it can be tremendously confusing, but here I’ve tried to use a consistent set of terms.)

Everything we’ve just done with one equation in two variables (getting a tangent line of dimension one specified by one linear equation) can of course be done with k equations in n variables (getting a tangent plane of dimension n−k specified by a system of k linear equations), but it’s better to first get some intuition on how higher-dimensional planes work in an abstract/geometric/non-calculational linear algebra course.