
I realized that derivatives are linear - jasonszhao
https://codesmoothie.com/blog/derivatives-are-linear.html
======
dan-robertson
So the point of this article is that _differentiation_ is linear. That is, the
operator D which takes f to d f/d x is linear. The author points out that one
can write this down in as a matrix with respect to a basis of polynomials,
which is nice for suitably well behaved functions and I think nice for
understanding. Other operators one might look at are integration, Fourier or
Laplace transforms, or more exotic integral transforms which are linear. One
can view a Fourier transform like a change of basis.

In another sense, _derivatives_ themselves are linear: for a function f: U ->
V of vector spaces, the derivative (at some point) is a linear map from U ->
V, (i.e. the derivative of the functions is a function Df: U -> L(U,V)) and
this extends the concept of derivative to multiple dimensions as f(x+h) = f(x)
+ (Df)(x)h + o(h).

This seems ok at first derivatives but can become unwieldy as they became
tensors higher rank.

Another question one might ask on learning that differntiation is a linear
operator is what it’s eigenvalues are. For differentiation these are functions
of the form f(x) = exp(ax). But one can construct other linear operators and
from this you get Sturm–Liouville theory which is fantastic.

One final note is that much of this multidimensional derivatives and tensor
stuff becomes a lot easier if one learns suffix notation (aka Einstein
notation, aka index notation, aka summation convention), as well as perhaps a
few identities with the kronecker delta or Levi-Civita symbol. Notation can
break down a bit with arbitrary rank tensors: $a_{i_1,...,i_k}$ becomes
unwieldy but writing $a_{pq...r}$ is ok.

------
sampo
The derivative is a linear operator, but it's not a bounded operator. That is,
for example, the vector norm of f(x) = k·sin(x/k) → 0 when k→0, but the norm
of d/dx f(x) does not. This also means that it's not continuous.

Of the mappings between vector spaces, the most well behaving are the bounded
linear operators, and the derivative doesn't belong to these. But yes, it's
linear.

Edit: Originally wrote f(x) = k·sin(k·x), but meant f(x) = k·sin(x/k).

~~~
hgibbs
Depends on what space you define the derivative on. It is of course a bounded
(and therefore continuous) operator from C^k to C^{k-1} for any positive
integer K.

Additionally, it only really makes sense to talk about bounded operators
between topological vectors spaces (as you need to make sense of what it means
to be bounded), of which the most commonly dealt with are Banach spaces.

~~~
adament
It also depends on what topology you put on your space, while the derivative
operator is defined on all of C^k, that does not make it continuous. In fact
as the GPs example shows, the topology you put on C^k and C^{k-1} must be so
that uniform convergence does not imply convergence in C^k (which differs from
many peoples intuitive notion of convergence of functions) or cos(x/k)
converges to 0 in C^{k-1} (which is just plain weird).

------
azernik
This was an example used in my linear algebra class as soon as they started
introducing the vector spaces in an abstract sense.

I think this post may still be _too_ wedded to the idea of linear spaces and
vectors being arrays of objects - specifically in insisting on decomposing
functions like sin and cos to Taylor Series. In fact, you can have a vector
space where, in addition to polynomial terms, there are also dimensions for
sin(x), tan(x), sin(x - pi), e^x, etc. The fact that you can't _enumerate_
these dimensions, or even describe the set of them until given a set of
vectors you're trying to describe, doesn't keep this from being a vector
space.

~~~
tzahola
Hm. Interesting.

I always viewed real functions as infinite-dimensional vectors in the
"canonical" basis, that is, shifted Dirac impulses. I guess it can be
transformed into your representation with a change of basis with some
handwaving.

~~~
azernik
What you're describing may not be a subspace of the space I'm describing. Mine
is definitely a subspace of yours - e.g. you can project functions into sums
of shifted Dirac impulses by representing each function dimension as a linear
combination of the "Dirac vectors" for that function's values.

~~~
tzahola
Hmmm. I think it’s easy to fix that:

Suppose that there’s a function f, that can be written as an infinite sum
(integral) of shifted Dirac impulses, but cannot be written in your
representation as a sum of those “base functions”. Then simply add a new
dimension to your representation that will correspond to f, so that f will be
represented as 1 at this new dimension, and zero everywhere else. (In other
words: add f to the base functions)

Repeat until you have covered every function.

------
chombier
Well this is the whole point of derivatives (i.e. tangent maps): to be linear
approximations of functions.

So yes, a linear approximation of a linear function is the function itself.

~~~
thaumasiotes
If a function f passes through some point (a,b), then the tangent to f through
that point is given by

    
    
        (y-b) = f'(a)·(x-a)
    

and that function is affine but usually not linear. (For the tangent curve to
be a linear function, you would need a·f'(a) = b, so that the tangent goes
through the point (0,0).)

It's not at all obvious to me that this means that the function d(f) = df/dx
is linear. It is linear, but I don't see how the tangent curve demonstrates
it.

~~~
giomasce
It is common in this context to say "linear" to also mean "affine", because
after all affine functions are not much more complicated than linear
functions.

------
wodenokoto
The first headline for this was something along the lines of "I realized that
derivatives are linear" making it clear that this is not a new discovery, but
rather a person sharing a lightbulb moment.

I feel a lot of comments are saying "well of course they are!", not realizing
that this is not about a new discovery.

------
anujsharmax
Be careful when using knowledge from this post. These are special cases, not
the general rules of differentiaton (or calculus).

For example, for multi variable calculus, the results would be very different.

Let's take the example of W.X

d/dx (W.X) = X.d/dx(W) + W.d/dx(X)

since W is not dependent on x, the first term is zero and we get the answer
the author got.

Before drawing conclusions from the post, please remember the assumptions the
author has taken.

------
ohazi
Integrals and Fourier Transforms are also linear...

------
ndh2
There's a ² missing. Should be dC/dx = sum d/dx |...|².

~~~
jasonszhao
Yep, fixed!

------
vole
>Most of the other non-polynomial functions have an equivalent Taylor
polynomial

_analytic_ functions have a Taylor _series_, but it would be incorrect to say
that "most" functions have a taylor series, and a taylor series is not a
polynomial.

------
speedplane
There seems to be a misconception that linear transformations have to look
like lines.

~~~
blattimwind
One easy way to see that "linear" does not mean "line-like" is looking at
matrix transforms: All matrix transforms are linear. And we can do a lot of
stuff easily with those, like rotating, distorting, even perspective (with
w-normalization).

~~~
speedplane
All matrix transforms are linear. And few people understand matrix transforms.
The term itself "linear" transform stems more from the contrast to nonlinear
equations, but it still "feels" like a bit of a misnomer. Perhaps it should be
called "proportional" or "relatable" transform.

Ehh, then again, it doesn't really matter, people who don't understand it
still won't with a nomenclature change.

------
fwdpropaganda
HN continues to confuse me to no end.

Mention some mathematically advanced idea: out come the pitchforks about how
you don't need that, all you need is code/market size/scalability/product
fit/investment/execution.

Mention a banality that anyone who studied algebra knows: frontpage.

~~~
viraptor
You may be overestimating the banality. I studied CS with many math courses
and some of the article goes over my head. (It would be clearer 10y ago) I
don't expect many people here actually studied math as their main goal.

~~~
fwdpropaganda
That aside I think there's something bigger here.

I think learning is really hard work, and so most people's first reaction to
hard work is to say No, and then go and construct an a posteriori rationale
for why actually they shouldn't do that hard work (it's not that useful,
you're never gonna use it, you're an expert at something else, etc).

Similar story for why asking data structures in job interviews is a bad idea
when you're an applicant (but the people who have been hired and are hiring,
do think it's good to ask)

~~~
steamer25
Yeah it seems analogous to bike shedding or Paul Graham's 'blub' paradox.
I.e., "we don't understand that and haven't [realized that we've] needed it to
date therefore it's probably not important and certainly not of interest".

I like your conservation-of-mental-energy interpretation.

~~~
fwdpropaganda
TIL about the Blub paradox
[https://en.wikipedia.org/wiki/Blub_paradox#The_Blub_paradox](https://en.wikipedia.org/wiki/Blub_paradox#The_Blub_paradox)

Thanks for that.

------
anonytrary
This is why you take linear algebra and calculus _before_ doing machine
learning.

~~~
mduerksen
On the contrary - ML is a great motivator to finally grapple the
"prerequisites".

During school I never understood what the math was for, so my unconscious
brain never saw the necessity to actually learn it. Now I _want_ to learn -
with hugely better results.

This mechanism should be utilized much more often instead of shoving seemingly
unrelated knowledge into peoples ears without letting them feel the need for
it first.

~~~
bbeonx
Right. There are a lot of comments along the lines of "Duhhh, that's the point
of calculus" here. But you know what? Screw that. People learn when people
learn, and discovering it for yourself is a lot cooler than someone telling
you about it, especially if it's related to investigating something you care
about.

I can't tell you the number of 'trivial' math facts that I have (re)discovered
because they were in the context of something I cared deeply about.

The point isn't to remember D_x is a linear operator--math isn't about
memorization. It's about understanding the context where this is a useful fact
and knowing how to figure it out.

Learn it in Calc I and you can half-heartedly reference it (...isn't
differentiation linear? I feel like I remember that from senior year of high
school...).

Figure it out on your own and you own it for life.

Post it to the internet and you get ridiculed and mocked for it so that you
wish you could forget it.

~~~
acheron
_> The point isn't to remember D_x is a linear operator--math isn't about
memorization. It's about understanding the context where this is a useful fact
and knowing how to figure it out._

Totally agree with this part.

 _> Post it to the internet and you get ridiculed and mocked for it so that
you wish you could forget it._

Telling everybody you meet about a basic fact that you just learned is mildly
cute when a 6-year-old does it.

It's great that this person learned something that was new to them, sure. But
that doesn't mean that they need to shout about it to the tens of thousands of
their closest friends who read the front page of HN.

------
zeofig
Heyyy man, what if the universe is LINEAR?

~~~
Ace17
"Classification of mathematical problems as linear and nonlinear is like
classification of the Universe as bananas and non-bananas."

