
Affine transformations - signa11
https://eli.thegreenplace.net/2018/affine-transformations/
======
dahart
Nice article, but a tiny bit weird that it calls affine transforms
categorically non-linear, and then shows an affine matrix, which is linear by
definition.

Affine transforms are linear intermediate transforms in one dimension higher
than the source & target spaces. They're intuitively (rather than
algebraically) non-linear because we move from 2d to 3d then back, or from 3d
to 4d then back.

It can be helpful to think of the translation as a (linear) skew in the added
extra dimension.

~~~
FabHK
For linear functions (in the article's second, linear algebra sense), the
image of zero must be zero, and that doesn't hold for (most) affine functions,
so makes sense.

Note that he shows the trick of making affine functions linear by tacking on
one more dimension.

~~~
dahart
> the image of zero must be zero, and that doesn't hold for (most) affine
> functions

It does hold, always, in the higher dimension. And I feel like that's the most
important thing to clearly understand about affine transforms. The entire
beauty of the augmented matrix is that you get a class of non-linear
transforms in 3d by using linear transforms in 4d. The article was nice, I'm
nit picking something that was nearly there. It'd just be nice to be one
teensy bit more explicit about what's going on here.

~~~
syndev
But affine transformations are indeed not linear. The augmentation trick
creates a new, linear transform in n+1 dimension, which is related but
different to the affine transform in question.

The categorization in the article seems correct.

~~~
dahart
Yes, exactly, you're right. The article is correct, it's just not telling
quite the whole story. Affine transforms aren't linear, and their augmented
matrices that do the same thing are linear. In practice, it's both, and it's
precisely cool because it's both.

------
noelwelsh
Really liked this post. Although I knew the "correct" definition of linearity
I'd never considered that linear regression is not in fact linear (though you
can transform it as blt says).

~~~
srean
Yeah its one of those handy hacks. I like these because they allow changing
the input data to be a substitute for a different algorithm/formulation.
Transforming the input is often easier to do under a deadline than deploy a
well tested new algorithm.

All that said, the typical regularized version of linear regression with the
'append 1' trick is no longer equivalent to the affine version one may have in
mind. The difference is the weight that corresponds to the appended dimension
would be regularized by a typical implementation of a regularized linear
regression. Unless, of course, special care is taken to remove regularization
on the appended dimension.

------
blt
w.r.t. the "affine regression" comment - you can reduce affine regression to
linear regression by adding a constant feature whose value is 1 for every data
point, or by centering the data on its mean. So it's a fairly mild misnomer :)

edit: of course, I agree that we should not say "linear function" when we mean
"affine function" in general.

~~~
setzer22
That reduction is in fact how I was taught linear regression, now I understand
why!

------
Leszek
I really love the translation "hack". The 2D plane is mapped onto the z=1
plane, and then we apply an x-y shear in 3D space. This keeps the 3D origin at
the origin, but shifts around the z=1 plane without distorting it, so that
when we project back to 2D it looks like a translation.

------
thanatropism
So.

In geometry an "affine vector" (c,v) is said to be the vector v tangent to c
-- no longer a vector, a "tangent vector". In your neural network this would
be the weights tangent to each bias. If you keep c fixed

    
    
        Tc[V] = {(c,v) | c fixed, forall v in V} 
    

is clearly a vector space.

Here's the fun part: if you tried to consider

    
    
        T[V] = {(c,v) | forall c in R, v in V}
    

this is no longer a vector space. But it's a fun structure: at each c, Tc[V]
is sorta like (homeomorphic) to the Cartesian product

    
    
        {c} x V
    

this is a fiber bundle. For example: a torus is near each c the cartesian
product of a point and a circle; a "circle squared" is a donut.

Now, we don't need to restrict ourselves to c+v _x -- we can consider
sigmoid(c+v_ x), which will have tangents (derivatives) to b at each w. For
fixed c we still have a vector space with the derivatives dsigmoid/dv, and
varying c you have a sigmoid fiber bundle.

------
SomewhatLikely
Going on the thought that interaction aids learning, I recently put together a
very crude interactive jsfiddle to help visualize 2-d matrix transforms here:
[https://jsfiddle.net/holoopooj/31yt1ytp/6/](https://jsfiddle.net/holoopooj/31yt1ytp/6/)
You click and drag in the left graph area and see the transformed vector on
the right. The matrix values can be changed at the top. I've only tested in
chrome on desktop. You may have to vertically expand the lower right pane to
see all of the drawing area.

------
kurthr
A nice discussion of the extension of the linear transformation matrix. The
classic example and use case of this is 3D transformations that include
Translation (as well as the linear shear, scale, rotate, and reflect).

[https://en.wikipedia.org/wiki/Transformation_matrix](https://en.wikipedia.org/wiki/Transformation_matrix)

I vaguely remember that the Minkowski space can also be written as an affine,
but not particularly how or why, since it should be translation independent?
It seemed that the raising/lowering tensor notation is always used.

------
killjoywashere
Can someone unpack the definition of an affine subspace for me, it's been a
while since I took point set topology:

A subset U ⊂ V of a vector space V is an _affine space_ if there exists a u ∈
U such that U - u = {x - u | x ∈ U} is a vector subspace of V.

I'm unpacking this to read

A subset U of a vector space of V is an _affine space_ if there exists an
element u such that U - u, which is exactly equal to x - u for all x in U, is
a vector subspace of V.

If I'm reading that right, the right side of the equation is a paranthetic
expression, so is it necessary?

------
Chris2048
Excuse my ignorance, but in calculus I've seen some "affine solution" \- does
this just refer to its form? What is the relevance?

~~~
thanatropism
So. In calculus both the derivative and integral are linear operators:

    
    
       D[a*f(x)] = a*Df[x]
       D[f(x)+g(x)] = Df(x) + Dg(x)
    

and the indefinite (without limits) integral is an "antiderivative", right?
I.e. y(x) = I(f(x)) is the solution to

    
    
       D[y(x)] = f(x)
    

Here's the problem: there are multiple solutions to the case where f(x) = 0.
Indeed, for any constant y(x) you have

    
    
       D[y(x)] = 0
    

This is why you're drilled in engineering classes to always add a + C to your
indefinite integral. The solution to an indefinite integral is always a class
of functions -- the part without +C continues to be linear, but you have to
tag that along.

This is also why Initial Value Problems like

    
    
        D[f(x)] = g(x)
    

always need an initial condition.

------
SubiculumCode
I'm tempted to go through this carefully given that affine transformations
tagged with cost functions are used in MR image registration/normalization all
the time.

~~~
thanatropism
Affine transformations are also cost functions used in taxis and nightclubs.
You pay for an entrance and then a fee per drink.

