This is a 2000-page monograph on those topics in the title. I was hoping that the monograph goes deeper into optimization theory, but it seems that, while the monograph builds up to optimization, it only covers select topics in optimization. The title says it is "for computer science and machine learning." As such, deeper mathematical works in optimization theory understandably do not seem to be the intent behind the monograph.
I'm confused as to the intended audience for this. It's not a textbook. In the bits I looked at it doesn't attempt to teach the material, just banging out definitions and theorems.
Also, the table of contents doesn't have hyperlinks to the actual content, which makes navigation a massive annoyance. Similarly, there aren't hyperlinks to the references.
An impressive amount of work went into this, but why?
The book is still in progress [0]. For example, the introduction has yet to be written, which presumably would answer some of these questions about the intended audience.
The author also has books [1] on tangentially related subjects, several with a good amount of overlap in material.
I feel this is good foundation in CS and interested into more advanced ML and math. At least for me I feel excited when reading the toc since I have touched many of those concept over the years but never went deeper
The space of permissible transformations is of great interest since moving beyond linear combination alla ANNs has lead to great performance strides in deep learning?
Yep, this is an extended survey of interrelated concepts organized according to increasing complexity.
Similar to textbooks in the organizational principle, but differentiated in substance in that it assumes familiarity with the underlying topics and focuses on the relationships between the concepts rather than the concepts themselves. Good for practitioners I think.
Chapter 6, at page 163, says that direct sums can't be built directly from the cartesian product because, instead of defining ordered pairs (a, b) != (b, a), they are unordered, so E ∐ F = F ∐ E
Actually I'm confused because Wikipedia says "The direct sum of two abelian groups A A and B B is another abelian group A ⊕ B consisting of the ordered pairs (a, b) where a ∈ A and b ∈ B . To add ordered pairs, we define the sum (a, b) + (c, d) = (a + c, b + d)", and, while vector spaces have a little more structure than abelian groups, the definition should match up; and this definition seems just wrong, since it implies that direct sums are just cartesian products!
PS: in my understanding, direct sums in math are analogous to sum types in programming (like in Haskell, data X = One Float | Another Float Float, X is a direct sum between ℝ and ℝ²), or better yet, anonymous sum types like the polymorphic variants of OCaml.
From my experience, in literature you will usually found (external) direct sum of spaces defined as Cartesian product. But as comment before `Proposition 6.1` says, there is a bijection between definition from book and 'standard' definition, so it doesn't really matter as long we are consistent with choice.
(i guess this nonstandard choice was made because it behaves nicely from the point of inner sums)
It is unfortunate coincidence, but Haskell sum types don't correspond to sums of spaces. Sum type is disjoint union of types, while sum of (vector) spaces is isomorphic to Cartesian product of those spaces (and therefore corresponds to product types in Haskell).
>It is unfortunate coincidence, but Haskell sum types don't correspond to sums of spaces.
Since types on computers are discrete, it makes more sense to name them according to the fact that if |x| is the number of values that a variable of type x can take on, |x+y|=|x|+|y| and |x*y|=|x|*|y|.
> It is unfortunate coincidence, but Haskell sum types don't correspond to sums of spaces.
It depends on the perspective. Both are a form of coproduct https://en.wikipedia.org/wiki/Coproduct which forms a disjoint set for types (and sets and topological spaces) while forming a direct product for vector spaces. Confusingly, the vector space sum is a product of sets. (but you can see that it is an addition when you look at the resulting dimensions) The product operation for vector spaces is the tensor product.
It's the unique thing that makes the commutative diagram from the wikipedia article work. As in: You want V⊕W to be a vector space that contains the vector spaces V and W. You want that for any linear maps V → U and W → U, you get a linear map V⊕W → U which agrees on the inclusions. Since these maps can be arbitrary, you know that dim(V⊕W)≥dim(V)+dim(W) since you can't have any colinearities between the images of V and W in V⊕W. Plus you want that the map V⊕W → U is unique. This means that the images of V and W span V⊕W. Otherwise you have another vector that you can send to arbitrary places. This all means that V⊕W must be a vector space that has dim(V⊕W)=dim(V)+dim(W). Now all you have to do is provide a concrete candidate for V⊕W and the set of ordered pairs (v,w)∈V×W with coordinate-wise addition (+etc.) works.
> From my experience, in literature you will usually found (external) direct sum of spaces defined as Cartesian product. But as comment before Proposition 6.1 says, there is a bijection between definition from book and 'standard' definition
I don't get it, does this mean that the direct sum and the cartesian product are in some sense equivalent?
There are different sum- and product-like operations for different types of things: For sets (and topological spaces), the sum operation is the disjoint union and the product is the Cartesian product. For vector spaces, the sum operation is the Cartesian product and the product operation is the tensor product.
If you think of finite dimensional vector spaces as functions on finite sets, then these all match up. i.e. if V (vector space) is the functions on S (set) and W (vector space) is the functions on T (set), then the functions on S⊔T are V⊕W because you just list the values of the functions on S and then on T. Further, you can see that functions on S×T are V⊗W since the natural basis for V⊗W consists of vectors e_s⊗e_t that pick out elements (s,t) ∈ S×T.
And actually.. since a vector space has an underlying set, it seems to me that to do a direct sum on a vector space, you must first do a direct sum on the set (that is, do a disjoint union), and then do other things to map the rest of the structure of a vector space into the sum.
So, somehow, a disjoint union (of the underlying set) becomes a cartesian product (of the whole vector space)?
Also: the direct sum of an abelian group is also a cartesian product, right? Of any algebraic structure, not only vectors? Why are sets so special to have a different direct sum than algebraic structures built on top of sets?
(Is there somewhere to read about this to gain intuition?)
> And actually.. since a vector space has an underlying set, it seems to me that to do a direct sum on a vector space, you must first do a direct sum on the set (that is, do a disjoint union), and then do other things to map the rest of the structure of a vector space into the sum.
No.
> So, somehow, a disjoint union (of the underlying set) becomes a cartesian product (of the whole vector space)?
No. If anything it is the other way: the cartesian product of the underlying sets becomes the (equivalent of) "disjoint union" of the vector spaces.
> (Is there somewhere to read about this to gain intuition?)
I don't think you want that. But there definitely are some category theory textbooks, and introductory pdfs. E.g. Tom Leinster Basic Category Theory is a textbook, and it aims to give you intuition. https://arxiv.org/abs/1612.09375
No, there is nothing all that specific about vector spaces. This is true about general algebraic structures.
What you want to care about here is preserving structures: How would you define addition on a disjoint union? e.g. If you have V⊔W you can add two things in V and you can add two things in W, but what about something in V with something in W? Which zero vector is "the" zero vector?
If you don't have all those properties that make a vector space, then it's just a set. Then, yes, if for some reason you want to do the sum of vector spaces V and W as sets, you will get the disjoint union of these sets. That would be a pretty odd thing to do though; in that case, why are they vector spaces?
> For vector spaces, the sum operation is the Cartesian product and the product operation is the tensor product.
For vector spaces the sum operation is the Cartesian product, and the product operation is also the Cartesian product (at least for finite many vector spaces).
I believe that this upenn text doesn't use the 2 variable direct sum operation at all (which is not commutative), but abuses its notation to denote the direct sum of an indexed family. (But then he can't talk about E ∐ F which was his original goal :/)
Yes, this looks like an error. You don't want unordered pairs, which do not form a vector space, but an isomorphism of under order permutation of the terms. Roughly speaking, V⊕W is not the same as W⊕V, but if you swap the order everywhere then math won't notice.
I think it's correct, and also seems pretty deliberately. I also think that his way is inferior to the common way (I can't think of 1 single advantage of it, really).
Same here. I’ve been having fun learning a lot of math from first principles in recent years but it’s 1000 times easier when presented in notation/framework I already have internalized from code. I’ve found some of the work in type theory a solid bridge between these worlds as a sort of Rosetta Stone. Works like HoTT are challenging and require (me) to keep learning other areas along the way to fill in knowledge gaps but it provides me with a common framework to grok math concepts in line with computational intuitions without sacrificing rigor.
Remotely related - If you want to get a feel for topology optimization, the section of Topology Optimization at my university made a couple of cool little apps for exploring the concept in relation to structural mechanics: https://www.topopt.mek.dtu.dk/apps-and-software.