I wish I could upvote this twice. This is the best basic introduction to concepts in functional analysis that I've seen. Another great overview that goes deeper into the math is [1].
Another fantastic application that the website doesn't mention is the composition / Koopman operator. In control theory (e.g. autonomous drones, cars, robot arms, etc.), most real-world systems are described by nonlinear dynamics which are very difficult to work with (e.g. safety/stability guarantees, optimizing over forward horizons using NMPC, state estimation, etc.) The Koopman operator however gives a globally relevant linear approximation of non-linear systems. In other words, you can treat a nonlinear system as a linear system with fairly high accuracy. This greatly simplifies control and estimation from a computational perspective. You can also learn these linearizations from data. Steve Brunton has some good materials on Koopman theory [2][3], and there are some great applications to control of systems such as soft robots [4].
My first thought was about its striking conceptual similarity to Fourier transforms. Truly fascinating. Going to explore this a bit more. Thanks for sharing.
I am so thankful for Steve Brunton's content. Had I been able to access his content a decade ago during my M.Sc, I would have probably pursued a PhD out of the passion and quality he brings to this domain - instead I just felt done with academia, strugglign to find grants, reading yet again terse books all alone, and just moved on.
Solid YouTube educators are creating immense future opportunities and we'll all benefit from that. Control theory connects the dots between all sorts of things and can be a joy for those of us who love seeing patterns and structures everywhere (iirc Steve has a recent video about control theory for social models also).
The realization that functions can be treated as elements in an abstract vector space (with infinitely many dimensions) is a turning point in the history of mathematics that led to the emergence of the sub-field known as functional analysis.
The significance of this paradigm shift is that it allowed mathematicians to apply some of the geometric intuition developed from the study of finite-dimensional spaces (such as the 3D Euclidean space) to difficult questions involving functions, such as the existence of solutions to certain differential equations.
The history of this change of perspective is absolutely fascinating and can be traced back to the end of the 19th century and beginning of the 20th century. At the time, work on axiomatic foundations of mathematics was driving a systematization of the study of mathematical objects by capturing their structure with a concise list of axioms. This is for example how the concept of an abstract vector space was born, encompassing not only Euclidean spaces but also infinite-dimensional spaces of functions.
An early reference already demonstrating this change of perspective, albeit in a primitive form, is a memoir by Vito Volterra from 1889 [1]. The PhD thesis of Maurice Fréchet from 1906 [2] is arguably the work that was most influential in crystalizing the new paradigm and presenting it in a modern form that served as a key reference for the first half of the 19th century. Of course, these are only two among a multitude of works around that time. Looking at later developments in the 19th century, it is hard not to also mention the book by Stefan Banach from 1932 [3].
My friend, you don't even need it to be in vector space for functional analysis. Truly what is needed is just an inner product. I will grant you the inner product must be linear and hence in a vector space.
I dont understand the point of this comment. You obviously need it to be a vector space before you can define an inner product. Inner product spaces are very special examples of vector spaces.
I agree. The GP comment contains some inaccuracies: most of the spaces of functions considered in functional analysis do not have an inner product defined on them, but are still vector spaces. The existence of an inner product presupposes a vector space structure, but the converse is not true…
Perhaps the most famous example is provided by the Lp spaces [1] consisting of functions whose pth power is absolutely integrable. For p≥1, these spaces are Banach spaces (complete normed spaces) but it is only when p=2 that the norm is associated with an inner product.
Not saying that the vector space bit isn't neat, but it's called functional analysis because you can take limits of various forms and define (semi-) continuity, have completions of spaces, and all that has nice properties.
So to me, a crucial thing is that these vector spaces are indeed topological.
I always liked this viewpoint a lot. I'm enjoying with abandon some dusty lectures that Vito Volterra gave in Madrid on differential and integrodifferential equations, while helping also to create Functional Analysis (a Functional being the analogue of a dual vector). He is constantly exploiting this analogy method from finite variable constructions to infinite, also uncountable variables. Even up to showing some embarrassment of being too repetitive with the idea! People in teaching should join and take a peek.
Givental uses this viewpoint too in his differential equations class notes. It may soothe students who notice a disjunction between linear independence of functions and that of vectors.
I have never seen these index functions used as a transfinite basis for a vector space. And it seems like the function is not a limit point of finite sequences of basis functions, but some weird transfinite sum with mostly zero entries? Clearly there is no Fourier transformation possible on all functions? I think diagonalization methods would be easy to disprove any useful result.
Even Hilbert spaces are usually just indexed by the ints. And such a basis gives you zero continuity or differential conditions. All the functional analysis I have seen uses some continuity conditions and has some countable basis. Other than that, it is a very useful perspective on functions, and kind of the required start to understanding quantum mechanics formalism.
This is my major gripe with the article. While useful for intuition, that "..." after the sum is mathematically meaningless. This is also a common issue with quantum mechanics as taught at an introductory level. But it seems the article, similarly to intro QM courses, is more about motivating functional analysis concepts, which is useful as exposition even if not rigorous.
The article is some summary of a book with chapters. At some point they limit the space to the subspace of functions periodic over (b-a) and change the basis (with proof) from dirac delta to sines of frequency 2pi*k/(b-a) [with k in N]. In this subspace all functions have Fourier transformations.
For something to be a basis each element needs to be the finite linear combination. Of course there are notions for different kinds of choices of basis, where you have countable linear combinations.
The article (maybe for good reason) completely ignores the (usually quite hard) choice in functional analysis which vectorspace to use.
And the one outlined here, where functions are defined pointwise, is amost always the most useless one to consider. Although I get that it was done to teach the general outline of the subject, which is quite valuable on its own.
>Clearly there is no Fourier transformation possible on all functions?
You don't even get a useful metric for such a space.
This touches on the actual definition of a function, which is a mapping between sets where every element of the first set maps to exactly one element of the second set. The problem with using vectors is that vectors aren't as general as sets, so there's functions that can't be expressed using vectors. For example, vectors can't be used to handle undefined values or non-numeric elements.
Functions can’t be used to handle undefined values either. A function is a mapping, f, from one set to another such that each element of the originating set gets mapped to one and only one element of the target set. This, by definition, requires that all values in the originating set get mapped to something in the target set. So there are no undefined values as such.
Function spaces can’t always be viewed as vector spaces because there may be no concept of addition of functions or a concept of scalar multiplication on the functions that behaves well with whatever notion of addition the functions satisfy.
That's not the definition of a function, what you're describing would be called a bijective function. A simple function that is not bijective and maps to two distinct values would be sqrt(x)
You may be confused because a bijective function must be one-to-one, while in general, a function may be many-to-one. However, the standard definition of a function excludes one-to-many mappings.
> A function f from a set X to a set Y, denoted f: X -> Y, is a relation from X, the domain, to Y, the co-domain, that satisfies two properties: (1) every element in X is related to some element in Y, and (2) no element in X is related to more than one element in Y.
Susanna S. Epp. 2010. Discrete Mathematics with Applications (4th. ed.). p.384
Are you defining sqrt(x²) to be +x and -x? If you are, then sqrt(x) is indeed not a function.
You could make it a function by defining sqrt : N -> Z×Z, so sqrt(4) = (-2,2), and we're back to the property that any value in the domain (the natural numbers) is mapped to a single value in the co-domain (a unique pair of integers). Of course, this is not the traditional sqrt function that people normally use, since it doesn't obey the property (sqrt(x))² = x (since obviously taking the square of a pair of integers will not return an integer, for usual definitions of squaring).
Note that a bijective function has both this property and the opposite, that for any value in the co-domain there is also a unique value in the domain that maps to it. Sqrt, both in the traditional definition where it only returns positive roots, and in this definition where it returns a pair of roots, is actually also bijective. Squaring is a non-bijective function, though.
The codomain should not be Z×Z — firstly, the square root is usually not an integer, but more importantly, there is not a canonical ordering of the pairs you get, especially if we allow negative/complex inputs. The map we really want is
sqrt: C -> (C×C)/~,
where C is the complex numbers and ~ is the equivalence relation generated by (a,b) ~ (b,a) for all a,b in C.
You're right, I should have said sqrt : N -> R×R. I don't think the order is a problem, we can define the function such that sqrt(x)={a | a²=x}. Sets are inherently unordered, {-2,2} = {2,-2}.
Of course, this is not a surjective function, but it is a function nevertheless.
The issue with sets is that they don’t capture the notion of multiplicity. One would like to say sqrt(0) = {0,0}, but {0,0} is just equal to {0} as a set. The CxC/~ formalism captures the possibility of multiplicity as well as the fact that the output always has size 2.
As an aside, the notation R x R means ordered pairs of reals. So for your definition, you would instead want 2^R as the codomain (i.e. the set of subsets of R).
As others comments have pointed out - you may want to brush up on those definitions. Correct semantics are crucial. They're a form of compression with no/low error correction.
Any claims of injective or surjective properties (and by extension bijection) are moot without the domain and co-domain being specified.
To map math ↦ dev semantics: functions need to be pure, deterministic and declared with strong types. The properties of square root in particular will vary wildly based on what those types are.
That is the definition of a function, but it's purely a formalism. In reality, and in theoretical math, we are frequently interested in functions that generate multiple values, and it's very easy to represent those in terms of the formal definition:
1. When we want to consider both positive and negative square roots, we can just say that instead of the function sqrt(x): ℝ ⟶ ℝ which always gives a nonnegative real number, we are using a function full_sqrt(x): ℝ ⟶ ℝ² which gives an ordered pair of real numbers.
2. We mostly define arcsine as yielding a single value between -pi/2 and pi/2, but it's just as easy to define the equivalence relation "x ~ y when sin(x) = sin(y)" and then say that the unique value given by the arcsine function is one of the equivalence classes which that relation induces over the real numbers -- or that it is the subset of the reals included in such an equivalence class.
3. If we are concerned about graphing a multi-valued function, we can do that too; for your parabola, we can define the three-dimensional function "z(x,y): ℝ² ⟶ {0, 1} = 1 when x = y² and 0 otherwise" and the graph of this function is the parabola you want.
So you're wrong about the definition of a function, except that you're right in spirit; the way we normally think about functions, as opposed to the way we define them, definitely allows them to yield multiple values for one input.
You're catastrophically wrong about bijective functions; those are an important concept and your definition is not related to the actual one. A bijective function is one for which all values in the target set are reached by exactly one, no more and no less, value in the source set.
We don’t allow multiple values for a given input of a function. Ever. This is never done. f(x) = x^2 is not invertible as a real valued function of a real variable. On the restricted domain of [0, infinity) it is invertible and that inverse is sqrt(x). If you want to talk about negative roots you use the function -1*sqrt(x). You can construct a new function as you did to encapsulate this fact but one never has a function with multiple values for a given input. I don’t know of any mathematician that thinks of functions as allowing them to have multiple values for a given input. You will not find this in the literature.
Don't be ridiculous. It's a common need, it's easily satisfied, and so it is commonly done. Compare the concept of the "inverse element" of a value given by an invertible function with the "inverse image" of a value given by a non-invertible function. Of course the function that gives you inverse images is nothing more than an inverse function that is allowed to give multiple values. Why do we do that? Because our need to invert functions has nothing to do with whether those functions are one-to-one, and because in most contexts it isn't actually important that functions yield a single value. It's a formalism.
There is no high principle that says "collecting multiple values and giving them a collective name that implies they are a single value is metaphysically superior to collecting multiple values and admitting that they are multiple values".
In mathematics (and in programming) it is essential to use non-ambiguous names, i.e. names whose meaning is certain when you see them independently of any context, without having to add an explanation of what is meant by them.
The concept of a special kind of mapping between two sets, where to any element of the first set corresponds a unique element of the second set is very important and it needs a special name.
The choice of the names is arbitrary and one could use for instance the term "univocal function" to mean a mapping like described above and "function" for any mapping between two sets.
Nevertheless, by far the most widespread convention in mathematics is to use the term "relation" for any mapping between two sets and the term "function" only for those relations where to any element of the first set corresponds a unique element of the second set.
There exists no reason for not following this convention, from which it also results that an invertible function is a function where for any element of the second set corresponds a unique element of the first set, so this convention also provides a simple meaningful name for another important concept that needs a special name.
The functions in programming languages that return multiple values, unless they return partially or totally random values (in which case they are not functions of only the input arguments, but also of an internal state or of time), are just functions that return values which belong to the set that is the Cartesian product of the types of the individual values. So the name "function" is usually correct in the mathematical sense even for such functions. If they had not been functions, the programmer would not have known what values to return, when writing them.
Moreover, I disagree that in most contexts when you want to invert a function "it isn't actually important that functions yield a single value".
In the overwhelming majority of the cases that appear in engineering and science when you want to solve equations a.k.a. to invert functions, you want to obtain a unique solution that can be directly implemented in practice. Whenever you cannot obtain a unique solution, you need to add extra criteria that allow the selection of a unique solution that is usable. Those extra criteria are actually equivalent with transforming the original non-invertible function into a function that can be inverted.
There is no high principle other than the agreed upon definition of what a function is.
Of course the function that gives you inverse images is nothing more than an inverse function that is allowed to give multiple values.
You can’t use the word function for something that isn’t a function. It makes no sense to say inverse function that is allowed to give multiple values. Hence the need for terms like pre-image of a set.
In all contexts it is important that a function yield a single value for a given input because that is the definition of the word.
Seems like a useful formalism though, at least coming from a programming background.
Maybe some kind of "it's all of them at once" mental model is useful in math, but object oriented programming gives you a hierarchal mindset, everything is in a container of some sort, so multiple return values having a container like a set makes perfect sense and just a bunch of loose values isn't a very familiar concept.
But perhaps if you're actually doing math, things are different?
It's useful in many places and less useful in others.
Over in the programming world, Common Lisp allows you to return multiple values without wrapping them in a container. If you want the primary value, you just treat the function as if it returned one value. If you want additional return values, you use multiple-value-bind.
(I believe the general idea is along these lines: your function does a certain amount of computation, producing a set of related values. Most of the time, the caller will be interested in just one of those, which you return as the primary value. But some of the time, the caller will be interested in more than just that one value, and you have to compute the secondary values whether the caller wants them or not, so you return those too.)
For a different example of formalism in math, it is conventional to say that there are two boolean logical operators, negation and implication. You can still write about conjunction and disjunction, but everyone understands that when you write "p and q are both true", what you really meant to write was "it isn't the case that the truth of p implies the falsity of q". The point of the formalism is that you can do your proofs by considering negation and implication and then ignoring everything else. (It isn't conventional to say that there's just one logical operator NAND. You might think that would be even better, but the effort saved by only considering how one operation works ends up being less than the extra effort involved in doing proofs about NAND.)
The situation with functions is more or less the same thing; at many points we want to rely on the assumption that when a = b, f(a) = f(b). So we define functions that way, and functions that give multiple values have to be treated as giving a single composite value instead. But in a context where you have some value a and what you want to know is "what is f(a)?", the fact that the answer may consist of multiple values won't bother you.
Now, I have never seen someone take the position that boolean conjunction and disjunction don't exist as concepts just because that is how logic is normally defined, but the analogous position seems to be more popular for multiple-valued functions.
I can think of two simple, practical answers to your query:
The expressive power of single value functions is very powerful and the constraint is not necessarily restrictive but may even drive a stronger analysis. (Where does this function have multiple returns? Is it for the whole domain? Etc)
By contrast the expressive power of negation and implication are relatively low given very intuitive and well defined alternatives exist.
Second, there already exist good enough paradigms for dealing with multi-valued functions. Splitting the function up into multiple functions, mapping to an ordered pair, etc.
Defining a function the traditional way is more than just notational convenience. The single value constraint allows for many simplifying assumptions, enough that is worth to pay the cost when dealing with relations that you want to talk about in functional ways
A multivalued function is not a function. Hence the need to call it something else. In complex analysis you have branch cuts and things like that. We don’t call something a function that doesn’t have a unique output for a given input.
EDIT: Complex analysis was developed well before the formal definition of functions, relations, set theory. As such there are legacy terms in use. Mathematics has a lot of abuse of notation. For instance x=2 can mean assignment or an equation to solve. It depends on context.
In each branch of a multi-function the mapping is a function and, as a convenience, a single term “multi-function” is used to encapsulate all this. It cuts out the verbosity. A multi-function is not a function though.
My pedantry in this comes from the blog post. It was poorly written and left out some, from a mathematical point of view, important details. For instance the author uses a Schauder basis and has an uncountable sum at one point whilst ignoring the need distinguish between a Hamel basis and Schauder basis or the need to use a linear ordering in the reals to properly make sense of the uncountable sum.
A set-valued function is a perfectly valid kind of function, and a multivalued function in complex analysis is just a set-valued function with certain continuity and other restrictions.
If set up that way then yes it meets the definition of a function. I’m not a complex analyst but I imagine they think of a function from C to C on a given branch rather than multi-function being a function from C to the countable product of C (or however one wants to formally define the set value). I think this because they tend to care about analytic properties and saying function from C to countable product of C includes far too many functions. Saying multifunction will immediately make one think of branch cuts.
They don’t think of Log(z) as a set of values. At least I don’t. I assume one means the principal branch when writing Log(z) unless specifically stated otherwise. Maybe in their mind they do think of an infinite set when seeing Log(z). I’m skeptical of this possibility though.
From the amount of complex analysis I've seen, I think I agree with you.
The insight behind multi-valued functions seems to be that for certain types of functions that seem unambiguous in the real case (roots, logarithms, ...) there's no clear candidate for which one is "the" logarithm function etc. So theorems look more like "let l be a logarithm function".
If you actually want to do complex analysis, it would not be very useful to have a vector valued function (even if that's technically possible) for logarithm etc. because most of the theory is developed for (in particular, differentiable) functions C -> C. So you'd rather just pick a branch.
The point GP was making is that you can also define the function sqrt(x) as a function from reals to pairs of reals, and say that the sqrt(4) is the pair (-2, 2). With a proper definition of squaring a pair of reals, you could even have this be the inverse of the square function (we could define square((x, y)) = x²-xy+y², so square((-2,2))=4-4+4=4).
Of course, these are still functions which take exactlt one element in their input set (the reals for sqrt, or pairs of reals for square) and return for each a single element in their output sets (the pairs of reals for sqrt, or the reals for square).
> That is the definition of a function, but it's purely a formalism...
Isn’t the function definition meant to be interpreted as “maps consistently to exactly one element of the codomain”? So an ordered pair of R^2 is still one element of R^2
It seems the parent has just mixed up the domain and codomain, because under that assumption he would be right about both the definition and bijectivity
I can't tell what you mean by your emphasis on the word "consistently". Functions are not stochastic; f(y) is f(y) regardless of how many times you ask what f(y) is.
The formal definition guarantees that whenever a = b, f(a) = f(b). You use it when you need that guarantee.
An ordered pair drawn from ℝ² is in some sense a single value. In another sense, it is two values. Which way you want to think about it depends on what you're going to do with it; if you're thinking about square roots of real numbers, it will be more useful to think of it as two values.
> It seems the parent has just mixed up the domain and codomain, because under that assumption he would be right about both the definition and bijectivity
He still wouldn't be right about bijectivity; you also need the assumption that a function is defined over its entire domain.
>is in some sense a single value. In another sense, it is two values.
What I tryed to say is that the sense in which “exactly one element” is used in the definition of function is inclusive of codomain being R^n, so it confused me why you would provide a function that has a codomain of R^2 as something that suggests deviation from the formalism. It just seemed misleading to phrase it that way
I thought consistently would convey the idea I had, nevermind if it doesn’t
My bad about bijectivity, I see it now, you’re right
I feel like you're being technically correct but in a kind of useless way.
Yes, nothing stops you from defining the square root function such that it returns an element of R^2. It's perfectly legal mathematics, functions can map arbitrary domains to arbitrary codomains after all.
But I've never seen it used anywhere. Maybe there is an area where such a definition makes sense... but regular real (or complex) analysis doesn't really normally do such a kind of thing.
The problem is that you want your typical functions like square root (and logarithms, which in the complex case also aren't unique) to be functions R -> R (or C -> C). Otherwise you can't compose them and then everything is kinda weird. In particular, you don't get the vector space structure that the article is all about.
When working in a structure, one often doesn't want to leave that structure and go to another one, but keep the same structure (or at least one that is related in an interesting way, such as a quotient structure). That just leads to more interesting mathematics.
Hi thaumasiotes,
Mathematician here, and somewhow interested in this discussion. I am curious of your definition of full_sqrt, what would be the value of full_sqrt(-3) ?
My pedant nature strikes but I think your definition of bijective can also be confusing. Did you mean for each value in the target set rather than all values in the target set?
A bijective function f: A ⟶ B satisfies ∀b∈B ∃a∈A ∀x∈A (f(a) = b ∧ (f(x) = b ⟶ x = a)).
(In fact, this definition is incomplete: it must also be the case that the function is defined over the entire set A; each value in A maps to a value in B.† This is a common assumption to make about functions, but since I violate it earlier by talking about sqrt(x): ℝ ⟶ ℝ, I really should make it explicit. The definition I gave is motivated by the synonymous term "invertible function".)
"All values in the target set have property X" and "Each value in the target set has property X" are exactly equivalent English-language statements.
† "f: A ⟶ B satisfies ∀b∈B ∃a∈A ∀x∈A (f(a) = b ∧ f(x) ∈ B ∧ (f(x) = b ⟶ x = a))", but while that technically works, it feels kludgy.
Right. Specifically, the codomain needs to be an abelian group. And even that is not sufficient as one needs also an action of the field of scalars on that codomain with the right properties.
This looks absolutely fantastic and I want to go over it in more detail later. You'll go over most of this in a typical physics degree. But like a good movie or book, the concepts are interesting enough to go over more than once.
I will say that as a programmer some of these techniques look a lot like hacks. You start off with a perfectly reasonable integer index. And then you realize you can generalize the index, effectively cramming more information into the index than was originally intended. The really shocking thing is that these stupid, abusive ideas seem to always lead to something really insightful and useful down the road. It's a little bit magical.
Plug for the Funsor library, written by Eli Bingham and me for use in the Pyro and NumPyro probabilistic programming languages. We tried to take the "functions are tensors" perspective and make a numpy-like library for functions, aimed mostly at the log-density functions of probability distributions.
I think the article has it backwards and provides bad intuition. It is not input that makes functions form a vector space, it is the output. Functions from any set X to a field F can form a vector space, even if X is unordered.
This is a fascinating take as far as I can follow it, which unfortunately is not that far. But does any of this formal logic help with deriving a function that describes a vector? Because it seems like the greatest inefficiencies and bottlenecks in big data analysis e.g. training networks still boils down to how to find functions that approximate output comparable to expected vectors, whether that's done by symbolic regression or layers of transformations. It would be "magic" if you could operate only on the vectors-as-functions without needing to distill or compress the relationships between their inputs and outputs somehow.
Sure. You can Fourier transform a vector, drop some number of the least contributing terms (frequencies) and store only the remaining coefficients. Which is essentially the basic idea behind MP3 and JPEG compression. You're trading space for time, of course, as now to get an approximation of the original vector you have to apply reverse Fourier transform first.
You may be thinking of a vector as a concrete collection of values, like a vector in R^3: [x y z]. This piece is about abstract vector spaces, their properties (vector addition, scalar multiplication, etc.) and specifically how functions meet the definition, giving you vector spaces of functions (function spaces).
So the idea is that if you two functions, f and g, and a scalar b, then you can do stuff like:
f + g = g + f
b(f + g) = bf + bg
The existence of (-f) so you have:
f + (-f) = 0
Where 0 is the zero function (which also must exist for function spaces).
I was reading here earlier today about the naming of the constant for light as c, and I had a question which I was too embarrassed to ask. It is this: In e=mc^2, what are the units, and if the units aren't defined and it's just a relationship, then why specify c^2? What's the point of squaring a constant, since it's just another constant?
Not that I understand a damn thing about that equation. But the idea that two functions can be thought of operating additively on a vector - or space - seems... trivial.
This is where I fear I am too stupid to understand the value of this.
It's not tautological — to pick one of the properties, you can conceive of spaces of functions where there is no -f to a function f. For example, consider the space of all positive-valued real functions.
Consider otherwise functions which take colours and output letters or the alphabet. Letters of the alphabet can't be added or subtracted, so there's no vector space structure on that.
On the flip side, vector spaces can be defined not just on real numbers, but complex numbers as well, or even other sets — specifically, any "field", that is, any set with addition, subtraction, multiplication, and division defined on it in a self-consistent way. There are even finite fields; vector spaces over three are relevant in cryptography.
If these rules seem silly and abstract to you (shouldn't a function just be a function?!), well, that's mathematics! By elucidating the very specific conditions under which results hold, and abstracting away all the irrelevant details, you end up with results of incredible generality.
But the idea that two functions can be thought of operating additively on a vector - or space - seems... trivial.
No, the functions are not operating on a vector, the functions themselves are vectors, which means you can do things like define a linear transformation L and then you have things like this:
L(af + bg) = aL(f) + bL(g)
You may also define a norm on a function space (such as an L^P space [1]), ||f||_p, which maps functions to non-negative real numbers and obeys the properties of norms that we expect, such as the triangle inequality:
Thanks. I'm trying to understand this. So take a function that returns a random shuffled deck of cards with 52! possible real number vector outputs. What is the generalized insight from treating this function itself as a vector space... is there a shortcut to monte carlo-ing a million random shuffles if you can "divide" that function by one which produces a straight flush?
> So take a function that returns a random shuffled deck of cards with 52! possible real number vector outputs.
that is not a function in the (mathematical) sense that the article is talking about. a function is a mapping from a set of inputs to a set of outputs, and the same input will always map to the same output. (in programming terms it's what you would call a "pure function")
not quite; in the most general sense the arguments to and result of a function (the 'domain' and 'range' in mathematical jargon) need not be numbers but any set of mathematical objects. the heavy work went into proving properties of these generalised functions that were universally true, and showing that they were isomorphic to structures built up in other branches of mathematics.
as a side note, one very important technique/idea in mathematics (in general, not just in the area of functional analysis) is describing something in terms of a set of properties that is both as general as possible and as minimal as possible. for instance numbers can be added, subtracted, multiplied and divided, with "obvious" real-world interpretations. mathematicians then asked themselves what properties exactly the numbers had to possess in order for those operations to be defined, and then they proceeded to find other classes of mathematical objects that also had those properties, and suddenly we were able to "add" and "multiply" things that had no obvious physical interpretation for those operations. but since their structure was mapped to the structure of the numbers, those operations could be mechanically defined over them, and you had all sorts of mathematical tools at your disposal.
here a similar thing was done with functions. there had been a lot of work put into studying the operations you could do on "vector spaces", a mathematical structure that generalised the notion of a vector as a collection of numbers. then mathematicians noticed that if you took the minimal collection of properties something needed to have in order to be a vector space, functions satisfied all those properties. and voila - everything that you could prove about vector spaces (and again, it was a whole lot) was suddenly applicable to functions as well.
(why some of this seems a bit tautological is that it also follows the properties of the real numbers, and even non-mathematicians have had a lot of intuition built up about how numbers behave. but it is by no means guaranteed that every mathematical construct will have these same properties.)
yes it was in total good faith. soooo... the metric system just happens to perfectly conform to meters per second squared times kilograms equaling joules? That seems... mind-blowing since it was invented before kilograms and joules would have been interchangeable.. ??.
The kinetic energy of a body, in purely Newtonian mechanics, is Ec = (m × v²) / 2. Since energy is measured in J, mass in kg, and speed in m/s, it follows nicely that J = kg × m² / s². Or, we can also take the potential energy of a body at some height h above the Earth: Ep = m × h × g, where g is the gravitational acceleration, ~9.8m/s², and we get the same units.
It's really not in any way surprising - this is basically the definition of the Joule. The link between units of kinetic energy and units of thermal energy is actually more surprising.
The surprising thing about E=mc² is that it gives a definition of energy for a completely stationary body outside any external field.
One way to look at it is actually that this is simply the kinetic energy of the body, and that all "stationary" bodies are in fact moving with speed c on the t coordinate in 4D space-time ("a body which is not moving in space at all is moving with speed c towards the future"). [Note that of course speeds are all relative to some reference frame.]
If you express it in a different unit system all E, m, and c will take different numerical values, but the relationship will still be true.
So there’s nothing special about the metric system. When we want to discuss this kind of relationship without reference to human convention we talk about a quantity’s dimension (not geometric like 3D). A Meter and a foot both have dimensions of length. c has units of length/time. 1 kilogram and 1 gram both have units of mass. And so on.
Yep. This was going to be the next phase: there isn't anything special about the SI system, but it is consistent. The poster didn't seem prepared for "or any other velocity units you want".
For the original poster, you could equally use c in miles per hour, mass in "pounds", and get whatever that produces for energy (I'm sure there's a name for it).
This is also why Imperial units produce such amusing things as Foot-Pounds (for torque). The math all works out in the end, but you get some amusing numbers along the way.
Kilograms and joules have been interchangable (well, they've had a defined relationship) for a long time though, you can calculate work in a completely Newtonian context and see that the energy requires is linearly related to the mass and quadratically to velocity (like with a spring).
That the direct conversion from mass to energy follows the same shape isn't really surprising, it sort of has to.
That said, the joule was only explicitly defined as kgm/s^2 in 1946 (or 1935), after Einstein and nuclear physics.
If the data set is large enough then there is no way to represent it as a functional relationship between finite dimensional vector spaces. In fact, this problem already is visible in existing large neural networks because they can only work with data that conforms to the dimensional constraints of the input space. It's why image transformers trained on NxM images don't work on any other grid size.
ah. I mean, the dataset can be infinitely large and still be
covered perfectly by a function, if it was generated from a function. That's why absurd functions can be found that overfit for some subset of basic real world results. (Ask anyone who built a funky physics system for a videogame in the 90s). I think what's more interesting is the question of what that essential function is for a stop sign or a pedestrian, as opposed to the function for finding it in a 512 grid or something.
I suspect the idea is that there can be many functions that fit a particular vector, because there are more functions than vectors, but I nay be wrong.
I haven't read the article yet, but I've known that functions are (infinite) vectors for some years.
However, there's something that has been bothering me: most of my understanding of linear algebra comes from 2D and 3D spaces, and then in different context of machine learning, datasets that have from tens to even millions of dimensions.
In the former, geometric context, the connection between the dimensions are clear: they are orthogonal, but conceptually exactly similar. They are just a 90 degree rotation away from each other.
On the other hand, in ML datasets some dimensions are conceptually very similar, and some are totally different. Some are correlated (nearby pixels of an image), some are not, but represent the same unit of quality, and some represent totally different, unrelated things. And as we go toward the mid-layer representations, it becomes very unclear and fuzzy what they represent.
In the case of functions, there's usually a clear connection between the dimensions: they are of the same unit (the domain and the image (the outputs and inputs) of the function are sets, and those tend to be made of similar-ish – or same type of – things, at least in well-behaved math). And there's often a similarity metric between the elements of the sets.
The 2D/3D linear algebra that I know doesn't bother with the "connectedness" of the input dimensions; it only cares the connections from the inputs to the outputs. But surely there is a lot of interesting math that is concerned with the connectedness of the input and output spaces themselves, in context of there still existing a mapping between the input and output. What is that field of math called? What are the interesting results, theorems and such? I love learning more so I'm kind of just looking for some pointers/keywords.
This is bothering me a little as well. Perhaps someone more knowledgeable could guide us into the light.
As a concrete example, I am toying with a reinforcement learning to solve the game of Yahtzee. When trying to formulate the state space, I have to lump in together the states of the dice (1..6), the current state of the score card (13 boolean values, and 13 integer scores in the range 0..50), and the current turn (first attempt, second attempt). This could be formulated as a 5+13+13+2 = 34 dimensional vector, but I might as well use one-hot encoding for some aspects, and even go all the way up to 56+13+56+13+30+30+1+1+1+1+30 = 180 dimensions. Which formulation would be the most "natural"?
Then again, what may seem (un)natural to a human, might be the other way around for a neural network.
(And, yes, I am aware of papers out there describing how to solve Yahtzee with reinforcement learning, but I'm still at a stage where avoiding those seems to increase the amount of fun.)
In general, you want to do one-hot encoding whenever your number doesn’t really represent a quantity.
For example, for the current state of a rolled die in Yahtzee, ‘2’ mostly (see below) isn’t smaller than ‘3’, isn’t between ‘1’ and ‘3’, and isn’t closer to ‘3’ than it is to ‘4’.
So, ‘2’ isn’t a quantity there; it’s just one of the markings on the dice.
About that ‘mostly’: the straights require the ability to order the values on the dice, but even then, it’s just an order, not anything about them being equidistant, or about (2,6) showing more eyes than (3,4) (again, there’s an exception there, with the ‘Chance’ score)
I am not sure I fully understand your question, but I’ll try to answer the question I understood. Very generally, every field of math I know concerns itself with mappings and what they do (e.g. matrixes are mappings, as are metrics and norms). The differences between the different areas of math (oversimplified) is that they concern themselves with different kinds of mappings. The area I specialized in was partial differential equations, here the mappings of interest are solutions to partial differential equations (usually generalized functions). One of the important questions to ask is of these functions are continuous, differentiable, or any other of the many variations. All of these properties are defined over how a function maps an input to an output.
BTW: There’s usually an infinite number of solutions to a differential equation, and IIRC (it’s been a while) those solutions too form a vector space.
> But surely there is a lot of interesting math that is concerned with the connectedness of the input and output spaces themselves, in context of there still existing a mapping between the input and output.
Topology.
Topology studies neighborhoods in your space and how they relate to things like functions, limits, etc.
Yeah, topology seems to have a lot to do with it! However, I hesitated to jump into that conclusion, because my impression of topology is that it's concerned of "whether" something is connected (and often about the directedness of surfaces/spaces), because topology allows "stretchiness" of its study matter. The connections between things are then kind of binary things - either something is connected or not. However, in the world of "connected dimensions", there seems to be degrees of connectedness. Linear correlation being from the simplest end, but more generally, shared entropy. So it feels like just "topology" doesn't quite catch the field and its pecularities.
I think what you're getting at is two things. You have a dataset. This consists of some set of observations, each of which contains features that have been observed. These may be things like education level, age, income, favorite color, gender, and height. Three of these variables are continuous. One is ordinal. One is nominal. One is binary. Can you do any kind of meaningful matrix operations with this data?
Traditionally, in regression analysis, you'd use dummy variables for the nominal/binary data, turning all of them into binary {0, 1}. This is GF(2), a bonified vector space. Education level you've got some choices. You can also encode these as binary, potentially losing some information given they're ordered. But if you choose to map them to integers or real numbers, what is the justification for the distance function you're defining? Is MD > PhD? How much greater is PhD than BA than high school diploma?
If you stick to one-hot encoding, there's justification in performing regression analysis still, because you end up in a case where most of the values are 0 and drop out and you get multiple models, one for each case, so the matrix operations being performed to generate the model only operate on the remaining real-valued vectors, and the one case that is 1 becomes the constant term. This is just basic ANCOVA.
You have another issue, though, right? Are "height" and "income" and "age" really from the same vector space? Obviously not. Some modelers choose to ignore this. Some will assume they're all normally distributed and tranform the individual values to a Z-score or something and then they're all from the same vector space. But this still may be dubious. For one thing, they're not actually normally distributed. Height and age have a strictly zero lower-bound. They have some upper-bound, whatever it may be. Income is severely right-skewed. But they're probably all at least approximately normal over some restricted range of the most common values.
We're starting to see the issue. Modeling in this way might work reasonably well within some restricted range of common values, but extreme cases are left out. We need some other techniques that don't rely on matrix math, because once we have to admit all of our elements from not from the same field, we admit we don't really have a matrix, even if a computer will gladly let us pretend anything we can encode as a number is actually a number.
I think your question is how can we encode data such that it all ends up in the same vector space? It's easy if your data is imagery. Color channel intensity of each pixel is all real-valued, possibly within different ranges, but easily normed. If all you have is text, representing the corpus by an incidence matrix puts all of your elements in GF(2). Plenty of other word embeddings are well-justified. But when you start looking at longitudinal data from epidemology, the shit they do in econometrics, it starts to get less math and more black magic. By the time you're a data scientist at Netflix, I'm sure they have reason to believe whatever they're doing works, but it may be hard to reason about why it works.
I don't really know if there any field of mathematics studying ways of encoding data such that doing things like adding and multipying height by favorite color actually means anything, but it's an interesting question.
> I think your question is how can we encode data such that it all ends up in the same vector space?
No, it's not. Yes, that is an example of a single question we could ask when doing work in that context. My question was:
> What is that field of math called? What are the interesting results, theorems and such? I love learning more so I'm kind of just looking for some pointers/keywords.
What makes the author say that functions are infinite dimensional? Seems like the space of functions might be infinite dimensional but one function is usually not.
“AND” is 0001 for 00 01 10 11. 2^4=16 binary Boolean functions, in ternary it blows up, but it’s not infinite.
I think I understand it, let's see if I can explain it. Hopefully I'll say something useful.
Take a vector for normal space, [x, y, z]. We say each component of this vector is one dimension, so this one is 3D, and each of its three components can vary. Two such vectors are different if one or more components differ between them. Treating a function as a vector means treating each distinct possible input to the function as a distinct component.
For example, consider the integer function f(x) = x^2. This can be represented as the vector [..., 16, 9, 4, 0, 4, 9, 16, ...] Where the complete vector has as many components as integers. Since there's infinitely many integers, there's infinite components, so instead of 3D like the three component vector above, this vector is ∞D.
Any single function is representable in this way, so each distinct function has its own unique infinitely long vector.
So each different function is a different "point" in an infinite dimensional vector space.
A function on the reals maps any real to another [or maybe the same] real. Given some systematic way to order the inputs, you could describe the function as a vector lookup table with an infinite number of elements -- one output for each possible input.
That vector describes a single point in an infinite-dimensional space. Thus every function from R to R is a single point in an infinite-dimensional space.
Now you can use linear algebra to move these points around in the infinite-dimensional space, measure how far two points [functions] are from each other, etc. That's functional analysis.
The linear operators that do this moving around and measuring are called functionals to indicate that they take functions as arguments. (Like higher-order functions in a programming language.) "Functional Analysis" is thus "The analysis of the objects known as functionals".
The author lives in a context of real calculus, as such he declares that the field from the second section onwards will be reals. The functions over the field of booleans can be equally interesting including the Fourier transformation (multiplication in n log n, iirc)! But they are less intuitive and less known.
The article is by no means perfect, but that "howler" sees completely fine to me. If you want to prove something is a vector space the standard way would be to prove that all the vector space axioms hold for it.
In mathematics, a function is a mapping from input values to output values. A vector is a mapping from a set of integers to a value at the specified index, therefore it is a function.
try to look at it as a kind of format were you are storing equations (by the moment)
You process the operations between such equations -sum, inner, outer- by "unpacking" them into Matrix equations format (populating the Matrix with the vector values), doing the operations, and returning back to the vector format.
So in essence you are working with Matrix equations, but with a more compact format as vector.
For being able to work with such equations' compact format, its needed to follow some rules restricted to the purpose, case contrary, as in reality they are vectors per se, the geometries could mix dimensions were the equations' proprieties are different.
I’d probably have titled this something including either the term “linear” or “functional analysis”. Because submitted here, we will first interpret “functions” in the context of a function in computer programming, where the statement is more provocative and thus clickbaity.
The problem is many real world functions and problems are nonlinear. But they may have linear components. For example, a dog can be recognized by its outline and the texture of the fur, pattern of the face, etc within that outline. Deep neural nets solve this by composing vector operations, hence the universal approximation theorem and existence of NNs that can recognize dogs (though I could have picked a better example as dog recognition is not continuous I think).
In the context of computer programming, it is not really a helpful statement to say that functions are vectors. But because of the universal approximation theorem and its relatives you could say that “functions are (can be approximated as) compositions of vector operations”
The title is actually perfect, but requires some context, after which you might appreciate its tongue-in-cheek beauty:
The author studied at CMU, where the proudly-paraded slogan for an introductory functional programming classes is "Functions are values", which has an almost cult-like status - appearing on their TA hoodies, laptop stickers, and so on.
Other classes soon caught on, first with the imperative programming class declaring that "Functions are pointers", then the introductory discrete math class's "Functions are tuples", and even "Functions are relations" from the databases class.
So viewed in this lens, passing up the opportunity to title it what it was would have been unthinkable.
For some reason computer programmers took the mathematical definition of "functions", used it for their things which are emphatically not mathematical functions, and are now complaining that they get confused by people talking about the old math definition.
Another fantastic application that the website doesn't mention is the composition / Koopman operator. In control theory (e.g. autonomous drones, cars, robot arms, etc.), most real-world systems are described by nonlinear dynamics which are very difficult to work with (e.g. safety/stability guarantees, optimizing over forward horizons using NMPC, state estimation, etc.) The Koopman operator however gives a globally relevant linear approximation of non-linear systems. In other words, you can treat a nonlinear system as a linear system with fairly high accuracy. This greatly simplifies control and estimation from a computational perspective. You can also learn these linearizations from data. Steve Brunton has some good materials on Koopman theory [2][3], and there are some great applications to control of systems such as soft robots [4].
[1]: https://arxiv.org/abs/1904.02539
[2]: https://youtube.com/playlist?list=PLMrJAkhIeNNSVXUvppZTYNHKQ...
[3]: https://arxiv.org/abs/2102.12086
[4]: https://arxiv.org/abs/1902.02827