Thank you for your diligence in explaining complicated topics in a way that makes them accessible to people trying to learn what people before them figured out. You are a true educator that seems have found a place to support you to continue doing that.
Are you concerned that your git exposition is much longer than the other guides you have produced?
This is incomplete, incorrect, and irrelevant. Standard notation already exists. I'm sure it is fun to draw squiggly lines and some people enjoy reinventing the wheel. Spend some time learning what others have taught us before striking out on your own lonely path.
I'm hoping the Tensor Cookbook can become as engaging to read for others as Jordan Taylor's paper was to me. If you have any thoughts on where I lose people, please share!
Tensor diagrams are standard, but some notation is missing. My goal was to be able to handle the entire Matrix Cookbook.
For this I needed a good notation for functions applied to specific dimensions and broadcasting over the rest. Like softmax in a transformer.
The function chapter is still under development in the book though. So if you have any good references for how it's been done graphically in the past, that I might have missed, feel free to share them.
You can do broadcasting with a tensor, at least for products and sums. The product is multilinear, and a sum can be in two steps, first step using a tensor to implement fanout. Though I can see the value in representing structure that can be used more efficiently versus just another box for a tensor. Beyond that (softmax?) seems kind of awkward since you're outside the domain of your "domain specific language". I don't know why it's needed to extend the matrix cookbook to tensor diagrams.
I come back to this every few months and do some work trying to make sense of how tensors are used in machine learning. Tensors, as used in physics and whose notation these tools inherit, are there for coordinate transforms and nothing else.
Tensors, as used in ML, are much closer to a key-value store with composite keys and scalar values, with most of the complexity coming from deciding how to filter on those composite keys.
Drop me a line if you're interested in a chat. This is something I've been thinking about for years now.
The Wikipedia page on this is sufficient. If F:X -> Y is a function between normed linear spaces then DF:X -> L(X,Y), where L(X,Y) is the vector space of linear operators from X to Y, satisfies F(x + h) = F(x) + DF(x)h + o(h). A function is differentiable if it can be locally approximated by a linear operator.
Some confusion arises from the difference between f:R -> R and f':R -> R. It's Fréchet derivative is Df:R -> L(R,R) where Df(x)h = f'(x)h. Row vectors and column vectors a just a clumsy way of thinking about this.
Let S = {1, 2}. Every distance function is determined by d(1,2) = a, a >= 0. Define f(d) = {{1,2}} if a = 0 and f(d) = {{1},{2}} otherwise. Isn't this a clustering algorithm that is scale invariant, rich, and consistent?
Looks that way to me, yeah, though this is obviously a super simple case. It's clearly scale invariant and there are only two partitions, which your algorithm hits, so it's rich. Completeness is trivially satisfied in both cases too.
i think i found the issue: the paper says distance function is 0 IFF elements are equal. so for this example, you can not define d(1,2) as equal to 0. so it is not rich, as this is the only way to get the partition {{1,2}}.
Oh, I see, it's not a true metric. That's fair enough, though I wonder if the result depends critically on that assumption.
(you can pass to equivalence classes to recover a true metric, and I didn't see anything obviously incompatible with that in the paper, but I admit I didn't look very deeply)
That tax case didn't concern their alpha. It was a result of the ruling body deciding that they had inaccurately grouped short term profits with long term profits, through the use of basket options. The ruling in this case was simply the short term gains that were grouped into basket options weren't able to be taxed like the long term gains they were construing them as.
We agree. The case was about Renaissance breaking the law. The best lawyers their money could buy lost and did not avail themselves of their right to appeal. The ruling body found they were tax cheats.
Philanthropic work is great way to get shills to come out of the woodworks.
They didn't exactly cheat. According to the rules as written, what they did was acceptable. Others who were slightly less blatant about it were permitted, and continue to be permitted, to do the same thing.
Category theory is just another way of looking at math besides the impoverished notion "everything is a set". Mathematics is used in computer science. Rust is a great example. Jean-Yves Girard invented linear logic to make Gerhard Gentzen's sequent calculus symmetric, similar to Paul Dirac's theory that led to the discovery of positrons. Girard's concern about using a proposition exactly once in a proof led to borrow checking.
Putting on category theory glasses can help discover and clarify new facts. Thinking in terms of objects and arrows leads to duality: reverse the direction of the arrows.
The category Set is only one of many categories. The objects are sets and the arrows are functions. A function I -> S that is 1-1/injective/mono[1] corresponds to the set theory notion of a "subset". The dual is a function S -> I that is onto/surjective/epi[2]. What set theory notion does this correspond to?
Hint: Look into David Ellerman. He is the von Neumann of our times.
[1] f is mono if fg = fh implies g = h.
[2] f is epi if gf = hf implies g = h.
This. Motivation is the key to getting employees to produce over a period of time. Give the good ones the maximum amount of ownership they can handle and they will set an example for everyone else. Grow the talent over time.
Fire the whinny shit-talkers as soon as possible. They drag everyone down.
And, like, a LARPing mode I could use while sitting in my mom's basement that would automagically post every brain fart I had to HN. That would be totally cool!
When I worked at Bloomberg their first coding guideline was "Don't sweat the small stuff." I was mystified by that when I first read it. Then I sat in meetings where someone would drone on about how clever they thought their code was until someone asked "Is this small stuff?" It would shut them up so we could move on to more important matters.
reply