More

keithalewis · 2025-02-10T12:21:18 1739190078

Linus wrote it, so there's got to be a pony in there somewhere. He's the first person in the world to figure out version control software. /s

keithalewis · 2025-02-10T05:15:40 1739164540

Thank you for your diligence in explaining complicated topics in a way that makes them accessible to people trying to learn what people before them figured out. You are a true educator that seems have found a place to support you to continue doing that.

Are you concerned that your git exposition is much longer than the other guides you have produced?

keithalewis · 2025-01-31T21:38:22 1738359502

This is incomplete, incorrect, and irrelevant. Standard notation already exists. I'm sure it is fun to draw squiggly lines and some people enjoy reinventing the wheel. Spend some time learning what others have taught us before striking out on your own lonely path.

llm_trw · 2025-01-31T22:05:25 1738361125

This is standard notation that's been used for decades.

https://arxiv.org/abs/2402.01790v1

chriskanan · 2025-02-01T01:09:29 1738372169

This paper motivates and explains concepts much better than the Tensor Cookbook.

thomasahle · 2025-02-02T10:08:57 1738490937

I'm hoping the Tensor Cookbook can become as engaging to read for others as Jordan Taylor's paper was to me. If you have any thoughts on where I lose people, please share!

llm_trw · 2025-02-01T01:44:49 1738374289

The cookbook is a work in progress by the looks of it.

keithalewis · 2025-02-01T03:29:20 1738380560

"This book aims to standardize the notation for tensor diagrams..." https://youtu.be/zELbzXAmcUA?t=73

thomasahle · 2025-02-02T10:06:09 1738490769

Tensor diagrams are standard, but some notation is missing. My goal was to be able to handle the entire Matrix Cookbook.

For this I needed a good notation for functions applied to specific dimensions and broadcasting over the rest. Like softmax in a transformer.

The function chapter is still under development in the book though. So if you have any good references for how it's been done graphically in the past, that I might have missed, feel free to share them.

absolutelastone · 2025-02-05T18:09:53 1738778993

You can do broadcasting with a tensor, at least for products and sums. The product is multilinear, and a sum can be in two steps, first step using a tensor to implement fanout. Though I can see the value in representing structure that can be used more efficiently versus just another box for a tensor. Beyond that (softmax?) seems kind of awkward since you're outside the domain of your "domain specific language". I don't know why it's needed to extend the matrix cookbook to tensor diagrams.

llm_trw · 2025-02-02T14:24:33 1738506273

I come back to this every few months and do some work trying to make sense of how tensors are used in machine learning. Tensors, as used in physics and whose notation these tools inherit, are there for coordinate transforms and nothing else.

Tensors, as used in ML, are much closer to a key-value store with composite keys and scalar values, with most of the complexity coming from deciding how to filter on those composite keys.

Drop me a line if you're interested in a chat. This is something I've been thinking about for years now.

thomasahle · 2025-02-02T10:07:11 1738490831

Highly recommend this note by Jordan Taylor.

HighlandSpring · 2025-01-31T22:07:04 1738361224

Do point us at this standard notation

ok123456 · 2025-01-31T23:19:10 1738365550

https://en.wikipedia.org/wiki/Einstein_notation

keithalewis · 2025-02-01T03:15:32 1738379732

The author also seems to be unaware of Fréchet derivatives.

gsf_emergency_2 · 2025-02-01T09:34:05 1738402445

I don't exactly know what you mean but from your hint I found the uh, clarifying bedtime story:

https://arxiv.org/abs/2302.09687

(On functions of 3rd-order "tensors")

((Whereas matrix-functions are of 2nd-order "tensors"))

Playground: https://gitlab.com/katlund/t-frechet

(MATLAB)

keithalewis · 2025-02-02T05:15:48 1738473348

The Wikipedia page on this is sufficient. If F:X -> Y is a function between normed linear spaces then DF:X -> L(X,Y), where L(X,Y) is the vector space of linear operators from X to Y, satisfies F(x + h) = F(x) + DF(x)h + o(h). A function is differentiable if it can be locally approximated by a linear operator.

Some confusion arises from the difference between f:R -> R and f':R -> R. It's Fréchet derivative is Df:R -> L(R,R) where Df(x)h = f'(x)h. Row vectors and column vectors a just a clumsy way of thinking about this.

BTW, all you need in order to publish on arixv.org is to know a FoF. There is no rigorous peer review. https://arxiv.org/abs/1912.01091, https://arxiv.org/abs/2009.10852.

thomasahle · 2025-02-02T10:01:21 1738490481

What content about Fréchet derivatives do you think would be useful to include?

keithalewis · 2024-12-27T09:24:18 1735291458

Let S = {1, 2}. Every distance function is determined by d(1,2) = a, a >= 0. Define f(d) = {{1,2}} if a = 0 and f(d) = {{1},{2}} otherwise. Isn't this a clustering algorithm that is scale invariant, rich, and consistent?

bubblyworld · 2024-12-27T09:33:47 1735292027

Looks that way to me, yeah, though this is obviously a super simple case. It's clearly scale invariant and there are only two partitions, which your algorithm hits, so it's rich. Completeness is trivially satisfied in both cases too.

n0bra1n · 2024-12-27T10:36:23 1735295783

i think i found the issue: the paper says distance function is 0 IFF elements are equal. so for this example, you can not define d(1,2) as equal to 0. so it is not rich, as this is the only way to get the partition {{1,2}}.

bubblyworld · 2024-12-27T10:50:43 1735296643

Oh, I see, it's not a true metric. That's fair enough, though I wonder if the result depends critically on that assumption.

(you can pass to equivalence classes to recover a true metric, and I didn't see anything obviously incompatible with that in the paper, but I admit I didn't look very deeply)

keithalewis · 2024-12-27T08:10:05 1735287005

You made a completely obvious and true statement starting with "It would help..." That seems to be frowned upon on HN.

BTW, a simpler definition of a (small) category is that it is a partial monoid.

xanderlewis · 2024-12-27T09:31:52 1735291912

You mean a monoidoid.

(I’m not sure this ‘simpler definition’ is going to help!)

keithalewis · 2024-12-15T13:11:32 1734268292

Worked for Renaissance Technologies: https://www.ft.com/content/8cef8c70-5d02-4762-9100-2d92d0c76...

TeaBrain · 2024-12-15T20:09:57 1734293397

That tax case didn't concern their alpha. It was a result of the ruling body deciding that they had inaccurately grouped short term profits with long term profits, through the use of basket options. The ruling in this case was simply the short term gains that were grouped into basket options weren't able to be taxed like the long term gains they were construing them as.

keithalewis · 2024-12-16T00:25:33 1734308733

We agree. The case was about Renaissance breaking the law. The best lawyers their money could buy lost and did not avail themselves of their right to appeal. The ruling body found they were tax cheats.

Philanthropic work is great way to get shills to come out of the woodworks.

lmm · 2024-12-15T13:26:19 1734269179

They didn't exactly cheat. According to the rules as written, what they did was acceptable. Others who were slightly less blatant about it were permitted, and continue to be permitted, to do the same thing.

keithalewis · 2024-12-14T01:09:06 1734138546

Category theory is just another way of looking at math besides the impoverished notion "everything is a set". Mathematics is used in computer science. Rust is a great example. Jean-Yves Girard invented linear logic to make Gerhard Gentzen's sequent calculus symmetric, similar to Paul Dirac's theory that led to the discovery of positrons. Girard's concern about using a proposition exactly once in a proof led to borrow checking.

Putting on category theory glasses can help discover and clarify new facts. Thinking in terms of objects and arrows leads to duality: reverse the direction of the arrows.

The category Set is only one of many categories. The objects are sets and the arrows are functions. A function I -> S that is 1-1/injective/mono[1] corresponds to the set theory notion of a "subset". The dual is a function S -> I that is onto/surjective/epi[2]. What set theory notion does this correspond to?

Hint: Look into David Ellerman. He is the von Neumann of our times.

[1] f is mono if fg = fh implies g = h. [2] f is epi if gf = hf implies g = h.

Hi Dang.

keithalewis · 2024-12-12T14:08:52 1734012532

This. Motivation is the key to getting employees to produce over a period of time. Give the good ones the maximum amount of ownership they can handle and they will set an example for everyone else. Grow the talent over time.

Fire the whinny shit-talkers as soon as possible. They drag everyone down.

keithalewis · 2024-12-12T13:03:09 1734008589

And, like, a LARPing mode I could use while sitting in my mom's basement that would automagically post every brain fart I had to HN. That would be totally cool!

keithalewis · 2024-12-09T10:04:44 1733738684

When I worked at Bloomberg their first coding guideline was "Don't sweat the small stuff." I was mystified by that when I first read it. Then I sat in meetings where someone would drone on about how clever they thought their code was until someone asked "Is this small stuff?" It would shut them up so we could move on to more important matters.