Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Agree. So attention is like hierarchy of graphs where nodes are tokens and edges are attention scores per head.

Now what's trippy is this node has position data. So node feature and position it appears is used to create a operator that projects a sequence to a semantic space.

This seems to work for any modality of data.. so there is some thing about order in which data appears that seems to be linked to semantics and for me hints about some deep causal structure being latent in LLM



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: