Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I built something similar in structure, if not in method, using a hierarchy of cross attention and learned queries, made sparse by applying L1 to the attention matrices.

Discrete hierarchical representations are super cool. The pattern of activations across layers amounts to a “parse tree” for each input. You have effectively compressed the image into a short sequence of integers.





Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: