Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Has there been research into some hierarchical attention model that has local attention at the scale of sentences and paragraphs that feeds embeddings up to longer range attention across documents?


There’s the hierarchical reasoning model https://arxiv.org/abs/2506.21734 but it’s very new and largely untested

Though honestly I don’t think new neural network architectures are going to get us over this local maximum, I think the next steps forward involve something that’s

1. Non lossy

2. Readily interpretable


The ARC Prize Foundation ran extensive ablations on HRM for their slew of reasoning tasks and noted that the "hierarchical" part of their architecture is not much more impactful than a vanilla transformer of the same size with no extra hyperparameter tuning:

https://arcprize.org/blog/hrm-analysis#analyzing-hrms-contri...


By now, I seriously doubt any "readily interpretable" claims.

Nothing about human brain is "readily interpretable", and artificial neural networks - which, unlike brains, can be instrumented and experimented on easily - tend to resist interpretation nonetheless.

If there was an easy to reduce ML to "readily interpretable" representations, someone would have done so already. If there were architectures that perform similarly but are orders of magnitude more interpretable, they will be used, because interpretability is desirable. Instead, we get what we get.


From what I’ve seen neurology is very readily interpretable but it’s hard to get data to interpret. For example the visual cortex V1-V5 areas are very well mapped out but other “deeper” structures are hard to get to and meaningfully measure.


They're interpretable in a similar way to how interpretable CNNs are. Not by a coincidence.

For CNNs, we know very well how the early layers work - edge detectors, curve detectors, etc. This understanding decays further into the model. In the brain, V1/V2 are similarly well studied, but it breaks down deeper into the visual cortex - and the sheer architectural complexity there sure doesn't help.


Well, in terms of architectural complexity you have to wonder what something intelligent is going to look like, it’s probably not going to be very simple, but that doesn’t mean it can’t be readily interpreted. For the brain we can ascribe structure to evolutionary pressure, IMO there isn’t quite as powerful a principle at play with LLMs and transformer architectures and such. Like how does minimizing reconstruction loss help us understand the 50th, 60th layer of a neural network? It becomes very hard to interpret, compared to say the function of the amygdala or hippocampus in the context of evolutionary pressure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: