Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wish this was written with more care. None of the symbols are defined.

Worst of all, they use "channel dimension" in a sequence model. What even is a channel in a sequence of tokens? This happens as soon as you have a single person with CNN background on the team and it makes zero sense. What if you actually have channels in your data? What then?



If you have more specific feedback, like a specific digram or page, and how it can be made better. I will gladly forward that info, to improve the paper draft.

Because channel mixing, is a core component of this architecture, and that keyword "channel"is all over the place. I have no idea what is it you are critiquing specifically (i could not find the mention of "channel dimension" in the paper)


They said the paper is still working in progress and will improve it.

https://twitter.com/AiEleuther/status/1660811180901019648




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: