Hacker News new | past | comments | ask | show | jobs | submit login

This looks like a potential NeurIPS submission.

But it will probably be rejected. The quality bar for NeurIPS is quite high.

Some reasons:

The experiments are very weak: There are just a few figures, basically figure 1 and figure 5, which show some results. There are no tables with numbers.

But more importantly: There are no comparisons (in terms of experiments/numbers) to similar models, like:

- Block-Recurrent Transformers (https://arxiv.org/abs/2203.07852) and related approaches to make the Transformer recurrent, so effectively getting infinite context length.

- All the work on sparse attention, or linear attention, like Longformer, etc, which should also allow for such context lengths.

I don't mean that they just mention them in related work (they partly do, although it looks very short, so I'm quite sure they leave out a lot of other related work, without looking too much into it). I mean that they really run experiments and show numbers. And also, to look at the model definitions of those alternatives, and compare and analyze the differences.

Analysis also seems to be a bit weak. Not sure about novelty.

So, while the presented approach could be interesting, it's hard to tell how good it really performs, and how it compares to alternatives.

(This is now a 10 min review by me. If I had to review this, I usually would spend more like 1-2h at least, so giving much more details. But I don't expect that my initial impression would change too much.)




The original architecture used in this model was accepted at last year's NeurIPS: https://proceedings.neurips.cc/paper_files/paper/2022/file/4...

That paper is written very differently.


I feel like this is a thing in ML land, like everyone is in such a rush to publish something "revolutionary" because the whole field is already moving at warp speed. Under such pressure, its easy to lose focus on metrics and comparisons.


A model of publishing in which the authors of related work are compensated (citations, appearing as coauthors,...) would allow new approaches and ideas to be disseminated easily. The main factor here is about novelty and possible applications of the new approaches.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: