I really think that the numbers were inflated because the prolific benchmarkism that goes on in ML. Basically if you don't beat SOTA, you don't get published. Usually you need SOTA on MULTIPLE datasets. Which is is problematic, because plenty of non SOTA methods are useful (forget novel). Given the results Ken/ks2048 calculated, I am pretty confident that the work wouldn't have made it in. BUT I think the results given the other features does make the work quite useful! I agree Ken, that it unfairly boosts their work, but I understand why they're bending over backwards to defend it. I wish people would just admit mistakes but that risks (probably not) losing a paper. This is probably the same reason they didn't think to double check the suspicious results like the Filipino dataset too (btw, not uncommon for datasets to be spoiled people. Always be suspicious!).
I'm not trying to give them a pass, but we do need to discuss the perverse incentives we've set up that make these kinds of things prolific. The work should be good on its own, but good doesn't mean it'll get published in a journal. And frankly, it doesn't matter how many citations your arxiv paper has, people will still say "it isn't peer reviewed" and it won't help you get a job, graduate, or advance in academia. Which I think we should all agree is idiotic, since citations are indicating peer review too.
I don't blame them for failing to double check their results.
I blame them for giving obviously incorrect excuses on GitHub when such an obvious mistake is pointed out.
There is no way they could be at the stage they claim to be in their program (having just defended their thesis) and think the excuses they gave on GitHub are reasonable.
Yeah, I fully agree. They should just admit the mistake rather than try to justify it. I was just trying to explain the incentive structure around them that encourages this behavior. Unfortunately no one gives you points for admitting your mistakes (in fact, you risk losing points) and you are unlikely to lose points for doubling down on an error.
> There is no way they could be at the stage they claim to be in their program (having just defended their thesis) and think the excuses they gave on GitHub are reasonable.
Unfortunately it is a very noisy process. I know people from top 3 universities that have good publication records and don't know probabilities from likelihoods. I know students and professors at these universities that think autocasting your model to fp16 reduces your memory by half (from fp32) and are confused when you explain that that's a theoretical (and not practical) lower bound. Just the other day I had someone open an issue on my github (who has a PhD from one of these universities and is currently a professor!) who was expecting me to teach them how to load a pretrained model. This is not uncommon.
I'm not trying to give them a pass, but we do need to discuss the perverse incentives we've set up that make these kinds of things prolific. The work should be good on its own, but good doesn't mean it'll get published in a journal. And frankly, it doesn't matter how many citations your arxiv paper has, people will still say "it isn't peer reviewed" and it won't help you get a job, graduate, or advance in academia. Which I think we should all agree is idiotic, since citations are indicating peer review too.