That's not a good metric.

omeze · 2024-03-27T21:29:25

Many applications dont want to host inference on the cloud and would ideally run things locally. Hardware constraints is clearly important.

Id actually say its the most important metric for most open models now, since the price per performance of closed cloud models is so competitive with open cloud models, so edge inference that is competitive is a clear value add

Y_Y · 2024-03-28T09:19:46

It's not that memory usage isn't important, it's that dividing error by memory gives you a useless number. The benefit from incremental error decrease is highly nonlinear, as with memory. Improving error by 1% matters a lot more starting from 10% error than 80%. Also a model that used no memory and got everything wrong would have the best score.

omeze · 2024-03-28T19:25:25

I see, I agree with you. But I would imagine the useful metric to be “error rate below X GB memory”. We really just need memory and/or compute reported when these evaluations are performed to compile that. People do it for training reports since compute and memory is implicit based on training time (since people saturate it and report what hardware they’re using). But for inference no such details :\