"Undervolting" is a thing for 3090s where they get them down from 350 to 300W at 5% perf drop but for your case it's irrelevant because your lane budget is far too little!
> know Youtuber, that ran LLMs on a 4090, and the actual power draw was only 130W on the GPU.
Well, let's see his video. He must be using some really inefficient backend implementation if the GPU wasn't utilised like that.
I'm not running e-waste. My cards are L40S and even in basic inference, no batching with ggml cuda kernels they get to 70% util immediately.
> know Youtuber, that ran LLMs on a 4090, and the actual power draw was only 130W on the GPU.
Well, let's see his video. He must be using some really inefficient backend implementation if the GPU wasn't utilised like that.
I'm not running e-waste. My cards are L40S and even in basic inference, no batching with ggml cuda kernels they get to 70% util immediately.