GPUsGoBurr: Get up to 2x higher performance by Tuning LLM Inference Deployment

agrawalamey · 2024-05-15T21:53:45

LLM inference systems are like high-performance engines —complex, powerful, and full of intricate settings. Efficiently deploying them to maximize GPU performance is a challenge typically tackled by experts at orgs like OpenAI and Meta with tons of tribal knowledge. Our latest research at MSFT Research and Georgia Tech, shows that suboptimal configurations can more than double your deployment costs! Vidur is a the first full-stack LLM inference simulator. Designed to find the optimal deployment settings in minutes, it lets you maximize your GPU output https://bit.ly/vidur

nitinkedia7 · 2024-05-15T22:59:18

Do check out the GitHub repo https://github.com/microsoft/vidur . You can run it without any GPUs. Disclaimer: I'm a co-author of the paper.