Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
TensorRT-LLM runtime now open-source (github.com/nvidia)
4 points by mmoskal 5 months ago | hide | past | favorite | 1 comment


Previously, the "Executor" runtime was shipped as binary blobs. This is the bit that schedules requests and manages KV cache (similar to vLLM or SGLang server).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: