ubermenchh's comments

ubermenchh · 2025-12-31T22:24:15 1767219855

yes it does continous batching along with paged attention and prefix caching. i am also goint to be adding some more inference techniques

ubermenchh · 2025-12-29T18:10:07 1767031807

Haha, i just wanted my repo to be out here. If someone finds it interesting they can always just check the repo. And you're close, its about getting faster responses from the model by manipulating the request queues and memory.