Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
|
ubermenchh's comments
login
ubermenchh
15 days ago
|
parent
|
context
|
next
[–]
| on:
Show HN: Mini-vLLM in ~500 lines of Python
yes it does continous batching along with paged attention and prefix caching. i am also goint to be adding some more inference techniques
ubermenchh
17 days ago
|
parent
|
context
|
prev
[–]
| on:
Show HN: Mini-vLLM in ~500 lines of Python
Haha, i just wanted my repo to be out here. If someone finds it interesting they can always just check the repo. And you're close, its about getting faster responses from the model by manipulating the request queues and memory.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: