Efficient and Lossless Moe Diffusion LLM Inference with I/O-Aware Expert Offload | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		Efficient and Lossless Moe Diffusion LLM Inference with I/O-Aware Expert Offload (tide-paper.vercel.app)
		1 point by imalomder 1 day ago \| hide \| past \| favorite \| 1 comment

		help

imalomder 1 day ago [–]

Hi HN, this is my research project that allow people to locally deploy MoE Diffusion LLMs more efficiently. With this method, you can fit a 100B LLaDA2.0-flash model into a PC with a RTX5090 and run it faster than other methods.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact