Built the tool to kind of builds the tools. Just publicly made available yesterd...

hasteg · 2026-06-09T12:27:07 1781008027

This seems really cool actually. Just read thru the README and watched the demo gif. Any examples of what you've built with it so far? Might have to play around with it tonight. I've been getting pretty heavy into local LLMs since getting my 5090. Amazing what Qwen can do given how small it is (running 35B at Q4 quant).

Duaard · 2026-06-09T12:39:52 1781008792

It is dumb but very specific to my use case but an agent that runs every evening to pull both my calendar and time logs (Google sheet). The calendar I do every morning to time block my day, then throughout the day I record what I did and categorize by Eisenhower's matrix. Then it compares map vs territory and I hopefully adjust my priors to do better planning or flag time sinks.

Cool thing is I built the tools with the [Tool Builder](https://github.com/fabritorio/fabritorio/blob/main/docs/node...) as well, pretty small CLI helpers. https://imgur.com/a/XeYYbzC

mystraline · 2026-06-09T12:44:55 1781009095

Check out Krasis https://github.com/brontoguana/krasis

It enables something similar to unified memory. Ive got a 5060 (16GB) card and 96 GB ddr5.

I can run qwen3.5-122b int4 at 25tok/sec.And now even does image ingestion!

Ive been bulk transliterating and translating foreign language books into english. And all completely local.

hasteg · 2026-06-09T13:02:38 1781010158

Wait so this makes it so I can use my DDR5 as well as my VRAM combined? This is actually sick if so. Maybe I will actually have to go out and buy some more DDR5 (currently only have 32GB...)

mystraline · 2026-06-09T16:14:43 1781021683

Yep, thats it what it does. Only works with nvidia.

The difference it does use safetensors, and not gguf's. But it does dynamically requant to int4 8 or bf16.

hasteg · 2026-06-09T19:26:10 1781033170

Wow that's actually sick as hell, somehow hadn't heard of this. maybe I will go and blow $700 on a new ram kit... thanks for sharing!

mystraline · 2026-06-09T19:59:06 1781035146

Glad to share!

But go try it out now with a 35B model on your current hardware.

Right now, I have loaded qwen3.6-35B-A3B, 128k context, kv cache 2.5GB, thinking. Int8

Using 11.5GB gfx ram, 42GB system ram.

I dont want to oversell. All GPU would be faster, but creating a semi-unified system is deffo a game changer for me.